DNA Binding: ChIP-seq
- Pre-alignment quality assessment:
- Per-base sequence quality
- Per-base sequence content
- Per-base GC content
- Search for overrepresented sequences (adapters, primers, etc)
- Alignment to a reference genome using Bowtie:
- Homo sapiens
- Mus musculus
- Rattus norvegicus
- Bos taurus
- Canis familiaris
- Gallus gallus
- Drosophila melanogaster
- Arabidopsis thaliana
- Caenorhabditis elegans
- Saccharomyces cerevisiae
- Peak calling using MACS (Model-based Analysis of ChIP-Seq)
The core uses BWA for alignment and MACS for peak calling.
Q: How much sequence do I need?
ChIP-seq, like RNA-seq, is also a counting application - coverage is estimated in the number of reads (as opposed to genome sequencing, where coverage is measured in gigabases). Whereas RNA-seq's standards call for ~30-million reads per sample, ChIP-seq has a much wider range for an appropriate number of reads, which depends on the DNA-binding factor being analyzed. Transcription factors with very limited binding sites need far fewer reads for high resolution (e.g., 5-million) than ubiquitous DNA-binding proteins, like histones (e.g., 30-million). The DNA-binding profile of a factor is often unknown before performing the ChIP-seq experiment. In these cases, the ENCODE consortium recommends at least 10-million uniquely mapped reads should be sequenced per sample for mammalian experiments, or at least 2-million uniquely mapped reads per sample for worm or fly (80% of these reads should be distinct/nonidentical).
Q: Which sequencing platform should I use?
See Table 1, "Comparison of Sequencing Technologies" from this paper in the January 2012 issue of Nature Reviews Genetics for a comparison of read length, insert size, run time, reads per run, error rate, and relative cost per nucleotide for several common sequencing platforms, including Roche 454, Illumina GAIIx, Illumina HiSeq 2000, Illumina MiSeq, PacBio, and Ion Torrent. If you would like to schedule a joint meeting with directors from both the Bioinformatics Core and the DNA Sciences (Sequencing) Core, please email us and indicate this on your consultation request form.
Q: How much will this cost me?
Bioinformatics costs are separate from sequencing costs, and are the same whether data is generated here at the DNA Sciences Core or elsewhere through an external vendor. Read the about page, and submit a consultation request form.