This pipeline identifies regions of significant protein binding ("peaks") based on a reference genome. 12 hour minimum ($876 internal, $1116 external) per project.
1. Quality Assessment
Quality of data assessed by FastQC and aggregated with MultiQC. Results of quality assessment will be evaluated prior to downstream analysis.
- Deliverables:
- Reports generated by FastQC and MultiQC.
- Tools used:
- FastQC: (Andrews 2010) used to generate quality summaries of data:
- Per base sequence quality report: useful for deciding if trimming necessary.
- Sequence duplication levels: evaluation of library complexity.
- Overrepresented sequences: evaluation of adapter contamination.
- MultiQC (https://multiqc.info/) used to aggregate FastQC, alignment, and other reports
- FastQC: (Andrews 2010) used to generate quality summaries of data:
2. Mapping
Mapping to genome reference performed using BWA.
- Deliverables:
- Mapping results, as BAM files and mapping statistics.
- Tools Used:
- BWA: (Li 2013) primary aligner used to generate read alignments.
- Samtools: (Li 2009) used to prepare BAMs and generate mapping statistics.
- In-house statistics generation scripts
4. Peak Calling
Counting the number of normalized ChIP-seq reads compared to a background control (Input or mock ChIP) to identify regions of binding enrichment.
- Deliverables:
- Peak calls as narrowPeak (BED 6+) files, containing p-value, q-value, and fold enrichment scores for each peak.
- Per-base normalized signal files as bigWigs.
- Tools Used:
- MACS2: (Zhang, 2008) used to identify and score peak regions.
- bedtools (Quinlan, 2010) used for optional blacklist filtering.
5. Significance Threshold Analysis
Statistical analysis and informed heuristics to determine appropriate significance threshhold(s) for identifying peaks for downstream analysis.
- Deliverables:
- Summary file outlining peak counts at selected levels (High, Medium, and Low stringency) and master file containing counts over a wide range of q-values and fold enrichment values. Peak count vs q-value and fold enrichment plots.
- Tools Used:
- R and in-house scripts used to produce peak count statistics and plots.