This pipeline uses an annotated genome to identify differential expressed genes/transcripts. 10 hour 15 hour minimum ($470 internal$1470 internal, $600 $1860 external) per project.
1. Quality Assessment
...
- Deliverables:
- Trimmed/filtered fastq files.
- Tools Used:
- Fastx-toolkit: Used to preprocess fastq files.
- Fastq quality trimmer: Trimming reads based on quality.
- Fastq quality filter: Filtering reads based on quality.
- Cutadapt: Used to remove adaptor from reads.
- Fastx-toolkit: Used to preprocess fastq files.
3. Mapping
Mapping to transcriptome reference performed using Kallisto pseudomapper or mapping to genome reference performed using BWA-mem or TophatHISAT2.
- Deliverables:
- Mapping results, as bam files (when mapped using HISAT2) and mapping statistics.
- Tools Used:
- BWA-memKallisto: (Li 2013) primary aligner used to generate read alignments.TophatBray 2016) pseudoaligner and RNA-Seq quantification tool
- HISAT2: (Kim 20112015) aligner aligner used to generate read alignments in a splice-aware manner and identify novel junctions.
- Samtools: (Li 2009) used to generate mapping statistics.
4. Gene/Transcript Counting
Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.
- Deliverables:
- Raw gene/transcript counts
- Variance stabilized gene/transcript counts
- Tools Used:
- Kallisto: (Bray 2016) pseudoaligner and RNA-Seq quantification tool
- HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.
5. DEG Identification
Normalization and statistical testing to identify differentially expressed genes.
- Deliverables:
- DEG Summary and master file containing fold changes and p values for every gene, MA Plots.
- Tools Used:
- DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.
- DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.
5. Visualizations
Standard visualizations of the RNA-Seq data using in-house R Scripts.
Deliverables:
...
...
- Sample dendogram
- Sample-Sample correlation plot
- Pair plot: Matrix of scatter plots showing relationship of every sample metadata variable to every other variable.
- Expression heatmap with clustering of samples
- Volcano plot : Scatter plot of fold-change versus significance
- Box plots of top 10 upregulated and top 10 downregulated genes.
- PCA plot: Orthogonal transformation of the data to look at underlying structure of data.