Page Comparison

This RNA-Seq analysis This pipeline uses an annotated genome to identify differentially expressed genes and it consists of the following steps:differential expressed genes/transcripts. 15 hour minimum ($1470 internal, $1860 external) per project.

1. Quality Assessment

Data quality assessed using industry standard tools and quality assessment Quality of data assessed by FastQC; results of quality assessment will be evaluated prior to downstream analysis.

Deliverables: Reports
- reports generated by FastQC
.

...

Tools

...

used:
- FastQC: (Andrews 2010) used to generate quality summaries of data:
  - Per base sequence quality report: useful for deciding if trimming necessary.
  - Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
  - Overrepresented sequences: evaluation of adapter contamination.

2. Fastq Preprocessing

Quality assessment used to decide if any preprocessing of the raw data is required and if so, preprocessing is performed.

Deliverables: Trimmed
- Trimmed/filtered fastq files.
Tools Used:
- Fastx-toolkit: Used to preprocess fastq files.
  - Fastq quality trimmer: Trimming reads based on quality.
  - Fastq quality filter: Filtering reads based on quality.
- Cutadapt: Used to remove adaptor from reads.

3. Mapping

Mapping to transcriptome reference performed using Kallisto pseudomapper or mapping to genome reference performed using BWA-mem or TophatHISAT2.

Deliverables:
- Mapping results, as bam files (when mapped using HISAT2) and mapping statistics.
Tools Used:
4. Gene/Transcript Counting
Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.
- Deliverables: Raw
  - Raw gene/transcript counts
  - Variance stabilized gene/transcript counts
- Tools Used:
  - Kallisto: (Bray 2016) pseudoaligner and RNA-Seq quantification tool
  - HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.
5. DEG Identification
Normalization and statistical testing to identify differentially expressed genes.
- Deliverables: DEG
  - DEG Summary and master file containing fold changes and p values for every gene
  , MA Plots
  - .
- Tools Used:
  - DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.

5. Visualizations

Standard visualizations of the RNA-Seq data using in-house R Scripts.

Deliverables:

...

Sample dendogram
Sample-Sample correlation plot
Pair plot: Matrix of scatter plots showing relationship of every sample metadata variable to every other variable.
Expression heatmap with clustering of samples
Volcano plot : Scatter plot of fold-change versus significance
Box plots of top 10 upregulated and top 10 downregulated genes.
PCA plot: Orthogonal transformation of the data to look at underlying structure of data.

Versions Compared

Old Version 9

New Version Current

Key

1. Quality Assessment

2. Fastq Preprocessing

3. Mapping

4. Gene/Transcript Counting

5. DEG Identification

5. Visualizations