RNA-Seq Normalization Overview

Why Normalize?

Normalization smooths out technical variations among the samples we are comparing so that we can more confidently attribute variations we see to biological reasons. 

We usually normalize for:

  1. Sequencing depth: Say we are comparing gene counts in sample A against sample B.  If you start out with 10 million reads in sample A  vs 1 million reads in sample B, a 10 fold increase in expression in sample A is going to be purely due to its sequencing depth.
  2. Gene length: A gene that is twice as long is likely to have twice as many reads sampling it.
  3. GC content

Some Normalization Methods

RPKM: 

 

Median scaling (DESeq method):

 

TMM:

 

Quantile: