Why Normalize?
Normalization smooths out technical variations among the samples we are comparing so that we can more confidently attribute variations we see to biological reasons.
We usually normalize for:
- Sequencing depth: Say we are comparing gene counts in sample A against sample B. If you start out with 10 million reads in sample A vs 1 million reads in sample B, a 10 fold increase in expression in sample A is going to be purely due to its sequencing depth.
- Gene length: A gene that is twice as long is likely to have twice as many reads sampling it.
- GC content
Some Normalization Methods
RPKM:
Median scaling (DESeq method):
TMM:
Quantile: