This brief tutorial will walk you through data analysis of an RNA-seq experiment.
In this experiment, E. coli was inoculated into culture and the culture was then sampled at 4 hours and 24 hours post inoculation. The experiment was run in triplicate.
RNA was extracted from the 6 samples, fragmented, and sequenced. All sequencing runs were of the paired-end 2x100 type, so each RNA fragment is read from both ends, 100 bp from each end.
Here is a table showing the data we have:
Sample | Condition | Replicate | Sequencing Runs | Data Files |
---|---|---|---|---|
MURI_17 | 4 hr | 1 | SA13172 | MURI_17_SA13172_ATGTCA_L007 |
MURI_26 | 4 hr | 2 | SA14027 | MURI_26_SA14027_TTAGGC_L006 |
MURI_98 | 4 hr | 3 | SA14008 | MURI_98_SA14008_TTAGGC_L005, MURI_98_SA14008_TTAGGC_L006 |
MURI_21 | 24 hr | 1 | SA13172 | MURI_21_SA13172_GTGGCC_L007 |
MURI_30 | 24 hr | 2 | SA14027 | MURI_30_SA14027_CAGATC_L006 |
MURI_102 | 24 hr | 3 | SA14008, SA14032 | MURI_102_SA14008_CAGATC_L005, MURI_102_SA14008_CAGATC_L006, MURI_102_SA14032_CAGATC_L006 |
In class, we will explore and characterize the raw data. Here are some elements (programs & techniques) we may use (you will need some of these for the homework):
For your homework, you will investigate the validity of combining data files from different sequencing runs. Only a few of these questions require working at a computer keyboard, but I encourage you to work in groups to solve the entire set of questions.
- Based on what you learned about the T-test (that is, using terms associated with a T-test), explain what criteria you might use to consider it "invalid" to combine the multiple raw sequence data files from samples.
- Outline the steps needed to reduce the raw data to numbers suitable for evaluation of your criteria in question #1
- Perform the steps you outlined in #2 and tell whether or not it was valid to combine the data files
- Starting with the raw "count" data, explore the effect on PCA of NOT normalizing.