Analyzing RNA-Seq data for differential gene expression
We will...
Download data files
The data files for this example are in the path:
| Code Block |
|---|
/corral-repl/utexas/BioITeam/ngs_course/listeria_RNA_seq/data
|
File Name | Description | Sample |
|---|---|---|
| Single-end Illumina 36-bp reads | wild-type, replicate 1 |
| Single-end Illumina 36-bp reads | ΔsigB mutant, replicate 1 |
| Single-end Illumina 36-bp reads | wild-type, replicate 2 |
| Single-end Illumina 36-bp reads | ΔsigB mutant, replicate 2 |
| Reference Genome | Listeria monocytogenes strain 10403S |
...
Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs
...
- RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq)
...
Install software
Bowtie2 (first choice)
...
There is an option to process paired-end data like this: %BR%
<code>$far --source datasetX_R1.fastq --source2 datasetX_R2.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger</code>
---++++ Compile and install FAR on MacOSX
Unfortunately, FAR comes only with Windows and Linux binaries. To build FAR(2.0) for MacOSX:
1 Install Apple Developer Tools
1 Install cmake: $sudo port install cmake
1 Check out code: %BR% <code>$ svn co https://theflexibleadap.svn.sourceforge.net/svnroot/theflexibleadap theflexibleada</code>
1 Compile code: %BR% <code>$ cd theflexibleada</code> %BR% <code>$ cmake CMakeLists.txt </code>
1 Copy executable and library: %BR% <code>$ cp lib/libtbb.dylib ~/local/lib</code> %BR% <code>$ cp build/* ~/local/bin </code>
1 Add these locations to your path with lines in ~/.profile: %BR% <code>export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$HOME/local/lib" %BR% export PATH="$PATH:$HOME/local/bin" </code>
---+++ Align reads to reference genome
...
And convert to BAM format (assumes single-end data): %BR%
<code>$samtools faidx REL606.fna </code> %BR%
<code>$samtools import
| Code Block |
|---|
login1$ samtools faidx REL606.fna login1$ samtools import REL606.fna datasetX.sam datasetX.unsorted.bam |
...
login1$ samtools sort datasetX.unsorted.bam |
...
datasetX login1$ samtools index datasetX.bam |
...
|
Exercise: Is this a strand-specific RNA-seq library? Try using IGV to view some of the data.
hg2. Analyze differential gene expression
- Use bedtools to count reads in features.
- Converting mapped reads to feature counts.
The data files for this example are in the path:
| Code Block |
|---|
/corral-repl/utexas/BioITeam/ngs_course/ecoli_rnaseq
|
...
File Name
...
Description
...
- .
...
Illumina reads, 0K generation individual clone from population
...
SRR032374.fastq.gz
...
Illumina reads, 20K generation mixed population
...
SRR032376.fastq.gz
...
Illumina reads, 40K generation mixed population
...
NC_012967.1.fasta.gz
...
hg3. Using DESeq
...