Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Analyzing RNA-Seq data for differential gene expression

We will...

Download data files

The data files for this example are in the path:

Code Block

/corral-repl/utexas/BioITeam/ngs_course/listeria_RNA_seq/data

File Name

Description

Sample

SRR034450.fastq

Single-end Illumina 36-bp reads

wild-type, replicate 1

SRR034451.fastq

Single-end Illumina 36-bp reads

ΔsigB mutant, replicate 1

SRR034452.fastq

Single-end Illumina 36-bp reads

wild-type, replicate 2

SRR034453.fastq

Single-end Illumina 36-bp reads

ΔsigB mutant, replicate 2

NC_017544.1.gbk

Reference Genome

Listeria monocytogenes strain 10403S

...

Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs

...

  • RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq)

...

Install software

Bowtie2 (first choice)

...

There is an option to process paired-end data like this: %BR%
<code>$far --source datasetX_R1.fastq --source2 datasetX_R2.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger</code>

---++++ Compile and install FAR on MacOSX

Unfortunately, FAR comes only with Windows and Linux binaries. To build FAR(2.0) for MacOSX:

1 Install Apple Developer Tools
1 Install cmake: $sudo port install cmake
1 Check out code: %BR% <code>$ svn co https://theflexibleadap.svn.sourceforge.net/svnroot/theflexibleadap theflexibleada</code>
1 Compile code: %BR% <code>$ cd theflexibleada</code> %BR% <code>$ cmake CMakeLists.txt </code>
1 Copy executable and library: %BR% <code>$ cp lib/libtbb.dylib ~/local/lib</code> %BR% <code>$ cp build/* ~/local/bin </code>
1 Add these locations to your path with lines in ~/.profile: %BR% <code>export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$HOME/local/lib" %BR% export PATH="$PATH:$HOME/local/bin" </code>

---+++ Align reads to reference genome

...

And convert to BAM format (assumes single-end data): %BR%
<code>$samtools faidx REL606.fna </code> %BR%
<code>$samtools import

Code Block

login1$ samtools faidx REL606.fna
login1$ samtools import REL606.fna datasetX.sam datasetX.unsorted.bam

...


login1$ samtools sort datasetX.unsorted.bam

...

 datasetX
login1$ samtools index datasetX.bam

...


Exercise: Is this a strand-specific RNA-seq library? Try using IGV to view some of the data.

hg2. Analyze differential gene expression

  • Use bedtools to count reads in features.
  • Converting mapped reads to feature counts.

The data files for this example are in the path:

Code Block

/corral-repl/utexas/BioITeam/ngs_course/ecoli_rnaseq

...

File Name

...

Description

...

  • .

...

Illumina reads, 0K generation individual clone from population

...

SRR032374.fastq.gz

...

Illumina reads, 20K generation mixed population

...

SRR032376.fastq.gz

...

Illumina reads, 40K generation mixed population

...

NC_012967.1.fasta.gz

...

hg3. Using DESeq

...