Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

File Name

Description

Sample

SRR034450.fastq

Single-end Illumina 36-bp reads

wild-type, biological replicate 1

SRR034451.fastq

Single-end Illumina 36-bp reads

ΔsigB mutant, biological replicate 1

SRR034452.fastq

Single-end Illumina 36-bp reads

wild-type, biological replicate 2

SRR034453.fastq

Single-end Illumina 36-bp reads

ΔsigB mutant, biological replicate 2

NC_017544.1.gbkfasta

Reference Genome sequence (FASTA)

Listeria monocytogenes strain 10403S

NC_017544.1.gff

Reference Genome features (GFF)

Listeria monocytogenes strain 10403S

...

We've done this several times before, so you should be able to come up with the full command lines if you refer back to the original lesson. Be careful we are now mapping single-end reads, so you may have to look at the bowtie help to figure out how to do that!

You will need to first build the index file, just once and in "interactive mode" is fine. Then, you will need to submit a commands file with four lines to the TACC queue.

...

Expand
Just give me the answer...
Just give me the answer...
Code Block
module load bowtie
bowtie-build NC_017544.1.fasta NC_017544.1

Now create a commands file that looks like this:

Code Block

---+++ Align reads to reference genome

---++++ Using bowtie2

First, index your genome so bowtie2 can map read to it: %BR%
<code>$bowtie2-build REL606.fna REL606</code> %BR%

Then, align each data set: %BR%
<code>$bowtie2 -x REL606 -U datasetX.fastq --phred33 -S REL606.sam</code> %BR%

Optionally, add the <code>--local</code> flag if your reads do not map end-to-end.

---++++ Using BWA
First, index your genome so BWA can map read to it: %BR%
<code>$bwa index REL606.fna</code> %BR%

Then, align each data set: %BR%
<code>$bwa aln REL606.fna dataset1.fastq > datasetX.sai </code> %BR%

And convert to SAM format (assumes single-end data):
<code>$bwa samse REL606.fna datasetX.sai datasetX.fastq > datasetX.sam </code> %BR%

---++ Count reads mapping to genes

<code>breseq RNASEQ -f REL606.fna -r REL606.gbk -o datasetX.count.tab datasetX.sam</code> %BR%

---++ Convert alignments to BAM

And convert to BAM format (assumes single-end data):

Code Block

login1$ samtools faidx REL606.fna
login1$ samtools import REL606.fna datasetX.sam datasetX.unsorted.bam
login1$ samtools sort datasetX.unsorted.bam datasetX
login1$ samtools index datasetX.bam

...


bowtie --sam NC_017544.1 SRR034450.fastq SRR034450.sam
bowtie --sam NC_017544.1 SRR034451.fastq SRR034451.sam
bowtie --sam NC_017544.1 SRR034452.fastq SRR034452.sam
bowtie --sam NC_017544.1 SRR034453.fastq SRR034453.sam

Create the launcher script:

Code Block

module load python
launcher_creator.py -n bowtie -e you@somewhere.com

Run it:

Code Block

qsub launcher.sge

Convert alignments to BAM

Edit your commands file so that you convert all of these files from SAM to sorted and indexed BAM.

Linux expert tip: you can string together commands all on one line, so that they are sent to the same core one after another by separating them on the line with ;.

Note the use of the variable $FILE, which means that is the only part of the line that we have to change. This is a mini-use of shell scripting.

Code Block

FILE=SRR034450 && samtools import NC_017544.1.fasta $FILE.sam $FILE.unsorted.bam && samtools sort $FILE.unsorted.bam $FILE && samtools index $FILE.bam
FILE=SRR034451 ; samtools import NC_017544.1.fasta $FILE.sam $FILE.unsorted.bam ; samtools sort $FILE.unsorted.bam $FILE ; samtools index $FILE.bam
FILE=SRR034452 ; samtools import NC_017544.1.fasta $FILE.sam $FILE.unsorted.bam ; samtools sort $FILE.unsorted.bam $FILE ; samtools index $FILE.bam
FILE=SRR034453 ; samtools import NC_017544.1.fasta $FILE.sam $FILE.unsorted.bam ; samtools sort $FILE.unsorted.bam $FILE ; samtools index $FILE.bam

Re-create the launcher script and submit this new job to the queue.:

Code Block

module load samtools
launcher_creator.py -n samtools -e you@somewhere.com
qsub launcher.sge

Optional Exercise

  • Is this a strand-specific RNA-seq library? Try using IGV to view some of the BAM file data and examine the reads mapped to each gene.

...

hg2. Analyze differential gene expression

  • Use bedtools to count reads in features.
  • Converting mapped reads to feature counts.

hg3. Using DESeq

...

titleDESeq installation

...

Count reads mapping to genes using bedtools

bedtools is a great utility for working with sequence features and mapped reads in BAM, BED, VCF, and GFF formats.

We are going to use it to count the number of reads that map to each gene in the genome. Load the module and check out the help for bedtools and the multicov specific command that we are going to use:

Code Block

module load bedtools
bedtools
bedtools multicov

The multicov command takes a feature file (GFF) and counts how many reads in. By default it counts how many reads overlap the feature on either strand, but it can be made specific with an option.

Our GFF file has a lot of redundant features that describe a gene multiple times, so we are going to trim it just to have "gene" features using grep.

Code Block

grep '^NC_017544[[:space:]]*GenBank[[:space:]]*gene' NC_017544.1.gff > NC_017544.1.genes.gff

What is this doing? It's taking all the lines that begin with (^), then "NC_017544", then any number of spaces or tabs, then "GenBank", then any number of spaces or tabs, then "gene". Use head to see the before and after.

Code Block

head -n 50 NC_017544.1.gff
head -n 50 NC_017544.1.genes.gff

In order to use the bedtools command on our data, submit this commands file to the TACC queue:

Code Block

bedtools multicov -bams SRR034450.bam SRR034451.bam SRR034452.bam SRR034453.bam -bed NC_017544.1.genes.gff > gene_counts.gff

hg2. Analyze differential gene expression (DESeq)

...

Code Block
titleUsing DESeq
login1$ R
...
> library("DESeq")
> combined = read.csv("combined.csv", header=T, row.names=1)
> design <- data.frame(
  row.names = colnames( combined ),
  condition = c( "Anc", "Anc", "EL", "EL", "EW", "EW"),
  libType = c( "single-end", "single-end", "single-end",
  "single-end", "single-end", "single-end" ) )
> design
> conds <- factor(design$condition)
> cds <- newCountDataSet( combined, conds )
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
> res <- nbinomTest( cds, "EL", "EW" )
> write.csv(res, "EL-vs-EW.csv")

...