...
File Name | Description | Sample |
|---|---|---|
| Single-end Illumina 36-bp reads | wild-type, biological replicate 1 |
| Single-end Illumina 36-bp reads | ? ΔsigB mutant, biological replicate 1 |
| Single-end Illumina 36-bp reads | wild-type, biological replicate 2 |
| Single-end Illumina 36-bp reads | ? ΔsigB mutant, biological replicate 2 |
| Reference Genome sequence (FASTA) | Listeria monocytogenes strain 10403S |
| Reference Genome features (GFF) | Listeria monocytogenes strain 10403S |
...
To get right to the new stuff, you can copy the mapped read BAM files and the reference sequence files that you will need from hereusing these commands:
| Code Block |
|---|
cds cp -r $BI/ngs_course/listeria_RNA_seq/mapped_data listeria_RNA_seq |
...
Many of the modules for doing statistical tests on NGS data have been written in the "R" language for statistical computing. If you're not familiar with R, then this section is likely probably going to be a bit confusing. (You might be thinking "Stop with the new languages already guys! Uncle!") To orient you, we are going to run the R command, which launches the R shell inside our terminal. Like the bash shell that we were usingnormally use, the R shell interprets commands, but now they are R commands rather than bash commands. The prompt changes from login1$ to > when you are in the R shell, to help clue you in to this fact. The R shell is inside the bash shell. So when you quit R, you will be back where you were in the bash shell.
R is the favorite language of pirates.
...
Like other languages, R can be expanded by loading modules. The R equivalent of Bioperl or Biopython is Bioconductor. Bioconductor can theoretically do things for you like convert sequences (none of us use it for that), but where it really shines is in doing statistical tests (where is it second-to-none in this list of languages). Many functions for analyzing microarray data are implemented in R, and this strength has now carried over to the analysis of RNAseq data.
Here's how you install two modules that we will need for this exercise:
| Warning |
|---|
The install commands may take several minutes to complete. You can read ahead while they run. |
| Code Block | ||
|---|---|---|
| ||
login1$ module load R
login1$ R
R version 2.14.0 (2011-10-31)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> source("http://bioconductor.org/biocLite.R")
...
> biocLite("DESeq")
...
> biocLite("edgeR")
...
> q()
Save workspace image? [y/n/c]: n
|
When you start R later, you will not need to re-intall the modules. You can load these modules them with just these commands:
...
These commands will work for any Bioconductor modules module!
Align the reads
For RNA-seq analysis we're mainly counting the reads that align well, so we choose to use bowtie. (You could also use BWA or many other mappers.)
We've done this several times before, so you should be able to come up with the full command lines if you refer back to the original lesson.
| Warning |
|---|
Be careful we are now mapping single-end reads, so you may have to look at the bowtie help to figure out how to do that! |
You will need to first build the index file, just once and in "interactive mode" is fine (it's fast, so you don't need an idev shell). Then, you will need to submit a commands file with four lines to the TACC queue.
...
| Expand | ||||||
|---|---|---|---|---|---|---|
| ||||||
Now create a
Create the launcher script and run it:
|
...