Mapping with Bowtie2

Get your data

Six raw data files have been provided for all our further RNA-seq analysis:

  • c1_r1, c1_r2, c1_r3 from the first biological condition
  • c2_r1, c2_r2, and c2_r3 from the second biological condition


Get set up for the exercises
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bowtie_exercise . &
cd bowtie_exercise

Run Bowtie

Load the bowtie module

module load bowtie

What version of bowtie was loaded?

 Commands you could use to find this out...
bowtie2 --version
module list

Next, index the reference file. THIS IS ALREADY DONE! The command you need is:

bowtie2-build

Try typing this alone in the terminal and figuring out what to do from the help.

 If you're stuck...
bowtie2-build reference/genome.fa reference/genome.fa

The first argument is the reference FASTA. The second argument is the "base" file name to use for the created index files. It will create a bunch of files beginning bowtie/NC_012967.1*.

Finally, map the reads! The command you need is:

bowtie2

Try reading the help to figure out how to run the command yourself. This command takes a while. This is longer than we want to run a job on the head node (especially when all of us are doing it at once).

So, you will want to submit the full job to the cluster like you learned in the introduction.

But first, try to figure out the command and start it in interactive mode. Remember these are paired-end reads. Use control-c to stop the job once you are sure it is running without an immediate error! Then, submit your command that is working to the TACC queue.

Submit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by qsub.

 Not sure what to do...

Put this in your commands file:

bowtie2 -t reference/genome.fa -1 data/GSM794483_C1_R1_1.fq -2 data/GSM794483_C1_R1_2.fq -s C1_R1.sam

bowtie2 -t reference/genome.fa -1 data/GSM794484_C1_R2_1.fq -2 data/GSM794484_C1_R2_2.fq -s C1_R1.sam
 
and so forth

What does the -t option do?

Your final output file is in SAM format. It's just a text file, so you can peek at it and see what it's like inside. Two warnings though:

  1. SAM files can be enormously humongous text files (maybe >1 gigabytes). Attempting to open the entire file at once can cause your computer to lock up or your text editor to crash. You are generally safer only looking at a portion at a time using linux commands like head or grep or using a viewer like IGV, which we will cover later.
  2. SAM files have some rather complicated information encoded as text, like a binary encoded FLAGS field and CIGAR strings. We'll take a look at some of these later, if we have time.

Still, you should recognize some of the information on a line in a SAM file from the input FASTQ, and some of the other information is relatively straightforward to understand, like the position where the read mapped. Give this a try:

head C1_R1.sam