Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Objectives

In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total).  The objectives of this lab is mainly to:

 

  • Learn how BWA works and how to use it.

Introduction

BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It 's the successor to another aligner you might have used or heard of called MAQ (Mapping and Assembly with Quality)is an unspliced mapper. As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.

BWA Variants

BWA has three different algorithms:

...

  • BWA-SW
  • BWA-MEM : Newer! Typically faster and more accurate.

Get your data

Six raw data files have been provided for all our further RNA-seq analysis:

  • c1_r1, c1_r2, c1_r3 from the first biological condition

  • c2_r1, c2_r2, and c2_r3 from the second biological condition

...

...

Get set up for the exercises
Code Block
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/bwa_exercise . &
cd bwa_exercisecd day_2/bwa_exercise

Lets look at the data files and reference files

Get set up for the exercises
Code Block
ls ../data
ls ../reference
 
#transcriptome
head ../reference/transcripts.fasta 
#see how many transcripts there are in the file
grep -c '^>' ../reference/transcripts.fasta
 
#genome
head ../reference/genome.fa
#see how many sequences there are in the file
grep -c '^>' ../reference/genome.fa
  
#annotation
head ../reference/genes.formatted.gtf
#see how many entries there are in this file
wc -l ../reference/genes.formatted.gtf

Run BWA

Load the module:

Code Block
module load bwa/0.7.7 

There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.

module spider bwa module list bwa
biocontainers
module load bwa

#to get the full command for running bwa from the container
type bwa
Expand
titleHere are some commands that could help...
Code Block

You can see the different commands available under the bwa package from the command line help:

Code Block
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa

...

 
#this may need to run in an idev session since biocontainer modules cannot be run on the login nodes.

Part 1. Create a index of your reference

NO NEED TO RUN THIS NOW- YOUR INDEX HAS ALREADY BEEN BUILT!

All the files starting with the prefix transcripts.fasta are your BWA index files.

Code Block
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa index -a bwtsw ../reference/genometranscripts.fa
fasta

Part 2a2. Align the samples to reference using bwa aln/samse/sampe

We will be using this set of commands (with options that you should try to figure out) in this order, on each sample:

    
    bwa aln
    bwa samse or sampe

Let's submit the bwa aln job

...

mem

 Running alignment using the newest and greatest, BWA MEM to the transcriptome. Alignment is just one single step with bwa mem.

an shell

nano commands.bwa

 

Make sure each command is one line in your commands file.

Put this in your commands file
:

bwa aln -f GSM794483_C1_R1_1.sai reference/genome.fa data/GSM794483_C1_R1_1.fq

bwa aln -f GSM794483_C1_R1_2.sai reference/genome.fa data/GSM794483_C1_R1_2.fq

bwa aln -f GSM794484_C1_R2_1.sai reference/genome.fa data/GSM794484_C1_R2_1.fq

bwa aln -f GSM794484_C1_R2_2.sai reference/genome.fa data/GSM794484_C1_R2_2.fq

bwa aln -f GSM794485_C1_R3_1.sai reference/genome.fa data/GSM794485_C1_R3_1.fq

bwa aln -f GSM794485_C1_R3_2.sai reference/genome.fa data/GSM794485_C1_R3_2.fq

bwa aln -f GSM794486_C2_R1_1.sai reference/genome.fa data/GSM794486_C2_R1_1.fq

bwa aln -f GSM794486_C2_R1_2.sai reference/genome.fa data/GSM794486_C2_R1_2.fq

bwa aln -f GSM794487_C2_R2_1.sai reference/genome.fa data/GSM794487_C2_R2_1.fq

bwa aln -f GSM794487_C2_R2_2.sai reference/genome.fa data/GSM794487_C2_R2_2.fq

bwa aln -f GSM794488_C2_R3_1.sai reference/genome.fa data/GSM794488_C2_R3_1.fq

bwa aln -f GSM794488_C2_R3_2.sai reference/genome.fa data/GSM794488_C2_R3_2.fq

 

Warning

Submit to the TACC queue or run in

idev

session

Create a commands file and use launcher_creator.py followed by sbatch.

Expand
Expand
titleUse this Launcher_creator command

launcher_creator.py -n aln -t 04:00:00 -j commands.bwa -q normal -a UT-2015-05-18 -m "module load bwa/0.7.7" -l bwa_launcher.slurm

 *.sai file is a file containing "alignment seeds" in a file format specific to BWA.  We still need to extend these seed matches into alignments of entire reads, choose the best matches, and convert the output to SAM format. Do we use sampe or samse?

Lets submit the bwa sampe job, but have it be on hold till previous job is finished.

nano commands.bwa.sampe

 

Put this in your commands file:

bwa sampe -f C1_R1.sam reference/genome.fa GSM794483_C1_R1_1.sai GSM794483_C1_R1_2.sai data/GSM794483_C1_R1_1.fq data/GSM794483_C1_R1_2.fq

bwa sampe -f C1_R2.sam reference/genome.fa GSM794484_C1_R2_1.sai GSM794484_C1_R2_2.sai data/GSM794484_C1_R2_1.fq data/GSM794484_C1_R2_2.fq

bwa sampe -f C1_R3.sam reference/genome.fa GSM794485_C1_R3_1.sai GSM794485_C1_R3_2.sai
Warning
titleSubmit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by sbatch.

Expand
titleI need some help figuring out the options...
Code Block
nano commands.mem
 
#Enter these lines into the file
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R1.mem.sam ../reference/transcripts.fasta ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq 
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R2.mem.sam ../reference/transcripts.fasta ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq 
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R3.mem.sam ../reference/transcripts.fasta ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq

bwa sampe -f C2_R1.sam reference/genome.fa GSM794486_C2_R1_1.sai GSM794486_C2_R1_2.sai data/GSM794486_C2_R1_1.fq data/GSM794486_C2_R1_2.fq

bwa sampe -f C2_R2.sam reference/genome.fa GSM794487_C2_R2_1.sai GSM794487_C2_R2_2.sai data/GSM794487_C2_R2_1.fq data/GSM794487_C2_R2_2.fq

bwa sampe -f C2_R3.sam reference/genome.fa GSM794488_C2_R3_1.sai GSM794488_C2_R3_2.sai data/GSM794488_C2_R3_1.fq data/GSM794488_C2_R3_2.fq

Expand
titleUse this Launcher_creator command

launcher_creator.py -n sampe -t 04:00:00 -j commands.bwa.sampe -q normal -a UT-2015-05-18 -m "module load bwa/0.7.7" -l bwa_sampe_launcher.slurm

sbatch --dependency=afterok:<aln-job-ID> bwa_sampe_launcher.slurm

Part 2b. Align the samples to reference using bwa mem

Alternatively, lets also try running alignment using the newest and greatest, BWA MEM. Alignment is just one single step with bwa mem.

 

...

titleSubmit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by sbatch.

...

titleI need some help figuring out the options...

Put this in your commands file:

...

 
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R1.mem.sam ../reference/transcripts.fasta ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq 
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R2.mem.sam ../reference/transcripts.fasta ../data/GSM794487_C2_R2_1.fq ../data/

...

GSM794487_

...

C2_R2_2.fq 

...

 

...


singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R3.mem.sam ../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq

...

 

#ctrl+X to exit nano
#Y, followed by enter to save file
Expand
titleUse this Launcher_creator command

launcher_creator.py -n mem -t 04:00:00 -j commands.mem -q normal -a

...

OTH21164 -m "module unload xalt;module load

...

biocontainers;module load bwa" -l bwa_mem_launcher.slurm

Expand
titleUse sbatch to submit your job to the queue

sbatch --reservation=rna-seq-class-0610 bwa_mem_launcher.slurm

#or if reservation is giving us issues

sbatch bwa_mem_launcher.slurm

Since

...

this will take a while to run, you can look at already generated results at:

...

bwa_mem_results

...

 Help! I have a lots of reads and a large number of reads. Make BWA go faster!

...

Use threading option in the bwa command ( bwa -t <number of threads>)

...

_transcriptome

Alternatively, we can also use bwa to map to the genome (reference/genome.fa).

Now that we are done mapping, lets look at how to assess mapping results.