Objectives
In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
Introduction
BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It is an unspliced mapper. It's the successor to another aligner you might have used or heard of called MAQ (Mapping and Assembly with Quality). As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.
BWA Variants
BWA has three different algorithms:
...
- BWA-SW
- BWA-MEM : Newer! Typically faster and more accurate.
Get your data
Six raw data files have been provided for all our further RNA-seq analysis:
...
c1_r1, c1_r2, c1_r3 from the first biological condition
c2_r1, c2_r2, and c2_r3 from the second biological condition
...
Get set up for the exercises
| Code Block |
|---|
cds
cd my_rnaseq_course
cd day_2/bwa_exercise |
Lets look at the data files and reference files
...
Get set up for the exercises
| Code Block |
|---|
ls ../data
ls ../reference
#transcriptome
head ../reference/transcripts.fasta
#see how many transcripts there are in the file
grep -c '^>' ../reference/transcripts.fasta
#genome
head ../reference/genome.fa
#see how many sequences there are in the file
grep -c '^>' ../reference/genome.fa
#annotation
head ../reference/genes.formatted.gtf
#see how many entries there are in this file
wc -l ../reference/genes.formatted.gtf |
Run BWA
Load the module:
...
| Code Block |
|---|
module load biocontainers
module load bwa
|
There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.
| Expand |
|---|
| title | Here are some commands that could help... |
|---|
|
| Code Block | module spider bwa
module list
bwa
#to get the full command for running bwa from the container
type bwa |
You can see the different commands available under the bwa package from the command line help:
| Code Block |
|---|
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa
#this may need to run in an idev session since biocontainer modules cannot be run on the login nodes. |
Part 1. Create a index of your reference
...
All the files starting with the prefix transcripts.fasta are your BWA index files.
| Code Block |
|---|
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa index -a bwtsw ../reference/transcripts.fasta |
Part 2. Align the samples to reference using bwa mem
Running alignment using the newest and greatest, BWA MEM to the transcriptome. Alignment is just one single step with bwa mem.
...
...
Submit to the TACC queue or run in idev session |
Run all the bwa commands in the background (using the &) in the idev session.
...
Create a commands file and use launcher_creator.py followed by sbatch. Make sure each command is one line in your commands file. Put this in your commands file |
...
...
#Enter these lines into the file
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R1.mem.sam |
|
...
../reference/transcripts.fasta ../data/ |
|
...
...
...
...
...
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R2.mem.sam ../reference/transcripts.fasta ../data/ |
|
...
...
...
...
...
...
...
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C1_R3.mem.sam ../reference/transcripts.fasta ../data/ |
|
...
...
...
...
...
| title | Option 2: Submit to the TACC queue |
|---|
Create a commands file and use launcher_creator.py followed by sbatch.
Make sure each command is one line in your commands file.
...
| title | Put this in your commands file |
|---|
...
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R1.mem.sam |
|
...
../reference/transcripts.fasta ../data/ |
|
...
...
...
...
...
...
...
singularity exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R2.mem.sam ../reference/transcripts.fasta ../data/ |
|
...
...
...
...
...
...
...
exec ${BIOCONTAINER_DIR}/biocontainers/bwa/bwa-0.7.17--pl5.22.0_2.simg bwa mem -o C2_R3.mem.sam |
|
...
../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq |
|
...
#ctrl+X to exit nano
#Y, followed by enter to save file |
| Expand |
|---|
| title | Use this Launcher_creator command |
|---|
| launcher_creator.py -n mem -t 04:00:00 -j commands.mem -q normal -a |
|
...
OTH21164 -m "module unload xalt;module load biocontainers |
|
...
...
bwa" -l bwa_mem_launcher.slurm |
| Expand |
|---|
| title | Use sbatch to submit your job to the queue |
|---|
| sbatch --reservation= |
|
...
rna-seq-class-0610 bwa_mem_launcher.slurm #or if reservation is giving us issues sbatch bwa_mem_launcher.slurm |
|
Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome
Alternatively, we can also use bwa to
...
map to the genome (reference/genome.fa).
...
Now that we are done mapping, lets look at how to assess mapping results.