Objectives
In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
- Learn how BWA works and how to use it.
Introduction
BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It's the successor to another aligner you might have used or heard of called MAQ (Mapping and Assembly with Quality). As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.
Run BWA
Load the module:
module load bwa
There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.
Create a fresh output directory, so that we don't write over the output from bowtie. Be sure you are back in your main intro_to_mapping
directory. Then:
mkdir bwa
Try to figure out how to index and map from the command line help:
bwa
You will need to run this set of commands (with options that you should try to figure out) in this order:
bwa index bwa aln bwa samse or sampe
What's going on at each step?
Remember to use the option that enables multithreading, if there is one, for each BWA command.
First, run the index command (index
) on the reference file. This is fast, so you can run it interactively.
BWA doesn't give you a choice of where to create your index files. It creates them in the same directory as the FASTA that you input. So copy the FASTA in your intro_to_mapping
directory to your new bwa
directory:
Then, run the index command using the copied FASTA as input.
Take a look at your output directory using ls bwa
to see what new files appear after indexing.
Then, run the mapping command (aln
). Note that you need to map each set of reads in the pairs separately with BWA because of how it separates the initial mapping and the later alignment steps.
Submit to the TACC queue or run in an idev shell
Create a commands
file and use launcher_creator.py followed by qsub.
Again, take a look at your output directory using ls bwa
to see what new files have appeared. What is a *.sai file? It's a file containing "alignment seeds" in a file format specific to BWA. Many programs produce this kind of "intermediate" file in their own format and then at the end have tools for converting things to a "community" format shared by many downstream programs.
We still need to extend these seed matches into alignments of entire reads, choose the best matches, and convert the output to SAM format.
Do we use sampe
or samse
?
Submit to the TACC queue or run in an idev shell
Create a commands
file and use launcher_creator.py followed by qsub.
run BWA pipeline