In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It is an unspliced mapper. It's the successor to another aligner you might have used or heard of called MAQ (Mapping and Assembly with Quality). As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.
BWA has three different algorithms:
Six raw data files have been provided for all our further RNA-seq analysis:
cds cd my_rnaseq_course cd day_1_partB/bwa_exercise |
Lets look at the data files and reference files
ls ../data ls ../reference #transcriptome head ../reference/transcripts.fasta #see how many transcripts there are in the file grep -c '^>' ../reference/transcripts.fasta #genome head ../reference/genome.fa #see how many sequences there are in the file grep -c '^>' ../reference/genome.fa #annotation head ../reference/genes.formatted.gtf #see how many entries there are in this file wc -l ../reference/genes.formatted.gtf |
module load bwa |
There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.
|
You can see the different commands available under the bwa package from the command line help:
bwa |
Part 1. Create a index of your reference
NO NEED TO RUN THIS NOW- YOUR INDEX HAS ALREADY BEEN BUILT!
All the files starting with the prefix transcripts.fasta are your BWA index files.
bwa index -a bwtsw reference/transcripts.fasta |
Part 2. Align the samples to reference using bwa mem
Create a
|
Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome
Alternatively, we can also use bwa to make to the genome (reference/genome.fa). Those already generated results are at: bwa_mem_results_genome
Use threading option in the bwa command ( bwa -t <number of threads>)
Now that we are done mapping, lets look at how to assess mapping results.