Objectives
In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:
...
| Code Block |
|---|
| title | Get set up for the exercises |
|---|
|
ls ../data
ls ../reference
#transcriptome
head ../reference/transcripts.fasta
#see how many transcripts there are in the file
grep -c '^>' ../reference/transcripts.fasta
#genome
head ../reference/genome.fa
#see how many sequences there are in the file
grep -c '^>' ../reference/genome.fa
#annotation
head ../reference/genes.formatted.gtf
#see how many entries there are in this file
wc -l ../reference/genes.formatted.gtf |
Run BWA
Load the module:
| Code Block |
|---|
module load intel/17.0.4
module load bwa
|
You can see the different commands available under the bwa package from the command line help:
...
| Warning |
|---|
| title | Submit to the TACC queue or run in idev session |
|---|
|
Create a commands file and use launcher_creator.py followed by sbatch. Make sure each command is one line in your commands file. | Code Block |
|---|
| title | Put this in your commands file |
|---|
| nano commands.mem
#Enter these lines into the file
bwa mem -o C1_R1.mem.sam ../reference/transcripts.fasta ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq >
bwa mem -o C1_R1R2.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq >
bwa mem -o C1_R2R3.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq > C1_R3
bwa mem -o C2_R1.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq >
bwa mem -o C2_R1R2.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq >
bwa mem -o C2_R2R3.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq > C2_R3.mem.sam |
| Expand |
|---|
| title | Use this Launcher_creator command |
|---|
| launcher_creator.py -n mem -t 04:00:00 -j commands.mem -q normal -a UT-2015-05-18 -m "module load intel/17.0.4;module load bwa" -l bwa_mem_launcher.slurm |
| Expand |
|---|
| title | Use sbatch to submit your job to the queue |
|---|
| sbatch --reservation=RNAday2 bwa_mem_launcher.slurm
#or if reservation is giving us issues sbatch bwa_mem_launcher.slurm |
|
Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome
Alternatively, we can also use bwa to map to the genome (reference/genome.fa).
Now that we are done mapping, lets look at how to assess mapping results.