Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Objectives

...

Code Block
titleGet set up for the exercises
cds
cd my_rnaseq_course
cd day_1_partB2/bwa_exercise

Lets look at the data files and reference files

Code Block
titleGet set up for the exercises
ls ../data
ls ../reference
 
#transcriptome
head ../reference/transcripts.fasta 
#see how many transcripts there are in the file
grep -c '^>' ../reference/transcripts.fasta
 
#genome
head ../reference/genome.fa
#see how many sequences there are in the file
grep -c '^>' ../reference/genome.fa
 
 
#annotation
head ../reference/genes.formatted.gtf
#see how many entries there are in this file
wc -l ../reference/genes.formatted.gtf

Load the module:Run BWA

Code Block
module load biocontainers
module load bwa

There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.

...


Warning
titleSubmit to the TACC queue or run in an idev shell


Create a commands file and use launcher_creator.py followed by sbatch.

Make sure each command is one line in your commands file.

Code Block
titlePut this in your commands file
nano commands.mem
 
bwa mem ../reference/transcripts.fasta ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq > C1_R1.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq > C1_R2.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq > C1_R3.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq > C2_R1.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq > C2_R2.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq > C2_R3.mem.sam
Expand
titleUse this Launcher_creator command

launcher_creator.py -n mem -t 04:00:00 -j commands.mem -q normal -a UT-2015-05-18 -m "module load biocontainers; module load bwa" -l bwa_mem_launcher.slurm

Expand
titleUse sbatch to submit your job to the queue

sbatch --reservation=intro_NGS BIO_DATA_week_1 bwa_mem_launcher.slurm


#or if reservation is giving us issues

sbatch bwa_mem_launcher.slurm

Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome

Alternatively, we can also use bwa to make to the genome (reference/genome.fa). Those already generated results are at: bwa_mem_results_genome

 Help! I have a lots of reads and a large number of reads. Make BWA go faster!

  • Use threading option in the bwa command ( bwa -t <number of threads>)

  • Split one data file into smaller chunks and run multiple instances of bwa. Finally concatenate the output.
    • WAIT! We have a pipeline for that!
    • Look for runBWA.sh in $BI/bin  (it should be in your path)

Now that we are done mapping, lets look at how to assess mapping results.