Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Objectives

...

Code Block
titleGet set up for the exercises
ls ../data
ls ../reference
 
#transcriptome
head ../reference/transcripts.fasta 
#see how many transcripts there are in the file
grep -c '^>' ../reference/transcripts.fasta
 
#genome
head ../reference/genome.fa
#see how many sequences there are in the file
grep -c '^>' ../reference/genome.fa
 
 
#annotation
head ../reference/genes.formatted.gtf
#see how many entries there are in this file
wc -l ../reference/genes.formatted.gtf

Load the module:Run BWA

Code Block
module load bwa

...


Warning
titleSubmit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by sbatch.

Code Block
titlePut this in your commands file
nano commands.mem
 
bwa mem ../reference/transcripts.fasta ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq > C1_R1.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq > C1_R2.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq > C1_R3.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq > C2_R1.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq > C2_R2.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq > C2_R3.mem.sam
Expand
titleUse this Launcher_creator command

launcher_creator.py -n mem -t 04:00:00 -j commands.mem -q normal -a UT-2015-05-18 -m "module load bwa" -l bwa_mem_launcher.slurm

Expand
titleUse sbatch to submit your job to the queue

sbatch --reservation=CCBB _5.23.17PM bwa_mem_launcher.slurm

Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome

Alternatively, we can also use bwa to make to the genome (reference/genome.fa). Those already generated results are at: bwa_mem_results_genome

 Help! I have a lots of reads and a large number of reads. Make BWA go faster!

  • Use threading option in the bwa command ( bwa -t <number of threads>)

  • Split one data file into smaller chunks and run multiple instances of bwa. Finally concatenate the output.
    • WAIT! We have a pipeline for that!
    • Look for runBWA.sh in $BI/bin  (it should be in your path)

Now that we are done mapping, lets look at how to assess mapping results.