Content Comparison

Objectives

In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:

...

Code Block

title	Get set up for the exercises

ls ../data
ls ../reference
 
#transcriptome
head ../reference/transcripts.fasta 
#see how many transcripts there are in the file
grep -c '^>' ../reference/transcripts.fasta
 
#genome
head ../reference/genome.fa
#see how many sequences there are in the file
grep -c '^>' ../reference/genome.fa
 
 
#annotation
head ../reference/genes.formatted.gtf
#see how many entries there are in this file
wc -l ../reference/genes.formatted.gtf

Load the module:Run BWA

Code Block
module load intel/17.0.4 module load bwa

...

Running alignment using the newest and greatest, BWA MEM to the transcriptome. Alignment is just one single step with bwa mem. Warning

title	Option 1: Run in an idev session

Run all the bwa commands in the background (using the &) in the idev session.

Code Block

title	Put this in your commands file

bwa mem ../reference/transcripts.fasta ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq > C1_R1.mem.sam & bwa mem ../reference/transcripts.fasta ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq > C1_R2.mem.sam & bwa mem ../reference/transcripts.fasta ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq > C1_R3.mem.sam & bwa mem ../reference/transcripts.fasta ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq > C2_R1.mem.sam & bwa mem ../reference/transcripts.fasta ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq > C2_R2.mem.sam & bwa mem ../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq > C2_R3.mem.sam &

Warning

title	Option 2: Submit to the TACC queue or run in idev session

Create a commands file and use launcher_creator.py followed by sbatch.

Make sure each command is one line in your commands file.

Code Block

title	Put this in your commands file

nano commands.mem
 
bwa mem ../reference/transcripts.fasta ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq > C1_R1.mem.sam
bwa mem ../reference/transcripts.fasta ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq > C1_R2.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq > C1_R3.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq > C2_R1.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq > C2_R2.mem.sam 
bwa mem ../reference/transcripts.fasta ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq > C2_R3.mem.sam

Expand

title	Use this Launcher_creator command

launcher_creator.py -n mem -t 04:00:00 -j commands.mem -q normal -a UT-2015-05-18 -m "module load biocontainersintel/17.0.1.04;module load bwa/ctr-v0.7.12_cv2bwa" -l bwa_mem_launcher.slurm

Expand

title	Use sbatch to submit your job to the queue

sbatch --reservation=BIO_DATA_week_1 bwa_mem_launcher.slurm

#or if reservation is giving us issues

sbatch bwa_mem_launcher.slurm

Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome

Alternatively, we can also use bwa to make to the genome (reference/genome.fa). Those already generated results are at: bwa_mem_results_genome

Now that we are done mapping, lets look at how to assess mapping results.

Version	Old Version 28	New Version 29
Changes made by	Dhivya Arasappan	Dhivya Arasappan
Saved on	Jun 07, 2021	Jun 07, 2021

Versions Compared

Key

Objectives

Load the module:Run BWA