Objectives
...
| Code Block | ||
|---|---|---|
| ||
cds cd my_rnaseq_course cd day_1_partB2/bwa_exercise |
Lets look at the data files and reference files
| Code Block | ||
|---|---|---|
| ||
ls ../data ls ../reference #transcriptome head ../reference/transcripts.fasta #see how many transcripts there are in the file grep -c '^>' ../reference/transcripts.fasta #genome head ../reference/genome.fa #see how many sequences there are in the file grep -c '^>' ../reference/genome.fa #annotation head ../reference/genes.formatted.gtf #see how many entries there are in this file wc -l ../reference/genes.formatted.gtf |
Load the module:Run BWA
| Code Block |
|---|
module load biocontainers
module load bwa
|
There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.
...
| Warning | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||
Create a Make sure each command is one line in your commands file.
|
Since this will take a while to run, you can look at already generated results at: bwa_mem_results_transcriptome
Alternatively, we can also use bwa to make to the genome (reference/genome.fa). Those already generated results are at: bwa_mem_results_genome
Help! I have a lots of reads and a large number of reads. Make BWA go faster!
Use threading option in the bwa command ( bwa -t <number of threads>)
- Split one data file into smaller chunks and run multiple instances of bwa. Finally concatenate the output.
- WAIT! We have a pipeline for that!
- Look for runBWA.sh in $BI/bin (it should be in your path)
Now that we are done mapping, lets look at how to assess mapping results.