Assessing Mapping Results I 2014
Objectives
In this lab, you will explore the mapping results generated previously. You will learn how to assess mapping results using samtools and unix commands. You will also look at the differences between mapping with BWA and mapping with Tophat2.
Get your data
To try out exercises, you'll need to access your mapping results.
Get set up for the exercises
# If using your own BWA mapped results (Sam files), they should be at:
$SCRATCH/my_rnaseq_course/bwa_exercise
# If not, you can get the results from:
/corral-repl/utexas/BioITeam/rnaseq_course/bwa_aln_results
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bwa_aln_results .Samtools
Utilities for parsing and manipulating alignment files in SAM and BAM formats. It is useful for:
Sorting alignment files
Merging multiple alignment files
Converting from SAM to BAM and vice versa.
Retrieving reads based on different criteria : reads mapping to a particular region, reads mapping with a certain quality, unmapped reads, etc.
Collecting statistics about your mapping result.
With mapping results in SAM format, we need to convert it to BAM format, sort the BAM file and create and index for the BAM file.
Exercise 1: Convert SAM file to BAM format.
samtools view syntax
samtools view -b -S samfile > bamfile
Exercise 2: Sort and index newly created BAM file.
samtools sort syntax
samtools sort bamfile sortedbamfile
samtools index sortedbamfile
WE ARE NOT GOING TO DO THIS RIGHT NOW, BUT TAKE A LOOK!
Submit to the TACC queue or run in an idev shell
Create a commands file and use launcher_creator.py followed by qsub.
Get already created results
#For bwa mem results
cp $SCRATCH/my_rnaseq_course/bwa_exercise/results/bwa_mem/C1_R1.mem.bam* .
# If not, you can get the results from:
/corral-repl/utexas/BioITeam/rnaseq_course/bwa_mem_results
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/bwa_mem_results .
Exercise 3: Let's get some statistics: Samtools flagstat
Samtools flagstat can generate useful statistics about a mapped BAM file. Let's try it on one of our samples- C1_R1. Let's look at the BAM files for this sample, generated from bwa aln/sampe and generated from bwa mem. The two files are: C1_R1.bam and C1_R1.mem.bam
samtools flagstat command
samtools flagstat C1_R1.bam
samtools flagstat C1_R1.mem.bam
Exercise 4: Let's get some statistics: Samtools idxstats
samtools flagstat command
samtools idxstats C1_R1.bam
samtools idxstats C1_R1.mem.bamRNASEQC
A java program that generates rnaseq specific metrics for an input BAM file. Not available on TACC yet. Metrics like:
• total read number, number of unique reads, and number of duplicate reads
• duplication rate (number of duplicates/total reads)
• number of reads mapped/aligned
• number of unique reads mapped
• number of reads that are mapped to rRNA regions
• intragenic rate (reads mapped to intragenic regions/mapped unique reads)
• exonic rate (reads mapped to exonic regions/mapped unique reads)
• coding rate (reads mapped to coding regions/mapped unique reads)
• intergenic rate (reads mapped to intergenic regions/mapped unique reads)
• strand specificity metrics
• coverage metrics
Images from: http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/rnaseqc/RNA-SeQC_Help_v1.1.2.pdf
BACK TO COURSE OUTLINE