...
Expand | |||||
---|---|---|---|---|---|
| |||||
Here's one way to do it:
|
Filtering high-quality reads
About mapping quality
Mapping qualities are a measure of how likely a given sequence alignment to its reported location is correct. If a read's mapping quality is low (especially if it is zero, or mapQ 0 for short) the read maps to multiple locations on the genome (they are multi-hit reads), and we can't be sure whether the reported location is the correct one.
...
- bwa aln (global alignment) and bowtie2 with default parameters (both --local and --global) report at most one location for a read that maps
- this will be the location with the best mapping quality and alignment
- if a read maps equally well to multiple locations, these aligners pick one location at random
- bwa aln will always report a 0 mapping quality for these multi-hit reads
- bowtie2 will report a low mapping quality (< 10), based on the complexity (information content) of the sequence
- bwa mem (local alignment) can always report more than one location for a mapped read
- for example, if one part of a read maps to one location and other part maps somewhere else (e.g. because of RNA splicing)
- there is no way to disable reporting of secondary reads with bwa mem.
- bowtie2 can be configured to report more than one location for a mapped read
- the -k N option says to report up to N alignments for a read
- most transcriptome-aware RNA-seq aligners by defaultĀ can report more than one location for a mapped read
- e.g. tophat2, hisat2, star.
...
Filtering high-quality
...
reads
Using our yeast_pe.sort.bam file, let's do some some quality filtering.
...
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
|
Exercise: How many records are in the filtered BAM compared to the original? How many read
Expand | ||
---|---|---|
| ||
samtools view -c |
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
There were 1184360 alignment records in the original BAM, and only 456890 in the quality-filtered BAM, around 38% of our starting reads. |
Exercise: If our original BAM contained secondary reads, (0x100 = 1) how would we exclude those also?
...