Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleAnswer

Here's one way to do it:

Code Block
languagebash
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | awk '{$2 = sprintf("0x%x", $2); print $2}' | grep -c '3$'

Filtering high-quality reads

About mapping quality

Mapping qualities are a measure of how likely a given sequence alignment to its reported location is correct. If a read's mapping quality is low (especially if it is zero, or mapQ 0 for short) the read maps to multiple locations on the genome (they are multi-hit reads), and we can't be sure whether the reported location is the correct one.

...

  • bwa aln (global alignment) and bowtie2 with default parameters (both --local and --global) report at most one location for a read that maps
    • this will be the location with the best mapping quality and alignment
    • if a read maps equally well to multiple locations, these aligners pick one location at random
      • bwa aln will always report a 0 mapping quality for these multi-hit reads
      • bowtie2 will report a low mapping quality (< 10), based on the complexity (information content) of the sequence
  • bwa mem (local alignment) can always report more than one location for a mapped read
    • for example, if one part of a read maps to one location and other part maps somewhere else (e.g. because of RNA splicing)
    • there is no way to disable reporting of secondary reads with bwa mem.
  • bowtie2 can be configured to report more than one location for a mapped read
    • the -k N option says to report up to N alignments for a read
  • most transcriptome-aware RNA-seq aligners by defaultĀ can report more than one location for a mapped read
    • e.g. tophat2, hisat2, star.

...

Filtering high-quality

...

reads

Using our yeast_pe.sort.bam file, let's do some some quality filtering.

...

Expand
titleSolution
Code Block
languagebash
titlesolution
# if needed...
cd $SCRATCH/core_ngs/alignment/samtools
module load samtools

samtools view -b -F 0x04 -f 0x2 -q 20 -o yeast_pe.sort.filt.bam yeast_pe.sort.bam

Exercise: How many records are in the filtered BAM compared to the original? How many read

Expand
titleHint

samtools view -c


Expand
titleAnswer
Code Block
languagebash
titlesolution
samtools view -c yeast_pe.sort.bam
samtools view -c yeast_pe.sort.filt.bam

There were 1184360 alignment records in the original BAM, and only 456890 in the quality-filtered BAM, around 38% of our starting reads.

Exercise: If our original BAM contained secondary reads, (0x100 = 1) how would we exclude those also?

...