Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
# if needed...
cd $SCRATCH/core_ngs/alignment/samtools
module load samtools

# count the number of reads mapped to chromosome 2 (chrII)
samtools view -c -F 0x4 yeast_pe.sort.bam chrII

# count the number of reads mapped to chromosomes 1 or M (chrI, chrM)
samtools view -c -F 0x4 yeast_pe.sort.bam chrI chrM

# count the number of reads mapped to chromosomes 1 that overlap coordinates 1000-2000
samtools view -c -F 0x4 yeast_pe.sort.bam chrI:1000-2000

# since there are only 20 reads in the chrI:1000-2000 region, examine them individually
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000

# look at a subset of field for chrI:1000-2000 reads
#   2=flags, 3=contig, 4=start, 5=mapping quality, 6=CIGAR, 9=insert size
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | cut -f 2-6,9

Since you will find yourself wanting to interpret the flags field, and it's easier to do that when it is represented as hexadecimal, let's use awk to help do that. 

Code Block
languagebash
# look at a subset of field for chrI:1000-2000 reads
#   2=flags, 3=contig, 4=start, 5=mapping quality, 6=CIGAR, 9=insert size
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | cut -f 2-6,9 | awk '{$1 = sprintf("0x%x", $1); print}'

Notes:

  • There is a sprintf (string print formatted) function in awk, just as there is in many modern programming languages.
  • We use sprintf to re-format the 1st input field (here the cut flags) in hex (%x), also adding a literal "0x" prefix to denote the base.
  • The

...

  • results of sprintf are stored back into the 1st field ($1), replacing the original value.
  • awk's print statement is then used with no other arguments to print all its input fields.

Exercise: How many of the chrI:1000-2000 alignments are from properly paired mapped reads?

Expand
titleHint 1

Properly paired reads have the 0x2 flag set (1). All our reads also have the 0x1 flag set because they are paired-end reads. Mapped reads will have the 0x4 flag cleared (0), and properly paired mapped reads will have the 0x8 flag cleared (0) as well (mate not unmapped = mate mapped). So all mapped proper pairs will end in 0x3 = binary 0011.

Expand
titleHint 2
Using '$' in a grep pattern anchors the pattern to the end of the line (only patterns matching before EOL are printed).
Expand
titleAnswer

Here's one way to do it:

Code Block
languagebash
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | awk '{$2 = sprintf("0x%x", $2); print $2}' | grep -c '3$'

Filtering high quality reads

...