Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition to the rate, converted to a percentage, we also output the literal percent sign ( % ). The double quotes ( " ) denote a literal string in awk, and the comma between the number and the string says to separate the two fields with whatever the default Output Field Separator (OFS) is. By default, both awk's Input Input Field Separator (IFSFS) and Output Field Separator are whitespace (any number of space characters).

So what if we want to get fancy and do all this in a one-liner command? We can use echo and backtick evaluation to put both numbers in a string, then pipe that string to awk. This time we use awk body code, and refer to the two whitespace-separated fields being passed in: total count ($1) and indel count ($2).

Code Block
languagebash
echo "`samtools view -F 0x4 -c yeast_pe.sort.bam` `samtools view -F 0x4 yeast_pe.sort.bam | cut -f 6 | grep -P '[ID]' | wc -l`" | awk '{ print 100 * $2/$1,"%" }'

Filtering

...

by location range

Sometimes you just want to examine a subset of reads in detail. Once you have a sorted and indexed BAM, you can use the coordinate filtering options of samtools view to do this. Here are some examples:

Code Block
languagebash
# if needed...
cd $SCRATCH/core_ngs/alignment/samtools
module load samtools

# count the number of reads mapped to chromosome 2 (chrII)
samtools view -c -F 0x4 yeast_pe.sort.bam chrII

# count the number of reads mapped to chromosomes 1 or M (chrI, chrM)
samtools view -c -F 0x4 yeast_pe.sort.bam chrI chrM

# count the number of reads mapped to chromosomes 1 that overlap coordinates 1000-2000
samtools view -c -F 0x4 yeast_pe.sort.bam chrI:1000-2000

# since there are only 20 reads in the chrI:1000-2000 region, examine them individually
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000

# look at a subset of field for chrI:1000-2000 reads
# 2=flags, 3=contig, 4=start, 5=mapping quality, 6=CIGAR, 9=insert size
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | cut -f 2-6,9

The code block below details some basic samtools functionality:

Filtering high quality reads

The code block below details some basic samtools functionality:

...

We have a sorted, indexed BAM file.  Now we can use other samtools functionality to filter this file and count mapped vs unmapped reads in a given region. samtools allows you to sort based on certain flags that are specified on page 4 on the sam format specification.  We'll focus on a couple, below.Image Removed


Here are three of the most useful flags to sort on. We'll be using the unmapped flag.

...