Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleAnswer
Code Block
languagebash
grep -v 'Dubious' yeast_mrna_gene_counts.bed | awk '
 BEGIN{FS="\t";sum=0;tot=0}
 {if($9 > 0) { sum = sum + $9; tot = tot + 1 }}
 END{print sum,"overlapping reads in",tot,"records"}'

There are 1093140 overlapping reads in 5578 records

...

Use bedtools merge to collapse overlapping annotations

One issue that often arises when dealing with BED regions is that they can overlap one another. For example, on the yeast genome, which has very few non-coding areas, there are some overlapping ORFs (Open Reading Frames), especially Dubious ORFs that overlap Verified or Uncharacterized ones. When bedtools looks for overlaps, it will count a read that overlaps any of those overlapping ORFs – so some reads can be counted twice.

One way to avoid this double-counting is to collapse the overlapping regions into a merged set of non-overlapping regions – and that's what the bedtools merge utility does (http://bedtools.readthedocs.io/en/latest/content/tools/merge.html).

Here we're going to use bedtools merge to collapse our gene annotations into a non-overlapping set, first for all genes, then for only non-Dubious genes.