Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleAnswer
Code Block
languagebash
cut -f 8 sc_genes.bed | sort | uniq -c

You should see this:

Code Block
    810 Dubious
    897 Uncharacterized
   4896 Verified
      4 Verified|silenced_gene

If you want to further order this output listing the most abundant category first, add another sort statement:

Code Block
languagebash
cut -f 8 sc_genes.bed | sort | uniq -c | sort -k1,1nr

The -k 1,1nr options says to sort on the 1st field (whitespace delimited) of input, using numeric sorting, in reverse order (i.e., largest first). Which produces:

Code Block
   4896 Verified
    897 Uncharacterized
    809 Dubious
      4 Verified|silenced_gene

...

Expand
titleSetup (if needed)
Code Block
languagebash
idev -p development -m 120 -A UT-2015-05-18 -N 1 -n 24 --reservation=BIO_DATA_week_1
module load biocontainers
module load samtools
module load bedtools

mkdir -p $SCRATCH/core_ngs/bedtools
cd $SCRATCH/core_ngs/bedtools
cp $CORENGS/yeast_rna/*.gff .
cp $CORENGS/yeast_rna/sc_genes.bed* .
cp $CORENGS/yeast_rna/yeast_mrna.sort.filt.bam* .
Code Block
languagebash
titleRun bedtools multicov to count BAM alignments overlapping a set of genes
cd $SCRATCH/core_ngs/bedtools
module load bedtools
bedtools multicov -s -bams yeast_mrna.sort.filt.bam \
  -bed sc_genes.bed > yeast_mrna_gene_counts.bed

...

Expand
titleAnswer
Code Block
languagebash
cat yeast_mrna_gene_counts.bed | awk '
 BEGIN{FS="\t";sum=0;tot=0}
 {if($9 > 0) { sum = sum + $9; tot = tot + 1 }}
 END{print sum,"printf("%d overlapping reads in %d genes\n",tot,"genes" sum, tot) }'

There are 1144990 overlapping reads in 6235 recordsgene annotations.

Recall that in the yeast annotations from SGD there are 3 gene classifications: Verified, Uncharacterized and Dubious, and the Dubious ones have no experimental evidence.

...