...
| Expand |
|---|
|
| Code Block |
|---|
| cut -f 8 sc_genes.bed | sort | uniq -c |
You should see this: | Code Block |
|---|
810 Dubious
897 Uncharacterized
4896 Verified
4 Verified|silenced_gene |
If you want to further order this output listing the most abundant category first, add another sort statement: | Code Block |
|---|
| cut -f 8 sc_genes.bed | sort | uniq -c | sort -k1,1nr |
The -k 1,1nr options says to sort on the 1st field (whitespace delimited) of input, using numeric sorting, in reverse order (i.e., largest first). Which produces: | Code Block |
|---|
4896 Verified
897 Uncharacterized
809 Dubious
4 Verified|silenced_gene |
|
...
| Expand |
|---|
|
| Code Block |
|---|
| idev -p development -m 120 -A UT-2015-05-18 -N 1 -n 24 --reservation=BIO_DATA_week_1
module load biocontainers
module load samtools
module load bedtools
mkdir -p $SCRATCH/core_ngs/bedtools
cd $SCRATCH/core_ngs/bedtools
cp $CORENGS/yeast_rna/*.gff .
cp $CORENGS/yeast_rna/sc_genes.bed* .
cp $CORENGS/yeast_rna/yeast_mrna.sort.filt.bam* . |
|
| Code Block |
|---|
| language | bash |
|---|
| title | Run bedtools multicov to count BAM alignments overlapping a set of genes |
|---|
|
cd $SCRATCH/core_ngs/bedtools
module load bedtools
bedtools multicov -s -bams yeast_mrna.sort.filt.bam \
-bed sc_genes.bed > yeast_mrna_gene_counts.bed |
...
| Expand |
|---|
|
| Code Block |
|---|
| cat yeast_mrna_gene_counts.bed | awk '
BEGIN{FS="\t";sum=0;tot=0}
{if($9 > 0) { sum = sum + $9; tot = tot + 1 }}
END{print sum,"printf("%d overlapping reads in %d genes\n",tot,"genes" sum, tot) }' |
There are 1144990 overlapping reads in 6235 recordsgene annotations. |
Recall that in the yeast annotations from SGD there are 3 gene classifications: Verified, Uncharacterized and Dubious, and the Dubious ones have no experimental evidence.
...