...
Expand |
---|
|
Output column 5 has the gene count. Code Block |
---|
| cut -f merged.sc_genes.txt | sort | uniq -c | sort -k2,2n |
Produces this histogram: Code Block |
---|
| 57506374 1
105 2
18 2 4 3
1 4
1 7 |
There are 111 regions (105 + 4 + 1 + 1) where more than one gene contributed. |
Exercise: Repeat the steps above, but first create a good.sc_genes.bed file that does not include Dubious ORFs.
Expand |
---|
|
Code Block |
---|
| cd $SCRATCH/core_ngs/bedtools
grep -v 'Dubious' sc_genes.bed > good.sc_genes.bed
sort -k1,1 -k2,2n good.sc_genes.bed > good.sc_genes.sorted.bed
bedtools merge -i good.sc_genes.sorted.bed -s -c 4,4 -o count,collapse > merged.good.sc_genes.txt
wc -l good.sc_genes.bed, merged.good.sc_genes.txt |
There were 5797 "good" (non-Dubious) genes before merging and 5770 after. Code Block |
---|
| cut -f merged.good.sc_genes.txt | sort | uniq -c | sort -k2,2n |
Produces this histogram: Code Block |
---|
| 63745750 1
105 18 2
4 3 1 4
1 7 |
Now there are only 20 regions where more than one gene was collapsed. Clearly eliminating the Dubious ORFs helped. |