...
Expand | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||
There were 5797 "good" (non-Dubious) genes before merging and 5770 after.
Produces this histogram:
Now there are only 20 regions where more than one gene was collapsed. Clearly eliminating the Dubious ORFs helped. |
Exercise: Why did we name the merged file with the extension .txt instead of .bed? What would we need to do to convert it to a proper BED6 file?
Expand | |||||
---|---|---|---|---|---|
| |||||
The output does not follow the BED6 specification: "chrom, start, end, name, score, strand" The first 3 output columns comply with the BED3 standard (chrom, start, end), but if strand is to be included, it should be in column 6. Column 4 should be name (we'll put the collapsed gene name list there), and column 5 a score (we'll put the region count there). We can use awk to re-order the fields:
|