...
| Code Block |
|---|
cd $SCRATCH/my_rnaseq_course/day_4b cp /corral-repl/utexas/BioITeam/rnaseq_course_2015/day_4b/human_mirnaseq.sam . |
miRNA Profiling with SAMTools
Recall that, because each contig in our reference is a feature, we do not need a true annotation to quantify our miRNAs. Instead, we just need to count how many reads aligned to each contig, and sort them accordingly. Fortunately, samtools has a simple utility called idxstats that reports exactly this. The following commands will produce this information by converting the SAM file to BAM, sorting and indexing it, then running idxstats:
| Code Block |
|---|
module load samtools/0.1.19 cd $SCRATCH/my_rnaseq_course/day_4b samtools view -bS human_mirnaseq.sam > human_mirnaseq.bam samtools sort human_mirnaseq.bam human_mirnaseq.sorted samtools index human_mirnaseq.sorted.bam samtools idxstats human_mirnaseq.sorted.bam |
...
> idxstats.txt |
This will create a couple new files, but the one we care about is idxstats.txt, which contains our miRNA profile. If you wanted to look at the top ten most abundant miRNAs, then simply Unix commands will do the trick:
| Code Block |
|---|
sort -r -n -k 3 idxstats.txt | head |
This should report something like this:
| Code Block |
|---|
hsa-mir-302b 73 147605 0
hsa-mir-302a 69 17719 0
hsa-mir-21 72 14999 0
hsa-mir-302c 68 9590 0
hsa-mir-5588 63 3147 0
hsa-mir-18a 71 2884 0
hsa-mir-20a 71 2246 0
hsa-mir-16-2 81 1972 0
hsa-mir-335 94 1499 0
hsa-mir-15b 98 1466 0 |
The second column is contig length and the fourth is "unaligned reads," which is obviously zero. So, the third column gives read count and the first gives contig name.
Exercise #2: Photo-Activatable Ribonucleoside Cross-Linking and Immuno-Precipitation (PARCLIP-Seq)