Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
cd $SCRATCH/my_rnaseq_course/day_4b
cp /corral-repl/utexas/BioITeam/rnaseq_course_2015/day_4b/human_mirnaseq.sam .

miRNA Profiling with SAMTools

Recall that, because each contig in our reference is a feature, we do not need a true annotation to quantify our miRNAs.  Instead, we just need to count how many reads aligned to each contig, and sort them accordingly.  Fortunately, samtools has a simple utility called idxstats that reports exactly this.  The following commands will produce this information by converting the SAM file to BAM, sorting and indexing it, then running idxstats:

Code Block
module load samtools/0.1.19
cd $SCRATCH/my_rnaseq_course/day_4b
samtools view -bS human_mirnaseq.sam > human_mirnaseq.bam
samtools sort human_mirnaseq.bam human_mirnaseq.sorted
samtools index human_mirnaseq.sorted.bam
samtools idxstats human_mirnaseq.sorted.bam

 

 

 

...

 > idxstats.txt

This will create a couple new files, but the one we care about is idxstats.txt, which contains our miRNA profile.  If you wanted to look at the top ten most abundant miRNAs, then simply Unix commands will do the trick:

Code Block
sort -r -n -k 3 idxstats.txt | head

This should report something like this:

Code Block
hsa-mir-302b	73	147605	0
hsa-mir-302a	69	17719	0
hsa-mir-21	72	14999	0
hsa-mir-302c	68	9590	0
hsa-mir-5588	63	3147	0
hsa-mir-18a	71	2884	0
hsa-mir-20a	71	2246	0
hsa-mir-16-2	81	1972	0
hsa-mir-335	94	1499	0
hsa-mir-15b	98	1466	0

The second column is contig length and the fourth is "unaligned reads," which is obviously zero.  So, the third column gives read count and the first gives contig name.

Exercise #2: Photo-Activatable Ribonucleoside Cross-Linking and Immuno-Precipitation (PARCLIP-Seq)