Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The new tuxedo suite consists of three tools for transcript-level analysis of RNA-Seq data:

  • HISATHISAT2:  Spliced alignment of reads to the genome
  • Stringtie:  Assembly of transcripts based on mapping to genome, including novel transcripts. Quantification of these transcripts.
  • Ballgown: Identification of differentially expressed genes and transcripts.


We are going to run this on stampede2 as well.  Let's do this in an idev session today just so you can see how fast it is

Code Block
titleGet set up for the exercise
cds
cd my_rnaseq_course
cd day_3_partB/stringtie_ballgown 

...

-p brings up a key concept when submitting jobs on TACC: wayness. Go back to Submitting Jobs to Lonestar5Lonestar6 for more.

OUTPUT: Each stringtie command generates a new gtf file, so you will have one for each sample.

...

PriorityCodeDescription
1=Complete match of intron chain
2cContained
3jPotentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
4eSingle exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
5iA transfrag falling entirely within a reference intron
6oGeneric exonic overlap with a reference transcript
7pPossible polymerase run-on fragment (within 2Kbases of a reference transcript)
8rRepeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
9uUnknown, intergenic transcript
10xExonic overlap with reference on the opposite strand
11sAn intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors


cut -f 9 results/stringtie_merged.annotated.gtf |grep 'class_code "j"'|cut -d ';' -f 3 |head
Code Block
languagephptext
titleParse gffcompare results
Pull out top 10 genes with potentially novel (class code j) transcripts
Construct a unix command to pull out top 10 genes with potentially novel (class code j) transcripts
Hint: Use a combination of cut and grep (looking for class_code "j")


4. Redo stringtie abundance calculations using the newly merged transcripts. 

...