Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titleGet set up
 
cds
cd my_rnaseq_course

#If you've already copied day_3 stuff over, ignore next line
cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/day_3/cufflinks_results .
 
cd day_3/cufflinks_results

Step 1: Tophat

...

Code Block
titleCufflinks output files
#If you have a local copy:
ls -l C1_R1_clout
 
 
#If you don't have a local copy:
ls -l /corral-repl/utexas/BioITeam/rnaseq_course_2015/day_3/cufflinks_results/C1_R1_clout

-rw------- 1 daras G-801020   627673 May 17 16:58 genes.fpkm_tracking
-rw------- 1 daras G-801020  1021025 May 17 16:58 isoforms.fpkm_tracking
-rw------- 1 daras G-801020        0 May 17 16:50 skipped.gtf
-rw------- 1 daras G-801020 14784740 May 17 16:58 transcripts.gtf

 

DESCRIPTION OF TRANSCRIPTS.GTF FILE

...

Code Block
titleParsing transcripts.gtf file
The secret lies in the gene_id column.

#For counting novel entries 
grep 'CUFF' C1_R1_clout/transcripts.gtf |wc -l

5484754936
 
#For counting entries corresponding to annotated genes
grep -v 'CUFF' C1_R1_clout/transcripts.gtf |wc -l

8864488724
 
What do you think grep -v does?

...

PriorityCodeDescription
1=Complete match of intron chain
2cContained 
3jPotentially novel isoform (fragment): at least one splice junction is shared with a reference transcript 
4eSingle exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment. 
5iA transfrag falling entirely within a reference intron 
6oGeneric exonic overlap with a reference transcript 
7pPossible polymerase run-on fragment (within 2Kbases of a reference transcript) 
8rRepeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case 
9uUnknown, intergenic transcript 
10xExonic overlap with reference on the opposite strand 
11sAn intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)

...