All these steps have already been run. We'll be spending time looking at the commands and output. Let's get set up.

Get to the results

cds
cd my_rnaseq_course
cp /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results .

Step 1: Tophat

We've already gone over how tophat results look here so let's move on to step 2. 

 Step 2: Cufflinks

HOW WAS IT RUN?

#If you've copied it over successfully:
cat run_commands/commands.cufflinks
 
#If you don't have a local copy, you can read it from the source: 
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/commands.cufflinks


HOW DOES THE OUTPUT LOOK?

 Take a look at output for one of our samples, C1_R1. The important file is transcripts.gtf, which contains Tophat's assembled junctions for C1_R1.

#If you have a local copy:
ls -l C1_R1_clout
 
 
#If you don't have a local copy:
ls -l /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/C1_R1_clout

-rw------- 1 daras G-801020   627673 May 17 16:58 genes.fpkm_tracking
-rw------- 1 daras G-801020  1021025 May 17 16:58 isoforms.fpkm_tracking
-rw------- 1 daras G-801020        0 May 17 16:50 skipped.gtf
-rw------- 1 daras G-801020 14784740 May 17 16:58 transcripts.gtf

Step 3: Cuffmerge

HOW WAS IT RUN?

We first created a file listing the paths of all per-sample transcripts.gtf files so far, then pass that to cuffmerge:

find . -name transcripts.gtf > assembly_list.txt
 
#If you have a local copy:
cat assembly_list.txt

#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/assembly_list.txt
cat run_commands/commands.cuffmerge

 
     cuffmerge  -g reference/genes.exons.gtf  assembly_list.txt

 

HOW DOES THE OUTPUT LOOK?

The most important file is merged.gif, which contains the consensus transcriptome annotations cuffmerge has calculated.

#If you have a local copy:
ls -l merged_asm
 
#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/merged_asm

-rwxrwxr-x  1 daras G-803889  1571816 Aug 16  2012 genes.fpkm_tracking
-rwxrwxr-x  1 daras G-803889  2281319 Aug 16  2012 isoforms.fpkm_tracking
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 logs
-r-xrwxr-x  1 daras G-803889 32090408 Aug 16  2012 merged.gtf
-rwxrwxr-x  1 daras G-803889        0 Aug 16  2012 skipped.gtf
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 tmp
-rwxrwxr-x  1 daras G-803889 34844830 Aug 16  2012 transcripts.gtf

Step 4: Cuffdiff

HOW WAS IT RUN?

#If you have a local copy:
 cat run_commands/commands.cuffdiff
 
cuffdiff -o diff_out -b reference/genome.fa -p 8 -L C1,C2 -u merged_asm/merged.gtf C1_R1_thout/accepted_hits.bam,C1_R2_thout/accepted_hits.bam,C1_R3_thout/accepted_hits.bam C2_R1_thout/accepted_hits.bam,C2_R2_thout/accepted_hits.bam,C2_R3_thout/accepted_hits.bam
 
#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/commands.cuffdiff


HOW DOES THE OUTPUT LOOK?
#If you have a local copy:
ls -l diff_out
 
#If you don't have a local copy:
cat /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_results/diff_out

-rwxr-x--- 1 daras G-801020  2691192 Aug 21 12:20 isoform_exp.diff  : Differential expression testing for transcripts
-rwxr-x--- 1 daras G-801020  1483520 Aug 21 12:20 gene_exp.diff     : Differential expression testing for genes
-rwxr-x--- 1 daras G-801020  1729831 Aug 21 12:20 tss_group_exp.diff: Differential expression testing for primary transcripts
-rwxr-x--- 1 daras G-801020  1369451 Aug 21 12:20 cds_exp.diff      : Differential expression testing for coding sequences
 
-rwxr-x--- 1 daras G-801020  3277177 Aug 21 12:20 isoforms.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1628659 Aug 21 12:20 genes.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1885773 Aug 21 12:20 tss_groups.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1477492 Aug 21 12:20 cds.fpkm_tracking
 
-rwxr-x--- 1 daras G-801020  1349574 Aug 21 12:20 splicing.diff  : Differential splicing tests
-rwxr-x--- 1 daras G-801020  1158560 Aug 21 12:20 promoters.diff : Differential promoter usage
-rwxr-x--- 1 daras G-801020   919690 Aug 21 12:20 cds.diff       : Differential coding output


PARSING CUFFDIFF OUTPUT

Here is a basic command useful for parsing/sorting the gene_exp.diff or isoform_exp.diff files:

cat diff_out/isoform_exp.diff | awk '{print $10 "\t" $4}' | sort -n -r | head

Exercise 1: Find the 10 most up-regulated genes, by fold change that are classified as significantly changed. 

cat diff_out/gene_exp.diff |grep 'yes'|sort -k10nr,10|head

 
#Lets pull out just gene names
cat diff_out/gene_exp.diff |grep 'yes'|sort -k10nr,10|head|cut -f 3

Top 10 upregulated genes

scf

crc

KdelR

Nacalpha

CG8979

CG4389

Vha68-2

regucalcin

CG3835

CG1746


Exercise 2: Find the 10 most down-regulated genes, by fold change that are classified as significantly changed. 

cat diff_out/gene_exp.diff |grep 'yes'|sort -k10n,10|head|cut -f 3

Top 10 downregulated genes

CG17065

Git

CG18519

del

CG13319

RNaseX25

msb1l

achi

CG15611

CG11309


 

Exercise 3: Find the 10 most up-regulated isoforms, by fold change that are classified as significantly changed. What genes do they belong to?

cat diff_out/isoform_exp.diff |grep 'yes'|sort -k10nr,10|head

 
#To pull out gene names:
cat diff_out/isoform_exp.diff |grep 'yes'|sort -k10nr,10|head 

sick

mub

CG5021

akirin

CG4587

c(3)G

Ank2

CG1674

CrebB-17A

stai