All these steps have already been run. We'll be spending time looking at the commands and output. Let's get set up.

Get to the data and results

cd /corral-repl/utexas/BioITeam/rnaseq_course/cufflinks_exercise

Step 1: Run tophat2

We've already gone over how this is done and looked over the results. Let's move on to step 2. 

 Step 2: Run cufflinks

cufflinks [options] <hits.bam>

Some of the important options:
-p/--num-threads 
-G/--GTF
-g/--GTF-guide 
-b/--frag-bias-correct
-u/--multi-read-correct

Look a t$BI/rnaseq_course/cufflinks_exercise/run_commands/commands.cufflinks to see how it was run.

cat $BI/rnaseq_course/cufflinks_exercise/run_commands/commands.cufflinks

Take a minute to look at the output files produced by one cufflinks run.

The important file is transcripts.gtf, which contains Tophat's assembled junctions for C1_R1.

cd $BI/rnaseq_course/cufflinks_exercise/results/C1_R1_clout
ls -l
-rw------- 1 daras G-801020   627673 May 17 16:58 genes.fpkm_tracking
-rw------- 1 daras G-801020  1021025 May 17 16:58 isoforms.fpkm_tracking
-rw------- 1 daras G-801020        0 May 17 16:50 skipped.gtf
-rw------- 1 daras G-801020 14784740 May 17 16:58 transcripts.gtf

Step 3: Merging assemblies using cuffmerge

We first create a file listing the paths of all per-sample transcripts.gtf files so far, then pass that to cuffmerge:

find . -name transcripts.gtf > assembly_list.txt
cuffmerge <assembly_list.txt>
cat $BI/rnaseq_course/cufflinks_exercise/assembly_list.txt

Take a minute to look at the output files produced by cuffmerge. The most important file is merged.gif, which contains the consensus transcriptome annotations cuffmerge has calculated.

cd $BI/rnaseq_course/cufflinks_exercise/merged_asm
ls -l

-rwxrwxr-x  1 daras G-803889  1571816 Aug 16  2012 genes.fpkm_tracking
-rwxrwxr-x  1 daras G-803889  2281319 Aug 16  2012 isoforms.fpkm_tracking
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 logs
-r-xrwxr-x  1 daras G-803889 32090408 Aug 16  2012 merged.gtf
-rwxrwxr-x  1 daras G-803889        0 Aug 16  2012 skipped.gtf
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 tmp
-rwxrwxr-x  1 daras G-803889 34844830 Aug 16  2012 transcripts.gtf

Step 4: Finding differentially expressed genes and isoforms using cuffdiff

cuffdiff [options] <merged.gtf> <sample1_rep1.bam,sample1_rep2.bam> <sample2_rep1.bam,sample2_rep2.bam>

Exercise: What does cuffdiff -b do?

-b is for enabling fragment bias correction.