Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Figure from: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Trapnell et al, Nature Protocols, 2013. 

Get your data

Code Block
titleGet set up
#IF YOU HAVE NOT COPIED THIS OVER ALREADY
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/day_3/cuffdiff_results .

IF YOU HAVE ALREADY COPIED IT OVER
cds
cd my_rnaseq_course
cd day_3/cuffdiff_results
 

#We need the cuffdiff results because that is the input to cummeRbund.

Introduction

CummeRbund is powerful package with many different functions. Cuffdiff produces a lot of output files and all the output files are related by sets by IDs.

Code Block
titlecuffdiff output
#If you have a local copy:
ls -l cuffdiff_results

-rwxr-x--- 1 daras G-801020  2691192 Aug 21 12:20 isoform_exp.diff  : Differential expression testing for transcripts
-rwxr-x--- 1 daras G-801020  1483520 Aug 21 12:20 gene_exp.diff     : Differential expression testing for genes
-rwxr-x--- 1 daras G-801020  1729831 Aug 21 12:20 tss_group_exp.diff: Differential expression testing for primary transcripts
-rwxr-x--- 1 daras G-801020  1369451 Aug 21 12:20 cds_exp.diff      : Differential expression testing for coding sequences
 
-rwxr-x--- 1 daras G-801020  3277177 Aug 21 12:20 isoforms.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1628659 Aug 21 12:20 genes.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1885773 Aug 21 12:20 tss_groups.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1477492 Aug 21 12:20 cds.fpkm_tracking
 
-rwxr-x--- 1 daras G-801020  1349574 Aug 21 12:20 splicing.diff  : Differential splicing tests
-rwxr-x--- 1 daras G-801020  1158560 Aug 21 12:20 promoters.diff : Differential promoter usage
-rwxr-x--- 1 daras G-801020   919690 Aug 21 12:20 cds.diff       : Differential coding output

...

Figures from http://compbio.mit.edu/cummeRbund/manual_2_0.html

...


MAKE SURE YOU ARE IN THE IDEV SESSION FOR THIS!

...

titleGet set up

...

 

A) First load R and enter R environment

Code Block
#module load R will not work for this R package because its needs the latest version of R. So, we'll load a local version.

/work/01184/daras/bin/R

 

B) DON'T RUN THIS-IT HAS ALREADY BEEN INSTALLED! Within R environment, install package cummeRbundthis is how cummeRbund was installed. 

Code Block
#DON'T RUN THIS-IT HAS ALREADY BEEN INSTALLED!
source("http://bioconductor.org/biocLite.R")
biocLite("cummeRbund")

...

Code Block
library(cummeRbund)
cuff_data <- readCufflinks('cuffdiffdiff_resultsout')

 
 

 

D) Use cummeRbund to visualize the differential expression results.

...

Code Block
titleTo pull out significantly differentially expressed genes and isoforms
#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday
gene_diff_data <- diffData(genes(cuff_data))   
#Do gene_diff_data followed by tab to see all the variables in this data object
 
sig_gene_data  <- subset(gene_diff_data, (significant ==  'yes'))

#Count how many we have
nrow(sig_gene_data)

#Below command is equivalent to looking at the isoform_exp.diff file 
 
isoform_diff_data <-diffData(isoforms(cuff_data))
sig_isoform_data <- subset(isoform_diff_data, (significant == 'yes'))

#Count how many we have
nrow(sig_isoform_data)

 
Why do we
have
so
many? Yesterday we saw a lot less.

...

Exercise 2b:    Pull out from your data, genes that are significantly differentially expressed and above log2 fold change of 1

Code Block
titleTo pull out genes
#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday
gene_diff_data <- diffData(genes(cuff_data))   
#Do gene_diff_data followed by tab to see all the variables in this data object
 
sig_gene_data  <- subset(gene_diff_data, (significant ==  'yes'))
up_gene_data  <- subset(sig_gene_data, (log2_fold_change > 1))
#How many
nrow(up_gene_data)
 
#Pause here to consider a mistake I made yesterday

 
   
down_gene_data <- subset(sig_gene_data, (log2_fold_change < -1))
nrow(up_gene_data)

...