...
In this lab, we'll look at how to use cummeRbund to visualize our gene expression results from cuffdiff. CummeRbund is part of the tuxedo pipeline and it is an R package that is capable of plotting the results generated by cuffdiff.
Figure from: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Trapnell et al, Nature Protocols, 2013.
Get your data
| Code Block | ||
|---|---|---|
| ||
#IF YOU HAVE NOT COPIED THIS OVER ALREADY
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/day_3_partB/cuffdiff_results .
IF YOU HAVE ALREADY COPIED IT OVER
cds
cd my_rnaseq_course
cd day_3_partB/cuffdiff_results
#We need the cuffdiff results because that is the input to cummeRbund. |
Introduction
CummeRbund is powerful package with many different functions. Cuffdiff produces a lot of output files and all the output files are related by sets by IDs.
| Code Block | ||
|---|---|---|
| ||
#If you have a local copy: ls -l cuffdiff_results -rwxr-x--- 1 daras G-801020 2691192 Aug 21 12:20 isoform_exp.diff : Differential expression testing for transcripts -rwxr-x--- 1 daras G-801020 1483520 Aug 21 12:20 gene_exp.diff : Differential expression testing for genes -rwxr-x--- 1 daras G-801020 1729831 Aug 21 12:20 tss_group_exp.diff: Differential expression testing for primary transcripts -rwxr-x--- 1 daras G-801020 1369451 Aug 21 12:20 cds_exp.diff : Differential expression testing for coding sequences -rwxr-x--- 1 daras G-801020 3277177 Aug 21 12:20 isoforms.fpkm_tracking -rwxr-x--- 1 daras G-801020 1628659 Aug 21 12:20 genes.fpkm_tracking -rwxr-x--- 1 daras G-801020 1885773 Aug 21 12:20 tss_groups.fpkm_tracking -rwxr-x--- 1 daras G-801020 1477492 Aug 21 12:20 cds.fpkm_tracking -rwxr-x--- 1 daras G-801020 1349574 Aug 21 12:20 splicing.diff : Differential splicing tests -rwxr-x--- 1 daras G-801020 1158560 Aug 21 12:20 promoters.diff : Differential promoter usage -rwxr-x--- 1 daras G-801020 919690 Aug 21 12:20 cds.diff : Differential coding output |
...
Figures from http://compbio.mit.edu/cummeRbund/manual_2_0.html
...
MAKE SURE YOU ARE IN THE IDEV SESSION FOR THIS!
...
| title | Get set up |
|---|
...
A) First load R and enter R environment
| Code Block |
|---|
#module load R will not work for this R package because its needs the latest version of R. So, we'll load a local version. /work/01184/daras/bin/R |
B) DON'T RUN THIS-IT HAS ALREADY BEEN INSTALLED! Within R environment, install package cummeRbundthis is how cummeRbund was installed.
| Code Block |
|---|
#DON'T RUN THIS-IT HAS ALREADY BEEN INSTALLED!
source("http://bioconductor.org/biocLite.R")
biocLite("cummeRbund")
|
...
| Code Block |
|---|
library(cummeRbund)
cuff_data <- readCufflinks('cuffdiffdiff_resultsout')
|
D) Use cummeRbund to visualize the differential expression results.
...
| Code Block |
|---|
pdf(file="scatterplot.pdf") csScatter(genes(cuff_data), 'C1', 'C2') dev.off() #How does this plot look? What is it telling us? |
The resultant plot is here.
Exercise 2a: Pull out from your data, significantly differentially expressed genes and isoforms.
| Code Block | ||
|---|---|---|
| ||
#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday gene_diff_data <- diffData(genes(cuff_data)) #Do gene_diff_data followed by tab to see all the variables in this data object sig_gene_data <- subset(gene_diff_data, (significant == 'yes')) #Count how many we have nrow(sig_gene_data) #Below command is equivalent to looking at the isoform_exp.diff file isoform_diff_data <-diffData(isoforms(cuff_data)) sig_isoform_data <- subset(isoform_diff_data, (significant == 'yes')) #Count how many we have nrow(sig_isoform_data) Why do we have so many? Yesterday we saw a lot less. |
Exercise 2b: Pull out from your data, genes that are significantly differentially expressed and above log2 fold change of 1
| Code Block | ||
|---|---|---|
| ||
#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday gene_diff_data <- diffData(genes(cuff_data)) #Do gene_diff_data followed by tab to see all the variables in this data object sig_gene_data <- subset(gene_diff_data, (significant == 'yes')) up_gene_data <- subset(sig_gene_data, (log2_fold_change > 1)) #How many nrow(up_gene_data) #Pause here to consider a mistake I made yesterday down_gene_data <- subset(sig_gene_data, (log2_fold_change < -1)) nrow(updown_gene_data) |
Exercise 3: For a gene, regucalcin, plot gene and isoform level expression.
| Code Block | ||
|---|---|---|
| ||
pdf(file="regucalcin.pdf") mygene1 <- getGene(cuff_data,'regucalcin') expressionBarplot(mygene1) expressionBarplot(isoforms(mygene1)) dev.off() |
The resultant plot is here.
Exercise 4: For a gene, Rala, plot gene and isoform level expression.
| Code Block | ||
|---|---|---|
| ||
pdf(file="rala.pdf") mygene2 <- getGene(cuff_data, 'Rala') expressionBarplot(mygene2) expressionBarplot(isoforms(mygene2)) dev.off() |
The figure is here.
Take cummeRbund for a spin...
...
| Expand | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
|
The resultant graph is here.
| Expand | ||||
|---|---|---|---|---|
| ||||
Use csBoxplot function on cuff_data object to generate a boxplot of gene or isoform level fpkms. |
The resultant plot is here.
Exercise 6: Visualize the significant vs non-significant genes using a volcano plot.
...
| Expand | ||||
|---|---|---|---|---|
| ||||
Use csVolcano function on cuff_data object to generate a volcano plot. |
The resultant plot is here.
Exercise 7: MORE COMPLICATED! Generate a heatmap for genes that are upregulated and significant.
...
| Expand | ||||
|---|---|---|---|---|
| ||||
Use csHeatmap function on the up_gene_data data structure we created. But its a little tricky. |
The resultant plot is here.