...
In this lab, we'll look at how to use cummeRbund to visualize our gene expression results from cuffdiff. CummeRbund is part of the tuxedo pipeline and it is an R package that is capable of plotting the results generated by cuffdiff.
Figure from: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Trapnell et al, Nature Protocols, 2013. Get your data
| Code Block | ||
|---|---|---|
| ||
#IF YOU HAVE NOT COPIED THIS OVER ALREADY cds cd my_rnaseq_course cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/day_3_partB/cuffdiff_results . IF YOU HAVE ALREADY COPIED IT OVER cds cd my_rnaseq_course cd day_3_partB/cuffdiff_results #We need the cuffdiff results because that is the input to cummeRbund. |
Introduction
CummeRbund is powerful package with many different functions. Cuffdiff produces a lot of output files and all the output files are related by sets by IDs.
| Code Block | ||
|---|---|---|
| ||
#If you have a local copy:
ls -l cuffdiff_results
-rwxr-x--- 1 daras G-801020 2691192 Aug 21 12:20 isoform_exp.diff : Differential expression testing for transcripts
-rwxr-x--- 1 daras G-801020 1483520 Aug 21 12:20 gene_exp.diff : Differential expression testing for genes
-rwxr-x--- 1 daras G-801020 1729831 Aug 21 12:20 tss_group_exp.diff: Differential expression testing for primary transcripts
-rwxr-x--- 1 daras G-801020 1369451 Aug 21 12:20 cds_exp.diff : Differential expression testing for coding sequences
-rwxr-x--- 1 daras G-801020 3277177 Aug 21 12:20 isoforms.fpkm_tracking
-rwxr-x--- 1 daras G-801020 1628659 Aug 21 12:20 genes.fpkm_tracking
-rwxr-x--- 1 daras G-801020 1885773 Aug 21 12:20 tss_groups.fpkm_tracking
-rwxr-x--- 1 daras G-801020 1477492 Aug 21 12:20 cds.fpkm_tracking
-rwxr-x--- 1 daras G-801020 1349574 Aug 21 12:20 splicing.diff : Differential splicing tests
-rwxr-x--- 1 daras G-801020 1158560 Aug 21 12:20 promoters.diff : Differential promoter usage
-rwxr-x--- 1 daras G-801020 919690 Aug 21 12:20 cds.diff : Differential coding output |
...
Figures from http://compbio.mit.edu/cummeRbund/manual_2_0.html
...
MAKE SURE YOU ARE IN THE IDEV SESSION FOR THIS!
...
...
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/diff_results .
cd diff_results
#We need the cuffdiff results because that is the input to cummeRbund.
A) First load R and enter R environment
| Code Block |
|---|
#module load R will not work for this R package because its needs the latest version of R. So, we'll load a local version. $BI /work/01184/daras/bin/R What is $BI here? |
B) DON'T RUN THIS-IT HAS ALREADY BEEN INSTALLED! Within R environment, install package cummeRbundthis is how cummeRbund was installed.
| Code Block |
|---|
#DON'T RUN THIS-IT HAS ALREADY BEEN INSTALLED!
source("http://bioconductor.org/biocLite.R")
biocLite("cummeRbund")
|
...
| Code Block |
|---|
library(cummeRbund)
cuff_data <- readCufflinks('cuffdiffdiff_resultsout')
|
D) Use cummeRbund to visualize the differential expression results.
...
| Code Block |
|---|
pdf(file="scatterplot.pdf") csScatter(genes(cuff_data), 'C1', 'C2') dev.off() #How does this plot look? What is it telling us? |
The resultant plot is here.
Exercise 2a: Pull out from your data, significantly differentially expressed genes and isoforms.
| Code Block | ||
|---|---|---|
| ||
#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday gene_diff_data <- diffData(genes(cuff_data)) #Do gene_diff_data followed by tab to see all the variables in this data object sig_gene_data <- subset(gene_diff_data, (significant == 'yes')) #Count how many we have nrow(sig_gene_data) #Below command is equivalent to looking at the isoform_exp.diff file isoform_diff_data <-diffData(isoforms(cuff_data)) sig_isoform_data <- subset(isoform_diff_data, (significant == 'yes')) #Count how many we have nrow(sig_isoform_data) Why do we have so many? Yesterday we saw a lot less. |
...
Exercise 2b: Pull out from your data, genes that are significantly differentially expressed and above log2 fold change of 1
| Code Block | ||
|---|---|---|
| ||
#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday gene_diff_data <- diffData(genes(cuff_data)) #Do gene_diff_data followed by tab to see all the variables in this data object sig_gene_data <- subset(gene_diff_data, (significant == 'yes')) up_gene_data <- subset(sig_gene_data, (log2_fold_change > 1)) #How many nrow(up_gene_data) #Pause here to consider a mistake I made yesterday down_gene_data <- subset(sig_gene_data, (log2_fold_change < -1)) nrow(updown_gene_data) |
Exercise 3: For a gene, regucalcin, plot gene and isoform level expression.
| Code Block | ||
|---|---|---|
| ||
pdf(file="regucalcin.pdf") mygene1 <- getGene(cuff_data,'regucalcin') expressionBarplot(mygene1) expressionBarplot(isoforms(mygene1)) dev.off() |
The resultant plot is here.
Exercise 4: For a gene, Rala, plot gene and isoform level expression.
| Code Block | ||
|---|---|---|
| ||
pdf(file="rala.pdf") mygene2 <- getGene(cuff_data, 'Rala') expressionBarplot(mygene2) expressionBarplot(isoforms(mygene2)) dev.off() |
The figure is here.
Take cummeRbund for a spin...
...
| Expand | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
|
The resultant graph is here.
| Expand | ||||
|---|---|---|---|---|
| ||||
Use csBoxplot function on cuff_data object to generate a boxplot of gene or isoform level fpkms. |
The resultant plot is here.
Exercise 6: Visualize the significant vs non-significant genes using a volcano plot.
...
| Expand | ||||
|---|---|---|---|---|
| ||||
Use csVolcano function on cuff_data object to generate a volcano plot. |
The resultant plot is here.
Exercise 7: MORE COMPLICATED! Generate a heatmap for genes that are upregulated and significant.
...
| Expand | ||||
|---|---|---|---|---|
| ||||
Use csHeatmap function on the up_gene_data data structure we created. But its a little tricky. |
The resultant plot is here.