...
R script: WGCNAshortTutorial.R
Code Block | ||
---|---|---|
|
...
Warning |
---|
When following along here, please switch to your idev session for running these example commands. |
If you have not requested an idev session, do so now:
Code Block | ||
---|---|---|
| ||
ssh <username>@stampede2.tacc.utexas.edu
idev -m 120 -q normal -A UT-2015-05-18 -r RNAday4
|
Code Block | ||
---|---|---|
| ||
#We will be doing all this in the idev session
cds
cd my_rnaseq_course/day_4_partA/wgcna |
...
cds
cd my_rnaseq_course/day_4_partA/wgcna |
Explanation of sample dataset: Time series of coral larval development from 4 hours post fertilization (Day 0) to 245 hours post fertilization (Day 12). Multiple other quantitative traits were measured through the time series. Only green and red fluorescence are added as quantitative traits in the sample dataset. Dataset has 48 samples total, four replicates (A-D) over 12 days. The goal is to find genes that correlate with developmental traits through time and differences in gene expression between early larval development and late larval development.
The complete R script has been provided for you, so you run it using R CMD BATCH. Or you can open up an R prompt and run key pieces of it by copy-pasting bits of code from below. This is to understand what the code is actually doing.
TRAIT DATA FILE: Traits_23May2015.csv
Code Block | ||
---|---|---|
| ||
module load intel/18.0.2
module load Rstats/4.0.3 |
...
title | Load biocontainer R modules |
---|
...
-pasting bits of code from below. This is to understand what the code is actually doing.
TRAIT DATA FILE: Traits_23May2015.csv
While WGCNA and flashClust (the two libraries needed) are available as modules on lonestar6 (as part of biocontainers) and seem to use the same version of R (a pretty old version), they do not seem to play well with each other. So, we are not able to run the WGCNA script right now on lonestar6. But, I've included the steps for reference.
Code Block | ||
---|---|---|
| ||
module load biocontainers module load r-flashclust/ctr-1.01_2--r3.3.2_0 module load r-wgcna/ctr-1.51--r3.3.2_1 | ||
Code Block | ||
| ||
0
#run the R script
R CMD BATCH WGCNAshortTutorial.R |
Below are the details of the WGCNA R script:
Step 1: upload data into R and reformat for WGCNA (This is all run under the R console)
Code Block | ||
---|---|---|
| ||
# Only run the following commands once to install flashClust if needed #install.packages("flashClust") # Load WGCNA and flashClust libraries every time you open R library(WGCNA) library(flashClust) # Uploading data into R and formatting it for WGCNA # This creates an object called "datExpr" that contains the normalized counts file output from DESeq2 datExpr = read.csv("SampleTimeSeriesRLD.csv") # "head" the file to preview it head(datExpr) # You see that genes are listed in a column named "X" and samples are in columns # Manipulate file so it matches the format WGCNA needs row.names(datExpr) = datExpr$X datExpr$X = NULL datExpr = as.data.frame(t(datExpr)) # now samples are rows and genes are columns dim(datExpr) # 48 samples and 1000 genes (you will have many more genes in reality) # Run this to check if there are gene outliers gsg = goodSamplesGenes(datExpr, verbose = 3) gsg$allOK #If the last statement returns TRUE, all genes have passed the cuts. If not, we remove the offending genes and samples from the data with the following: #if (!gsg$allOK) # {if (sum(!gsg$goodGenes)>0) # printFlush(paste("Removing genes:", paste(names(datExpr)[!gsg$goodGenes], collapse= ", "))); # if (sum(!gsg$goodSamples)>0) # printFlush(paste("Removing samples:", paste(rownames(datExpr)[!gsg$goodSamples], collapse=", "))) # datExpr= datExpr[gsg$goodSamples, gsg$goodGenes] # } #Create an object called "datTraits" that contains your trait data datTraits = read.csv("Traits_23May2015.csv") head(datTraits) #form a data frame analogous to expression data that will hold the clinical traits. rownames(datTraits) = datTraits$Sample datTraits$Sample = NULL table(rownames(datTraits)==rownames(datExpr)) #should return TRUE if datasets align correctly, otherwise your names are out of order head(datTraits) # You have finished uploading and formatting expression and trait data # Expression data is in datExpr, corresponding traits are datTraits save(datExpr, datTraits, file="SamplesAndTraits.RData") #load("SamplesAndTraits.RData") |
...