Content Comparison

...

Now, we are going to load the GFF file straight into R, remove the columns we don't want, name the columns and rows like R expects, and write out this file. You could do this in any other scripting language, or even Excel. We will write out the first few lines of the file at each step, so that you can see what the command is doing.

vignette("DEseq")Do not copy the > characters in the below example. They indicate are the R prompt to remind you which commands you are running the inside of R!

Code Block

title	Using DESeq

login1$ R
...
> library("DESeq")
> counts = read.table("gene_counts.tab", header=F, row.names=1, sep="\t")
> head(counts)
> colnames(counts) = c("wt1", "mut1", "wt2", "mut2")
> head(counts)
> my.design <- data.frame(
  row.names = colnames( counts ),
  condition = c( "wt", "mut", "wt", "mut"),
  libType = c( "single-end", "single-end", "single-end", "single-end" ) 
)
> conds <- factor(my.design$condition)

> cds <- newCountDataSet( counts, conds )
> summary(cds)

> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )

> cds <- estimateDispersions( cds )

> pdf("dispersion_estimates.pdf")
> plotDispEstsplot(
cds ) > dev.off()

> result <- nbinomTest( cds, "wt", "mut" )
> head(result)
> result = result[order(result$pval), ]

> write.csv(result, "wt-vs-mut.csv")

> pdf("MA-plot")
> plot(
  result$baseMean,
  result$log2FoldChange,
  log="x", pch=20, cex=.3,
  col = ifelse( result$padj < .1, "red", "black" ) )
> dev.off()

> q()
Save workspace image? [y/n/c]: n
login1$ head wt-vs-mut.csv
rowMeans( counts( cds, normalized=TRUE ) ),
  fitInfo(cds)$perGeneDispEsts,
  pch = '.', log="xy" )
  xg <- 10^seq( -.5, 5, length.out=300 )
  lines( xg, fitInfo(cds)$dispFun( xg ), col="red" )
)
> dev.off()

> result <- nbinomTest( cds, "wt", "mut" )
> head(result)

> result = result[order(result$pval), ]
> head(result)

> write.csv(result, "wt-vs-mut.csv")

> pdf("MA-plot.pdf")
> plot(
  result$baseMean,
  result$log2FoldChange,
  log="x", pch=20, cex=.3,
  col = ifelse( result$padj < .1, "red", "black" ) )
> dev.off()

> q()
Save workspace image? [y/n/c]: n
login1$ head wt-vs-mut.csv

wt-vs-mut.csv is a comma-delimited file that could be reloaded into R or viewed in Excel.

You should copy the two *.pdf files that were created back to your Desktop to view them.

Questions

What are the numbers returned by sizeFactors( cds )?

Expand

	Answer...
	Answer...

They are roughly speaking the relative average coverage of each data set? There are roughly 5 times as many counts of reads in genes for wt2 as there are for mut2. Specifically, they are the size parameter of the negative binomial fit to the counts per gene per data file.

What are the dispersion estimates?

Expand

	Answer...
	Answer...

The model assumes there is also a per-gene aspect to the variance in counts observed, that is again fit to a negative binomial distribution (=overdispersed Poisson distribution). The program fits a model where the lower the counts are the more dispersion is expected (red line in graph), and thus the less significant a change in counts becomes.

What was the predominant effect of the mutation on gene expression in this Listeria strain?

Additional Points

In an actual RNAseq analysis, you might want to trim stray adaptor sequences from your data using a tool like the FASTX-Toolkit or FAR before aligning.
You can get a lot more information from RNAseq data than you could from a microarray experiment. You can map transcriptional start sites, areas of unexpected transcription, splice sites, etc. - all because you have full sequence information that we have barely used in this example.

Version	Old Version 19	New Version 20
Changes made by	Jeffrey E Barrick	Jeffrey E Barrick
Saved on	May 23, 2012	May 23, 2012

Versions Compared

Key

Questions

Additional Points