...
| Code Block | ||
|---|---|---|
| ||
login1$ R
...
> library("DESeq")
> counts = read.delim("gene_counts.tab", header=F, row.names=1)
> head(counts)
> colnames(counts) = c("wt1", "mut1", "wt2", "mut2")
> head(counts)
> my.design <- data.frame(
row.names = colnames( counts ),
condition = c( "wt", "mut", "wt", "mut"),
libType = c( "single-end", "single-end", "single-end", "single-end" )
)
> conds <- factor(my.design$condition)
> cds <- newCountDataSet( counts, conds )
> cds
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
> cds <- estimateDispersions( cds )
> pdf("DESeq-dispersion_estimates.pdf")
> plot(
rowMeans( counts( cds, normalized=TRUE ) ),
fitInfo(cds)$perGeneDispEsts,
pch = '.', log="xy"
)
> xg <- 10^seq( -.5, 5, length.out=300 )
> lines( xg, fitInfo(cds)$dispFun( xg ), col="red" ) )
> dev.off()
> result <- nbinomTest( cds, "wt", "mut" )
> head(result)
> result = result[order(result$pval), ]
> head(result)
> write.csv(result, "DESeq-wt-vs-mut.csv")
> pdf("DESeq-MA-plot.pdf")
> plot(
result$baseMean,
result$log2FoldChange,
log="x", pch=20, cex=.3,
col = ifelse( result$padj < .1, "red", "black" ) )
> dev.off()
> q()
Save workspace image? [y/n/c]: n
login1$ head DESeq-wt-vs-mut.csv
|
DESeq-wt-vs-mut.csv is a comma-delimited file that could be reloaded into R or viewed in Excel.
You should copy the two *.pdf files that were created back to your Desktop local computer to view them.
Exercises
- What are the numbers returned by
sizeFactors( cds )?Expand Answer... Answer... They are, roughly speaking, the relative average coverage of each data set. Specifically, they are the size parameter of the negative binomial fit to the counts per gene per data file.
- What are the dispersion estimates?
Expand Answer... Answer... The model assumes there is also a per-gene aspect to the variance in counts observed, that is again fit to a negative binomial distribution (=overdispersed Poisson distribution). In this model, the lower the counts are, the more dispersion relative to the mean is expected (red line in graph). Thus, higher fold changes are required in lowly expressed genes to call the same observed fold-change difference as significant.
- What was the predominant effect of the mutation on gene expression in this Listeria strain?
...