Content Comparison

...

Code Block
bedtools multicov -bams SRR034450.bam SRR034451.bam SRR034452.bam SRR034453.bam -bed NC_017544.1.genes.gff > gene_counts.gff

...

head gene_counts.gff

Analyze differential gene expression (DESeq)

DESeq Manual and Instructions

Our data that is cluttered with a lot of extra columns and one column stuffed with tag\=value information (including the gene names that we want!). Let's clean it up a bit before loading into R - which likes to work on simple tables. GFF are tab-delimited files.

We can do this cleanup many ways, but a quick one is to use the Unix string editor sed. This command replaces the entire beginning of the line up to locus_tag= with nothing (that is, it deletes it). This conveniently leaves us with just the locus_tag and the columns of read counts in each gene. If you were writing a real pipeline, you would probably want to use a Perl script that would check to be sure that each line had the locus_tag (they do)

...

, among other things.

Code Block

title	Reformatting gene_counts.gff


head gene_counts.gff
sed -e 's/^.*locus_tag=//' < gene_counts.gff > gene_counts.tab
head gene_counts.tab

Now, we are going to load the GFF file straight into R, remove the columns we don't want, name the columns and rows like R expects, and write out this file. You could do this in any other scripting language, or even Excel. We will write out the first few lines of the file at each step, so that you can see what the command is doing.

vignette("DEseq")

Code Block

title	Using DESeq

login1$ R
...
> library("DESeq")
> combinedcounts = read.csvtable("combinedgene_counts.csvtab", header=TF, row.names=1, sep="\t")
> head(counts)
> colnames(counts) = c("wt1", "mut1", "wt2", "mut2")
> head(counts)
> my.design <- data.frame(
  row.names = colnames( combinedcounts ),
  condition = c( "Ancwt", "Ancmut", "ELwt", "EL", "EW", "EW"mut"),
  libType = c( "single-end", "single-end", "single-end",
  "single-end", "single-end", "single-end" ) 
)
)
> design
> conds <- factor(my.design$condition)
> cds <- newCountDataSet( combinedcounts, conds )
> summary(cds)
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
> cds <- estimateDispersions( cds )

> pdf("dispersion_estimates.pdf")
> plotDispEsts( cds )
> res dev.off()

> result <- nbinomTest( cds, "ELwt", "EW" )
mut" )
> head(result)
> result = result[order(result$pval), ]

> write.csv(resresult, "ELwt-vs-EWmut.csv")

> pdf("MA-plot")
> plot(
  result$baseMean,
  result$log2FoldChange,
  log="x", pch=20, cex=.3,
  col = ifelse( result$padj < .1, "red", "black" ) )
> dev.off()

> q()
Save workspace image? [y/n/c]: n
login1$ head wt-vs-mut.csv

Additional Points

In an actual RNAseq analysis, you might want to trim stray adaptor sequences from your data using a tool like the FASTX-Toolkit or FAR before aligning.

Version	Old Version 18	New Version 19
Changes made by	Jeffrey E Barrick	Jeffrey E Barrick
Saved on	May 22, 2012	May 23, 2012

Versions Compared

Key

Analyze differential gene expression (DESeq)

Additional Points