Quick tips on GO analysis

Scott likes two approaches:
1. GoMiner: http://discover.nci.nih.gov/gominer/htgm.servlet
But this requires gene names in a text file (MUST have .txt ending) to work; useful hack to get gene names from, e.g., blast -m 8 results (presuming fish.t has only the blast result identifier, e.g. "gi|5729849|ref|NM_006496.1"):


grep -f fish.t /home/scott/data/blastdb/human_rna.annot > fish.t.genes
cat fish.t.genes | awk '{for (i=1;i<NF;i++) {if (substr($i,1,1)=="("&&substr($i,length($i),1)==",") {print substr($i,2,length($i)-3)} } }' > fish.t.genes.genenames

The first command retrieves the entire reference line from the BLAST database annotation, the second command parses out the gene name (not attractively).

Then send fish.t.genes.genenames off to gominer, allowing it to select the background (you could send the annot file of course...)

2. DAVID: http://david.abcc.ncifcrf.gov/tools.jsp
In contrast to GoMiner, DAVID is fine having NCBI identifiers (NM_..., NP_..., etc.) that can easily be parsed out from blast results.

Why would you chose one or the other?

GoMiner provides a very clean output - suitable for grant applications and summaries of gene lists, but doesn't give you much to explore. DAVID's output is intended for interactive exploration when you're really trying to work out biology.