...
GO terms provide a standardized vocabulary to describe genes and gene products from different species. GO terms allow us to assign functionality to genes. The following properties are described for gene products:
cellular component, describes where in a cell a gene acts, what cellular unit the gene is part of
molecular function, describes the function carried out by the gene, such as binding or catalysis;
biological process, a set of molecular functions, with a defined beginning and end, makes up a biological process. This describes biological phenomenon like DNA replication.
All GO terms have an ID that looks like GO:0006260 and a name like DNA replication.
...
GO terms are hierarchical consisting of broader parent GO terms and narrower child GO terms. For example, DNA replication is a child of GO term:cellular metabolic process. DNA replication has child GO terms like regulation of DNA replication, strand elongation.
WHAT IS GO ENRICHMENT?
...
GO enrichment is a way of summarizing the FUNCTIONS AND TYPES of genes that are differentially expressed.
CLASSICAL GO ENRICHMENT
Input:
A. Total number of genes we are looking at (ALL genes).
B. Number of genes of interest, that is, in our DEG list (DEG).
C. Total number of genes in the GO term
D. Number of genes from our genes of interest (DEG) that are also in the GO term.
Enrichment test: whether “DEG list” contains more representatives of a certain GO category than expected by chance (Fisher’s exact, hypergeometric, or similar test).
...
Web-based tool for GO/pathway enrichment: Enrichr
Run Gorilla- Classical enrichment
...
...
Get the data for running gorilla
| Code Block |
|---|
#Make sure you are in the right location cds cd my_rnaseq_course/day_4_partA/go_enrichment |
...
Get all input files from DESeq2 output:
"","id","baseMean","baseMeanA","baseMeanB","foldChange","log2FoldChange","pval","padj"
"131","FBgn0000370",7637.91654540105,4217.77033402576,11058.0627567763,2.62177925326286,1.39054621964443,1.2887282997047e-116,7.22484613473489e-113
"2489","FBgn0025682",6038.35042952997,3300.21617337019,8776.48468568974,2.65936660649935,1.41108267336748,1.36704751839828e-116,7.22484613473489e-113
......
INPUT FILE 1: DEG (contains the 76 genes that meet our fold change an p value cut offs)
FBgn0000370
FBgn0025682
FBgn0086904
...
...
Pull out all the gene ids corresponding to DEGs
| Code Block |
|---|
#Alter this old command to pull out Gene ids corresponding to DEGs and store it in a file called DEG
sed 's/,/\t/g' deseq2_htseq_C1_vs_C2.csv|awk '{if ((($3>=1)||($3<=-1))&&($6<=0.05)) print $1}' |sed 's/"//g'|grep '^FB' > DEG |
.....
INPUT FILE 2: ALL (contains all 14869 genes)
FBgn0000370
FBgn0025682
FBgn0086904
.....
...
Pull out all the gene ids
| Code Block |
|---|
#Command to pull out ALL gene ids and store it in a file called ALL sed 's/,/\t/g' deseq2_htseq_C1_vs_C2.csv|cut -f 1|sed 's/"//g'|grep '^FB' > ALL |
...
SCP THE DATA OVER TO YOUR COMPUTER:
scp
| Code Block | |
|---|---|
| scp | #ON ls6: copy the path for the ALL and DEG files pwd #ON LOCAL COMPUTER: from a terminal tab scp <username>@ls6.tacc.utexas.edu:<pathtofileson/DEG> . scp <username>@ls6.tacc.utexas.edu:<pathtofileson/ALL> . |
RUN GORILLA USING THE UNRANKED METHOD: http://cbl-gorilla.cs.technion.ac.il/
Run Gorilla- Rank based enrichment
INPUT FILE: ALLRANKED (all genes, ranked by adjusted pvalue)
FBgn0000370
FBgn0025682
FBgn0086904
...
...
Pull out all the gene ids, ranked by pvalue
| Code Block |
|---|
##Command to pull out ALL gene ids, sorted by adjpvalue store it in a file called ALLRANKED #Remember we already sorted our results by adjusted pvalue in the deseq2 script before writing it out to a file. So #you just need to pull out the gene ids in the order it already is in. sed 's/,/\t/g' deseq2_htseq_C1_vs_C2.csv|cut -f 1|sed 's/"//g'|grep '^FB' > ALLRANKED |
SCP THE DATA OVER TO YOUR COMPUTER:
scp
| Code Block | title | scp
|---|
#ON ls6: copy the path for the ALLRANKED file pwd #ON LOCAL COMPUTER: from a terminal tab scp <username>@ls6.tacc.utexas.edu:<pathtofileson/ALLRANKED> . |
RUN GORILLA USING THE RANKED METHOD: http://cbl-gorilla.cs.technion.ac.il/
Go back to COURSE OUTLINE