GOE (Gene Ontology Enrichment) analysis is a method to interpreting the set of genes making use of the gene ontology system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. Here is the demo of gene ontology analysis by using the R package GOfuncR and it is based on the ontology enrichment software FUNC
library(GOfuncR)
library(dplyr)
The input can be loaded by using a text file that contains gene symbols in the first column and second column with 1 for candidate genes and 0 for background genes. If no background genes are defined, all remaining genes from the internal dataset are used as background.
## Gene_Ids Is_candidate
## 1 TSPOAP1 1
## 2 UBE4A 1
## 3 RIC1 1
## 4 FBXL3 1
## 5 FBXO11 1
## 6 KIAA1109 1
Input_Genes<-read.delim2("Input_Genes.txt",header = T)
Run hypergeometric test
Go_Enrich_Out<- go_enrich(Input_Genes)
Results<-Go_Enrich_Out$results
The gene enrichment can be classified as over-representation and under-representation based on the row p-value. If the raw p over rep value <=0.05 can consider as over-representation and for under-representation the raw p under rep <=0.05
Over_Representation<-Results[Results$raw_p_overrep<=0.05,]
Under_Representation<- Results[Results$raw_p_underrep<=0.05,]
If some gene symbol is not found in the database those will not include in the analysis. Here we took the genes that are considered as candidate genes.
Genes<- Go_Enrich_Out$genes
Candidate_Gene <-Genes[Genes$Is_candidate==1,]
Grep all associated GO-categories based on the candidate gene list
Gene_all_GO<-get_anno_categories(Candidate_Gene$Gene_Ids)
Out_Over_Representation<-merge(Over_Representation,Gene_all_GO, by.x = "node_id",by.y = "go_id", all.x = TRUE, all.y = FALSE)
Out_Under_Representation<-merge(Under_Representation,Gene_all_GO, by.x = "node_id",by.y = "go_id", all.x = TRUE, all.y = FALSE)
Results_Over_Representation<- Out_Over_Representation %>% group_by(node_id) %>% mutate(gene= paste(gene, collapse=",")) %>% unique %>% na.omit
Results_Under_Representation<- Out_Under_Representation %>% group_by(node_id) %>% mutate(gene= paste(gene, collapse=",")) %>% unique %>% na.omit
Results_Over_Representation[ ,c(9,10)] <- NULL
Results_Under_Representation[ ,c(9,10)] <- NULL
write.table(Results_Over_Representation,"Results_Over_Representation",quote = FALSE, row.names = F,sep = "\t")
write.table(Results_Under_Representation,"Results_Under_Representation,",quote = FALSE, row.names = F,sep = "\t")