About

GOE (Gene Ontology Enrichment) analysis is a method to interpreting the set of genes making use of the gene ontology system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. Here is the demo of gene ontology analysis by using the R package GOfuncR and it is based on the ontology enrichment software FUNC

Required R pakages

library(GOfuncR)
library(dplyr)

Input genes

The input can be loaded by using a text file that contains gene symbols in the first column and second column with 1 for candidate genes and 0 for background genes. If no background genes are defined, all remaining genes from the internal dataset are used as background.

##   Gene_Ids Is_candidate
## 1  TSPOAP1            1
## 2    UBE4A            1
## 3     RIC1            1
## 4    FBXL3            1
## 5   FBXO11            1
## 6 KIAA1109            1

Load input gene list into R

Input_Genes<-read.delim2("Input_Genes.txt",header = T)

Tests GO-categories for enrichment

Run hypergeometric test

Go_Enrich_Out<- go_enrich(Input_Genes)

Get enrichment analyses results

Results<-Go_Enrich_Out$results

Categories the enrichment results based on the P value.

The gene enrichment can be classified as over-representation and under-representation based on the row p-value. If the raw p over rep value <=0.05 can consider as over-representation and for under-representation the raw p under rep <=0.05

Over_Representation<-Results[Results$raw_p_overrep<=0.05,]
Under_Representation<- Results[Results$raw_p_underrep<=0.05,]

Get the genes which used in the enrichment analysis

If some gene symbol is not found in the database those will not include in the analysis. Here we took the genes that are considered as candidate genes.

Genes<- Go_Enrich_Out$genes
Candidate_Gene <-Genes[Genes$Is_candidate==1,]

Get candidate genes GO-categories.

Grep all associated GO-categories based on the candidate gene list

Gene_all_GO<-get_anno_categories(Candidate_Gene$Gene_Ids)

Add the gene symbol

Out_Over_Representation<-merge(Over_Representation,Gene_all_GO, by.x = "node_id",by.y = "go_id", all.x = TRUE, all.y = FALSE)

Out_Under_Representation<-merge(Under_Representation,Gene_all_GO, by.x = "node_id",by.y = "go_id", all.x = TRUE, all.y = FALSE)

Add all the gene names into GO categories.

Results_Over_Representation<- Out_Over_Representation %>% group_by(node_id) %>% mutate(gene= paste(gene, collapse=",")) %>% unique %>% na.omit
Results_Under_Representation<- Out_Under_Representation %>% group_by(node_id) %>% mutate(gene= paste(gene, collapse=",")) %>% unique %>% na.omit
Results_Over_Representation[ ,c(9,10)] <- NULL 
Results_Under_Representation[ ,c(9,10)] <- NULL

Export the results

write.table(Results_Over_Representation,"Results_Over_Representation",quote = FALSE, row.names = F,sep = "\t")
write.table(Results_Under_Representation,"Results_Under_Representation,",quote = FALSE, row.names = F,sep = "\t")