The aim of using pathway enrichment analysis is to consider the genetic interactions of DEGs and use the advantages of the size and directions of enriched pathways(KEGG,REACTOME). This will help to explain the downstream and pathway level effects.
While the gene set enrichment analysis was used to see whether there is any chance that the group of genes is related to the phenotype by over representation analysis.
We used a network based approach to create clusters of subnetworks from the provided DEGs based on their interactions which are described on stringDBs.
The created subnetworks were enriched in KEGG3 and REACTOME.
The similarity ranking pathways from the chosen databases were calculated by the network-based distance similarity (measured using the XD-score) of the DEGs.
This network similarity xd score measures the distance of genes set, with interactions, to the pathways.
The significance of the gene set overlap between the pathway and DEGs is measured by a Fisher-test.
We calculate the xd score of different tissue-types for each pathway.
The resulting pathways were also compared with experimentally verified databases such as innateDB5 and NCBI DBs.
The workflow of pathway and gene set enrichment analysis
def main():
import sys
#the filename below with the name of tab-delimited input network file
#(in human ENSEMBL gene format)
input_file = ""
opener = urllib2.build_opener(MultipartPostHandler)
params = { "upFile" : "yes",
"file" : open("C:/xampp/htdocs/enrichnet/"+input_file,
"rb") }
response = opener.open('http://lcsb-enrichnet-server.embl.de/enrichnet/
fileupload.php', params).read()
# the parameters to run the network-based enrichment analysis
#(replace them with your genes and pathways of interest)
genes = ''#put the gene_ids
idtype = '' #identify the gene ids type
pathdb = '' #specify the reference database
graph = 'own'
params = { "genes" : genes, "idtype" : idtype, "pathdb" : pathdb, "graph"
: graph }
url = 'http://elephant.embl.de/enrichnet/index.php'
params = urllib.urlencode({ 'genes':genes, 'idtype':idtype, 'pathdb':pathdb,
'graph':graph})
response = urllib2.urlopen(url+'?'+params).read()
print "The results will appear below:"
print response
| Annotation (pathway/process) | XD-score | Fisher q-value | Pathway size | Overlap size |
|---|---|---|---|---|
| hsa00920:Sulfur metabolism | 0.5116 | 1 | 12 | 1 |
| hsa00604:Glycosphingolipid biosynthesis - ganglio series | 0.4045 | 1 | 14 | 1 |
| hsa04910:Insulin signaling pathway | 0.0974 | 1 | 134 | 5 |
| hsa04010:MAPK signaling pathway | 0.13810 | 0.98 | 263 | 11 |
| Reactome:Nuclear receptor transcription pathways | 0.28216 | 1 | 138 | 2 |
#Function to link and install the used packages between R and cytoscape
installation_responses <- c()
cyto_app_toinstall <- c("stringapp","genemania")
cytoscape_version <- unlist(strsplit( cytoscapeVersionInfo()['cytoscapeVersion']
,split = "\\."))
if(length(cytoscape_version) == 3 && as.numeric(cytoscape_version[1]>=3)
&& as.numeric(cytoscape_version[2]>=7)){
for(i in 1:length(cyto_app_toinstall)){
#check to see if the app is installed.
#Only install it if it hasn't been installed
if(!grep(commandsGET(paste("apps status app=\"", cyto_app_toinstall[i],"\"",
sep="")),
pattern = "status: Installed")){
installation_response <-commandsGET(paste("apps install app=\"",
cyto_app_toinstall[i],"\"",
sep=""))
installation_responses <- c(installation_responses,installation_response)
} else{
installation_responses <- c(installation_responses,"already installed")
}
}
installation_summary <- data.frame(name = cyto_app_toinstall,
status = installation_responses)
knitr::kable(list(installation_summary),
booktabs = TRUE, caption =
'A Summary of automated app installation'
)
}
#Retrieve networks for genes of interest
string_interaction<- paste('string protein query taxonID=9606 limit=150
cutoff=0.9 query="',
paste(overlaped_genes, collapse=","),'"',sep="")
commandsGET(string_interaction
if(file.exists(initial_string_network_png_file_name)){
#cytoscape hangs waiting for user response if file already exists.
#Remove it first
response <- file.remove(initial_string_network_png_file_name)
}
response <- exportImage(initial_string_network, type = "png")
Insulin signaling genes (blue colours) associations
NR4A3 associations with nuclear receptor family
Associations between insulin signaling genes from DEGs (blue colour),insulin signaling genes of PD map (red colour), the intrmediae candidates(white colours)and NR4A3 molecule(black colour)
#plot the results
Tissue specificity %>%
mutate(name = fct_reorder(name, desc(Tissue_xd_score))) %>%
ggplot( aes(x=name, y=Tissue_xd_score)) +
geom_bar(stat="identity", fill="RED", alpha=.6, width=.4) +
coord_flip() +
xlab("") +
theme_bw()
Tissue specificity of insulin signaling
##read the input genes
genes<-system.file("extdata", "inputGenes.txt", package = "TissueEnrich")
inputGenes<-scan(genes,character())
## Specify the geneIdType parameter in the input GeneSet object
gs<-GeneSet(geneIds=inputGenes,organism="Homo Sapiens",
geneIdType=SymbolIdentifier())
## The output is a list object containing the enrichment results
output<-teEnrichment(inputGenes = gs)
### Exploring tissue-specific gene enrichment results
seEnrichmentOutput<-output[[1]]
enrichmentOutput<-setNames(data.frame(assay(seEnrichmentOutput),
row.names = rowData(seEnrichmentOutput)[,1]), colData(seEnrichmentOutput)[,1])
enrichmentOutput$Tissue<-row.names(enrichmentOutput)
head(enrichmentOutput)
### Plot tissue-specific gene enrichment results
ggplot(enrichmentOutput,aes(x=reorder(Tissue,-fold.change),y=fold.change,
label = Tissue.Specific.Genes,fill = Tissue))+
geom_bar(stat = 'identity')+
labs(x='', y = 'Fold change')+
theme_bw()+
theme(legend.position="none")+
theme(plot.title = element_text(hjust = 0.5,size = 20),
axis.title = element_text(size=15))+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
panel.grid.major= element_blank(),panel.grid.minor = element_blank())
### Retrieval of input tissue-specific genes
seGroupInf<-output[[3]][[""]]#specify the tissue you want
groupInf<-data.frame(assay(seGroupInf))
print(head(groupInf))
# Retrieve data for a specific collection, such as the hallmark gene sets.
h_gene_sets = msigdbr(species = "human", category = "H")
# Use the gene sets data frame for enrichment with genes as gene symbols
msigdbr_t2g = h_gene_sets %>% dplyr::distinct(gs_name, gene_symbol) %>%
as.data.frame()
result <- enricher(gene = gene_symbols_vector, TERM2GENE = msigdbr_t2g)
| Name | P-value | Adjusted p-value | Odds Ratio | Combined score |
|---|---|---|---|---|
| Epithelial Mesenchymal Transition | 1.044E-16 | 5.011E-15 | 7.65 | 281.52 |
| E2F Targets | 1.046E-08 | 2.51E-07 | 4.86 | 89.32 |
| G2-M Checkpoint | 6.165E-05 | 0.0009864 | 3.38 | 32.74 |
| Coagulation | 0.0002328 | 0.001863 | 3.68 | 30.79 |
| Mitotic Spindle | 0.0002007 | 0.001863 | 3.16 | 26.9 |
| Apical Junction | 0.000212 | 0.001863 | 3.14 | 26.58 |
| Apoptosis | 0.002995 | 0.02054 | 2.82 | 16.41 |
| Enriched area | p-val (adj) |
|---|---|
| Wnt signaling | 0.000511479918396 |
| Neuroinflammation | 0.009952404929447 |
| Postsynaptic terminal of dopaminergic neuron | 0.022056074874101 |
The ovrlapped genes in Wnt-pi3k/AKT signaling in disease map
The enrichment analysis shows that ACACB, INPP5D, PYGL, PPP1R3B and TRIP10 are found in the insulin signalling pathway (KEGG). INPP5D and PYGL are more deregulated than ACACB, PPP1R3B and TRIP10 in response to PINK1 mutation. INPP5D interacts with different molecules in the pathway which may affect other genes (Figure 2). The enrichment analysis reveals that INPP5D has a direct interaction with PTEN which is an essential molecule in insulin signalling in Parkinson’s disease map. INPP5D has indirect interactions with IRS1 and IGF1 through PDPK1 and DOK1 respectively. A research study reported the role of INPP5D to regulate PTEN and IRS1. PYGL is found to interact directly with the overlapped genes and indirectly with IRS1 in Parkinson’s disease map. PPP1R3B regulates the expression of IRS1 and IGF1. It interacts indirectly with IGF1 through IGFBP2.
NR4A3 and AR are enriched in the Nuclear receptor pathway(REACTOME). NR4A3 modulates different nuclear mechanisms and affects insulin expression. NR4A3 interacts with nuclear process group NR4A1 NR4A2 which are fully described in the PD map. NR4A3 was found to have common associations with the insulin signalling pathway.
In sulphur metabolism, Cytosolic sulfotransferase 1A3 (SULT1A1) is down-regulated and induced by dopamine and protects neuronal cells from dopamine toxicityPMID: 24136195. SULT1A1 interacts with the sulfotransferase family which may act as a cofactor for vitamin B production. It was reported that deregulation of SULT1A1 is associated with type two diabetes PMC1828071.
In Glycosphingolipid biosynthesis - ganglio series pathway, ST6GALNAC3 associated with the medial and lateral substantia nigra in Parkinson’s disease PMID: 17211632. It is also reported as one of the circulating protein signatures and causal candidates for type 2 diabetes PMID: 32385057. ST6GALNAC3 was also reported as a common interplayer in different subgroups through the identification of type 2 diabetes [PMID: 26511511 (https://pubmed.ncbi.nlm.nih.gov/26511511/).
In MAPK pathways, MECOM, FGF1, FAS, GADD45B, RRAS and TGFBR2 were found to have a significant effect according to the dataset. FGF1 was reported as new key gene therapy for type 2 diabetes. A single intracerebroventricular injection of FGF1 can induce long-lasting remission of type 2 diabetes mellitus phenotype PMID: 28664920. FAS is associated with apoptosis in type 2 diabetes mellitus PMID: 23565468, https://doi.org/10.4103). Gadd45b is required in part for the metabolic benefits of CAR activation.-which has a role in type 2 diabetes PMID: 33643822.
The tissue specificity annotations are retrieved from all non-overlapping gene pairs, which are connected through paths of interactions in a molecular network. Specifically, pairs of genes that have been found co-expressed in a cell are considered to have the potential to interact physically in the same tissue, whereas all other gene pairs are assumed to lack this potential. In insulin signalling, the Nuclear signalling pathway and Glycosphingolipid biosynthesis ganglio-series, skeletal muscle, liver, ciliary ganglion, cervical ganglion and trigeminal ganglion were the most common related tissues, suggesting that the interactions of overlapped genes with their coexpressed molecules have a potential to interact physically in those tissues.
The significant pathways of PINK1 and D2M are :
Insulin signaling pathway
Nuclear receptor pathway
Sulphur metabolism (SULT1A1)
Glycosphingolipid biosynthesis - ganglio series (ST6GALNAC3)
MAPK pathways (MECOM, FGF1, FAS, GADD45B, RRAS, TGFBR2)