Motivation

The motivation is to investigate the functional associations between the differential expressed genes (DEGs) and the disease pathways, thus we get a better interpretation by applying both pathway and gene set enrichment analysis by enriching them in different data resources.

Aim

The aim of using pathway enrichment analysis is to consider the genetic interactions of DEGs and use the advantages of the size and directions of enriched pathways(KEGG,REACTOME). This will help to explain the downstream and pathway level effects.
While the gene set enrichment analysis was used to see whether there is any chance that the group of genes is related to the phenotype by over representation analysis.

Workflow

Functional pathway enrichment analysis

We used a network based approach to create clusters of subnetworks from the provided DEGs based on their interactions which are described on stringDBs.
The created subnetworks were enriched in KEGG3 and REACTOME.
The similarity ranking pathways from the chosen databases were calculated by the network-based distance similarity (measured using the XD-score) of the DEGs.
This network similarity xd score measures the distance of genes set, with interactions, to the pathways.
The significance of the gene set overlap between the pathway and DEGs is measured by a Fisher-test.
We calculate the xd score of different tissue-types for each pathway.
The resulting pathways were also compared with experimentally verified databases such as innateDB5 and NCBI DBs.

Gene set enrichment analysis

The entire DEGs (without interactions) were enriched in parkinson’s disease map and MSigDB hallmark 2020 to identify the sets in which the DEGs are overrepresented or underrepresented in pathways. We specified the gene set enrichment into the disease map and the hallmarks because they reduce the noise and redundancy and provide a better understanding for the biological interpretations.

The workflow of pathway and gene set enrichment analysis

Results

1. Enrichment analysis using Enrichnet APIs:

def main():
    import sys 
#the filename below with the name of tab-delimited input network file
#(in human ENSEMBL gene format)
    input_file = "" 
    
    opener = urllib2.build_opener(MultipartPostHandler)
    
    params = { "upFile" : "yes",
                         "file" : open("C:/xampp/htdocs/enrichnet/"+input_file,
                                       "rb") }
  
    response = opener.open('http://lcsb-enrichnet-server.embl.de/enrichnet/
                           fileupload.php', params).read()
    
    
# the parameters to run the network-based enrichment analysis
#(replace them with your genes and pathways of interest)
    genes = ''#put the gene_ids
    idtype = '' #identify the gene ids type
    pathdb = '' #specify the reference database
    graph = 'own'
    params = { "genes" : genes, "idtype" : idtype, "pathdb" : pathdb, "graph" 
      : graph }
    
    url = 'http://elephant.embl.de/enrichnet/index.php'
    params = urllib.urlencode({ 'genes':genes, 'idtype':idtype, 'pathdb':pathdb,
      'graph':graph})
    
    response = urllib2.urlopen(url+'?'+params).read()
    
    print "The results will appear below:"
    print response

1.1. Siginficant pathways of PINK1_T2DM

Annotation (pathway/process)	XD-score	Fisher q-value	Pathway size	Overlap size
hsa00920:Sulfur metabolism	0.5116	1	12	1
hsa00604:Glycosphingolipid biosynthesis - ganglio series	0.4045	1	14	1
hsa04910:Insulin signaling pathway	0.0974	1	134	5
hsa04010:MAPK signaling pathway	0.13810	0.98	263	11
Reactome:Nuclear receptor transcription pathways	0.28216	1	138	2

1.2. Find the associations between the overlaped gene(s) and PD map

I used the cytoscape versions of two different tools (STRING and Genemania) to facilitate the network data retrieval and arrange the nodes with conceptual layouts.
STRING is molecular interaction database was used to find the associations between the overlapped gene(s)and PD map
Genemania contain a wealth of human genomics and proteomics database from NCBI database.

#Function to link and install the used packages between R and cytoscape
installation_responses <- c()
cyto_app_toinstall <- c("stringapp","genemania")

cytoscape_version <- unlist(strsplit( cytoscapeVersionInfo()['cytoscapeVersion']
                                      ,split = "\\."))
if(length(cytoscape_version) == 3 && as.numeric(cytoscape_version[1]>=3) 
   && as.numeric(cytoscape_version[2]>=7)){
  for(i in 1:length(cyto_app_toinstall)){
    #check to see if the app is installed.
    #Only install it if it hasn't been installed
    if(!grep(commandsGET(paste("apps status app=\"", cyto_app_toinstall[i],"\"",
                               sep="")), 
             pattern = "status: Installed")){
      installation_response <-commandsGET(paste("apps install app=\"", 
                                                cyto_app_toinstall[i],"\"", 
                                                sep=""))
      installation_responses <- c(installation_responses,installation_response)
    } else{
      installation_responses <- c(installation_responses,"already installed")
    }
  }
  installation_summary <- data.frame(name = cyto_app_toinstall, 
                                     status = installation_responses)
  
  knitr::kable(list(installation_summary),
               booktabs = TRUE, caption = 
                 'A Summary of automated app installation'
  )
}

#Retrieve networks for genes of interest
string_interaction<- paste('string protein query taxonID=9606 limit=150 
                           cutoff=0.9 query="',
                           paste(overlaped_genes, collapse=","),'"',sep="")
commandsGET(string_interaction

if(file.exists(initial_string_network_png_file_name)){
  #cytoscape hangs waiting for user response if file already exists. 
  #Remove it first
  response <- file.remove(initial_string_network_png_file_name)
} 

response <- exportImage(initial_string_network, type = "png")

Insulin signaling genes (blue colours) associations

NR4A3 associations with nuclear receptor family

Associations between insulin signaling genes from DEGs (blue colour),insulin signaling genes of PD map (red colour), the intrmediae candidates(white colours)and NR4A3 molecule(black colour)

1.3. Tissue specificity of insulin signaling pathway

1.3.1 Tissue xd score from KEGG annotation.

#plot the results
Tissue specificity %>%
  mutate(name = fct_reorder(name, desc(Tissue_xd_score))) %>%
  ggplot( aes(x=name, y=Tissue_xd_score)) +
  geom_bar(stat="identity", fill="RED", alpha=.6, width=.4) +
  coord_flip() +
  xlab("") +
  theme_bw()

Tissue specificity of insulin signaling

1.3.2. Tissue-specific gene enrichment using TissueEnrich package (GTEx and HPA)

##read the input genes
genes<-system.file("extdata", "inputGenes.txt", package = "TissueEnrich")
inputGenes<-scan(genes,character())

## Specify the geneIdType parameter in the input GeneSet object
gs<-GeneSet(geneIds=inputGenes,organism="Homo Sapiens",
            geneIdType=SymbolIdentifier())

## The output is a list object containing the enrichment results
output<-teEnrichment(inputGenes = gs)

### Exploring tissue-specific gene enrichment results
seEnrichmentOutput<-output[[1]]
enrichmentOutput<-setNames(data.frame(assay(seEnrichmentOutput),
                                      row.names = rowData(seEnrichmentOutput)[,1]), colData(seEnrichmentOutput)[,1])
enrichmentOutput$Tissue<-row.names(enrichmentOutput)
head(enrichmentOutput)

### Plot tissue-specific gene enrichment results
ggplot(enrichmentOutput,aes(x=reorder(Tissue,-fold.change),y=fold.change,
                            label = Tissue.Specific.Genes,fill = Tissue))+
  geom_bar(stat = 'identity')+
  labs(x='', y = 'Fold change')+
  theme_bw()+
  theme(legend.position="none")+
  theme(plot.title = element_text(hjust = 0.5,size = 20),
        axis.title = element_text(size=15))+
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        panel.grid.major= element_blank(),panel.grid.minor = element_blank())

### Retrieval of input tissue-specific genes
seGroupInf<-output[[3]][[""]]#specify the tissue you want
groupInf<-data.frame(assay(seGroupInf))
print(head(groupInf))

2. Enrichment in specific MSigDB human hallmark

# Retrieve data for a specific collection, such as the hallmark gene sets.
h_gene_sets = msigdbr(species = "human", category = "H")

# Use the gene sets data frame for enrichment with genes as gene symbols
msigdbr_t2g = h_gene_sets %>% dplyr::distinct(gs_name, gene_symbol) %>% 
  as.data.frame()
result <- enricher(gene = gene_symbols_vector, TERM2GENE = msigdbr_t2g)

Name	P-value	Adjusted p-value	Odds Ratio	Combined score
Epithelial Mesenchymal Transition	1.044E-16	5.011E-15	7.65	281.52
E2F Targets	1.046E-08	2.51E-07	4.86	89.32
G2-M Checkpoint	6.165E-05	0.0009864	3.38	32.74
Coagulation	0.0002328	0.001863	3.68	30.79
Mitotic Spindle	0.0002007	0.001863	3.16	26.9
Apical Junction	0.000212	0.001863	3.14	26.58
Apoptosis	0.002995	0.02054	2.82	16.41

3. Enrichment in Disease map

Enriched area	p-val (adj)
Wnt signaling	0.000511479918396
Neuroinflammation	0.009952404929447
Postsynaptic terminal of dopaminergic neuron	0.022056074874101

The ovrlapped genes in Wnt-pi3k/AKT signaling in disease map

Discussion

The enrichment analysis shows that ACACB, INPP5D, PYGL, PPP1R3B and TRIP10 are found in the insulin signalling pathway (KEGG). INPP5D and PYGL are more deregulated than ACACB, PPP1R3B and TRIP10 in response to PINK1 mutation. INPP5D interacts with different molecules in the pathway which may affect other genes (Figure 2). The enrichment analysis reveals that INPP5D has a direct interaction with PTEN which is an essential molecule in insulin signalling in Parkinson’s disease map. INPP5D has indirect interactions with IRS1 and IGF1 through PDPK1 and DOK1 respectively. A research study reported the role of INPP5D to regulate PTEN and IRS1. PYGL is found to interact directly with the overlapped genes and indirectly with IRS1 in Parkinson’s disease map. PPP1R3B regulates the expression of IRS1 and IGF1. It interacts indirectly with IGF1 through IGFBP2.
NR4A3 and AR are enriched in the Nuclear receptor pathway(REACTOME). NR4A3 modulates different nuclear mechanisms and affects insulin expression. NR4A3 interacts with nuclear process group NR4A1 NR4A2 which are fully described in the PD map. NR4A3 was found to have common associations with the insulin signalling pathway.
In sulphur metabolism, Cytosolic sulfotransferase 1A3 (SULT1A1) is down-regulated and induced by dopamine and protects neuronal cells from dopamine toxicityPMID: 24136195. SULT1A1 interacts with the sulfotransferase family which may act as a cofactor for vitamin B production. It was reported that deregulation of SULT1A1 is associated with type two diabetes PMC1828071.
In Glycosphingolipid biosynthesis - ganglio series pathway, ST6GALNAC3 associated with the medial and lateral substantia nigra in Parkinson’s disease PMID: 17211632. It is also reported as one of the circulating protein signatures and causal candidates for type 2 diabetes PMID: 32385057. ST6GALNAC3 was also reported as a common interplayer in different subgroups through the identification of type 2 diabetes [PMID: 26511511 (https://pubmed.ncbi.nlm.nih.gov/26511511/).
In MAPK pathways, MECOM, FGF1, FAS, GADD45B, RRAS and TGFBR2 were found to have a significant effect according to the dataset. FGF1 was reported as new key gene therapy for type 2 diabetes. A single intracerebroventricular injection of FGF1 can induce long-lasting remission of type 2 diabetes mellitus phenotype PMID: 28664920. FAS is associated with apoptosis in type 2 diabetes mellitus PMID: 23565468, https://doi.org/10.4103). Gadd45b is required in part for the metabolic benefits of CAR activation.-which has a role in type 2 diabetes PMID: 33643822.
The tissue specificity annotations are retrieved from all non-overlapping gene pairs, which are connected through paths of interactions in a molecular network. Specifically, pairs of genes that have been found co-expressed in a cell are considered to have the potential to interact physically in the same tissue, whereas all other gene pairs are assumed to lack this potential. In insulin signalling, the Nuclear signalling pathway and Glycosphingolipid biosynthesis ganglio-series, skeletal muscle, liver, ciliary ganglion, cervical ganglion and trigeminal ganglion were the most common related tissues, suggesting that the interactions of overlapped genes with their coexpressed molecules have a potential to interact physically in those tissues.

Summary

The significant pathways of PINK1 and D2M are :

Insulin signaling pathway
Nuclear receptor pathway
Sulphur metabolism (SULT1A1)
Glycosphingolipid biosynthesis - ganglio series (ST6GALNAC3)
MAPK pathways (MECOM, FGF1, FAS, GADD45B, RRAS, TGFBR2)

Enrichment analysis for PINK1_T2DM

Ahmed Hemedan

6/8/2021