Motivation

Aim

Workflow

Functional pathway enrichment analysis

  • We used a network based approach to create clusters of subnetworks from the provided DEGs based on their interactions which are described on stringDBs.

  • The created subnetworks were enriched in KEGG3 and REACTOME.

  • The similarity ranking pathways from the chosen databases were calculated by the network-based distance similarity (measured using the XD-score) of the DEGs.

  • This network similarity xd score measures the distance of genes set, with interactions, to the pathways.

  • The significance of the gene set overlap between the pathway and DEGs is measured by a Fisher-test.

  • We calculate the xd score of different tissue-types for each pathway.

  • The resulting pathways were also compared with experimentally verified databases such as innateDB5 and NCBI DBs.

Gene set enrichment analysis

  • The entire DEGs (without interactions) were enriched in parkinson’s disease map and MSigDB hallmark 2020 to identify the sets in which the DEGs are overrepresented or underrepresented in pathways. We specified the gene set enrichment into the disease map and the hallmarks because they reduce the noise and redundancy and provide a better understanding for the biological interpretations.

The workflow of pathway and gene set enrichment analysis

Results

1. Enrichment analysis using Enrichnet APIs:

def main():
    import sys 
#the filename below with the name of tab-delimited input network file
#(in human ENSEMBL gene format)
    input_file = "" 
    
    opener = urllib2.build_opener(MultipartPostHandler)
    
    params = { "upFile" : "yes",
                         "file" : open("C:/xampp/htdocs/enrichnet/"+input_file,
                                       "rb") }
  
    response = opener.open('http://lcsb-enrichnet-server.embl.de/enrichnet/
                           fileupload.php', params).read()
    
    
# the parameters to run the network-based enrichment analysis
#(replace them with your genes and pathways of interest)
    genes = ''#put the gene_ids
    idtype = '' #identify the gene ids type
    pathdb = '' #specify the reference database
    graph = 'own'
    params = { "genes" : genes, "idtype" : idtype, "pathdb" : pathdb, "graph" 
      : graph }
    
    url = 'http://elephant.embl.de/enrichnet/index.php'
    params = urllib.urlencode({ 'genes':genes, 'idtype':idtype, 'pathdb':pathdb,
      'graph':graph})
    
    response = urllib2.urlopen(url+'?'+params).read()
    
    print "The results will appear below:"
    print response

1.1. Siginficant pathways of PINK1_T2DM

Annotation (pathway/process) XD-score Fisher q-value Pathway size Overlap size
hsa00920:Sulfur metabolism 0.5116 1 12 1
hsa00604:Glycosphingolipid biosynthesis - ganglio series 0.4045 1 14 1
hsa04910:Insulin signaling pathway 0.0974 1 134 5
hsa04010:MAPK signaling pathway 0.13810 0.98 263 11
Reactome:Nuclear receptor transcription pathways 0.28216 1 138 2

1.2. Find the associations between the overlaped gene(s) and PD map

  • I used the cytoscape versions of two different tools (STRING and Genemania) to facilitate the network data retrieval and arrange the nodes with conceptual layouts.
  • STRING is molecular interaction database was used to find the associations between the overlapped gene(s)and PD map
  • Genemania contain a wealth of human genomics and proteomics database from NCBI database.
#Function to link and install the used packages between R and cytoscape
installation_responses <- c()
cyto_app_toinstall <- c("stringapp","genemania")

cytoscape_version <- unlist(strsplit( cytoscapeVersionInfo()['cytoscapeVersion']
                                      ,split = "\\."))
if(length(cytoscape_version) == 3 && as.numeric(cytoscape_version[1]>=3) 
   && as.numeric(cytoscape_version[2]>=7)){
  for(i in 1:length(cyto_app_toinstall)){
    #check to see if the app is installed.
    #Only install it if it hasn't been installed
    if(!grep(commandsGET(paste("apps status app=\"", cyto_app_toinstall[i],"\"",
                               sep="")), 
             pattern = "status: Installed")){
      installation_response <-commandsGET(paste("apps install app=\"", 
                                                cyto_app_toinstall[i],"\"", 
                                                sep=""))
      installation_responses <- c(installation_responses,installation_response)
    } else{
      installation_responses <- c(installation_responses,"already installed")
    }
  }
  installation_summary <- data.frame(name = cyto_app_toinstall, 
                                     status = installation_responses)
  
  knitr::kable(list(installation_summary),
               booktabs = TRUE, caption = 
                 'A Summary of automated app installation'
  )
}

#Retrieve networks for genes of interest
string_interaction<- paste('string protein query taxonID=9606 limit=150 
                           cutoff=0.9 query="',
                           paste(overlaped_genes, collapse=","),'"',sep="")
commandsGET(string_interaction

if(file.exists(initial_string_network_png_file_name)){
  #cytoscape hangs waiting for user response if file already exists. 
  #Remove it first
  response <- file.remove(initial_string_network_png_file_name)
} 

response <- exportImage(initial_string_network, type = "png")

Insulin signaling genes (blue colours) associations

NR4A3 associations with nuclear receptor family

Associations between insulin signaling genes from DEGs (blue colour),insulin signaling genes of PD map (red colour), the intrmediae candidates(white colours)and NR4A3 molecule(black colour)

1.3. Tissue specificity of insulin signaling pathway

1.3.1 Tissue xd score from KEGG annotation.

#plot the results
Tissue specificity %>%
  mutate(name = fct_reorder(name, desc(Tissue_xd_score))) %>%
  ggplot( aes(x=name, y=Tissue_xd_score)) +
  geom_bar(stat="identity", fill="RED", alpha=.6, width=.4) +
  coord_flip() +
  xlab("") +
  theme_bw()

Tissue specificity of insulin signaling

1.3.2. Tissue-specific gene enrichment using TissueEnrich package (GTEx and HPA)

##read the input genes
genes<-system.file("extdata", "inputGenes.txt", package = "TissueEnrich")
inputGenes<-scan(genes,character())

## Specify the geneIdType parameter in the input GeneSet object
gs<-GeneSet(geneIds=inputGenes,organism="Homo Sapiens",
            geneIdType=SymbolIdentifier())

## The output is a list object containing the enrichment results
output<-teEnrichment(inputGenes = gs)

### Exploring tissue-specific gene enrichment results
seEnrichmentOutput<-output[[1]]
enrichmentOutput<-setNames(data.frame(assay(seEnrichmentOutput),
                                      row.names = rowData(seEnrichmentOutput)[,1]), colData(seEnrichmentOutput)[,1])
enrichmentOutput$Tissue<-row.names(enrichmentOutput)
head(enrichmentOutput)

### Plot tissue-specific gene enrichment results
ggplot(enrichmentOutput,aes(x=reorder(Tissue,-fold.change),y=fold.change,
                            label = Tissue.Specific.Genes,fill = Tissue))+
  geom_bar(stat = 'identity')+
  labs(x='', y = 'Fold change')+
  theme_bw()+
  theme(legend.position="none")+
  theme(plot.title = element_text(hjust = 0.5,size = 20),
        axis.title = element_text(size=15))+
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        panel.grid.major= element_blank(),panel.grid.minor = element_blank())

### Retrieval of input tissue-specific genes
seGroupInf<-output[[3]][[""]]#specify the tissue you want
groupInf<-data.frame(assay(seGroupInf))
print(head(groupInf))

2. Enrichment in specific MSigDB human hallmark

# Retrieve data for a specific collection, such as the hallmark gene sets.
h_gene_sets = msigdbr(species = "human", category = "H")

# Use the gene sets data frame for enrichment with genes as gene symbols
msigdbr_t2g = h_gene_sets %>% dplyr::distinct(gs_name, gene_symbol) %>% 
  as.data.frame()
result <- enricher(gene = gene_symbols_vector, TERM2GENE = msigdbr_t2g)
Name P-value Adjusted p-value Odds Ratio Combined score
Epithelial Mesenchymal Transition 1.044E-16 5.011E-15 7.65 281.52
E2F Targets 1.046E-08 2.51E-07 4.86 89.32
G2-M Checkpoint 6.165E-05 0.0009864 3.38 32.74
Coagulation 0.0002328 0.001863 3.68 30.79
Mitotic Spindle 0.0002007 0.001863 3.16 26.9
Apical Junction 0.000212 0.001863 3.14 26.58
Apoptosis 0.002995 0.02054 2.82 16.41

3. Enrichment in Disease map

Enriched area p-val (adj)
Wnt signaling 0.000511479918396
Neuroinflammation 0.009952404929447
Postsynaptic terminal of dopaminergic neuron 0.022056074874101

The ovrlapped genes in Wnt-pi3k/AKT signaling in disease map

Discussion

Summary

The significant pathways of PINK1 and D2M are :