Motivation

*The motivation is to investigate the functional associations between the differential expressed genes (DEGs) and the disease pathways, thus we get a better interpretation by applying both pathway and gene set enrichment analysis by enriching them in different data resources.

Aim

Workflow

Functional pathway enrichment analysis

  • We used a network based approach to create clusters of subnetworks from the provided DEGs based on their interactions which are described on stringDBs.

  • The created subnetworks were enriched in KEGG3 and REACTOME.

  • The similarity ranking pathways from the chosen databases were calculated by the network-based distance similarity (measured using the XD-score) of the DEGs.

  • This network similarity xd score measures the distance of genes set, with interactions, to the pathways.

  • The significance of the gene set overlap between the pathway and DEGs is measured by a Fisher-test.

  • We calculate the xd score of different tissue-types for each pathway.

  • The resulting pathways were also compared with experimentally verified databases such as innateDB5 and NCBI DBs.

Gene set enrichment analysis

  • The entire DEGs (without interactions) were enriched in parkinson’s disease map and MSigDB hallmark 2020 to identify the sets in which the DEGs are overrepresented or underrepresented in pathways. We specified the gene set enrichment into the disease map and the hallmarks because they reduce the noise and redundancy and provide a better understanding for the biological interpretations.

The workflow of pathway and gene set enrichment analysis

Results

1. Enrichment analysis using Enrichnet APIs:

def main():
    import sys 
#the filename below with the name of tab-delimited input network file
#(in human ENSEMBL gene format)
    input_file = "" 
    
    opener = urllib2.build_opener(MultipartPostHandler)
    
    params = { "upFile" : "yes",
                         "file" : open("C:/xampp/htdocs/enrichnet/"+input_file,
                                       "rb") }
  
    response = opener.open('http://lcsb-enrichnet-server.embl.de/enrichnet/
                           fileupload.php', params).read()
    
    
# the parameters to run the network-based enrichment analysis
#(replace them with your genes and pathways of interest)
    genes = ''#put the gene_ids
    idtype = '' #identify the gene ids type
    pathdb = '' #specify the reference database
    graph = 'own'
    params = { "genes" : genes, "idtype" : idtype, "pathdb" : pathdb, "graph" 
      : graph }
    
    url = 'http://elephant.embl.de/enrichnet/index.php'
    params = urllib.urlencode({ 'genes':genes, 'idtype':idtype, 'pathdb':pathdb,
      'graph':graph})
    
    response = urllib2.urlopen(url+'?'+params).read()
    
    print "The results will appear below:"
    print response

1.1. Siginficant pathways of GBA N307S

Annotation (pathway/process) XD-score Fisher q-value Pathway size Overlap size
hsa00603:Glycosphingolipid biosynthesis - globo series 2.83 0.027 14 5
hsa00604:Glycosphingolipid biosynthesis - ganglio series 1.55 1 14 3
hsa04950:Maturity onset diabetes of the young 0.37 1 24 2

1.2. Find the associations between the overlaped gene(s) and PD map

  • I used the cytoscape versions of two different tools (STRING and Genemania) to facilitate the network data retrieval and arrange the nodes with conceptual layouts.
  • STRING is molecular interaction database was used to find the associations between the overlapped gene(s)and PD map
  • Genemania contain a wealth of human genomics and proteomics database from NCBI database.
#Function to link and install the used packages between R and cytoscape
installation_responses <- c()
cyto_app_toinstall <- c("stringapp","genemania")

cytoscape_version <- unlist(strsplit( cytoscapeVersionInfo()['cytoscapeVersion']
                                      ,split = "\\."))
if(length(cytoscape_version) == 3 && as.numeric(cytoscape_version[1]>=3) 
   && as.numeric(cytoscape_version[2]>=7)){
  for(i in 1:length(cyto_app_toinstall)){
    #check to see if the app is installed.
    #Only install it if it hasn't been installed
    if(!grep(commandsGET(paste("apps status app=\"", cyto_app_toinstall[i],"\"",
                               sep="")), 
             pattern = "status: Installed")){
      installation_response <-commandsGET(paste("apps install app=\"", 
                                                cyto_app_toinstall[i],"\"", 
                                                sep=""))
      installation_responses <- c(installation_responses,installation_response)
    } else{
      installation_responses <- c(installation_responses,"already installed")
    }
  }
  installation_summary <- data.frame(name = cyto_app_toinstall, 
                                     status = installation_responses)
  
  knitr::kable(list(installation_summary),
               booktabs = TRUE, caption = 
                 'A Summary of automated app installation'
  )
}

#Retrieve networks for genes of interest
string_interaction<- paste('string protein query taxonID=9606 limit=150 
                           cutoff=0.9 query="',
                           paste(overlaped_genes, collapse=","),'"',sep="")
commandsGET(string_interaction

if(file.exists(initial_string_network_png_file_name)){
  #cytoscape hangs waiting for user response if file already exists. 
  #Remove it first
  response <- file.remove(initial_string_network_png_file_name)
} 

response <- exportImage(initial_string_network, type = "png")

Glycosphingolipid biosynthesis genes (blue colours) associations

Associations between glycosphingolipid biosynthesis from DEGs (blue colour),insulin signaling genes of PD map (red colour), the intrmediae candidates(white colours)

1.3. Tissue specificity of insulin signaling pathway

1.3.1 Tissue xd score from KEGG annotation.

#plot the results
Tissue specificity %>%
  mutate(name = fct_reorder(name, desc(Tissue_xd_score))) %>%
  ggplot( aes(x=name, y=Tissue_xd_score)) +
  geom_bar(stat="identity", fill="RED", alpha=.6, width=.4) +
  coord_flip() +
  xlab("") +
  theme_bw()

Tissue specificity of insulin signaling

1.3.2. Tissue-specific gene enrichment using TissueEnrich package (GTEx and HPA)

##read the input genes
genes<-system.file("extdata", "inputGenes.txt", package = "TissueEnrich")
inputGenes<-scan(genes,character())

## Specify the geneIdType parameter in the input GeneSet object
gs<-GeneSet(geneIds=inputGenes,organism="Homo Sapiens",
            geneIdType=SymbolIdentifier())

## The output is a list object containing the enrichment results
output<-teEnrichment(inputGenes = gs)

### Exploring tissue-specific gene enrichment results
seEnrichmentOutput<-output[[1]]
enrichmentOutput<-setNames(data.frame(assay(seEnrichmentOutput),
                                      row.names = rowData(seEnrichmentOutput)[,1]), colData(seEnrichmentOutput)[,1])
enrichmentOutput$Tissue<-row.names(enrichmentOutput)
head(enrichmentOutput)

### Plot tissue-specific gene enrichment results
ggplot(enrichmentOutput,aes(x=reorder(Tissue,-fold.change),y=fold.change,
                            label = Tissue.Specific.Genes,fill = Tissue))+
  geom_bar(stat = 'identity')+
  labs(x='', y = 'Fold change')+
  theme_bw()+
  theme(legend.position="none")+
  theme(plot.title = element_text(hjust = 0.5,size = 20),
        axis.title = element_text(size=15))+
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        panel.grid.major= element_blank(),panel.grid.minor = element_blank())

### Retrieval of input tissue-specific genes
seGroupInf<-output[[3]][[""]]#specify the tissue you want
groupInf<-data.frame(assay(seGroupInf))
print(head(groupInf))

Tissue specificity of insulin signaling

2. Enrichment in specific MSigDB human hallmark

# Retrieve data for a specific collection, such as the hallmark gene sets.
h_gene_sets = msigdbr(species = "human", category = "H")

# Use the gene sets data frame for enrichment with genes as gene symbols
msigdbr_t2g = h_gene_sets %>% dplyr::distinct(gs_name, gene_symbol) %>% 
  as.data.frame()
result <- enricher(gene = gene_symbols_vector, TERM2GENE = msigdbr_t2g)
name, Adjusted_p_value
TNF-alpha Signaling via NF-kB, 0.04583
Epithelial Mesenchymal Transition 0.04583

Tissue specificity of insulin signaling

3. Enrichment in Disease map

Enriched area p-val (adj)
Dopamine secretion and recycling 0.000136634450016
Vesicle dynamics 0.016653051815289
Synaptic terminal of striatal neuron 0.031868828564528

The ovrlapped genes in Wnt-pi3k/AKT signaling in disease map

Discussion

*The insulin signalling genes in Parkinson’s disease map including IGF1R, IRS1, RET interacted with B3GALNT1, ST3GAL1 and ST8SIA1 through GDNF family receptor alpha 1 GFRA1.

Summary

The significant pathways of GBA N307S are :

Annotation (pathway/process) XD-score Fisher q-value Pathway size Overlap size
hsa00603:Glycosphingolipid biosynthesis - globo series 2.83 0.027 14 5
hsa00604:Glycosphingolipid biosynthesis - ganglio series 1.55 1 14 3
hsa04512:ECM-receptor interaction 1.13 0.0038 83 14
hsa00601:Glycosphingolipid biosynthesis - lacto and neolacto series 1.06 1 25 4
hsa00512:O-Glycan biosynthesis 0.82 1 30 4
hsa00260:Glycine, serine and threonine metabolism 0.78 1 31 4
hsa00360:Phenylalanine metabolism 0.74 1 16 2
hsa00340:Histidine metabolism 0.58 1 28 3
hsa00380:Tryptophan metabolism 0.54 1 39 4
hsa00020:Citrate cycle (TCA cycle) 0.52 1 30 3
hsa04540:Gap junction 0.45 1 86 8
hsa04920:Adipocytokine signaling pathway 0.42 1 67 6
hsa00310:Lysine degradation 0.40 1 46 4
hsa04610:Complement and coagulation cascades 0.40 1 69 6
hsa04950:Maturity onset diabetes of the young 0.37 1 24 2
hsa04146:Peroxisome 0.35 1 74 6
hsa00563:Glycosylphosphatidylinositol(GPI)-anchor biosynthesis 0.34 1 25 2
hsa00330:Arginine and proline metabolism 0.31 1 52 4
hsa04960:Aldosterone-regulated sodium reabsorption 0.31 1 39 3
hsa05100:Bacterial invasion of epithelial cells 0.28 1 68 5
hsa00982:Drug metabolism - cytochrome P450 0.28 1 68 5
hsa04722:Neurotrophin signaling pathway 0.27 1 124 9
hsa02010:ABC transporters 0.24 1 43 3
hsa00520:Amino sugar and nucleotide sugar metabolism 0.23 1 44 3
hsa00480:Glutathione metabolism 0.23 1 44 3
hsa04662:B cell receptor signaling pathway 0.22 1 74 5
hsa04144:Endocytosis 0.22 1 194 13
hsa00533:Glycosaminoglycan biosynthesis - keratan sulfate 0.22 1 15 1
hsa04330:Notch signaling pathway 0.19 1 47 3
hsa04080:Neuroactive ligand-receptor interaction 0.19 1 269 17