*The motivation is to investigate the functional associations between the differential expressed genes (DEGs) and the disease pathways, thus we get a better interpretation by applying both pathway and gene set enrichment analysis by enriching them in different data resources.
The aim of using pathway enrichment analysis is to consider the genetic interactions of DEGs and use the advantages of the size and directions of enriched pathways(KEGG,REACTOME). This will help to explain the downstream and pathway level effects.
While the gene set enrichment analysis was used to see whether there is any chance that the group of genes is related to the phenotype by over representation analysis.
We used a network based approach to create clusters of subnetworks from the provided DEGs based on their interactions which are described on stringDBs.
The created subnetworks were enriched in KEGG3 and REACTOME.
The similarity ranking pathways from the chosen databases were calculated by the network-based distance similarity (measured using the XD-score) of the DEGs.
This network similarity xd score measures the distance of genes set, with interactions, to the pathways.
The significance of the gene set overlap between the pathway and DEGs is measured by a Fisher-test.
We calculate the xd score of different tissue-types for each pathway.
The resulting pathways were also compared with experimentally verified databases such as innateDB5 and NCBI DBs.
The workflow of pathway and gene set enrichment analysis
def main():
import sys
#the filename below with the name of tab-delimited input network file
#(in human ENSEMBL gene format)
input_file = ""
opener = urllib2.build_opener(MultipartPostHandler)
params = { "upFile" : "yes",
"file" : open("C:/xampp/htdocs/enrichnet/"+input_file,
"rb") }
response = opener.open('http://lcsb-enrichnet-server.embl.de/enrichnet/
fileupload.php', params).read()
# the parameters to run the network-based enrichment analysis
#(replace them with your genes and pathways of interest)
genes = ''#put the gene_ids
idtype = '' #identify the gene ids type
pathdb = '' #specify the reference database
graph = 'own'
params = { "genes" : genes, "idtype" : idtype, "pathdb" : pathdb, "graph"
: graph }
url = 'http://elephant.embl.de/enrichnet/index.php'
params = urllib.urlencode({ 'genes':genes, 'idtype':idtype, 'pathdb':pathdb,
'graph':graph})
response = urllib2.urlopen(url+'?'+params).read()
print "The results will appear below:"
print response
| Annotation (pathway/process) | XD-score | Fisher q-value | Pathway size | Overlap size |
|---|---|---|---|---|
| hsa00603:Glycosphingolipid biosynthesis - globo series | 2.83 | 0.027 | 14 | 5 |
| hsa00604:Glycosphingolipid biosynthesis - ganglio series | 1.55 | 1 | 14 | 3 |
| hsa04950:Maturity onset diabetes of the young | 0.37 | 1 | 24 | 2 |
#Function to link and install the used packages between R and cytoscape
installation_responses <- c()
cyto_app_toinstall <- c("stringapp","genemania")
cytoscape_version <- unlist(strsplit( cytoscapeVersionInfo()['cytoscapeVersion']
,split = "\\."))
if(length(cytoscape_version) == 3 && as.numeric(cytoscape_version[1]>=3)
&& as.numeric(cytoscape_version[2]>=7)){
for(i in 1:length(cyto_app_toinstall)){
#check to see if the app is installed.
#Only install it if it hasn't been installed
if(!grep(commandsGET(paste("apps status app=\"", cyto_app_toinstall[i],"\"",
sep="")),
pattern = "status: Installed")){
installation_response <-commandsGET(paste("apps install app=\"",
cyto_app_toinstall[i],"\"",
sep=""))
installation_responses <- c(installation_responses,installation_response)
} else{
installation_responses <- c(installation_responses,"already installed")
}
}
installation_summary <- data.frame(name = cyto_app_toinstall,
status = installation_responses)
knitr::kable(list(installation_summary),
booktabs = TRUE, caption =
'A Summary of automated app installation'
)
}
#Retrieve networks for genes of interest
string_interaction<- paste('string protein query taxonID=9606 limit=150
cutoff=0.9 query="',
paste(overlaped_genes, collapse=","),'"',sep="")
commandsGET(string_interaction
if(file.exists(initial_string_network_png_file_name)){
#cytoscape hangs waiting for user response if file already exists.
#Remove it first
response <- file.remove(initial_string_network_png_file_name)
}
response <- exportImage(initial_string_network, type = "png")
Glycosphingolipid biosynthesis genes (blue colours) associations
Associations between glycosphingolipid biosynthesis from DEGs (blue colour),insulin signaling genes of PD map (red colour), the intrmediae candidates(white colours)
#plot the results
Tissue specificity %>%
mutate(name = fct_reorder(name, desc(Tissue_xd_score))) %>%
ggplot( aes(x=name, y=Tissue_xd_score)) +
geom_bar(stat="identity", fill="RED", alpha=.6, width=.4) +
coord_flip() +
xlab("") +
theme_bw()
Tissue specificity of insulin signaling
##read the input genes
genes<-system.file("extdata", "inputGenes.txt", package = "TissueEnrich")
inputGenes<-scan(genes,character())
## Specify the geneIdType parameter in the input GeneSet object
gs<-GeneSet(geneIds=inputGenes,organism="Homo Sapiens",
geneIdType=SymbolIdentifier())
## The output is a list object containing the enrichment results
output<-teEnrichment(inputGenes = gs)
### Exploring tissue-specific gene enrichment results
seEnrichmentOutput<-output[[1]]
enrichmentOutput<-setNames(data.frame(assay(seEnrichmentOutput),
row.names = rowData(seEnrichmentOutput)[,1]), colData(seEnrichmentOutput)[,1])
enrichmentOutput$Tissue<-row.names(enrichmentOutput)
head(enrichmentOutput)
### Plot tissue-specific gene enrichment results
ggplot(enrichmentOutput,aes(x=reorder(Tissue,-fold.change),y=fold.change,
label = Tissue.Specific.Genes,fill = Tissue))+
geom_bar(stat = 'identity')+
labs(x='', y = 'Fold change')+
theme_bw()+
theme(legend.position="none")+
theme(plot.title = element_text(hjust = 0.5,size = 20),
axis.title = element_text(size=15))+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
panel.grid.major= element_blank(),panel.grid.minor = element_blank())
### Retrieval of input tissue-specific genes
seGroupInf<-output[[3]][[""]]#specify the tissue you want
groupInf<-data.frame(assay(seGroupInf))
print(head(groupInf))
Tissue specificity of insulin signaling
# Retrieve data for a specific collection, such as the hallmark gene sets.
h_gene_sets = msigdbr(species = "human", category = "H")
# Use the gene sets data frame for enrichment with genes as gene symbols
msigdbr_t2g = h_gene_sets %>% dplyr::distinct(gs_name, gene_symbol) %>%
as.data.frame()
result <- enricher(gene = gene_symbols_vector, TERM2GENE = msigdbr_t2g)
| name, | Adjusted_p_value |
|---|---|
| TNF-alpha Signaling via NF-kB, | 0.04583 |
| Epithelial Mesenchymal Transition | 0.04583 |
Tissue specificity of insulin signaling
| Enriched area | p-val (adj) |
|---|---|
| Dopamine secretion and recycling | 0.000136634450016 |
| Vesicle dynamics | 0.016653051815289 |
| Synaptic terminal of striatal neuron | 0.031868828564528 |
The ovrlapped genes in Wnt-pi3k/AKT signaling in disease map
The results revealed that there is dysregulation of different beta galactosyltransferase regulating genes for examples B3GALNT1 which enriched in glycosphingolipids biosynthesis in ganglio and globo series. Glycosphingolipids are a well-defined subclass of lipids that regulate crucial aspects of brain function and recently emerged as potent regulators of the inflammatory process. The deregulation of galactosyltransferase was reported in Parkinson’s disease. B3GALNT1 and ST3GAL1 were significantly associated with the disease type 2 diabetes mellitus in GWAS and other genetic association datasets from the GWASdb SNP-Disease Associations datasetPMID: 22139925.
B3GALNT1 was found to interact with RET indirectly through a docking protein called DOK6 and GDNF family receptor alpha 1 which were reported in type 2 diabetes mellitus in GWAS.
*The insulin signalling genes in Parkinson’s disease map including IGF1R, IRS1, RET interacted with B3GALNT1, ST3GAL1 and ST8SIA1 through GDNF family receptor alpha 1 GFRA1.
IAPP was enriched in the maturity-onset diabetes pathway, it was reported to be a common interplayer in the crosstalks between type-2 diabetes and Parkinson’s disease PMID:27791129(https://pubmed.ncbi.nlm.nih.gov/27791129/). In type-2 diabetes (T2D) and Parkinson’s disease (PD), polypeptide assembly into amyloid fibres plays central roles: in PD, α-synuclein (aS) forms amyloids and in T2D, amylin [islet amyloid polypeptide (IAPP)] forms amyloids.
The deficiency of NEUROD1 was reported to cause diabetes PMID: 28664602. However, the expression of this gene increased in response to the GBA N307 mutation.
The tissue specificity annotations are retrieved from all non-overlapping gene pairs, which are connected through paths of interactions in a molecular network. Specifically, pairs of genes that have been found co-expressed in a cell are considered to have the potential to interact physically in the same tissue, whereas all other gene pairs are assumed to lack this potential. In the glycosphingolipid biosynthesis pathways, liver, dorsal root ganglion and trigeminal ganglion were the most common related tissues, suggesting that the interactions of the overlapped genes with their coexpressed molecules have a potential to interact physically in those tissues.
The significant pathways of GBA N307S are :
| Annotation (pathway/process) | XD-score | Fisher q-value | Pathway size | Overlap size |
|---|---|---|---|---|
| hsa00603:Glycosphingolipid biosynthesis - globo series | 2.83 | 0.027 | 14 | 5 |
| hsa00604:Glycosphingolipid biosynthesis - ganglio series | 1.55 | 1 | 14 | 3 |
| hsa04512:ECM-receptor interaction | 1.13 | 0.0038 | 83 | 14 |
| hsa00601:Glycosphingolipid biosynthesis - lacto and neolacto series | 1.06 | 1 | 25 | 4 |
| hsa00512:O-Glycan biosynthesis | 0.82 | 1 | 30 | 4 |
| hsa00260:Glycine, serine and threonine metabolism | 0.78 | 1 | 31 | 4 |
| hsa00360:Phenylalanine metabolism | 0.74 | 1 | 16 | 2 |
| hsa00340:Histidine metabolism | 0.58 | 1 | 28 | 3 |
| hsa00380:Tryptophan metabolism | 0.54 | 1 | 39 | 4 |
| hsa00020:Citrate cycle (TCA cycle) | 0.52 | 1 | 30 | 3 |
| hsa04540:Gap junction | 0.45 | 1 | 86 | 8 |
| hsa04920:Adipocytokine signaling pathway | 0.42 | 1 | 67 | 6 |
| hsa00310:Lysine degradation | 0.40 | 1 | 46 | 4 |
| hsa04610:Complement and coagulation cascades | 0.40 | 1 | 69 | 6 |
| hsa04950:Maturity onset diabetes of the young | 0.37 | 1 | 24 | 2 |
| hsa04146:Peroxisome | 0.35 | 1 | 74 | 6 |
| hsa00563:Glycosylphosphatidylinositol(GPI)-anchor biosynthesis | 0.34 | 1 | 25 | 2 |
| hsa00330:Arginine and proline metabolism | 0.31 | 1 | 52 | 4 |
| hsa04960:Aldosterone-regulated sodium reabsorption | 0.31 | 1 | 39 | 3 |
| hsa05100:Bacterial invasion of epithelial cells | 0.28 | 1 | 68 | 5 |
| hsa00982:Drug metabolism - cytochrome P450 | 0.28 | 1 | 68 | 5 |
| hsa04722:Neurotrophin signaling pathway | 0.27 | 1 | 124 | 9 |
| hsa02010:ABC transporters | 0.24 | 1 | 43 | 3 |
| hsa00520:Amino sugar and nucleotide sugar metabolism | 0.23 | 1 | 44 | 3 |
| hsa00480:Glutathione metabolism | 0.23 | 1 | 44 | 3 |
| hsa04662:B cell receptor signaling pathway | 0.22 | 1 | 74 | 5 |
| hsa04144:Endocytosis | 0.22 | 1 | 194 | 13 |
| hsa00533:Glycosaminoglycan biosynthesis - keratan sulfate | 0.22 | 1 | 15 | 1 |
| hsa04330:Notch signaling pathway | 0.19 | 1 | 47 | 3 |
| hsa04080:Neuroactive ligand-receptor interaction | 0.19 | 1 | 269 | 17 |