MeSH Analysis Tutorial

Why using MeSH?

Whole-genome sequencing studies have accelerated in recent decades, mainly due to the low cost of currently available technologies. As a result, the volume of data is increasing rapidly, and complete and incomplete genomes are now available for many species. Bioconductor has a large number of resources for genetic markers and genes, as well as metabolic pathways, gene ontology, and annotations, that can be used to analyze and comprehend genomic data. Annotation tools such as MeSH (Medical Subject Headings) make these procedures easier. MeSH is a comprehensive collection of annotations, that are annually updated, containing more than 27,000 clinical and biological annotations for more than 70 species, including plant and livestock (Morota et al. 2015, 2016; Beissinger; Morota, 2017). MeSH software packages (Tsuyuzaki et al. 2015) have been successfully applied for swine, cattle, horses, chicken, and turkey (Morota et al. 2015, 2016; Júnior et al. 2017; Abdalla et al. 2020). These studies demonstrated MeSH utility in improving the biological interpretation of gene sets, and understand the genetic architecture of complex traits in domestic animals.

1. Download and Install required R/Bioconductor packages

BiocManager::install("biomaRt")
BiocManager::install("AnnotationHub", ask = FALSE)
BiocManager::install("meshr", ask = FALSE)
BiocManager::install("MeSHDbi")
install.packages("readxl")  # Read Excel Files

Load required packages

library("meshr")
library("MeSHDbi")
library("AnnotationHub")

ah <- AnnotationHub()  # metadata of all the data stored in the AnnotationHub server

Create a vector of background genes

We first create a vector of background genes.

## access to biomaRt
mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "btaurus_gene_ensembl",
    host = "www.ensembl.org")
univ.geneID <- getBM(attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"),
    mart = mart)  # 25898  
## remove genes with no corresponding Entrez Gene ID
univ.geneID2 <- univ.geneID[!is.na(univ.geneID[, 2]), ]  # 18279 
## remove duplicated Entrez Gene ID
univ.geneID3 <- univ.geneID2[!duplicated(univ.geneID2[, 2]), ]  # 17745
# or load univgeneID3.Rda
load("univgeneID3.Rda")

Note that if this code can be easily switched to any species by changing the “btaurus_gene_ensembl” parameter on mart.

Download and read input file

In this tutorial, we are going to use data from a RNA-seq study by Li et al. (2015). The authors reported a list of differentially expressed genes in the ruminal wall of grass-fed and grain-fed Angus cattle. Please download the file S3 Table from the journal website.

## read data
my.geneID <- read_excel("pone.0116437.s004.XLS", col_names = TRUE, sheet = 1, na = "NA")  # 342
colnames(my.geneID)[1] <- "ensembl_gene_id"  # changing the 1st column name to 'ensembl_gene_id'.

## merge two files
my.geneID2 <- merge(my.geneID, univ.geneID3, by = "ensembl_gene_id")  # 307
## remove duplicated Entrez Gene ID
my.geneID3 <- my.geneID2[!duplicated(my.geneID2$entrezgene), ]  # 307

Access to MeSH data on AnnotationHub

MeSH terms are designated by articles indexed in PubMed by the National Library of Medicine and can be directly mapped to genes for the development of gene annotations. There are currently 19 categories.

Among the categories used in MeSH, in the animal context, five of them are most important: * A (Anatomy), * B (Organisms), * C (Diseases), * D (Chemicals and Drugs), * G (Phenomena and Process).

The specifications of the MeSH ORA framework have changed in the new Bioconductor release. All annotation packages will no longer be distributed, and the policy will be changed to one in which the data will be stored on a cloud server called AnnotationHub, and users will only be able to access it when it is necessary. In this new release, data reproducibility is ensured (e.g. users can specify versions of the data, such as “v001”).

The first step is to download MeSH annotated genes for Bos taurus from the cloud. We will first download MeSH datasets using the AnnotationHub package, which returns the metadata of all the data stored in the AnnotationHub server.

library("AnnotationHub")
ah <- AnnotationHub()

Creating the Cattle dataset - MeSH.Bta.eg.db

dbfile1 <- query(ah, c("MeSHDB", "Bos taurus", "v002"))[[1]]
MeSH.Bta.eg.db <- MeSHDbi::MeSHDb(dbfile1)  # converting SQLite file into MESHDb object
MeSH.Bta.eg.db

Note that if this code can be easily switched to any species by changing the “Bos taurus” parameter on dbfile1.

MeSH Enrichment Analysis

MeSH enrichment analysis can be performed using the meshr package. Let’s perfom MeSH-ORA for the category Chemicals and Drugs by setting category=“D”.

library(meshr)
# Chemicals and Drugs
meshParams.D <- new("MeSHHyperGParams",
                    geneIds = my.geneID3[, 6],
                    universeGeneIds = univ.geneID3[,2],
                    annotation = "MeSH.Bta.eg.db", # correspondence between MeSH IDs and Entrez Gene IDs
                    meshdb = "MeSH.db", 
                    category = "D",
                    database = "gene2pubmed", # data from PubMed
                    pvalueCutoff = 0.5, pAdjust = "none")
meshR.D <- meshHyperGTest(meshParams.D)

head(summary(meshR.D)) # check your results

Users can easily change MeSH-ORA category by changing the category parameter. Let’s perfom MeSH-ORA for Diseases category by changing “D” for “C”.

meshParams.C <- new("MeSHHyperGParams", geneIds = my.geneID3[, 6], universeGeneIds = univ.geneID3[,
    2], annotation = "MeSH.Bta.eg.db", meshdb = "MeSH.db", category = "C", database = "gene2pubmed",
    pvalueCutoff = 0.5, pAdjust = "none")
meshR.C <- meshHyperGTest(meshParams.C)

head(summary(meshR.C))  # check your results

MeSH-ORA for Anatomy (category = “A”).

meshParams.C <- new("MeSHHyperGParams", geneIds = my.geneID3[, 6], universeGeneIds = univ.geneID3[,
    2], annotation = "MeSH.Bta.eg.db", meshdb = "MeSH.db", category = "A", database = "gene2pubmed",
    pvalueCutoff = 0.5, pAdjust = "none")
meshR.C <- meshHyperGTest(meshParams.C)

head(summary(meshR.C))  # check your results

MeSH ORA for Phenomena and Processes (category = “G”).

meshParams.C <- new("MeSHHyperGParams", geneIds = my.geneID3[, 6], universeGeneIds = univ.geneID3[,
    2], annotation = "MeSH.Bta.eg.db", meshdb = "MeSH.db", category = "G", database = "gene2pubmed",
    pvalueCutoff = 0.5, pAdjust = "none")
meshR.C <- meshHyperGTest(meshParams.C)

head(summary(meshR.C))  # check your results

Visualization for MeSH Enrichment Analysis

Download the packages to plot your results.

install.packages("tagcloud")
install.packages("RColorBrewer")

Load the packages

library(tagcloud)
library(RColorBrewer)

tagcloud creates word clouds or weighted lists. Usually, a large number of words is displayed in size correlated with a numerical value, in our case, the frequency of enrichment of a MeSH term.

Let’s create a Word Cloud for Chemicals and Drugs category.

meshR <- meshR.D  # use  your meshR.D results from the Mesh-ORA analysis
meshR.summary <- summary(meshR)[!duplicated(summary(meshR)[, 7]), ]  # extract unique elements
tags <- strmultline(meshR.summary$MESHTERM)
weights <- -log(meshR.summary$Pvalue)
or <- meshR.summary$OddsRatio
colors <- smoothPalette(or, max = 4)
tagcloud(tags, weights = weights, col = colors)

# Here we are using RColorBrewer to include colors in the word cloud.
colors <- smoothPalette(weights, pal = brewer.pal(11, "Spectral"))
tagcloud(tags, weights = weights, col = colors, order = "size")

Note: This is a suggested color palette. Users are free to choose any color palette. More options can be found at https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html.

# Diseases
meshR <- meshR.C
meshR.summary <- summary(meshR)[!duplicated(summary(meshR)[, 7]), ]
tags <- strmultline(meshR.summary$MESHTERM)
weights <- -log(meshR.summary$Pvalue)
or <- meshR.summary$OddsRatio
colors <- smoothPalette(or, max = 4)
tagcloud(tags, weights = weights, col = colors)

colors <- smoothPalette(weights, pal = brewer.pal(11, "Spectral"))
tagcloud(tags, weights = weights, col = colors, order = "size")

# Anatomy
meshR <- meshR.A
meshR.summary <- summary(meshR)[!duplicated(summary(meshR)[, 7]), ]
tags <- strmultline(meshR.summary$MESHTERM)
weights <- -log(meshR.summary$Pvalue)
or <- meshR.summary$OddsRatio
colors <- smoothPalette(or, max = 4)
tagcloud(tags, weights = weights, col = colors)

colors <- smoothPalette(weights, pal = brewer.pal(11, "Spectral"))
tagcloud(tags, weights = weights, col = colors, order = "size")

# Phenomena and Processes
meshR <- meshR.G
meshR.summary <- summary(meshR)[!duplicated(summary(meshR)[, 7]), ]
tags <- strmultline(meshR.summary$MESHTERM)
weights <- -log(meshR.summary$Pvalue)
or <- meshR.summary$OddsRatio
colors <- smoothPalette(or, max = 4)
tagcloud(tags, weights = weights, col = colors)

colors <- smoothPalette(weights, pal = brewer.pal(11, "Spectral"))
tagcloud(tags, weights = weights, col = colors, order = "size")

Final Considerations

Functional annotation is one of the most important processes in the in silico analysis of genetic material. The number of sequences deposited in the databases demands functional analysis, to name the gene products and, thus, understand the biological processes of the studied organisms and guide further research. Given that MeSH database is updated annually to reflect changes in medicine and medical terminology, we believe that MeSH enrichment analysis may help to provide new insights into the biological interpretation of genes influencing complex traits in livestock species.