Research Data in Zebrafish Optic Nerve Regeneration was chosen from the Metabolomics Workbench site. The Positive and Negative Ion named metabolite data was obtained and formatted for analysis
library(devtools)
library(impute)
library(pcaMethods)
library(globaltest)
library(GlobalAncova)
library(Rgraphviz)
library(preprocessCore)
library(genefilter)
library(SSPA)
library(sva)
library(limma)
library(KEGGgraph)
library(siggenes)
library(MSnbase)
library(multtest)
library(RBGL)
library(edgeR)
library(fgsea)
library(MetaboAnalystR)
library(magick)
library(ggplot2)
library(pheatmap)
library(ellipse)
library(ggrepel)
library(pls)
library(knitr)
library(ComplexHeatmap)
library(circlize)
setwd("/Users/ardicsahin/Desktop/MetabolomicsTrialRun")
mSet <- InitDataObjects("pktable", "stat", FALSE)
## Starting Rserve:
## /Library/Frameworks/R.framework/Resources/bin/R CMD /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rserve/libs//Rserve --no-save
##
## [1] "MetaboAnalyst R objects initialized ..."
mSet <- Read.TextData(mSet, "/Users/ardicsahin/Desktop/MetabolomicsTrialRun/neggroupNEW.csv", "colu", "disc")
mSet <- SanityCheckData(mSet)
## [1] "Successfully passed sanity check!"
## [2] "Samples are not paired."
## [3] "2 groups were detected in samples."
## [4] "Only English letters, numbers, underscore, hyphen and forward slash (/) are allowed."
## [5] "<font color=\"orange\">Other special characters or punctuations (if any) will be stripped off.</font>"
## [6] "All data values are numeric."
## [7] "<font color=\"red\"> 3 features with a constant or single value across samples were found and deleted.</font>"
## [8] "A total of 0 (0%) missing values were detected."
## [9] "<u>By default, missing values will be replaced by 1/5 of min positive values of their corresponding variables</u>"
## [10] "Click the <b>Proceed</b> button if you accept the default practice;"
## [11] "Or click the <b>Missing Values</b> button to use other methods."
print(mSet[["msgSet"]][["read.msg"]])
## [1] "Samples are in columns and features in rows."
## [2] "The uploaded file is in comma separated values (.csv) format."
## [3] "The uploaded data file contains 12 (samples) by 326 (peaks(mz/rt)) data matrix."
mSet <- ReplaceMin(mSet)
To prevent specific metabolite peak intensities from dominating across all the data, the data was normalized for further analysis.Normalization of the raw data aims to make signals comparable without unwanted variation caused by different environmental sources. The normalization of the samples is done in three steps: Sample normalization, data transformation and data scaling.Sample normalization corrects the differences in total signal between samples. For peak intensity LC-MS samples, normalization by sum is used since it divides each feature intensity by the total intensity for that sample so it can be used to compare total ion intensities. Data transformation reduces the difference between the raw peak intensities to create a more normalized data set. Log base 10 transformation is used to compress the high intensity values and expand the low intensity values, stabilizing the variance across metabolites. Data scaling equalises the amount each metabolite contributes to the variance, but does not overexpress them. Paretro Scaling works by mean-centering and dividing by the square root of the standard deviation of each variable, keeping medium-scale differences but preventing overexpression of a specific metabolite and does not distort the data structure. All three of these steps turn the filtered raw data into a form which can be used in analysis. Figure 1 represents the metabolite peak intensities among all identified metabolites, without normalization some metabolites would overdominate the results. Figure 2 represents the distribution of metabolite intensities between groups.
mSet<-PreparePrenormData(mSet)
mSet <- Normalization(mSetObj = mSet, rowNorm = "SumNorm", transNorm = "LogNorm", scaleNorm = "ParetoNorm")
mSet <- PlotNormSummary(mSet,imgName = "NegNorm", format = "png", dpi = 72, width = 10.5)
mSet <- PlotSampleNormSummary(mSet, imgName = "NegSampleNorm", format = "png", dpi = 72, width = 10.5)
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NegNormdpi72.png")
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NegSampleNormdpi72.png")
PCA (Principle Component Analysis) is an analysis technique that reduces the number of variables in a dataset by transforming correlated variables into a smaller set of variables/ principal components. PCA is a mathematical rotation. It finds the “direction” in your data that has the maximum variance (PC1), then the next orthogonal direction with the second most variance (PC2), and so on.It compresses your thousands of metabolites into just a few “Scores” (PC1, PC2) to visualize global patterns without needing to know the group labels beforehand (“Unsupervised”). The smaller number of variables gives a clearer image of natural clustering patterns and group separation. PC1 is the direction where the data points vary the most and it has the largest percentage of information retained from the normalized data. For the scatter plot analysis, PC1, PC2, PC3, PC4 and PC5 were used. Two 2D scores plots were generated, one between PC1 and PC2 and one between PC3 and PC4. The PC1 vs PC2 2D scores plot gives the most critical view of the data. Since the plot does not show clear seperation, the difference between groups could be subtle.The PC3 vs PC4 2D scores plot gives the a minor view of the data which can be used to indicate subtler differences.
mSet <- PCA.Anal(mSet)
mSet <-PlotPCAPairSummary(mSet, "NEG_PCA_mSet_Summary", format = "png", dpi = 72, width=NA, 5)
mSet <- PlotPCA2DScore(mSetObj = mSet, imgName = "NEG_PCA_mSet_2DScores_1_2", format = "png", dpi = 72
, width = NA, pcx = 1, pcy = 2, reg = 0.95, show = 1, grey.scale = 0)
mSet <- PlotPCA2DScore(mSetObj = mSet, imgName = "NEG_PCA_mSet_2DScores_3_4", format = "png", dpi = 72
, width = NA, pcx = 3, pcy = 4, reg = 0.95, show = 1, grey.scale = 0)
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEG_PCA_mSet_Summarydpi72.png")
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEG_PCA_mSet_2DScores_1_2dpi72.png")
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEG_PCA_mSet_2DScores_3_4dpi72.png")
ANOVA compares the variance between groups to the variance within groups by calculating the F-statistic. If F is large, the groups are statistically different. To determine which groups differ from which groups, the Post-Hoc test is performed. Post_Hoc performs several t-test between groups for each possible combination to find the source of variation between groups.ANOVA can detect the individual metabolites and the plot displays the top significant metabolites sorted by p-value which are specific molecules that change the most consistently between your conditions.
mSet <- ANOVA.Anal(mSet, F, 0.05, "fisher")
## [1] "A total of 0 significant features were found."
mSet <- PlotANOVA(mSet, "NEG_aov_0_", "png", 72, width=NA)
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEG_aov_0_dpi72.png")
PLS-DA looks for variance that maximizes the separation between your groups with supervised classification, resulting in clearer separation of groups. Since PLS-DA knows the groups, it can seperate groups clearly compared to PCA, which might indicate a false separation. VIP scores calculate the importance of each variable/metabolite in the separation of groups. Metabolites with VIP scores 1< are plotted to easily interpret the data.
mSet<-PLSR.Anal(mSet, reg=FALSE)
mSet <-PLSDA.CV(mSet)
##
## Attaching package: 'caret'
## The following object is masked from 'package:pls':
##
## R2
## The following object is masked from 'package:MetaboAnalystR':
##
## splsda
## The following object is masked from 'package:ProtGenerics':
##
## tolerance
## The following object is masked from 'package:mzR':
##
## tolerance
## The following object is masked from 'package:survival':
##
## cluster
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
mSet<-PlotPLSPairSummary(mSet, "NEGpls_pair_0_", "png", 72, width=NA, 5)
mSet<-PlotPLS2DScore(mSet, "NEGpls_score2d_0_", "png", 72, width=NA, 1,2,0.95,1,0)
mSet<-PlotPLS.Imp(mSet, "NEGpls_imp_0_", "png", 72, width= 12, "vip", "Comp. 1", 15, FALSE)
## Warning in RColorBrewer::brewer.pal(25, colorpalette): n too large, allowed maximum for palette RdYlBu is 11
## Returning the palette you asked for with that many colors
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEGpls_pair_0_dpi72.png")
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEGpls_score2d_0_dpi72.png")
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEGpls_imp_0_dpi72.png")
Heatmaps show the general clustering and correlation of metabolites to groups. The relative abundance of each metabolite between samples is color coded to show the contrast. The top ten metabolites based on vip scores were used to generate the heatmap.
data_norm <- mSet$dataSet$norm
if(!is.null(mSet$analSet$plsr$vip.mat)) {
vip_scores <- mSet$analSet$plsr$vip.mat[,1]
} else {
vip_scores <- mSet$analSet$plsda$vip.mat[,1]
}
top_features_raw <- names(sort(vip_scores, decreasing = TRUE))[1:25]
top_features_safe <- make.names(top_features_raw)
valid_features <- intersect(top_features_safe, colnames(data_norm))
if(length(valid_features) == 0) {
valid_features <- intersect(top_features_raw, colnames(data_norm))
}
data_subset <- data_norm[, valid_features]
mat <- t(data_subset)
mat_scaled <- t(scale(t(mat)))
mat_scaled[is.infinite(mat_scaled)] <- 0
mat_scaled[is.nan(mat_scaled)] <- 0
sample_groups <- mSet$dataSet$cls
group_colors <- c("Injured" = "firebrick", "Uninjured" = "forestgreen")
column_ha <- HeatmapAnnotation(
Group = sample_groups,
col = list(Group = group_colors),
show_annotation_name = TRUE)
png("NEG_Complex_Heatmap.png", width = 1000, height = 1200)
ht <- ComplexHeatmap::Heatmap(mat_scaled,
name = "Z-score",
top_annotation = column_ha, cluster_rows = TRUE,cluster_columns = FALSE,show_column_names = FALSE,
row_names_gp = gpar(fontsize = 10), column_title = "Top 10 Metabolites by Group",
col = colorRamp2(c(-2, 0, 2), c("navy", "white", "firebrick")))
ComplexHeatmap::draw(ht)
dev.off()
## quartz_off_screen
## 2
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEG_Complex_Heatmap.png")
Volcano plots plot the combined values of log Fold Change and P values to find the statistically significant metabolites. The most significant metabolites can be found close to the top corners of the plot, the colors representing expression level between groups.
mSet <- Volcano.Anal(mSetObj = mSet,paired = FALSE, 2.0, 0, F, 0.1, TRUE, "raw" )
## Loading required package: memoise
## [1] "Performing regular t-tests ...."
## [1] "A total of 16 significant features were found."
mSet<-PlotVolcano(mSet, "NEG_volcano_plot_1", 1, 0, format ="png", dpi=72, width=NA)
knitr::include_graphics("/Users/ardicsahin/Desktop/MetabolomicsTrialRun/NEG_volcano_plot_1dpi72.png")