The incessant contraction of cardiomyocytes demands tremendous amount of energy supply, thus, maintaining a robust metabolic homeostasis is essential for the heart. For decades, accumulating evidence suggests that perturbation of cardiac metabolism plays an important role in the pathological progression of HF. Accordingly, comprehensive quantitation of metabolite abundances in blood plasma, termed metabolite or metabolomics profiling, can provide mechanistic insights into the molecular alterations underlying maladaptive remodeling and HF. The large-scale quantification of circulating metabolites across multiple pathways may also identify metabolic changes that reflect clinical classification and outcomes of HF patients before and after optimal medical therapy and surgical interventions, thereby advancing important venues of biomarker and drug-target development together with validation of treatments.
Our workflow involves taking whole blood samples from 26 patients receiving MCSD implantation. The whole blood was extracted at various timepoints, and the plasma was isolated in order to determine the metabolic profile. Temporal profiling of plasma metabolites enabled the identification of significantly altered metabolites as well as metabolites that maintain a relative stable concentration in plasma regardless of pathophysiological changes (housekeeping metabolites). Two methods were used to identify significantly altered metabolites: differential expression analysis and a linear mixed model. In total, the metabolites selected through these processes were subjected to the following statistical tests.
In order to work with my data, I had to pull it from a data object created by another person working on my project. The manipulations I performed are below, but essentially, my process was as follows:
#put all the metabolite quantifications into a workable file
metabolite_new <- Data_Obj_RMRedundancy$quantification
#1.create overlapping metabolites matrix
metabolite_select <- metabolite_new[,c("C12_p180","C14_p180","C14:2_p180", "C16:2_p180", "PC aa C36:1_p180","PE aa C26:4_Lipid", "PE aa C36:1_Lipid", "PE aa C36:2_Lipid", "PE aa C36:3_Lipid", "SM C19:0_Lipid", "Testosterone_Steroids")]
#2. transform the data and add the necessary identifiers
metabolite_select = data.frame(metabolite_select)
metabolite_select$Patient = Data_Obj_RMRedundancy$subjectNumber
metabolite_select$Time = Data_Obj_RMRedundancy$timepoint
metabolite_select$Surv = Data_Obj_RMRedundancy$deathStatus
#3. reshape the data so it looks good
select <- metabolite_select[c(12:14,1:11)]
row.names(select) <- paste0(select$Patient, select$Time)
select$Type <- substr(select$Patient, 1, 1)
select <- select[c(15,1:14)]
#4. write the csv
write.csv(select, file = "Selected Metabolites")
#set up other DF - differential expression
#1. select out the relevant metabolites
metabolite_select_diff <- metabolite_new[,c("PC aa C32:2_Lipid", "PC aa C36:1_Lipid", "PG aa C36:3_Lipid", "PE aa C36:1_Lipid", "PE aa C36:2_Lipid", "PE aa C36:3_Lipid", "PE ae C34:1_Lipid", "PE ae C36:3_Lipid", "LPC a C18:2_Lipid", "PC aa C36:3_Lipid", "LPE a C16:0_Lipid", "LPE e C18:0_Lipid", "PE aa C26:4_Lipid", "PC ae C36:3_Lipid", "PE ae C34:3_Lipid", "Glu_p180", "Dopamine_p180",
"PC aa C42:0_p180", "N-C20:0-Cer(2H)_Lipid", "SM C19:0_Lipid", "SM C23:3_Lipid", "Testosterone_Steroids", "C12_p180", "C14_p180", "C16:1_p180", "C16:2_p180", "alpha-KGA_Energy Metabolism", "Fum_Energy Metabolism", "DHA_PUFAs", "AA_PUFAs", "N-C9:0-Cer_Lipid",
"N-C15:1-Cer_Lipid", "N-C20:0(OH)-Cer_Lipid", "N-C20:0(OH)-Cer(2H)_Lipid", "C14:0_FFA", "C14:2_p180")]
#2. apply the correct identifiers
metabolite_select_diff = data.frame(metabolite_select_diff)
metabolite_select_diff$Patient = Data_Obj_RMRedundancy$subjectNumber
metabolite_select_diff$Time = Data_Obj_RMRedundancy$timepoint
metabolite_select_diff$Surv = Data_Obj_RMRedundancy$deathStatus
#3. reorganize
select_diff <- metabolite_select_diff[c(37:39,1:36)]
row.names(select_diff) <- paste0(select_diff$Patient, select_diff$Time)
select_diff$Type <- substr(select_diff$Patient, 1, 1)
select_diff <- select_diff[c(40,1:39)]
#4. write the csv
write.csv(select_diff, file = "Selected Metabolites Diff")
#set up other DF - LMM
#1. select out the relevant metabolites
metabolite_select_LMM <- metabolite_new[,c("Kyn_p180", "C0_p180", "C2_p180", "C4_p180", "C4:1_p180", "C4-OH (C3-DC)_p180", "C5_p180", "C5:1_p180", "C5:1-DC_p180","C5-DC (C6-OH)_p180", "C5-OH (C3-DC-M)_p180", "C6 (C4:1-DC)_p180", "C7-DC_p180", "C8_p180", "C9_p180", "C10_p180", "C10:1_p180", "C10:2_p180", "C12_p180", "C12:1_p180", "C14_p180", "C14:1-OH_p180", "C14:2_p180", "C14:2-OH_p180", "C16:2_p180", "C16-OH_p180", "LPC a C16:0_p180", "LPC a C17:0_p180", "LPC a C18:0_p180", "PC aa C32:0_p180", "PC aa C32:1_p180", "PC aa C34:1_p180", "SM (OH) C16:1_p180", "SM C18:0_p180", "SM C18:1_p180", "LPC e C18:0_Lipid", "PC aa C36:1_Lipid", "PG aa C32:1_Lipid", "PG ae C32:0_Lipid", "PE aa C26:4_Lipid", "PE aa C38:3_Lipid", "PE aa C34:0_Lipid", "PE aa C34:1_Lipid", "PE aa C34:2_Lipid", "PE aa C34:3_Lipid", "PE aa C36:1_Lipid", "PE aa C40:4_Lipid", "PE aa C36:2_Lipid", "PE aa C36:3_Lipid", "PC ae C36:5_Lipid", "PE ae C36:1_Lipid", "PE ae C36:2_Lipid", "PE ae C40:4_Lipid", "PE ae C36:4_Lipid", "PE ae C36:5_Lipid", "PE ae C38:1_Lipid", "PE ae C38:2_Lipid", "PE ae C38:3_Lipid", "N-C21:0(OH)-Cer_Lipid", "N-C22:0(OH)-Cer_Lipid", "N-C23:0(OH)-Cer_Lipid", "N-C24:0(OH)-Cer_Lipid", "N-C7:0-Cer(2H)_Lipid", "SM C17:0_Lipid", "SM C19:0_Lipid", "SM C19:1_Lipid", "SM C19:2_Lipid", "SM C20:1_Lipid", "11-Deoxycorticosterone_Steroids", "11-Deoxycortisol_Steroids", "Testosterone_Steroids", "DOPA_p180")]
#2. apply the correct identifiers
metabolite_select_LMM = data.frame(metabolite_select_LMM)
metabolite_select_LMM$Patient = Data_Obj_RMRedundancy$subjectNumber
metabolite_select_LMM$Time = Data_Obj_RMRedundancy$timepoint
metabolite_select_LMM$Surv = Data_Obj_RMRedundancy$deathStatus
#3. reorganize
select_diff <- metabolite_select_LMM[c(73:75,1:72)]
row.names(select_diff) <- paste0(select_diff$Patient, select_diff$Time)
select_diff$Type <- substr(select_diff$Patient, 1, 1)
select_diff <- select_diff[c(76,1:75)]
#4. write the csv
write.csv(select_diff, file = "Selected Metabolites LMM")
The metabolites identified through this statistical method are summarized below.
Upregulated | Downregulated |
---|---|
PC aa C32:2 | Glu |
PC aa C36:1 | Dopamine |
PG aa C36:3 | PC aa C42:0 |
PE aa C36:1 | N-C20:0-Cer(2H) |
PE aa C36:2 | SM C19:0 |
PE aa C36:3 | SM C23:3 |
PE ae C34:1 | Testosterone |
PE ae C36:3 | C12 |
LPC a C18:2 | C14 |
PC aa C36:3 | C16:1 |
LPE a C16:0 | C16:2 |
LPE e C18:0 | alpha-KGA |
PE aa C26:4 | Fum |
PC ae C36:3 | DHA |
PE ae C34:3 | AA |
- | N-C9:0-Cer |
- | N-C15:1-Cer |
- | N-C20:0(OH)-Cer |
- | N-C20:0(OH)-Cer(2H) |
- | C14:0 |
- | C14:2 |
setwd("/Users/jenniferpolson/Desktop/Metabolomics")
DF <- read.csv("Selected Metabolites Diff")
row.names(DF) <- DF$X
DF <- DF[,2:41]
#set patients and control
DF_patient <- subset(DF, Type == "P")
DF_control <- subset(DF, Type == "C")
#set survival stats
DF_surv <- subset(DF, Surv == "Live")
DF_dead <- subset(DF, Surv == "Died")
nmetab = length(metabolites_diff)
I first wanted to see how well these metabolites would cluster the patients vs. controls, as well as the patients who survived vs. those who did not. I normalized the metabolite abundances, and performed a Principal Component Analysis (PCA). Using the dudi.pca function, I gathered the necessary analysis to visualize how these groups would cluster. The results are summarized in Figure 1
NA2Mean <- function(x){
aColMean = colMeans(x, na.rm = TRUE);
for (i in c(1:length(colnames(x)))){
x[is.na(x[,i]),i] = aColMean[i];
}
y = x;
return(y)
}
NormByCol <- function(x){
aColMean = apply(x,2,mean, na.rm = TRUE);
aColStd = apply(x,2,sd, na.rm = TRUE);
y = sweep(x,2,aColMean, "-");
y = sweep(y, 2, aColStd, "/");
return(y);
}
par(mfrow = c(1,2), oma = c(0,0,2,0))
DF$Type <- as.character(DF$Type)
DF$Type[DF$Type == "P"] <- "Patient"
DF$Type[DF$Type == "C"] <- "Control"
diff0 <- DF[,6:ncol(DF)]
diff1 <- NA2Mean(diff0);
norm_diff <- NormByCol(diff1);
pcaData = dudi.pca(norm_diff, center = TRUE, scale = TRUE, scan = FALSE)
s.class(pcaData$li, factor(DF$Type),
clabel = 1.5, col = c("cornflowerblue","darkorchid"), cpoint = 0.2,
sub = "1A: Clustering Based on Type of Sample", csub = 0.8,
xlim = c(-10,10), ylim = c(-10,10));
diff0 <- DF_patient[,6:ncol(DF_patient)]
diff1 <- NA2Mean(diff0);
norm_diff <- NormByCol(diff1);
pcaData = dudi.pca(norm_diff, center = TRUE, scale = TRUE, scan = FALSE)
s.class(pcaData$li, factor(DF_patient$Surv),
clabel = 1.5, col = c("forestgreen","gold"), cpoint = 0.2,
sub = "1B: Clustering Based on Survival Status", csub = 0.8,
xlim = c(-10,10), ylim = c(-10,10));
mtext("Figure 1: PCA and Clustering of Patients vs. Controls - Differential Metabolites", outer = TRUE, cex = 2, font = 2)
Figure 1 shows the Principal Component Analyses clustering the patients and controls (Figure 1A) as well as the patients based on survival status (Figure 1B) from differentially expressed metabolites.
The results of this show that the patients are well clustered when compared to the controls, but the patients do not cluster well based on survival status. As such, I chose to move forward with performing recursive t-tests for each metabolite at every time point, for both two-group comparisons.
The overall goal of this analysis is to see the statistical differences between groups for the aforementioned selected metabolites at each timepoint for which there is a measurement. The reason for this comparison is that, if one can find a statistical difference at every time point, then that would greatly buttress the metabolites ability to predict status. Understandably, this is much more possible in theory than in practice. This requires subsetting the previously created file into the groups that will be tested, according to type (patient vs. control) and survival (alive vs. dead). Due to the fact that there is unequal variance between the population sizes, the t-test that will be performed is Welch’s t-test.
To perform this, a function was created to return the numeric p-value of the test results; this was applied to every subset for every metabolite. The results are visualized in a heatmap below. Figure 1 depicts the p-values for each test, while Figure 2 shows the adjusted p-values (q-values), using the false discovery rate method of adjustment.
#t-test function that will return Welch's statistic as numeric value
t_test <- function(x, y){
result = t.test(x,y);
return(as.numeric(result$p.value));
}
#create control vector
vector <- unique(DF_control$Time)
#creating the p-values for every timepoint and metabolite
result = c();
for (i in vector){
#print(i);
DF_S <- subset(DF_control, Time == i)[,-(1:4)];
DF_D <- subset(DF_patient, Time == i)[,-(1:4)];
tmp = lapply(DF_S, DF_D, FUN = t_test)
result = rbind(result, tmp)
}
#set the timepoints
rownames(result) = vector
#change to numeric
result <- round(as.numeric(result),5)
p.result <- matrix(result, ncol = nmetab, byrow=T)
rownames(p.result) <- vector
colnames(p.result) <- metabolites_diff
#set the q-values
qvalues <- round(p.adjust(p.result, method = "fdr"), 5)
q.result <- matrix(qvalues, ncol = nmetab, byrow = T)
rownames(q.result) <- vector
colnames(q.result) <- metabolites_diff
#setup the color palette
my_palette <- colorRampPalette(c("tomato4", "sienna1","white"))(n = 10)
p.result <- t(p.result)
q.result <- t(q.result)
#create the graphs
par(mfrow = c(1,1), oma = c(2,2,2,8))
heatmap.2(p.result, Colv = FALSE,
trace = "none", cellnote=ifelse(p.result>0.01, NA, p.result),
notecol = "white", tracecol = "black", col = my_palette,
main = "Fig. 2: p-values of Differential Metabolites at Each Timepoint",
dendrogram = "none")
Figure 2 illustrates the results of Welch’s t-test when comparing the patients and controls for each metabolite, at every time point. The values below 0.01 are listed in the corresponding box.
Figure 3 illustrates the results of the adjusted p-values (q-values) from Figure 2 using the false discovery rate method. The values below 0.01 are listed in the corresponding box.
When comparing patients and controls, the differentially expressed metabolites were all statistically different for each group. Moving forward, I chose to perform the same statistical analyses when comparing patients who survived, and patients who did not. The results are summarized in Figures 4 & 5.
#create timepoint vector to assign
vector <- unique(DF$Time)
vector = vector[-12]
#creating the p-values for every timepoint and metabolite
result = c();
for (i in vector){
#print(i);
DF_S <- subset(DF_surv, Time == i)[,-(1:4)];
DF_D <- subset(DF_dead, Time == i)[,-(1:4)];
tmp = lapply(DF_S, DF_D, FUN = t_test)
result = rbind(result, tmp)
}
#set the timepoints
rownames(result) = vector
#reorder to get proper order
result <- result[c(1,6,2,7,8,3,4,5,9,10,11), ]
vector <- vector[c(1,6,2,7,8,3,4,5,9,10,11) ]
#change to numeric
result <- round(as.numeric(result),5)
p.result <- matrix(result, ncol = nmetab, byrow=T)
rownames(p.result) <- vector
colnames(p.result) <- metabolites_diff
qvalues <- round(p.adjust(p.result, method = "fdr"), 5)
q.result <- matrix(qvalues, ncol = nmetab, byrow = T)
rownames(q.result) <- vector
colnames(q.result) <- metabolites_diff
#setup the color palette
my_palette <- colorRampPalette(c("goldenrod3","lightgoldenrod1", "white"))(n = 10)
p.result <- t(p.result)
q.result <- t(q.result)
#create the graphs4
par(mfrow = c(1,1), oma = c(2,2,2,8))
heatmap.2(p.result, Colv = FALSE,
trace = "none", cellnote=ifelse(p.result>0.01, NA, p.result),
notecol = "white", tracecol = "black", col = my_palette,
main = "Fig. 4: p-values of Differential Metabolites at Each Timepoint",
dendrogram = "none")
Figure 4 illustrates the results of Welch’s t-test when comparing the patients who survived and patients who did not for each metabolite, at every time point. The values below 0.01 are listed in the corresponding box.
Figure 5 illustrates the results of the adjusted p-values (q-values) from Figure 4 using the false discovery rate method. The values below 0.01 are listed in the corresponding box.
Similar to the q-values generated for the comparison of patients vs. controls (Figure 3), the comparison of survived vs. died (Figure 3) had q-value results proved to be significant at many timepoints; indeed, there were differential q-values. That being said, it was not possible to glean conclusive evidence as to the functional pathways that are perturbed at each time point (functional analysis, using Reactome, not pictured) due to the fact that there was not one metabolite that was significant across all time points. Therefore, I chose to look at the metabolites that were identified for the linear mixed model, which totaled 72 (Table 2). I performed the same analyses as above for this list of metabolites; the results are summarized in Figures 6-10.
Metabolite | Metabolite | Metabolite |
---|---|---|
Kyn | C16:2 | PE aa C36:3 |
C0 | C16-OH | PC ae C36:5 |
C2 | LPC a C16:0 | PE ae C36:1 |
C4 | LPC a C17:0 | PE ae C36:2 |
C4:1 | LPC a C18:0 | PE ae C40:4 |
C4-OH (C3-DC) | PC aa C32:0 | PE ae C36:4 |
C5 | PC aa C32:1 | PE ae C36:5 |
C5:1 | PC aa C34:1 | PE ae C38:1 |
C5:1-DC SM (OH) | C16:1 | PE ae C38:2 |
C5-DC (C6-OH) | SM C18:0 | PE ae C38:3 |
C5-OH (C3-DC-M) | SM C18:1 | N-C21:0(OH)-Cer |
C6 (C4:1-DC) | LPC e C18:0 | N-C22:0(OH)-Cer |
C7-DC | PC aa C36:1 | N-C23:0(OH)-Cer |
C8 | PG aa C32:1 | N-C24:0(OH)-Cer |
C9 | PG ae C32:0 | N-C7:0-Cer(2H) |
C10 | PE aa C26:4 | SM C17:0 |
C10:1 | PE aa C38:3 | SM C19:0 |
C10:2 | PE aa C34:0 | SM C19:1 |
C12 | PE aa C34:1 | SM C19:2 |
C12:1 | PE aa C34:2 | SM C20:1 |
C14 | PE aa C34:3 | 11-Deoxycorticosterone |
C14:1-OH | PE aa C36:1 | 11-Deoxycortisol |
C14:2 | PE aa C40:4 | Testosterone |
C14:2-OH | PE aa C36:2 | DOPA |
Figure 6 shows the Principal Component Analyses clustering the patients and controls (Figure 6A) as well as the patients based on survival status (Figure 6B) from differentially expressed metabolites.
The PCA in Figure 5 showed better clustering among the patients based on the survival status factor. This could potentially mean that there might be metabolites that are significantly different across all time points.
Figure 7 illustrates the results of Welch’s t-test when comparing the patients and controls for each metabolite, at every time point. The values below 0.01 are listed in the corresponding box.
Figure 8 illustrates the results of the adjusted p-values (q-values) from Figure 7 using the false discovery rate method. The values below 0.01 are listed in the corresponding box.
Figure 9 illustrates the results of Welch’s t-test when comparing the patients who survived and patients who did not for each metabolite, at every time point. The values below 0.01 are listed in the corresponding box.
Figure 10 illustrates the results of the adjusted p-values (q-values) from Figure 9 using the false discovery rate method. The values below 0.01 are listed in the corresponding box.
In total, there were 8 metabolites that were identified to have a statistically significant level of abundance between the two groups. This will inform pathway analysis and annotation, in order to determine the changes that are occurring. As a final comparison, I also chose to perform the same series of analysis, but with the metabolites that were identified through both differential expression analysis and the linear mixed model. The overlapping metabolites are listed in Table 3.
Metabolite | Metabolite |
---|---|
Dodecanoylcarnitine (C12) | Phosphatidylethanolamine aa C36:1 |
Tetradecanoylcarnitine (C14) | Phosphatidylethanolamine aa C36:2 |
Tetradecadienylcarnitine (C14:2) | Phosphatidylethanolamine aa C36:3 |
Hexadecadienylcarnitine (C16:2) | Sphingomyelin C19:0 |
Phosphatidylcholine aa C36:1 | Testosterone |
Phosphatidylethanolamine aa C26:4 | – |
Figure 11 shows the Principal Component Analyses clustering the patients and controls (Figure 11A) as well as the patients based on survival status (Figure 11B) from differentially expressed metabolites.
The separation of clusters here is poor; this is likely due to the small number of metabolites used for this comparison (only 11).
Figure 12 illustrates the results of Welch’s t-test when comparing the patients and controls for each metabolite, at every time point. The values below 0.01 are listed in the corresponding box.
Figure 13 illustrates the results of the adjusted p-values (q-values) from Figure 12 using the false discovery rate method. The values below 0.01 are listed in the corresponding box.
Figure 14 illustrates the results of Welch’s t-test when comparing the patients who survived and patients who did not for each metabolite, at every time point. The values below 0.01 are listed in the corresponding box.
Figure 15 illustrates the results of the adjusted p-values (q-values) from Figure 14 using the false discovery rate method. The values below 0.01 are listed in the corresponding box.
By examining the adjusted p-values of this comparison, one interesting result came about: there was not a single metabolite that was statistically different between the two survival groups for every single time point. However, 8 metabolites were identified through the linear mixed model. Due to the very small sample sizes, it is difficult to conclude if there are metabolites that stratify patients. This does not mean that it does not exist; in fact, there are many statistical limitations to this study. Individual variability in genetic assembly and metabolic homeostasis has the propensity to preclude deduction of generalized clinical insights, especially when an intended study contains human subjects with other disease comorbidities. For plasma metabolomics investigations, long-term, large-scale clinical studies take years to construct and data acquisition and analysis require significant economical and scientific commitments. Studies using small cohorts provide broader possibility for “data-driven” biomarker candidate discovery. However, a small subject number limits the statistical power for in-depth characterization by using standard computational approaches. Generating temporal metabolomics profiling datasets offers unique sensitivity in response to pathological stress or therapeutic regiments, which enable microscopic identification of biomarker candidates. This is why the statistical threshold was established as p < 0.01.