2025-02-27
Horvath used machine learning to select methylated parts of the genome that best predict age, an epigenetic clock
The penalized net regression equation is \[ \underset{\beta}{\text{minimize}} \sum_{i=1}^{n} \left(y_i - X_i \beta \right)^2 + \lambda_1 \sum_{j=1}^{p} |\beta_j| + \lambda_2 \sum_{j=1}^{p} \beta_j^2 \]
# Load necessary package
library(glmnet)
# Set seed for reproducibility
set.seed(42)
# Simulate toy methylation data (10 samples, 5 CpG sites)
CpG_sites <- matrix(runif(10 * 5, 0, 1), nrow = 10, ncol = 5) # 0-1 methylation beta values
colnames(CpG_sites) <- paste0("CpG_", 1:5) # Name CpG sites
# Simulate chronological age (true age)
true_age <- seq(30, 75, length.out = 10) + rnorm(10, sd = 2) # Add slight noise
# Fit Elastic Net Regression model (alpha = 0.5 for Elastic Net)
fit <- cv.glmnet(CpG_sites, true_age, alpha = 0.5) # Cross-validation to find best lambda
# Extract the best model
best_lambda <- fit$lambda.min
model_coefs <- coef(fit, s = best_lambda)
# Predict epigenetic age
epigenetic_age <- predict(fit, newx = CpG_sites, s = best_lambda)
# Combine results into a data frame
results <- data.frame(Sample = 1:10, Chronological_Age = true_age, Epigenetic_Age = as.numeric(epigenetic_age))
# Load ggplot2 for visualization
library(ggplot2)
# Create scatter plot with regression line
ggplot(results, aes(x = Chronological_Age, y = Epigenetic_Age)) +
geom_point(color = "blue", size = 3) + # Scatter points
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") + # Perfect correlation line
labs(title = "Epigenetic Age vs. Chronological Age",
x = "Chronological Age",
y = "Epigenetic Age") +
theme_minimal()
Lecture 7