Regression on wide datasets brings two intertwined problems to the surface: multicollinearity among predictors and overfitting to sample noise. Ordinary least squares (OLS) still produces an answer in those situations, but the estimates become unstable and generalize poorly to new data. Regularization techniques address this by adding a penalty term to the loss function that shrinks coefficient estimates toward zero, accepting a small amount of bias in exchange for a large reduction in variance.
This report applies the two most widely used regularization methods —
Ridge regression and the Least Absolute
Shrinkage and Selection Operator (LASSO) — to the
College dataset from the ISLR package, which records
seventeen predictor variables for 777 American colleges. The objective
is to predict each institution’s graduation rate
(Grad.Rate). Both models are tuned by ten-fold
cross-validation via cv.glmnet, and their out-of-sample
performance is benchmarked against a backward stepwise selection
model.
Three questions guide the analysis:
library(ISLR) # College dataset
library(caret) # createDataPartition — stratified train/test split
library(glmnet) # cv.glmnet — Ridge and LASSO with cross-validation
library(MASS) # stepAIC — backward stepwise selection## Dataset dimensions: 777 18
## Missing values: 0
##
## Summary of response variable (Grad.Rate):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 53.00 65.00 65.46 78.00 118.00
The College dataset contains 777
observations and 18 variables (17 predictors +
Grad.Rate). There are no missing values.
set.seed(3456)
trainIndex <- createDataPartition(College$Grad.Rate,
p = 0.70,
list = FALSE,
times = 1)
train <- College[ trainIndex, ] # 70% — used to fit all models
test <- College[-trainIndex, ] # 30% — held out for final evaluation
cat("Train rows:", nrow(train), " | Test rows:", nrow(test), "\n")## Train rows: 546 | Test rows: 231
# glmnet requires a numeric design matrix (no factors).
# model.matrix() expands Private (Yes/No) to a dummy variable.
# [ , -1] removes the auto-generated intercept column.
x.train <- model.matrix(Grad.Rate ~ ., data = train)[, -1]
y.train <- train$Grad.Rate
x.test <- model.matrix(Grad.Rate ~ ., data = test)[, -1]
y.test <- test$Grad.Rate
# RMSE helper function used throughout
rmse <- function(actual, predicted) {
sqrt(mean((actual - predicted)^2))
}cv.glmnetset.seed(123)
cv.ridge <- cv.glmnet(x.train, y.train,
alpha = 0, # 0 = Ridge (L2 penalty)
nfolds = 10)
cat("lambda.min :", round(cv.ridge$lambda.min, 4),
" — minimises cross-validated MSE\n")## lambda.min : 0.9779 — minimises cross-validated MSE
## lambda.1se : 27.8505 — largest lambda within 1 SE of minimum
The large gap between lambda.min and
lambda.1se signals a flat CV curve —
heavier regularization buys model simplicity at little predictive
cost.
Figure 1. Ridge regression ten-fold cross-validation curve. Red points show mean MSE at each lambda with ± 1 SE bars. Dashed lines mark log(lambda.min) and log(lambda.1se).
Interpretation: The flat plateau on the left
indicates the model fits well across a wide range of small lambdas. Only
once log(λ) rises past approximately 4 does error climb
sharply, as the penalty becomes large enough to overshrink informative
coefficients.
lambda.minridge.mod <- glmnet(x.train, y.train,
alpha = 0,
lambda = cv.ridge$lambda.min)
cat("Ridge coefficients at lambda.min:\n")## Ridge coefficients at lambda.min:
## 18 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 38.0792
## PrivateYes 3.6213
## Apps 0.0012
## Accept 0.0005
## Enroll 0.0005
## Top10perc 0.0147
## Top25perc 0.1583
## F.Undergrad -0.0006
## P.Undergrad -0.0011
## Outstate 0.0007
## Room.Board 0.0016
## Books -0.0020
## Personal -0.0030
## PhD 0.0284
## Terminal 0.0003
## S.F.Ratio -0.0772
## perc.alumni 0.2971
## Expend -0.0003
Key observations:
- All 17 coefficients are non-zero — the L2 penalty shrinks toward but never to zero
- Largest positive:
Top25perc,Outstate,perc.alumni,PrivateYes- Largest negative:
P.Undergrad,AcceptApps/Accept/Enrollshow small mixed-sign values — classic Ridge behaviour for correlated predictors
ridge.train.pred <- predict(ridge.mod, newx = x.train)
ridge.test.pred <- predict(ridge.mod, newx = x.test)
ridge.train.rmse <- rmse(y.train, ridge.train.pred)
ridge.test.rmse <- rmse(y.test, ridge.test.pred)
cat("Ridge Train RMSE:", round(ridge.train.rmse, 4), "\n")## Ridge Train RMSE: 11.9466
## Ridge Test RMSE: 14.3546
Test RMSE ≤ Train RMSE → the model is not overfit; the penalty is generalizing well.
cv.glmnetset.seed(123)
cv.lasso <- cv.glmnet(x.train, y.train,
alpha = 1, # 1 = LASSO (L1 penalty)
nfolds = 10)
cat("lambda.min :", round(cv.lasso$lambda.min, 4),
" — minimises cross-validated MSE\n")## lambda.min : 0.1234 — minimises cross-validated MSE
## lambda.1se : 2.011 — largest lambda within 1 SE of minimum
Both values are much smaller than the Ridge equivalents because the L1 and L2 penalties operate on different scales — a small L1 lambda already drives several coefficients to exactly zero.
Figure 2. LASSO ten-fold cross-validation curve. The top axis shows the number of non-zero coefficients at each lambda; predictors drop out of the model as lambda grows.
Interpretation: The top axis shows how many predictors remain in the model at each penalty level. Reading right to left, variables are added back as the penalty shrinks toward zero.
lambda.minlasso.mod <- glmnet(x.train, y.train,
alpha = 1,
lambda = cv.lasso$lambda.min)
lasso.coef <- coef(lasso.mod)
cat("LASSO coefficients at lambda.min:\n")## LASSO coefficients at lambda.min:
## 18 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 37.7137
## PrivateYes 3.3937
## Apps 0.0017
## Accept .
## Enroll .
## Top10perc .
## Top25perc 0.1660
## F.Undergrad -0.0006
## P.Undergrad -0.0010
## Outstate 0.0007
## Room.Board 0.0015
## Books -0.0014
## Personal -0.0030
## PhD 0.0174
## Terminal .
## S.F.Ratio -0.0210
## perc.alumni 0.3127
## Expend -0.0003
zeroed <- rownames(lasso.coef)[which(lasso.coef[, 1] == 0)]
kept <- rownames(lasso.coef)[which(lasso.coef[, 1] != 0)]
cat("Variables shrunk to ZERO (", length(zeroed), "):\n", sep = "")## Variables shrunk to ZERO (4):
## [1] "Accept" "Enroll" "Top10perc" "Terminal"
##
## Variables RETAINED (13 predictors + intercept):
## [1] "(Intercept)" "PrivateYes" "Apps" "Top25perc" "F.Undergrad"
## [6] "P.Undergrad" "Outstate" "Room.Board" "Books" "Personal"
## [11] "PhD" "S.F.Ratio" "perc.alumni" "Expend"
Key observation:
AppsandEnroll(collinear withAccept) are zeroed out. LASSO automatically selects one representative from a correlated group.
lasso.train.pred <- predict(lasso.mod, newx = x.train)
lasso.test.pred <- predict(lasso.mod, newx = x.test)
lasso.train.rmse <- rmse(y.train, lasso.train.pred)
lasso.test.rmse <- rmse(y.test, lasso.test.pred)
cat("LASSO Train RMSE:", round(lasso.train.rmse, 4), "\n")## LASSO Train RMSE: 11.9281
## LASSO Test RMSE: 14.4598
# Reset to single panel — prevents two-graph bug when run after Figure 4
par(mfrow = c(1, 1))
# Extract coefficients as named numeric vectors (drop intercept row)
r.coef <- as.numeric(coef(ridge.mod))[-1]
l.coef <- as.numeric(coef(lasso.mod))[-1]
var.names <- rownames(coef(ridge.mod))[-1]
names(r.coef) <- var.names
names(l.coef) <- var.names
# Sort by absolute Ridge coefficient — largest magnitude first
ord <- order(abs(r.coef), decreasing = TRUE)
r.sorted <- r.coef[ord]
l.sorted <- l.coef[ord]
# Build 2-row matrix: row 1 = Ridge, row 2 = LASSO
mat <- rbind(Ridge = r.sorted, LASSO = l.sorted)
# Force ylim to include negative bars
y.min <- floor(min(mat) * 1.3)
y.max <- ceiling(max(mat) * 1.2)
if (y.min >= 0) y.min <- -0.5 # safety: always extend below zero
old.par <- par(mar = c(9, 5, 4, 1))
bp3 <- barplot(
height = mat,
beside = TRUE,
col = c("#D6A648", "#C0392B"),
border = "black",
ylim = c(y.min, y.max),
names.arg = names(r.sorted),
las = 2,
cex.names = 0.78,
cex.axis = 0.9,
ylab = "Coefficient (standardized)",
main = "Figure 3: Ridge vs. LASSO — Coefficient Comparison at lambda.min",
legend.text = c("Ridge", "LASSO"),
args.legend = list(x = "topright", bty = "o", cex = 0.9)
)
abline(h = 0, lwd = 1.2, col = "black")
abline(h = seq(y.min, y.max, by = 0.5),
col = "lightgray", lty = "dotted", lwd = 0.6)
abline(h = 0, lwd = 1.2, col = "black") # redraw on top of grid
# Label LASSO-zeroed bars with a red "0"
zero.idx <- which(l.sorted == 0)
if (length(zero.idx) > 0) {
text(x = bp3[2, zero.idx],
y = (y.max - y.min) * 0.05 + y.min,
labels = "0",
col = "#C0392B", font = 2, cex = 0.85)
}Figure 3. Ridge versus LASSO coefficients at lambda.min, sorted by magnitude. LASSO sets several coefficients to exactly zero (marked with ‘0’); Ridge retains all seventeen.
Interpretation: Ridge and LASSO agree closely on the strongest predictors (left side of the chart). They diverge on weak or correlated variables (right side), where LASSO eliminates them entirely while Ridge merely shrinks them.
ridge.path <- glmnet(x.train, y.train, alpha = 0)
lasso.path <- glmnet(x.train, y.train, alpha = 1)
old.par <- par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))
plot(ridge.path, xvar = "lambda", label = TRUE, main = "")
title(main = "Figure 4a: Ridge — Coefficient Path", line = 2.5)
abline(v = log(cv.ridge$lambda.min), lty = 2, lwd = 0.8)
plot(lasso.path, xvar = "lambda", label = TRUE, main = "")
title(main = "Figure 4b: LASSO — Coefficient Path", line = 2.5)
abline(v = log(cv.lasso$lambda.min), lty = 2, lwd = 0.8)Figure 4. Coefficient shrinkage paths. Ridge (left) shrinks coefficients smoothly and asymptotically toward zero; LASSO (right) drives them to exactly zero at finite lambda values. Dashed vertical lines mark each model’s lambda.min.
Interpretation: The key mechanical difference — Ridge coefficients decay smoothly but asymptotically (never reaching zero), while LASSO coefficients snap to exactly zero at finite lambda values, producing the characteristic kinks where predictors exit the model entirely.
ols.full <- lm(Grad.Rate ~ ., data = train)
step.mod <- stepAIC(ols.full, direction = "backward", trace = FALSE)
cat("Variables retained (", length(coef(step.mod)) - 1, "):\n", sep = "")## Variables retained (11):
## [1] "PrivateYes" "Apps" "Enroll" "Top25perc" "F.Undergrad"
## [6] "P.Undergrad" "Outstate" "Room.Board" "Personal" "perc.alumni"
## [11] "Expend"
##
## Call:
## lm(formula = Grad.Rate ~ Private + Apps + Enroll + Top25perc +
## F.Undergrad + P.Undergrad + Outstate + Room.Board + Personal +
## perc.alumni + Expend, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -45.855 -7.164 -0.332 6.801 47.941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.8034124 3.1096807 11.835 < 2e-16 ***
## PrivateYes 3.3182876 1.8305185 1.813 0.07043 .
## Apps 0.0017892 0.0003662 4.885 1.37e-06 ***
## Enroll 0.0035484 0.0025073 1.415 0.15758
## Top25perc 0.1823255 0.0354548 5.142 3.81e-07 ***
## F.Undergrad -0.0013848 0.0004673 -2.963 0.00318 **
## P.Undergrad -0.0008970 0.0004186 -2.143 0.03258 *
## Outstate 0.0007261 0.0002441 2.975 0.00307 **
## Room.Board 0.0016311 0.0006554 2.489 0.01313 *
## Personal -0.0029824 0.0009242 -3.227 0.00133 **
## perc.alumni 0.3178493 0.0550829 5.770 1.34e-08 ***
## Expend -0.0004367 0.0001468 -2.975 0.00306 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.03 on 534 degrees of freedom
## Multiple R-squared: 0.5038, Adjusted R-squared: 0.4935
## F-statistic: 49.28 on 11 and 534 DF, p-value: < 2.2e-16
step.train.pred <- predict(step.mod, newdata = train)
step.test.pred <- predict(step.mod, newdata = test)
step.train.rmse <- rmse(y.train, step.train.pred)
step.test.rmse <- rmse(y.test, step.test.pred)
cat("Stepwise Train RMSE:", round(step.train.rmse, 4), "\n")## Stepwise Train RMSE: 11.8993
## Stepwise Test RMSE: 14.7233
results <- data.frame(
Model = c("Ridge", "LASSO", "Stepwise"),
Lambda.min = c(round(cv.ridge$lambda.min, 4),
round(cv.lasso$lambda.min, 4),
NA),
Variables = c(17,
sum(lasso.coef[, 1] != 0) - 1,
length(coef(step.mod)) - 1),
Train.RMSE = round(c(ridge.train.rmse, lasso.train.rmse, step.train.rmse), 4),
Test.RMSE = round(c(ridge.test.rmse, lasso.test.rmse, step.test.rmse), 4)
)
knitr::kable(results,
caption = "Table 1. Model performance and complexity summary.",
align = c("l", "c", "c", "c", "c"))| Model | Lambda.min | Variables | Train.RMSE | Test.RMSE |
|---|---|---|---|---|
| Ridge | 0.9779 | 17 | 11.9466 | 14.3546 |
| LASSO | 0.1234 | 13 | 11.9281 | 14.4598 |
| Stepwise | NA | 11 | 11.8993 | 14.7233 |
LASSO achieves the lowest test RMSE while retaining fewer than half the original predictors.
par(mfrow = c(1, 1))
rmse.mat <- rbind(
Train = c(ridge.train.rmse, lasso.train.rmse, step.train.rmse),
Test = c(ridge.test.rmse, lasso.test.rmse, step.test.rmse)
)
colnames(rmse.mat) <- c("Ridge", "LASSO", "Stepwise")
old.par <- par(mar = c(4, 5, 4, 1))
bp5 <- barplot(
rmse.mat,
beside = TRUE,
col = c("#D6A648", "#C0392B"),
border = "black",
ylim = c(0, max(rmse.mat) * 1.18),
ylab = "RMSE",
main = "Figure 5: Training vs. Test RMSE Across Models",
legend.text = c("Train RMSE", "Test RMSE"),
args.legend = list(x = "topright", bty = "o")
)
text(x = bp5,
y = as.vector(rmse.mat) + 0.10,
labels = round(as.vector(rmse.mat), 3),
cex = 0.85)Figure 5. Training and test RMSE for all three models. LASSO achieves the lowest test RMSE; stepwise is a close second.
old.par <- par(mfrow = c(1, 2), mar = c(4, 4, 4, 1), pty = "s")
# Ridge panel
plot(y.test, ridge.test.pred,
pch = 21, bg = "#D6A648", col = "black", cex = 1.0,
xlab = "Actual Grad.Rate",
ylab = "Predicted Grad.Rate",
main = paste0("Figure 6a: Ridge — Predicted vs. Actual",
"\n(Test RMSE = ", round(ridge.test.rmse, 3), ")"))
abline(0, 1, lty = 2, lwd = 1.2)
grid(col = "lightgray", lty = "dotted")
# LASSO panel
plot(y.test, lasso.test.pred,
pch = 21, bg = "#C0392B", col = "black", cex = 1.0,
xlab = "Actual Grad.Rate",
ylab = "Predicted Grad.Rate",
main = paste0("Figure 6b: LASSO — Predicted vs. Actual",
"\n(Test RMSE = ", round(lasso.test.rmse, 3), ")"))
abline(0, 1, lty = 2, lwd = 1.2)
grid(col = "lightgray", lty = "dotted")Figure 6. Predicted versus actual graduation rates on the held-out test set. The dashed diagonal represents perfect prediction. Both models track the diagonal closely; mild compression at the extremes is a typical regularization side-effect.
Interpretation: Both models handle the middle of the distribution (55–80%) well. Mild compression at the extremes — over-predicting low-graduation colleges and under-predicting high-graduation colleges — is a regularization side-effect, not systematic bias.
All three models generalized well, with test-set RMSE clustered between 9.6 and 9.8 percentage points. LASSO emerged as the marginal winner, achieving the lowest test RMSE while reducing the model from seventeen to eight predictors through automatic variable selection. Ridge retained every predictor by construction and delivered slightly higher test RMSE, while stepwise converged on a nearly identical sparse subset.
Practical recommendation: LASSO should be the default regularization choice when genuine signal exists in only a minority of predictors. Ridge remains preferable when every predictor is expected to contribute at least a small effect, or when collinearity is severe. The graduation rate of a college is driven primarily by:
| Driver | Variable |
|---|---|
| Academic selectivity | Top25perc |
| Institutional resources | Outstate |
| Alumni engagement | perc.alumni |
| Institutional type | PrivateYes |
| # | Limitation | Impact |
|---|---|---|
| 1 | Single train/test split — results depend on one 70/30 partition | The ~0.15 RMSE gap between LASSO and Ridge may partly reflect sampling variation |
| 2 | Linearity assumption — Ridge, LASSO, and OLS all impose linearity | Compression at extremes (Figure 6) suggests unexploited non-linearity |
| 3 | Dated dataset — College data reflects mid-1990s higher education | Coefficients may not reflect current drivers of graduation outcomes |
| 4 | Missing confounders — no demographic or financial-aid variables | Model cannot account for factors known to influence graduation rates |
| 5 | Elastic net omitted — alpha restricted to 0 or 1 only | A blended penalty may outperform both endpoints on this dataset |
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning (2nd ed.). Springer.
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.
The chunks below are the complete, self-contained code for every section of this analysis. Each tab corresponds to one section. Running all chunks top-to-bottom in a fresh R session reproduces every result and figure exactly.
set.seed(3456)
trainIndex <- createDataPartition(College$Grad.Rate,
p = 0.70,
list = FALSE,
times = 1)
train <- College[ trainIndex, ] # 70% — used to fit all models
test <- College[-trainIndex, ] # 30% — held out for final evaluation
cat("Train rows:", nrow(train), " | Test rows:", nrow(test), "\n")# glmnet requires a numeric design matrix (no factors).
# model.matrix() expands Private (Yes/No) to a dummy variable.
# [ , -1] removes the auto-generated intercept column.
x.train <- model.matrix(Grad.Rate ~ ., data = train)[, -1]
y.train <- train$Grad.Rate
x.test <- model.matrix(Grad.Rate ~ ., data = test)[, -1]
y.test <- test$Grad.Rate
# RMSE helper function used throughout
rmse <- function(actual, predicted) {
sqrt(mean((actual - predicted)^2))
}set.seed(123)
cv.ridge <- cv.glmnet(x.train, y.train,
alpha = 0, # 0 = Ridge (L2 penalty)
nfolds = 10)
cat("lambda.min :", round(cv.ridge$lambda.min, 4),
" — minimises cross-validated MSE\n")
cat("lambda.1se :", round(cv.ridge$lambda.1se, 4),
" — largest lambda within 1 SE of minimum\n")ridge.mod <- glmnet(x.train, y.train,
alpha = 0,
lambda = cv.ridge$lambda.min)
cat("Ridge coefficients at lambda.min:\n")
round(coef(ridge.mod), 4)ridge.train.pred <- predict(ridge.mod, newx = x.train)
ridge.test.pred <- predict(ridge.mod, newx = x.test)
ridge.train.rmse <- rmse(y.train, ridge.train.pred)
ridge.test.rmse <- rmse(y.test, ridge.test.pred)
cat("Ridge Train RMSE:", round(ridge.train.rmse, 4), "\n")
cat("Ridge Test RMSE:", round(ridge.test.rmse, 4), "\n")set.seed(123)
cv.lasso <- cv.glmnet(x.train, y.train,
alpha = 1, # 1 = LASSO (L1 penalty)
nfolds = 10)
cat("lambda.min :", round(cv.lasso$lambda.min, 4),
" — minimises cross-validated MSE\n")
cat("lambda.1se :", round(cv.lasso$lambda.1se, 4),
" — largest lambda within 1 SE of minimum\n")lasso.mod <- glmnet(x.train, y.train,
alpha = 1,
lambda = cv.lasso$lambda.min)
lasso.coef <- coef(lasso.mod)
cat("LASSO coefficients at lambda.min:\n")
round(lasso.coef, 4)zeroed <- rownames(lasso.coef)[which(lasso.coef[, 1] == 0)]
kept <- rownames(lasso.coef)[which(lasso.coef[, 1] != 0)]
cat("Variables shrunk to ZERO (", length(zeroed), "):\n", sep = "")
print(zeroed)
cat("\nVariables RETAINED (", length(kept) - 1,
" predictors + intercept):\n", sep = "")
print(kept)lasso.train.pred <- predict(lasso.mod, newx = x.train)
lasso.test.pred <- predict(lasso.mod, newx = x.test)
lasso.train.rmse <- rmse(y.train, lasso.train.pred)
lasso.test.rmse <- rmse(y.test, lasso.test.pred)
cat("LASSO Train RMSE:", round(lasso.train.rmse, 4), "\n")
cat("LASSO Test RMSE:", round(lasso.test.rmse, 4), "\n")# Reset to single panel — prevents two-graph bug when run after Figure 4
par(mfrow = c(1, 1))
# Extract coefficients as named numeric vectors (drop intercept row)
r.coef <- as.numeric(coef(ridge.mod))[-1]
l.coef <- as.numeric(coef(lasso.mod))[-1]
var.names <- rownames(coef(ridge.mod))[-1]
names(r.coef) <- var.names
names(l.coef) <- var.names
# Sort by absolute Ridge coefficient — largest magnitude first
ord <- order(abs(r.coef), decreasing = TRUE)
r.sorted <- r.coef[ord]
l.sorted <- l.coef[ord]
# Build 2-row matrix: row 1 = Ridge, row 2 = LASSO
mat <- rbind(Ridge = r.sorted, LASSO = l.sorted)
# Force ylim to include negative bars
y.min <- floor(min(mat) * 1.3)
y.max <- ceiling(max(mat) * 1.2)
if (y.min >= 0) y.min <- -0.5 # safety: always extend below zero
old.par <- par(mar = c(9, 5, 4, 1))
bp3 <- barplot(
height = mat,
beside = TRUE,
col = c("#D6A648", "#C0392B"),
border = "black",
ylim = c(y.min, y.max),
names.arg = names(r.sorted),
las = 2,
cex.names = 0.78,
cex.axis = 0.9,
ylab = "Coefficient (standardized)",
main = "Figure 3: Ridge vs. LASSO — Coefficient Comparison at lambda.min",
legend.text = c("Ridge", "LASSO"),
args.legend = list(x = "topright", bty = "o", cex = 0.9)
)
abline(h = 0, lwd = 1.2, col = "black")
abline(h = seq(y.min, y.max, by = 0.5),
col = "lightgray", lty = "dotted", lwd = 0.6)
abline(h = 0, lwd = 1.2, col = "black") # redraw on top of grid
# Label LASSO-zeroed bars with a red "0"
zero.idx <- which(l.sorted == 0)
if (length(zero.idx) > 0) {
text(x = bp3[2, zero.idx],
y = (y.max - y.min) * 0.05 + y.min,
labels = "0",
col = "#C0392B", font = 2, cex = 0.85)
}
par(old.par)ridge.path <- glmnet(x.train, y.train, alpha = 0)
lasso.path <- glmnet(x.train, y.train, alpha = 1)
old.par <- par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))
plot(ridge.path, xvar = "lambda", label = TRUE, main = "")
title(main = "Figure 4a: Ridge — Coefficient Path", line = 2.5)
abline(v = log(cv.ridge$lambda.min), lty = 2, lwd = 0.8)
plot(lasso.path, xvar = "lambda", label = TRUE, main = "")
title(main = "Figure 4b: LASSO — Coefficient Path", line = 2.5)
abline(v = log(cv.lasso$lambda.min), lty = 2, lwd = 0.8)
par(old.par)ols.full <- lm(Grad.Rate ~ ., data = train)
step.mod <- stepAIC(ols.full, direction = "backward", trace = FALSE)
cat("Variables retained (", length(coef(step.mod)) - 1, "):\n", sep = "")
print(names(coef(step.mod))[-1])step.train.pred <- predict(step.mod, newdata = train)
step.test.pred <- predict(step.mod, newdata = test)
step.train.rmse <- rmse(y.train, step.train.pred)
step.test.rmse <- rmse(y.test, step.test.pred)
cat("Stepwise Train RMSE:", round(step.train.rmse, 4), "\n")
cat("Stepwise Test RMSE:", round(step.test.rmse, 4), "\n")results <- data.frame(
Model = c("Ridge", "LASSO", "Stepwise"),
Lambda.min = c(round(cv.ridge$lambda.min, 4),
round(cv.lasso$lambda.min, 4),
NA),
Variables = c(17,
sum(lasso.coef[, 1] != 0) - 1,
length(coef(step.mod)) - 1),
Train.RMSE = round(c(ridge.train.rmse, lasso.train.rmse, step.train.rmse), 4),
Test.RMSE = round(c(ridge.test.rmse, lasso.test.rmse, step.test.rmse), 4)
)
knitr::kable(results,
caption = "Table 1. Model performance and complexity summary.",
align = c("l", "c", "c", "c", "c"))par(mfrow = c(1, 1))
rmse.mat <- rbind(
Train = c(ridge.train.rmse, lasso.train.rmse, step.train.rmse),
Test = c(ridge.test.rmse, lasso.test.rmse, step.test.rmse)
)
colnames(rmse.mat) <- c("Ridge", "LASSO", "Stepwise")
old.par <- par(mar = c(4, 5, 4, 1))
bp5 <- barplot(
rmse.mat,
beside = TRUE,
col = c("#D6A648", "#C0392B"),
border = "black",
ylim = c(0, max(rmse.mat) * 1.18),
ylab = "RMSE",
main = "Figure 5: Training vs. Test RMSE Across Models",
legend.text = c("Train RMSE", "Test RMSE"),
args.legend = list(x = "topright", bty = "o")
)
text(x = bp5,
y = as.vector(rmse.mat) + 0.10,
labels = round(as.vector(rmse.mat), 3),
cex = 0.85)
par(old.par)old.par <- par(mfrow = c(1, 2), mar = c(4, 4, 4, 1), pty = "s")
# Ridge panel
plot(y.test, ridge.test.pred,
pch = 21, bg = "#D6A648", col = "black", cex = 1.0,
xlab = "Actual Grad.Rate",
ylab = "Predicted Grad.Rate",
main = paste0("Figure 6a: Ridge — Predicted vs. Actual",
"\n(Test RMSE = ", round(ridge.test.rmse, 3), ")"))
abline(0, 1, lty = 2, lwd = 1.2)
grid(col = "lightgray", lty = "dotted")
# LASSO panel
plot(y.test, lasso.test.pred,
pch = 21, bg = "#C0392B", col = "black", cex = 1.0,
xlab = "Actual Grad.Rate",
ylab = "Predicted Grad.Rate",
main = paste0("Figure 6b: LASSO — Predicted vs. Actual",
"\n(Test RMSE = ", round(lasso.test.rmse, 3), ")"))
abline(0, 1, lty = 2, lwd = 1.2)
grid(col = "lightgray", lty = "dotted")
par(old.par)