Team Members: Esubalew Milam, Aubrey Cook, Nadia Shchetnikova, Daniel Zhang
Question 1:
NFL =read.csv("NFL Team Scoring.csv")NFL = NFL[, -1]summary(NFL)
Year PTS TA PAVG
Min. :2009 Min. :10.90 Min. :0.9375 Min. :5.100
1st Qu.:2009 1st Qu.:18.77 1st Qu.:1.3750 1st Qu.:6.500
Median :2010 Median :22.20 Median :1.6250 Median :7.000
Mean :2010 Mean :21.90 Mean :1.6504 Mean :7.042
3rd Qu.:2011 3rd Qu.:24.60 3rd Qu.:1.8906 3rd Qu.:7.700
Max. :2011 Max. :35.00 Max. :2.5000 Max. :9.300
TO SCK RAVG FD
Min. :0.625 Min. :0.8125 Min. :3.300 Min. :14.10
1st Qu.:1.375 1st Qu.:1.7969 1st Qu.:4.000 1st Qu.:17.48
Median :1.594 Median :2.1875 Median :4.200 Median :18.95
Mean :1.655 Mean :2.2259 Mean :4.231 Mean :19.02
3rd Qu.:1.938 3rd Qu.:2.6875 3rd Qu.:4.400 3rd Qu.:20.70
Max. :2.625 Max. :3.5000 Max. :5.400 Max. :26.00
P20 R20 COMP THRD
Min. :1.562 Min. :0.1875 Min. :49.40 Min. :25.80
1st Qu.:2.500 1st Qu.:0.4375 1st Qu.:57.58 1st Qu.:34.02
Median :2.812 Median :0.6875 Median :60.45 Median :37.65
Mean :3.024 Mean :0.7201 Mean :60.36 Mean :38.33
3rd Qu.:3.578 3rd Qu.:0.8750 3rd Qu.:62.95 3rd Qu.:41.73
Max. :4.500 Max. :1.6875 Max. :71.30 Max. :56.70
PA RA RY PY
Min. :24.60 Min. :20.00 Min. : 80.9 Min. :129.8
1st Qu.:31.20 1st Qu.:25.20 1st Qu.:100.7 1st Qu.:192.3
Median :33.95 Median :27.00 Median :114.5 Median :219.0
Mean :33.67 Mean :27.33 Mean :116.1 Mean :223.2
3rd Qu.:36.02 3rd Qu.:29.10 3rd Qu.:127.0 3rd Qu.:253.9
Max. :42.40 Max. :37.90 Max. :172.3 Max. :334.2
attach(NFL)
The NFL Team Scoring dataset contains 96 observations and 16 variables. After removing the team name column, the data includes PTS (points scored per game) as our response variable, along with offensive and defensive statistics such as passing yards (PY), rushing yards (RY), turnovers (TO), sacks (SCK), completion percentage (COMP), and others. Points per game range from 10.9 to 35.0, with a mean of 21.9.
Question 2:
library(glmnet)
Warning: package 'glmnet' was built under R version 4.5.3
Loading required package: Matrix
Loaded glmnet 5.0
x =model.matrix(PTS ~ .^2, NFL)[, -1]y = PTSgrid =exp(1)^seq(10, -5, length =100)set.seed(300)cv.out.r =cv.glmnet(x, y, lambda = grid, alpha =0)plot(cv.out.r)
bestlam.r = cv.out.r$lambda.minbestlam.r
[1] 9.705835
min.r =which.min(cv.out.r$cvm)cv.out.r$cvm[min.r]
[1] 4.476375
ridge.final =glmnet(x, y, alpha =0, lambda = bestlam.r)ridge.coef =predict(ridge.final, type ="coefficients", s = bestlam.r)sum(ridge.coef[-1] !=0)
[1] 120
Ridge regression was applied using 10-fold cross-validation across 100 lambda values. The best lambda was 9.71, producing a minimum CV MSE of 4.48. Because ridge regression never sets coefficients to exactly zero, all 120 predictors (main effects + two-way interactions) remained in the model. The plot shows MSE decreasing as lambda decreases from very large values, then stabilizing near the optimal point.
Question 3:
set.seed(300)cv.out.l =cv.glmnet(x, y, lambda = grid, alpha =1)plot(cv.out.l)
bestlam.l = cv.out.l$lambda.minbestlam.l
[1] 0.07609615
min.l =which.min(cv.out.l$cvm)cv.out.l$cvm[min.l]
[1] 4.638889
lasso.final =glmnet(x, y, alpha =1, lambda = bestlam.l)lasso.coef =predict(lasso.final, type ="coefficients", s = bestlam.l)sum(lasso.coef[-1] !=0)
[1] 19
LASSO was applied with the same grid and 10-fold CV. The best lambda was 0.076, producing a minimum CV MSE of 4.64. Unlike ridge, LASSO shrinks some coefficients all the way to zero — it reduced the model from 120 predictors down to just 19, automatically selecting the most relevant variables. This makes LASSO much easier to interpret than ridge, though its MSE was slightly higher here.
Question 4:
library(pls)
Warning: package 'pls' was built under R version 4.5.3
Attaching package: 'pls'
The following object is masked from 'package:stats':
loadings
msep.pcr =MSEP(pcr.fit)$val[1, 1, ]min.comp.pcr =which.min(msep.pcr) -1min.mse.pcr =min(msep.pcr)cat("PCR - Number of components at min MSE:", min.comp.pcr, "\n")
PCR - Number of components at min MSE: 5
cat("PCR - Minimum CV MSE:", min.mse.pcr, "\n")
PCR - Minimum CV MSE: 4.179948
PCR reduces the 120 predictors into a smaller set of uncorrelated principal components. The minimum CV MSE of 4.18 was achieved at just 5 components — meaning 5 linear combinations of the original predictors captured most of the useful information. This is the best MSE of all four methods. The validation plot shows MSE drops sharply in the first few components then rises again, confirming 5 is the sweet spot.
msep.pls =MSEP(pls.fit)$val[1, 1, ]min.comp.pls =which.min(msep.pls) -1min.mse.pls =min(msep.pls)cat("PLS - Number of components at min MSE:", min.comp.pls, "\n")
PLS - Number of components at min MSE: 2
cat("PLS - Minimum CV MSE:", min.mse.pls, "\n")
PLS - Minimum CV MSE: 4.324006
PLS is similar to PCR but constructs components using both the predictors and the response (PTS), making each component more directly useful for prediction. It achieved its minimum CV MSE at fewer components than PCR, demonstrating that response-guided dimension reduction is more efficient. The summary output shows RMSEP (root MSE) declining with added components before leveling off.
Across all four methods, PCR achieved the lowest CV MSE (4.18) using only 5 components. Ridge and LASSO performed similarly in terms of MSE, but LASSO is far more interpretable by retaining only 19 of 120 predictors. PLS offers a balance, fewer components than PCR with competitive accuracy. For prediction accuracy, PCR wins; for interpretability, LASSO wins.
Question 6:
library(randomForest)
Warning: package 'randomForest' was built under R version 4.5.3
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.
Call:
randomForest(formula = PTS ~ ., data = NFL, mtry = 15, ntree = 5000, importance = T)
Type of random forest: regression
Number of trees: 5000
No. of variables tried at each split: 15
Mean of squared residuals: 6.610999
% Var explained: 71.55
plot(bag.fit$mse)
plot(bag.fit$mse[1000:5000])
5,000 trees are not sufficient to reach a plateu; the graph shows high fluctuation even at 5,000 trees.
Call:
randomForest(formula = PTS ~ ., data = NFL, mtry = 5, ntree = 5000, importance = T)
Type of random forest: regression
Number of trees: 5000
No. of variables tried at each split: 5
Mean of squared residuals: 6.108013
% Var explained: 73.71
plot(bag.fit1$mse)
plot(bag.fit1$mse[1000:5000])
5,000 trees are not sufficient to reach a plateau; the graph shows high fluctuation even at 5,000 trees.
The resulting MSE of the model is 6.108 (lower than the previous model with mtry = 15).
bag.fit1$mse[5000]
[1] 6.108013
Question 9:
library(gbm)
Warning: package 'gbm' was built under R version 4.5.3
Loaded gbm 2.2.3
This version of gbm is no longer under development. Consider transitioning to gbm3, https://github.com/gbm-developers/gbm3
The y-axis range of this plot makes it difficult to differentiate the minimum mse point from other points close to the minimum.
The minimum test MSE occurs at 4,718 trees and has a value of 5.05 (the lowest MSE so far).
min.n=which.min(gbm.fit$cv.error)min.n
[1] 4718
gbm.fit$cv.error[min.n]
[1] 5.04952
Question 10:
summary(gbm.fit, min.n, las=1)
var rel.inf
FD FD 22.8020050
PAVG PAVG 13.2670156
TA TA 10.5385469
PY PY 7.6143880
THRD THRD 7.3123765
COMP COMP 6.6261006
TO TO 5.3845372
RY RY 5.1841922
P20 P20 5.1549510
RA RA 4.2577190
SCK SCK 4.1261189
PA PA 3.3306449
R20 R20 2.0824099
RAVG RAVG 1.4335399
Year Year 0.8854545
var rel.inf
FD FD 23.9004916
PAVG PAVG 13.0800572
TA TA 10.6390082
PY PY 9.2795725
THRD THRD 6.8806183
COMP COMP 6.8023852
TO TO 5.1214795
P20 P20 4.9194088
RY RY 4.3321599
RA RA 4.2624045
SCK SCK 4.0033664
PA PA 2.8855717
R20 R20 1.9881341
RAVG RAVG 1.2167642
Year Year 0.6885778
Call:
lm(formula = PTS ~ FD + R1 + FD:R1, data = NFL2)
Residuals:
Min 1Q Median 3Q Max
-1.30783 -0.83634 -0.00817 0.88128 1.30003
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2009.2699 0.9306 2159.219 <2e-16 ***
FD 0.1875 0.2182 0.859 0.3926
R1 -1.5718 0.8281 -1.898 0.0608 .
FD:R1 0.3916 0.1907 2.054 0.0428 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8065 on 92 degrees of freedom
Multiple R-squared: 0.06493, Adjusted R-squared: 0.03444
F-statistic: 2.13 on 3 and 92 DF, p-value: 0.1018
No predictors can be removed from the model since FD:R1 is significant. The terms involved (FD and R1) can’t be removed even if they’re individually insignificant due to the hierarchical principle.
Question 16:
mean(lm.fit0$residuals^2)
[1] 0.623379
Ridge: 4.48
LASSO: 4.64
PCR: 4.18
PLS: 4.32
Bagging (mtry=15): 6.61
Random Forest (mtry=5): 6.11
Boosting depth=1: 5.05
Boosting depth=2: 5.58
Q15 model: 0.62
Team project 1 mode: 4.15
The model from Q15 technically has the lowest test MSE, but it isn’t very comparable since it uses scaled data. The model from team project 1 is still the best model, but PCR is the best among the newer models.