University of Nevada, Reno (UNR)
The aims of this project are to simulate multivariate data from three underlying models with varying degrees of correlation among predictors and observing how effective different variable selection methods perform on each dataset for linear regression. Thus, we first simulate 50 datasets for each of the first three underlying models from section 7 of the Paper “Regression Shrinkage and Selection via the Lasso” by Tibshirani (1996). We then fit linear regression models for several variable selection and shrinkage methods and evaluate performances with Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Adjusted \(R^2\) as performance metrics. My next goal is to vectorize all for loops and generally reduce redundant variable storage where possible. The following methods are considered.
We use the following performance metrics to measure variable selection method performances.
\[ AIC = -2\text{log}(\mathscr{L}(\hat{\theta}_{MLE}|y)) + 2k \]
\[ BIC = -2\text{log}(\mathscr{L}(\hat{\theta}_{MLE}|y)) + k\;log(n) \]
\[ R_{adj}^{2}=1-\frac{RSS/(n-k-1)}{\sum_{i=1}^{n}(y_i-\bar{y})^2/(n-1)} \]
In this section we simulate data from the three aforementioned underlying models.
Data from the first underlying model consists of 50 simulated datasets with 20 observations each, from the underlying model as outlined by Tibshirani (1996) we have the model such that \(\boldsymbol{\beta}=(3,1.5,0,0,2,0,0,0)^T\) , and \(\sigma=3\), and the correlation between \(x_i\) and \(x_j\) as \(\rho^{|i-j|}\) where \(\rho=0.5\). Thus, we have the model as
\[ y = \boldsymbol{\beta}^{T}\boldsymbol{x} + \sigma\boldsymbol{\epsilon} \] R-Implementation:
# Variable Declarations
n_obs1 <- 20
n_simul1 <-50
p1 <- 8
corr1 <- 0.5
sigma1 <- 3
beta1 <- c(3, 1.5 ,0 ,0 ,2 ,0, 0, 0)
# Simulating The Matrix of Predictors
CovarMat1 <- outer(1:length(beta1), 1:length(beta1), function(i,j){corr1^abs(i-j)})
mean_vect1 <- rep(0,length(beta1))
X_n_simul1 <- list()
for(i in 1:n_simul1){
X_n_simul1[[i]] <- as.matrix(mvrnorm(n = n_obs1,mu = mean_vect1,Sigma = CovarMat1))
}
# Generate Y
epsilon1 <- rnorm(n_obs1,mean = 0,sd = sigma1)
y_n_simul1 <- list()
for(i in 1:n_simul1){
y_n_simul1[[i]] <- X_n_simul1[[i]] %*% beta1 + epsilon1
}
# Create Full Datasets
data1 <- list()
for(i in 1:n_simul1){
data1[[i]] <- data.frame(cbind(y_n_simul1[[i]],X_n_simul1[[i]]))
colnames(data1[[i]]) <- c("Y","X1","X2","X3","X4","X5","X6","X7","X8")
}Data from the second underlying model also consists of 50 simulated datasets with 20 observations each with the same model structure as for model 1, but now with parameter values \(\beta_{j}=0.85\) for every \(j\), \(\sigma=3\), and the correlation between \(x_i\) and \(x_j\) as \(\rho^{|i-j|}\) where \(\rho=0.5\).
R-Implementation:
# Variable declarations
n_obs2 <- 20
n_simul2 <-50
p2 <- 8
corr2 <- 0.5
sigma2 <- 3
beta2 <- c(0.85, 0.85, 0.85 ,0.85 ,0.85 ,0.85 ,0.85 ,0.85)
#Simulating Matrix of Predictors
CovarMat2 <- outer(1:length(beta2), 1:length(beta2), function(i,j){corr2^abs(i-j)})
mean_vect2 <- rep(0,length(beta2))
X_n_simul2 <- list()
for(i in 1:n_simul2){
X_n_simul2[[i]] <- as.matrix(mvrnorm(n = n_obs2,mu = mean_vect2,Sigma = CovarMat2))
}
#Generate Y
epsilon2 <- rnorm(n_obs2,mean = 0,sd = sigma2)
y_n_simul2 <- list()
for(i in 1:n_simul2){
y_n_simul2[[i]] <- X_n_simul2[[i]] %*% beta2 + epsilon2
}
# Create Full Datasets
data2 <- list()
for(i in 1:n_simul2){
data2[[i]] <- data.frame(cbind(y_n_simul2[[i]],X_n_simul2[[i]]))
colnames(data2[[i]]) <- c("Y","X1","X2","X3","X4","X5","X6","X7","X8")
}Data from the third underlying model also consists of 50 simulated datasets with 20 observations each with the same model structure as for model 1, but now with parameter values \(\boldsymbol{\beta}=(5,0,0,0,0,0,0,0)^T\), \(\sigma=2\), and the correlation between \(x_i\) and \(x_j\) as \(\rho^{|i-j|}\) where \(\rho=0.5\).
R-Implementation:
# Variable declaration for simulating dataset
n_obs3 <- 20
n_simul3 <-50
p3 <- 8
corr3 <- 0.5
sigma3 <- 2
beta3 <- c(5, 0, 0, 0, 0, 0, 0, 0)
#Simulating matrix of predictors X
CovarMat3 <- outer(1:length(beta3), 1:length(beta3), function(i,j){corr3^abs(i-j)})
mean_vect3 <- rep(0,length(beta3))
X_n_simul3 <- list()
for(i in 1:n_simul3){
X_n_simul3[[i]] <- as.matrix(mvrnorm(n = n_obs3,mu = mean_vect3,Sigma = CovarMat3))
}
#Generate y from X
epsilon3 <- rnorm(n_obs3,mean = 0,sd = sigma3)
y_n_simul3 <- list()
for(i in 1:n_simul3){
y_n_simul3[[i]] <- X_n_simul3[[i]] %*% beta3 + epsilon3
}
# Create Full Datasets
data3 <- list()
for(i in 1:n_simul3){
data3[[i]] <- data.frame(cbind(y_n_simul3[[i]],X_n_simul3[[i]]))
colnames(data3[[i]]) <- c("Y","X1","X2","X3","X4","X5","X6","X7","X8")
}In this section, we fit the regression models and measure model performances on each underlying model. For LASSO and Ridge we use the glmnet R package, and for all-possible, forward, backward, and stepwise selection we use the olsrr package.
We first fit the OLS Regression model and measure model performances. It is important to mention that no predictors are discarded for OLS Regression. We have the data as \((\boldsymbol{x}^{i},y_i)\) for \(i=1,2,...,N\), where \(\boldsymbol{x}^{i}=(x_{i1},...,x_{ip})^T\), and according to Garet et al. (2013) we thus estimate \(\beta_{0},\beta_{1},...,\beta_{p}\) to minimize
\[ \text{RSS} = \sum^{n}_{i=1}\left(y_{i}-\beta_{0}-\sum^{p}_{j=1}\beta_{j}x_{ij}\right)^2 \]
R-Implementation:
#Underlying Model 1
OLS_AIC_Data1 <- vector(length = n_simul1)
OLS_BIC_Data1 <- vector(length = n_simul1)
OLS_R2_Data1 <- vector(length = n_simul1)
for(i in 1:n_simul1){
OLS_fit1 <- lm(y_n_simul1[[i]]~X_n_simul1[[i]])
OLS_AIC_Data1[i] <- AIC(OLS_fit1)
OLS_BIC_Data1[i] <- BIC(OLS_fit1)
OLS_R2_Data1[i] <- summary(OLS_fit1)$adj.r.squared
}
mean_OLS_AIC_Data1 <- mean(OLS_AIC_Data1)
mean_OLS_BIC_Data1 <- mean(OLS_BIC_Data1)
mean_OLS_R2_Data1 <- mean(OLS_R2_Data1)
med_OLS_AIC_Data1 <- median(OLS_AIC_Data1)
med_OLS_BIC_Data1 <- median(OLS_BIC_Data1)
med_OLS_R2_Data1 <- median(OLS_R2_Data1)
OLS_row1 <- cbind("OLS", mean_OLS_AIC_Data1,
mean_OLS_BIC_Data1,
mean_OLS_R2_Data1,
med_OLS_AIC_Data1,
med_OLS_BIC_Data1,
med_OLS_R2_Data1)
#Underlying Model 2
OLS_AIC_Data2 <- vector(length = n_simul2)
OLS_BIC_Data2 <- vector(length = n_simul2)
OLS_R2_Data2 <- vector(length = n_simul2)
for(i in 1:n_simul2){
OLS_fit2 <- lm(y_n_simul2[[i]]~X_n_simul2[[i]])
OLS_AIC_Data2[i] <- AIC(OLS_fit2)
OLS_BIC_Data2[i] <- BIC(OLS_fit2)
OLS_R2_Data2[i] <- summary(OLS_fit2)$adj.r.squared
}
mean_OLS_AIC_Data2 <- mean(OLS_AIC_Data2)
mean_OLS_BIC_Data2 <- mean(OLS_BIC_Data2)
mean_OLS_R2_Data2 <- mean(OLS_R2_Data2)
med_OLS_AIC_Data2 <- median(OLS_AIC_Data2)
med_OLS_BIC_Data2 <- median(OLS_BIC_Data2)
med_OLS_R2_Data2 <- median(OLS_R2_Data2)
OLS_row2 <- cbind("OLS", mean_OLS_AIC_Data2,
mean_OLS_BIC_Data2,
mean_OLS_R2_Data2,
med_OLS_AIC_Data2,
med_OLS_BIC_Data2,
med_OLS_R2_Data2)
#Underlying Model 3
OLS_AIC_Data3 <- vector(length = n_simul3)
OLS_BIC_Data3 <- vector(length = n_simul3)
OLS_R2_Data3 <- vector(length = n_simul3)
for(i in 1:n_simul3){
OLS_fit3 <- lm(y_n_simul3[[i]]~X_n_simul3[[i]])
OLS_AIC_Data3[i] <- AIC(OLS_fit3)
OLS_BIC_Data3[i] <- BIC(OLS_fit3)
OLS_R2_Data3[i] <- summary(OLS_fit3)$adj.r.squared
}
mean_OLS_AIC_Data3 <- mean(OLS_AIC_Data3)
mean_OLS_BIC_Data3 <- mean(OLS_BIC_Data3)
mean_OLS_R2_Data3 <- mean(OLS_R2_Data3)
med_OLS_AIC_Data3 <- median(OLS_AIC_Data3)
med_OLS_BIC_Data3 <- median(OLS_BIC_Data3)
med_OLS_R2_Data3 <- median(OLS_R2_Data3)
OLS_row3 <- cbind("OLS", mean_OLS_AIC_Data3,
mean_OLS_BIC_Data3,
mean_OLS_R2_Data3,
med_OLS_AIC_Data3,
med_OLS_BIC_Data3,
med_OLS_R2_Data3)We now outline the LASSO technique according to Tibshirani (1996), fit the LASSO Regression model, and measure model performances. We have the data as \((\boldsymbol{x}^{i},y_i)\) for \(i=1,2,...,N\), where \(\boldsymbol{x}^{i}=(x_{i1},...,x_{ip})^T\) and the predictors are standardized. Additionally, we define \(\boldsymbol{\hat{\beta}}=(\hat{\beta}_{1},...,\hat{\beta}_{p})^T\) and tuning parameter \(t\geq0\) and so the LASSO estimate \((\boldsymbol{\hat{\alpha}},\boldsymbol{\hat{\beta}})\) is defined as
\[ (\boldsymbol{\hat{\alpha}},\boldsymbol{\hat{\beta}})=\text{arg min}\left\{\sum^{N}_{i=1}(y_i-\alpha-\sum_{j}\beta_{j}x_{ij})^2\right\} \text{ subject to } \sum_{j}|\beta_j|\leq t \]
R-Implementation:
#Underlying Model 1
LASSO_AIC_Data1 <- vector(length = n_simul1)
LASSO_BIC_Data1 <- vector(length = n_simul1)
LASSO_R2_Data1 <- vector(length = n_simul1)
for(i in 1:n_simul1){
lasso_fit1 = glmnet(X_n_simul1[[i]], y_n_simul1[[i]], alpha = 1, family="gaussian")
tLL <- lasso_fit1$nulldev - deviance(lasso_fit1)
k <- lasso_fit1$df
n <- lasso_fit1$nobs
LASSO_AIC_Data1[i] <- tail(-tLL+2*k+2*k*(k+1)/(n-k-1),n=1)
LASSO_BIC_Data1[i] <- tail(log(n)*k - tLL,n=1)
LASSO_R2_Data1[i] <- tail(lasso_fit1$dev.ratio,n=1)
}
mean_LASSO_AIC_Data1 <- mean(LASSO_AIC_Data1)
mean_LASSO_BIC_Data1 <- mean(LASSO_BIC_Data1)
mean_LASSO_R2_Data1 <- mean(LASSO_R2_Data1)
med_LASSO_AIC_Data1 <- median(LASSO_AIC_Data1)
med_LASSO_BIC_Data1 <- median(LASSO_BIC_Data1)
med_LASSO_R2_Data1 <- median(LASSO_R2_Data1)
LASSO_row1 <- cbind("LASSO", mean_LASSO_AIC_Data1,
mean_LASSO_BIC_Data1,
mean_LASSO_R2_Data1,
med_LASSO_AIC_Data1,
med_LASSO_BIC_Data1,
med_LASSO_R2_Data1)
#Underlying Model 2
LASSO_AIC_Data2 <- vector(length = n_simul2)
LASSO_BIC_Data2 <- vector(length = n_simul2)
LASSO_R2_Data2 <- vector(length = n_simul2)
for(i in 1:n_simul2){
lasso_fit2 = glmnet(X_n_simul2[[i]], y_n_simul2[[i]], alpha = 1, family="gaussian")
tLL <- lasso_fit2$nulldev - deviance(lasso_fit2)
k <- lasso_fit2$df
n <- lasso_fit2$nobs
LASSO_AIC_Data2[i] <- tail(-tLL+2*k+2*k*(k+1)/(n-k-1),n=1)
LASSO_BIC_Data2[i] <- tail(log(n)*k - tLL,n=1)
LASSO_R2_Data2[i] <- tail(lasso_fit2$dev.ratio,n=1)
}
mean_LASSO_AIC_Data2 <- mean(LASSO_AIC_Data2)
mean_LASSO_BIC_Data2 <- mean(LASSO_BIC_Data2)
mean_LASSO_R2_Data2 <- mean(LASSO_R2_Data2)
med_LASSO_AIC_Data2 <- median(LASSO_AIC_Data2)
med_LASSO_BIC_Data2 <- median(LASSO_BIC_Data2)
med_LASSO_R2_Data2 <- median(LASSO_R2_Data2)
LASSO_row2 <- cbind("LASSO", mean_LASSO_AIC_Data2,
mean_LASSO_BIC_Data2,
mean_LASSO_R2_Data2,
med_LASSO_AIC_Data2,
med_LASSO_BIC_Data2,
med_LASSO_R2_Data2)
#Underlying Model 3
LASSO_AIC_Data3 <- vector(length = n_simul3)
LASSO_BIC_Data3 <- vector(length = n_simul3)
LASSO_R2_Data3 <- vector(length = n_simul3)
for(i in 1:n_simul3){
lasso_fit3 = glmnet(X_n_simul3[[i]], y_n_simul3[[i]], alpha = 1, family="gaussian")
tLL <- lasso_fit3$nulldev - deviance(lasso_fit3)
k <- lasso_fit3$df
n <- lasso_fit3$nobs
LASSO_AIC_Data3[i] <- tail(-tLL+2*k+2*k*(k+1)/(n-k-1),n=1)
LASSO_BIC_Data3[i] <- tail(log(n)*k - tLL,n=1)
LASSO_R2_Data3[i] <- tail(lasso_fit3$dev.ratio,n=1)
}
mean_LASSO_AIC_Data3 <- mean(LASSO_AIC_Data3)
mean_LASSO_BIC_Data3 <- mean(LASSO_BIC_Data3)
mean_LASSO_R2_Data3 <- mean(LASSO_R2_Data3)
med_LASSO_AIC_Data3 <- median(LASSO_AIC_Data3)
med_LASSO_BIC_Data3 <- median(LASSO_BIC_Data3)
med_LASSO_R2_Data3 <- median(LASSO_R2_Data3)
LASSO_row3 <- cbind("LASSO", mean_LASSO_AIC_Data3,
mean_LASSO_BIC_Data3,
mean_LASSO_R2_Data3,
med_LASSO_AIC_Data3,
med_LASSO_BIC_Data3,
med_LASSO_R2_Data3)We now outline the Ridge Regression technique according to Gareth James et al. (2013), fit the Ridge Regression model, and measure model performances. Ridge Regression has a similar setup to OLS, although the quantity to be minimized is different than for OLS and LASSO Regression. We define tuning parameter \(\lambda \geq 0\), and so we have regression coefficient estimates \(\boldsymbol{\hat{\beta}}\) to minimize
\[ \sum_{i=1}^{n}\left(y_{i}-\beta_{0}-\sum_{j=1}^{p}\beta_{j}x_{ij}\right)^2 + \lambda\sum^{p}_{j=1}\beta^{2}_{j} \] R-Implementation:
#Underlying Model 1
Rid_AIC_Data1 <- vector(length = n_simul1)
Rid_BIC_Data1 <- vector(length = n_simul1)
Rid_R2_Data1 <- vector(length = n_simul1)
for(i in 1:n_simul1){
Rid_fit1 = glmnet(X_n_simul1[[i]], y_n_simul1[[i]], alpha = 0, family="gaussian")
tLL <- Rid_fit1$nulldev - deviance(Rid_fit1)
k <- Rid_fit1$df
n <- Rid_fit1$nobs
Rid_AIC_Data1[i] <- tail(-tLL+2*k+2*k*(k+1)/(n-k-1),n=1)
Rid_BIC_Data1[i] <- tail(log(n)*k - tLL,n=1)
Rid_R2_Data1[i] <- tail(Rid_fit1$dev.ratio,n=1)
}
mean_Rid_AIC_Data1 <- mean(Rid_AIC_Data1)
mean_Rid_BIC_Data1 <- mean(Rid_BIC_Data1)
mean_Rid_R2_Data1 <- mean(Rid_R2_Data1)
med_Rid_AIC_Data1 <- median(Rid_AIC_Data1)
med_Rid_BIC_Data1 <- median(Rid_BIC_Data1)
med_Rid_R2_Data1 <- median(Rid_R2_Data1)
Rid_row1 <- cbind("Ridge", mean_Rid_AIC_Data1,
mean_Rid_BIC_Data1,
mean_Rid_R2_Data1,
med_Rid_AIC_Data1,
med_Rid_BIC_Data1,
med_Rid_R2_Data1)
#Underlying Model 2
Rid_AIC_Data2 <- vector(length = n_simul2)
Rid_BIC_Data2 <- vector(length = n_simul2)
Rid_R2_Data2 <- vector(length = n_simul2)
for(i in 1:n_simul2){
Rid_fit2 = glmnet(X_n_simul2[[i]], y_n_simul2[[i]], alpha = 0, family="gaussian")
tLL <- Rid_fit2$nulldev - deviance(Rid_fit2)
k <- Rid_fit2$df
n <- Rid_fit2$nobs
Rid_AIC_Data2[i] <- tail(-tLL+2*k+2*k*(k+1)/(n-k-1),n=1)
Rid_BIC_Data2[i] <- tail(log(n)*k - tLL,n=1)
Rid_R2_Data2[i] <- tail(Rid_fit2$dev.ratio,n=1)
}
mean_Rid_AIC_Data2 <- mean(Rid_AIC_Data2)
mean_Rid_BIC_Data2 <- mean(Rid_BIC_Data2)
mean_Rid_R2_Data2 <- mean(Rid_R2_Data2)
med_Rid_AIC_Data2 <- median(Rid_AIC_Data2)
med_Rid_BIC_Data2 <- median(Rid_BIC_Data2)
med_Rid_R2_Data2 <- median(Rid_R2_Data2)
Rid_row2 <- cbind("Ridge", mean_Rid_AIC_Data2,
mean_Rid_BIC_Data2,
mean_Rid_R2_Data2,
med_Rid_AIC_Data2,
med_Rid_BIC_Data2,
med_Rid_R2_Data2)
#Underlying Model 3
Rid_AIC_Data3 <- vector(length = n_simul3)
Rid_BIC_Data3 <- vector(length = n_simul3)
Rid_R2_Data3 <- vector(length = n_simul3)
for(i in 1:n_simul3){
Rid_fit3 = glmnet(X_n_simul3[[i]], y_n_simul3[[i]], alpha = 0, family="gaussian")
tLL <- Rid_fit3$nulldev - deviance(Rid_fit3)
k <- Rid_fit3$df
n <- Rid_fit3$nobs
Rid_AIC_Data3[i] <- tail(-tLL+2*k+2*k*(k+1)/(n-k-1),n=1)
Rid_BIC_Data3[i] <- tail(log(n)*k - tLL,n=1)
Rid_R2_Data3[i] <- tail(Rid_fit3$dev.ratio,n=1)
}
mean_Rid_AIC_Data3 <- mean(Rid_AIC_Data3)
mean_Rid_BIC_Data3 <- mean(Rid_BIC_Data3)
mean_Rid_R2_Data3 <- mean(Rid_R2_Data3)
med_Rid_AIC_Data3 <- median(Rid_AIC_Data3)
med_Rid_BIC_Data3 <- median(Rid_BIC_Data3)
med_Rid_R2_Data3 <- median(Rid_R2_Data3)
Rid_row3 <- cbind("Ridge", mean_Rid_AIC_Data3,
mean_Rid_BIC_Data3,
mean_Rid_R2_Data3,
med_Rid_AIC_Data3,
med_Rid_BIC_Data3,
med_Rid_R2_Data3)We now outline the All Possible Regression technique according to Hocking (1976), fit All Possible Regression Models, measure model performances, and. All Possible Regression is a computationally intensive variable selection technique where the given number of predictors \(t\) cannot be too large, as the number of all possible models is \(2^t\). All \(2^t\) possible models will be fit and the model with the best performance will be selected. It is important to mention that Hocking (1976) emphasizes approaches and modifications to All Possible Regression that optimize the procedure for evaluating all possible subsets, which we will not further consider for this project.
R-Implementation:
# Underlying Model 1
All_AIC_Data1 <- vector(length = n_simul1)
All_BIC_Data1 <- vector(length = n_simul1)
All_R2_Data1 <- vector(length = n_simul1)
for(i in 1:n_simul1){
sig1 <- lm(Y~.,data=data1[[i]])
All_fit1 <- ols_step_all_possible(sig1)
All_AIC_Data1[i] <- min(All_fit1$aic)
All_BIC_Data1[i] <- min(All_fit1$sbc)
All_R2_Data1[i] <- max(All_fit1$adjr)
}
mean_All_AIC_Data1 <- mean(All_AIC_Data1)
mean_All_BIC_Data1 <- mean(All_BIC_Data1)
mean_All_R2_Data1 <- mean(All_R2_Data1)
med_All_AIC_Data1 <- median(All_AIC_Data1)
med_All_BIC_Data1 <- median(All_BIC_Data1)
med_All_R2_Data1 <- median(All_R2_Data1)
All_row1 <- cbind("All-Possible Regression", mean_All_AIC_Data1,
mean_All_BIC_Data1,
mean_All_R2_Data1,
med_All_AIC_Data1,
med_All_BIC_Data1,
med_All_R2_Data1)
# Underlying Model 2
All_AIC_Data2 <- vector(length = n_simul2)
All_BIC_Data2 <- vector(length = n_simul2)
All_R2_Data2 <- vector(length = n_simul2)
for(i in 1:n_simul2){
sig2 <- lm(Y~.,data=data2[[i]])
All_fit2 <- ols_step_all_possible(sig2)
All_AIC_Data2[i] <- min(All_fit2$aic)
All_BIC_Data2[i] <- min(All_fit2$sbc)
All_R2_Data2[i] <- max(All_fit2$adjr)
}
mean_All_AIC_Data2 <- mean(All_AIC_Data2)
mean_All_BIC_Data2 <- mean(All_BIC_Data2)
mean_All_R2_Data2 <- mean(All_R2_Data2)
med_All_AIC_Data2 <- median(All_AIC_Data2)
med_All_BIC_Data2 <- median(All_BIC_Data2)
med_All_R2_Data2 <- median(All_R2_Data2)
All_row2 <- cbind("All-Possible Regression", mean_All_AIC_Data2,
mean_All_BIC_Data2,
mean_All_R2_Data2,
med_All_AIC_Data2,
med_All_BIC_Data2,
med_All_R2_Data2)
# Underlying Model 3
All_AIC_Data3 <- vector(length = n_simul3)
All_BIC_Data3 <- vector(length = n_simul3)
All_R2_Data3 <- vector(length = n_simul3)
for(i in 1:n_simul3){
sig3 <- lm(Y~.,data=data3[[i]])
All_fit3 <- ols_step_all_possible(sig3)
All_AIC_Data3[i] <- min(All_fit3$aic)
All_BIC_Data3[i] <- min(All_fit3$sbc)
All_R2_Data3[i] <- max(All_fit3$adjr)
}
mean_All_AIC_Data3 <- mean(All_AIC_Data3)
mean_All_BIC_Data3 <- mean(All_BIC_Data3)
mean_All_R2_Data3 <- mean(All_R2_Data3)
med_All_AIC_Data3 <- median(All_AIC_Data3)
med_All_BIC_Data3 <- median(All_BIC_Data3)
med_All_R2_Data3 <- median(All_R2_Data3)
All_row3 <- cbind("All-Possible Regression", mean_All_AIC_Data3,
mean_All_BIC_Data3,
mean_All_R2_Data3,
med_All_AIC_Data3,
med_All_BIC_Data3,
med_All_R2_Data3)We now outline the Forward Selection technique, fit Forward Selection Regression Models with a cutoff p-value of \(0.5\) for the \(F\)-statistic outlined below, and measure model performances. According to Hocking (1976) the Forward Selection technique starts with no variables and adds one variable at a time to the equation until either all variables are in or a criteria to stop the technique is satisfied. Furthermore, the variable considered for inclusion at any step is the one yielding the largest F-ratio among the variables being considered for inclusion. Thus, the variable \(i\) is added to the equation of \(p\)-terms if
\[ F_{i} = \underset{i}{\max}\left(\frac{RSS_{p} - RSS_{p+i}}{\hat{\sigma}_{p+i}^2}\right) > F_{in} \] R-Implementation:
#Underlying Model 1
For_AIC_Data1 <- vector(length = n_simul1)
For_BIC_Data1 <- vector(length = n_simul1)
For_R2_Data1 <- vector(length = n_simul1)
for(i in 1:n_simul1){
sig1 <- lm(Y~.,data=data1[[i]])
For_fit1 <- ols_step_forward_p(sig1, penter = 0.5)
For_AIC_Data1[i] <- min(For_fit1$aic)
For_BIC_Data1[i] <- min(For_fit1$sbc)
For_R2_Data1[i] <- max(For_fit1$adjr)
}
mean_For_AIC_Data1 <- mean(For_AIC_Data1)
mean_For_BIC_Data1 <- mean(For_BIC_Data1)
mean_For_R2_Data1 <- mean(For_R2_Data1)
med_For_AIC_Data1 <- median(For_AIC_Data1)
med_For_BIC_Data1 <- median(For_BIC_Data1)
med_For_R2_Data1 <- median(For_R2_Data1)
For_row1 <- cbind("Forward Selection", mean_For_AIC_Data1,
mean_For_BIC_Data1,
mean_For_R2_Data1,
med_For_AIC_Data1,
med_For_BIC_Data1,
med_For_R2_Data1)
#Underlying Model 2
For_AIC_Data2 <- vector(length = n_simul2)
For_BIC_Data2 <- vector(length = n_simul2)
For_R2_Data2 <- vector(length = n_simul2)
for(i in 1:n_simul2){
sig2 <- lm(Y~.,data=data2[[i]])
For_fit2 <- ols_step_forward_p(sig2, penter = 0.5)
For_AIC_Data2[i] <- min(For_fit2$aic)
For_BIC_Data2[i] <- min(For_fit2$sbc)
For_R2_Data2[i] <- max(For_fit2$adjr)
}
mean_For_AIC_Data2 <- mean(For_AIC_Data2)
mean_For_BIC_Data2 <- mean(For_BIC_Data2)
mean_For_R2_Data2 <- mean(For_R2_Data2)
med_For_AIC_Data2 <- median(For_AIC_Data2)
med_For_BIC_Data2 <- median(For_BIC_Data2)
med_For_R2_Data2 <- median(For_R2_Data2)
For_row2 <- cbind("Forward Selection", mean_For_AIC_Data2,
mean_For_BIC_Data2,
mean_For_R2_Data2,
med_For_AIC_Data2,
med_For_BIC_Data2,
med_For_R2_Data2)
#Underlying Model 3
For_AIC_Data3 <- vector(length = n_simul3)
For_BIC_Data3 <- vector(length = n_simul3)
For_R2_Data3 <- vector(length = n_simul3)
for(i in 1:n_simul3){
sig3 <- lm(Y~.,data=data3[[i]])
For_fit3 <- ols_step_forward_p(sig3, penter = 0.5)
For_AIC_Data3[i] <- min(For_fit3$aic)
For_BIC_Data3[i] <- min(For_fit3$sbc)
For_R2_Data3[i] <- max(For_fit3$adjr)
}
mean_For_AIC_Data3 <- mean(For_AIC_Data3)
mean_For_BIC_Data3 <- mean(For_BIC_Data3)
mean_For_R2_Data3 <- mean(For_R2_Data3)
med_For_AIC_Data3 <- median(For_AIC_Data3)
med_For_BIC_Data3 <- median(For_BIC_Data3)
med_For_R2_Data3 <- median(For_R2_Data3)
For_row3 <- cbind("Forward Selection", mean_For_AIC_Data3,
mean_For_BIC_Data3,
mean_For_R2_Data3,
med_For_AIC_Data3,
med_For_BIC_Data3,
med_For_R2_Data3)We now outline the Backward Selection technique, fit Backward Elimination models with a cutoff p-value of \(0.1\) for the \(F\)-statistic outlined below, and measure model performances. According to Hocking (1976) the Backward Elimination technique starts with all variables and eliminates one variable at a time from the equation until either all variables are out or a criteria to stop the technique is satisfied. Furthermore, the variable considered for exclusion at any step is the one yielding the smallest F-ratio among the variables being considered for exclusion (if the F-ratio does not exceed a specified value). Thus, the variable \(i\) is removed removed from the equation of \(p\)-terms if
\[ F_{i} = \underset{i}{\min}\left(\frac{RSS_{p-i} - RSS_{p}}{\hat{\sigma}_{p}^2}\right) < F_{out} \] R-Implementation:
#Underlying Model 1
Back_AIC_Data1 <- vector(length = n_simul1)
Back_BIC_Data1 <- vector(length = n_simul1)
Back_R2_Data1 <- vector(length = n_simul1)
for(i in 1:n_simul1){
sig1 <- lm(Y~.,data=data1[[i]])
Back_fit1 <- ols_step_backward_p(sig1, prem = 0.1)
Back_AIC_Data1[i] <- min(Back_fit1$aic)
Back_BIC_Data1[i] <- min(Back_fit1$sbc)
Back_R2_Data1[i] <- max(Back_fit1$adjr)
}
mean_Back_AIC_Data1 <- mean(Back_AIC_Data1)
mean_Back_BIC_Data1 <- mean(Back_BIC_Data1)
mean_Back_R2_Data1 <- mean(Back_R2_Data1)
med_Back_AIC_Data1 <- median(Back_AIC_Data1)
med_Back_BIC_Data1 <- median(Back_BIC_Data1)
med_Back_R2_Data1 <- median(Back_R2_Data1)
Back_row1 <- cbind("Backward Selection", mean_Back_AIC_Data1,
mean_Back_BIC_Data1,
mean_Back_R2_Data1,
med_Back_AIC_Data1,
med_Back_BIC_Data1,
med_Back_R2_Data1)
#Underlying Model 2
Back_AIC_Data2 <- vector(length = n_simul2)
Back_BIC_Data2 <- vector(length = n_simul2)
Back_R2_Data2 <- vector(length = n_simul2)
for(i in 1:n_simul2){
sig2 <- lm(Y~.,data=data2[[i]])
Back_fit2 <- ols_step_backward_p(sig2, prem = 0.1)
Back_AIC_Data2[i] <- min(Back_fit2$aic)
Back_BIC_Data2[i] <- min(Back_fit2$sbc)
Back_R2_Data2[i] <- max(Back_fit2$adjr)
}
mean_Back_AIC_Data2 <- mean(Back_AIC_Data2)
mean_Back_BIC_Data2 <- mean(Back_BIC_Data2)
mean_Back_R2_Data2 <- mean(Back_R2_Data2)
med_Back_AIC_Data2 <- median(Back_AIC_Data2)
med_Back_BIC_Data2 <- median(Back_BIC_Data2)
med_Back_R2_Data2 <- median(Back_R2_Data2)
Back_row2 <- cbind("Backward Selection", mean_Back_AIC_Data2,
mean_Back_BIC_Data2,
mean_Back_R2_Data2,
med_Back_AIC_Data2,
med_Back_BIC_Data2,
med_Back_R2_Data2)
#Underlying Model 3
Back_AIC_Data3 <- vector(length = n_simul3)
Back_BIC_Data3 <- vector(length = n_simul3)
Back_R2_Data3 <- vector(length = n_simul3)
for(i in 1:n_simul3){
sig3 <- lm(Y~.,data=data3[[i]])
Back_fit3 <- ols_step_backward_p(sig3, prem = 0.1)
Back_AIC_Data3[i] <- min(Back_fit3$aic)
Back_BIC_Data3[i] <- min(Back_fit3$sbc)
Back_R2_Data3[i] <- max(Back_fit3$adjr)
}
mean_Back_AIC_Data3 <- mean(Back_AIC_Data3)
mean_Back_BIC_Data3 <- mean(Back_BIC_Data3)
mean_Back_R2_Data3 <- mean(Back_R2_Data3)
med_Back_AIC_Data3 <- median(Back_AIC_Data3)
med_Back_BIC_Data3 <- median(Back_BIC_Data3)
med_Back_R2_Data3 <- median(Back_R2_Data3)
Back_row3 <- cbind("Backward Selection", mean_Back_AIC_Data3,
mean_Back_BIC_Data3,
mean_Back_R2_Data3,
med_Back_AIC_Data3,
med_Back_BIC_Data3,
med_Back_R2_Data3)We now outline the Stepwise Selection technnique, fit Stepwise Selection Regression models with an entry cutoff p-value of \(0.15\) and a removal cutoff p-value of \(0.15\), and measure model performances. According to Hocking (1976) the Stepwise Selection technique is a combination of both Forward Selection and Backward Elimination such that variables are added as in Forward Selection, but then after adding a variable, all variables in the equation can be removed as in Backward Elimination.
R-Implementation:
#Underlying Model 1
Step_AIC_Data1 <- vector(length = n_simul1)
Step_BIC_Data1 <- vector(length = n_simul1)
Step_R2_Data1 <- vector(length = n_simul1)
for(i in 1:n_simul1){
sig1 <- lm(Y~.,data=data1[[i]])
Step_fit1 <- ols_step_both_p(sig1, pent = 0.15, prem = 0.15)
Step_AIC_Data1[i] <- min(Step_fit1$aic)
Step_BIC_Data1[i] <- min(Step_fit1$sbc)
Step_R2_Data1[i] <- max(Step_fit1$adjr)
}
mean_Step_AIC_Data1 <- mean(Step_AIC_Data1)
mean_Step_BIC_Data1 <- mean(Step_BIC_Data1)
mean_Step_R2_Data1 <- mean(Step_R2_Data1)
med_Step_AIC_Data1 <- median(Step_AIC_Data1)
med_Step_BIC_Data1 <- median(Step_BIC_Data1)
med_Step_R2_Data1 <- median(Step_R2_Data1)
Step_row1 <- cbind("Stepwise Selection", mean_Step_AIC_Data1,
mean_Step_BIC_Data1,
mean_Step_R2_Data1,
med_Step_AIC_Data1,
med_Step_BIC_Data1,
med_Step_R2_Data1)
#Underlying Model 2
Step_AIC_Data2 <- vector(length = n_simul2)
Step_BIC_Data2 <- vector(length = n_simul2)
Step_R2_Data2 <- vector(length = n_simul2)
for(i in 1:n_simul2){
sig2 <- lm(Y~.,data=data2[[i]])
Step_fit2 <- ols_step_both_p(sig2, pent = 0.15, prem = 0.15)
Step_AIC_Data2[i] <- min(Step_fit2$aic)
Step_BIC_Data2[i] <- min(Step_fit2$sbc)
Step_R2_Data2[i] <- max(Step_fit2$adjr)
}
mean_Step_AIC_Data2 <- mean(Step_AIC_Data2)
mean_Step_BIC_Data2 <- mean(Step_BIC_Data2)
mean_Step_R2_Data2 <- mean(Step_R2_Data2)
med_Step_AIC_Data2 <- median(Step_AIC_Data2)
med_Step_BIC_Data2 <- median(Step_BIC_Data2)
med_Step_R2_Data2 <- median(Step_R2_Data2)
Step_row2 <- cbind("Stepwise Selection", mean_Step_AIC_Data2,
mean_Step_BIC_Data2,
mean_Step_R2_Data2,
med_Step_AIC_Data2,
med_Step_BIC_Data2,
med_Step_R2_Data2)
#Underlying Model 3
Step_AIC_Data3 <- vector(length = n_simul3)
Step_BIC_Data3 <- vector(length = n_simul3)
Step_R2_Data3 <- vector(length = n_simul3)
for(i in 1:n_simul3){
sig3 <- lm(Y~.,data=data3[[i]])
Step_fit3 <- ols_step_both_p(sig3, pent = 0.15, prem = 0.15)
Step_AIC_Data3[i] <- min(Step_fit3$aic)
Step_BIC_Data3[i] <- min(Step_fit3$sbc)
Step_R2_Data3[i] <- max(Step_fit3$adjr)
}
mean_Step_AIC_Data3 <- mean(Step_AIC_Data3)
mean_Step_BIC_Data3 <- mean(Step_BIC_Data3)
mean_Step_R2_Data3 <- mean(Step_R2_Data3)
med_Step_AIC_Data3 <- median(Step_AIC_Data3)
med_Step_BIC_Data3 <- median(Step_BIC_Data3)
med_Step_R2_Data3 <- median(Step_R2_Data3)
Step_row3 <- cbind("Stepwise Selection", mean_Step_AIC_Data3,
mean_Step_BIC_Data3,
mean_Step_R2_Data3,
med_Step_AIC_Data3,
med_Step_BIC_Data3,
med_Step_R2_Data3)Results Tables:
Table_Data1 <- data.frame(rbind(OLS_row1,LASSO_row1,Rid_row1,All_row1
,For_row1,Back_row1,Step_row1))
colnames(Table_Data1) <- c("Method","Mean AIC","Mean BIC","Mean Adj. R-Squared","Median AIC","Median BIC","Median Adj. R-Squared")
Table_Data2 <- data.frame(rbind(OLS_row2,LASSO_row2,Rid_row2,All_row2
,For_row2,Back_row2,Step_row2))
colnames(Table_Data2) <- c("Method","Mean AIC","Mean BIC","Mean Adj. R-Squared","Median AIC","Median BIC","Median Adj. R-Squared")
Table_Data3 <- data.frame(rbind(OLS_row3,LASSO_row3,Rid_row3,All_row3
,For_row3,Back_row3,Step_row3))
colnames(Table_Data3) <- c("Method","Mean AIC","Mean BIC","Mean Adj. R-Squared","Median AIC","Median BIC","Median Adj. R-Squared")
Table_Data1[, c(2:ncol(Table_Data1))] <- sapply(Table_Data1[, c(2:ncol(Table_Data1))], as.numeric)
Table_Data1[, c(2:ncol(Table_Data1))] <- round(Table_Data1[, c(2:ncol(Table_Data1))],2)
Table_Data2[, c(2:ncol(Table_Data2))] <- sapply(Table_Data2[, c(2:ncol(Table_Data2))], as.numeric)
Table_Data2[, c(2:ncol(Table_Data2))] <- round(Table_Data2[, c(2:ncol(Table_Data2))],2)
Table_Data3[, c(2:ncol(Table_Data3))] <- sapply(Table_Data3[, c(2:ncol(Table_Data3))], as.numeric)
Table_Data3[, c(2:ncol(Table_Data3))] <- round(Table_Data3[, c(2:ncol(Table_Data3))],2)
Table_Full <- cbind(Table_Data1, Table_Data2[,-1], Table_Data3[,-1])
Table_Full[, c(2:ncol(Table_Full))] <- sapply(Table_Full[, c(2:ncol(Table_Full))], as.numeric)
Table_Full[, c(2:ncol(Table_Full))] <- round(Table_Full[, c(2:ncol(Table_Full))],2)
kable(Table_Data1) %>%
kable_styling("striped") %>%
add_header_above(c(" " = 1, "Underlying Model 1" = 6))
kable(Table_Data2) %>%
kable_styling("striped") %>%
add_header_above(c(" " = 1, "Underlying Model 2" = 6))
kable(Table_Data3) %>%
kable_styling("striped") %>%
add_header_above(c(" " = 1, "Underlying Model 3" = 6))| Method | Mean AIC | Mean BIC | Mean Adj. R-Squared | Median AIC | Median BIC | Median Adj. R-Squared |
|---|---|---|---|---|---|---|
| OLS | 107.52 | 117.48 | 0.70 | 109.41 | 119.37 | 0.71 |
| LASSO | -471.70 | -476.46 | 0.82 | -441.02 | -446.14 | 0.83 |
| Ridge | -466.06 | -471.19 | 0.82 | -436.67 | -441.80 | 0.83 |
| All-Possible Regression | 102.26 | 107.93 | 0.75 | 102.66 | 108.30 | 0.76 |
| Forward Selection | 102.35 | 108.10 | 0.75 | 102.66 | 108.50 | 0.76 |
| Backward Selection | 102.32 | 108.09 | 0.75 | 102.66 | 108.33 | 0.76 |
| Stepwise Selection | 103.18 | 108.20 | 0.72 | 103.70 | 108.30 | 0.73 |
| Method | Mean AIC | Mean BIC | Mean Adj. R-Squared | Median AIC | Median BIC | Median Adj. R-Squared |
|---|---|---|---|---|---|---|
| OLS | 102.19 | 112.15 | 0.67 | 104.08 | 114.04 | 0.68 |
| LASSO | -331.91 | -336.64 | 0.81 | -299.24 | -304.37 | 0.82 |
| Ridge | -329.21 | -334.34 | 0.80 | -297.19 | -302.31 | 0.81 |
| All-Possible Regression | 96.73 | 102.50 | 0.73 | 98.26 | 103.62 | 0.73 |
| Forward Selection | 97.28 | 103.30 | 0.72 | 98.59 | 104.13 | 0.72 |
| Backward Selection | 96.82 | 102.72 | 0.73 | 98.27 | 103.75 | 0.73 |
| Stepwise Selection | 97.99 | 103.24 | 0.70 | 98.83 | 103.90 | 0.71 |
| Method | Mean AIC | Mean BIC | Mean Adj. R-Squared | Median AIC | Median BIC | Median Adj. R-Squared |
|---|---|---|---|---|---|---|
| OLS | 84.53 | 94.49 | 0.89 | 86.26 | 96.21 | 0.90 |
| LASSO | -480.89 | -485.85 | 0.94 | -449.17 | -454.30 | 0.94 |
| Ridge | -473.97 | -479.09 | 0.93 | -442.46 | -447.58 | 0.93 |
| All-Possible Regression | 77.63 | 82.14 | 0.91 | 78.87 | 83.43 | 0.92 |
| Forward Selection | 77.70 | 82.22 | 0.91 | 78.87 | 83.43 | 0.92 |
| Backward Selection | 77.73 | 82.29 | 0.91 | 79.05 | 83.59 | 0.92 |
| Stepwise Selection | 78.31 | 82.23 | 0.91 | 79.25 | 83.43 | 0.91 |
AIC_df <- data.frame(
OLS = c(OLS_AIC_Data1, OLS_AIC_Data2, OLS_AIC_Data3),
LASSO = c(LASSO_AIC_Data1, LASSO_AIC_Data2, LASSO_AIC_Data3),
Ridge = c(Rid_AIC_Data1, Rid_AIC_Data2, Rid_AIC_Data3),
`All Possible Regression` = c(unlist(All_AIC_Data1), unlist(All_AIC_Data2), unlist(All_AIC_Data3)),
`Forward Selection` = c(unlist(For_AIC_Data1), unlist(For_AIC_Data2), unlist(For_AIC_Data3)),
`Backward Elimination` = c(unlist(Back_AIC_Data1), unlist(Back_AIC_Data2), unlist(Back_AIC_Data3)),
`Stepwise Selection` = c(unlist(Step_AIC_Data1), unlist(Step_AIC_Data2),unlist(Step_AIC_Data3))
)
gathered_data <- gather(AIC_df, key = "key", value = "value")
colnames(gathered_data) <- c("key","value")
gathered_data %>%
ggplot( aes(reorder(key, value, FUN = median, .desc =TRUE), y=value, fill=key)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.5) +
geom_jitter(color="black", size=0.4, alpha=0.8) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)) +
coord_flip() +
labs(title = "AIC Distribution Over All Models",
x="Variable Selection Technique",
y="AIC")BIC_df <- data.frame(
OLS = c(OLS_BIC_Data1, OLS_BIC_Data2, OLS_BIC_Data3),
LASSO = c(LASSO_BIC_Data1, LASSO_BIC_Data2, LASSO_BIC_Data3),
Ridge = c(Rid_BIC_Data1, Rid_BIC_Data2, Rid_BIC_Data3),
`All Possible Regression` = c(unlist(All_BIC_Data1), unlist(All_BIC_Data2), unlist(All_BIC_Data3)),
`Forward Selection` = c(unlist(For_BIC_Data1), unlist(For_BIC_Data2), unlist(For_BIC_Data3)),
`Backward Elimination` = c(unlist(Back_BIC_Data1), unlist(Back_BIC_Data2), unlist(Back_BIC_Data3)),
`Stepwise Selection` = c(unlist(Step_BIC_Data1), unlist(Step_BIC_Data2),unlist(Step_BIC_Data3))
)
gathered_data <- gather(BIC_df, key = "key", value = "value")
colnames(gathered_data) <- c("key","value")
gathered_data %>%
ggplot( aes(reorder(key, value, FUN = median, .desc =TRUE), y=value, fill=key)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.5) +
geom_jitter(color="black", size=0.4, alpha=0.8) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)) +
coord_flip() +
labs(title = "BIC Distribution Over All Models",
x="Variable Selection Technique",
y="BIC")R2_df <- data.frame(
OLS = c(OLS_R2_Data1, OLS_R2_Data2, OLS_R2_Data3),
LASSO = c(LASSO_R2_Data1, LASSO_R2_Data2, LASSO_R2_Data3),
Ridge = c(Rid_R2_Data1, Rid_R2_Data2, Rid_R2_Data3),
`All Possible Regression` = c(unlist(All_R2_Data1), unlist(All_R2_Data2), unlist(All_R2_Data3)),
`Forward Selection` = c(unlist(For_R2_Data1), unlist(For_R2_Data2), unlist(For_R2_Data3)),
`Backward Elimination` = c(unlist(Back_R2_Data1), unlist(Back_R2_Data2), unlist(Back_R2_Data3)),
`Stepwise Selection` = c(unlist(Step_R2_Data1), unlist(Step_R2_Data2),unlist(Step_R2_Data3))
)
gathered_data <- gather(R2_df, key = "key", value = "value")
colnames(gathered_data) <- c("key","value")
gathered_data %>%
ggplot( aes(reorder(key, value, FUN = median, .desc =TRUE), y=value, fill=key)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.5) +
geom_jitter(color="black", size=0.4, alpha=0.8) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)) +
coord_flip() +
labs(title ="Adjusted R-Squared Distribution Over All Models",
x="Variable Selection Technique",
y=expression(paste('Adjusted',R^2)))Table_Full_New <- rbind(Table_Full[,c(1,2,3,5,6)],Table_Full[,c(1,8,9,11,12)],Table_Full[,c(1,14,15,17,18)])
gathered_Full <- gather(Table_Full_New , key = "key", value = "value",-Method)
colnames(gathered_Full) <- c("method","metric","value")
ggplot(data = gathered_Full , aes(x=method, y=value)) +
geom_boxplot(aes(fill=metric)) +
facet_wrap( ~ method, scales="free") +
theme_light()Table_Full_New <- rbind(Table_Full[,c(1,4,7)],Table_Full[,c(1,10,13)],Table_Full[,c(1,16,19)])
gathered_Full <- gather(Table_Full_New , key = "key", value = "value",-Method)
colnames(gathered_Full) <- c("method","metric","value")
ggplot(data = gathered_Full , aes(x=method, y=value)) +
geom_boxplot(aes(fill=metric)) +
facet_wrap( ~ method, scales="free") +
theme_light()AIC_ByMod_df <- data.frame(
'Model-1' = c(OLS_AIC_Data1, LASSO_AIC_Data1, Rid_AIC_Data1,unlist(All_AIC_Data1),unlist(Back_AIC_Data1),unlist(Step_AIC_Data1)),
'Model-2' = c(OLS_AIC_Data2, LASSO_AIC_Data2, Rid_AIC_Data2,unlist(All_AIC_Data2),unlist(Back_AIC_Data2),unlist(Step_AIC_Data2)),
'Model-3' = c(OLS_AIC_Data3, LASSO_AIC_Data3, Rid_AIC_Data3,unlist(All_AIC_Data3),unlist(Back_AIC_Data3),unlist(Step_AIC_Data3))
)
BIC_ByMod_df <- data.frame(
'Model-1' = c(OLS_BIC_Data1, LASSO_BIC_Data1, Rid_BIC_Data1,unlist(All_BIC_Data1),unlist(Back_BIC_Data1),unlist(Step_BIC_Data1)),
'Model-2' = c(OLS_BIC_Data2, LASSO_BIC_Data2, Rid_BIC_Data2,unlist(All_BIC_Data2),unlist(Back_BIC_Data2),unlist(Step_BIC_Data2)),
'Model-3' = c(OLS_BIC_Data3, LASSO_BIC_Data3, Rid_BIC_Data3,unlist(All_BIC_Data3),unlist(Back_BIC_Data3),unlist(Step_BIC_Data3))
)
R2_ByMod_df <- data.frame(
'Model-1' = c(OLS_R2_Data1, LASSO_R2_Data1, Rid_R2_Data1,unlist(All_R2_Data1),unlist(Back_R2_Data1),unlist(Step_R2_Data1)),
'Model-2' = c(OLS_R2_Data2, LASSO_R2_Data2, Rid_R2_Data2,unlist(All_R2_Data2),unlist(Back_R2_Data2),unlist(Step_R2_Data2)),
'Model-3' = c(OLS_R2_Data3, LASSO_R2_Data3, Rid_R2_Data3,unlist(All_R2_Data3),unlist(Back_R2_Data3),unlist(Step_R2_Data3))
)
AIC_ByMod_gather <- gather(AIC_ByMod_df, key = "key", value = "value")
colnames(AIC_ByMod_gather) <- c("key","value")
BIC_ByMod_gather <- gather(BIC_ByMod_df, key = "key", value = "value")
colnames(BIC_ByMod_gather) <- c("key","value")
R2_ByMod_gather <- gather(R2_ByMod_df, key = "key", value = "value")
colnames(R2_ByMod_gather) <- c("key","value")
AIC_ByMod_gather %>%
ggplot( aes(reorder(key, value, FUN = median, .desc =TRUE), y=value, fill=key)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.5) +
geom_jitter(color="black", size=0.4, alpha=0.8) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)) +
coord_flip() +
labs(title = "AIC Distribution By Model",
x="Variable Selection Technique",
y="AIC")BIC_ByMod_gather %>%
ggplot( aes(reorder(key, value, FUN = median, .desc =TRUE), y=value, fill=key)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.5) +
geom_jitter(color="black", size=0.4, alpha=0.8) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)) +
coord_flip() +
labs(title = "BIC Distribution By Model",
x="Variable Selection Technique",
y="BIC")R2_ByMod_gather %>%
ggplot( aes(reorder(key, value, FUN = median, .desc =TRUE), y=value, fill=key)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.5) +
geom_jitter(color="black", size=0.4, alpha=0.8) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)) +
coord_flip() +
labs(title = "R-Squared Distribution By Model",
x="Variable Selection Technique",
y=expression(paste(R^2)))The observation that LASSO and Ridge outperform the stepwise methods (and OLS), and also that LASSO outperforms Ridge is in line with the recommendations of by Hastie, Tibshirani, and Friedman (2009). Furthermore, there seem to be more modern variable selection methods that outperform LASSO and Ridge (that could be considered in a future project), which follow a mixed integer optimization approach to choosing best subsets, as outlined in a paper by Bertsimas, King, Mazumder (2015).
I would like to thank Professor Mihye Ahn for the recommendation of doing such a personal project to supplement her regression and linear models course.
Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society (Series B), 58, 267-288.
Hocking, R. (1976). A Biometrics Invited Paper. The Analysis and Selection of Variables in Linear Regression. Biometrics, 32, 1-49.
Portet S. (2020). A primer on model selection using the Akaike Information Criterion. Infectious Disease Modelling, 5, 111–128.
Dziak, J. J., Coffman, D. L., Lanza, S. T., Li, R., & Jermiin, L. S. (2020). Sensitivity and specificity of information criteria. Briefings in bioinformatics, 21(2), 553–565.
Bertsimas, D., King, A., & Mazumder, R. (2015). Best Subset Selection via a Modern Optimization Lens. arXiv: Methodology.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer.