So far, we’ve learned how to create multiple linear regression models and compare them to each other using the general linear F-test to decide which one to keep.
Analysis of Variance Table
Model 1: TuG ~ Age
Model 2: TuG ~ Age + Weight + MoCA + Fear_falling
Res.Df RSS Df Sum of Sq F Pr(>F)
1 42 248.39
2 39 211.36 3 37.029 2.2775 0.09475 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the above analyses we can tell that it would be better to keep the full model over the reduced. But can we be certain that the full model is the best model to use in this case?
Model Selection Approaches
One way that we can be certain if we have the best model or not is to compare some sort of metric ( \(p\), \(r^2\), etc.) across all combinations of predictors to see which one has the best one. For example if we had two predictors— \(x_1\) and \(x_2\) —then these are the models that we could compare.
No predictors
\(x_1\) only
\(x_2\) only
\(x_1\) and \(x_2\)
Given that there are only four possible models, it would be easy to compare all of them by hand. But what if we had three predictors? four predictors? even more?
olsrr
For this section, we will primarily be working with functions from the olsrr package. Install it in R then import it.
library(olsrr)
All Possible Regression
Fortunately, R has a way to automatically compare every possible model for us. This is called all possible regression and can be done using the ols_step_all_possible() function.
This allows to automatically generate the \(r^2\) , \(r^2_{adj}\) , and Mallow’s \(C_p\) for all combinations of predictors ordered from least number to greatest.
Choosing a Criterion
Let’s use what we just learned about sorting a dataframe to easily find the best combination of predictors.
result <-ols_step_all_possible(model_f)$resultresult[order(result$adjr, decreasing =TRUE), c("predictors", "adjr")]
predictors adjr
7 Age Fear_falling 0.32064670
13 Age MoCA Fear_falling 0.31720741
12 Age Weight Fear_falling 0.31489686
15 Age Weight MoCA Fear_falling 0.31420767
1 Age 0.25163122
6 Age MoCA 0.24840426
11 Age Weight MoCA 0.24746266
5 Age Weight 0.24741500
10 MoCA Fear_falling 0.18316801
14 Weight MoCA Fear_falling 0.16817562
4 Fear_falling 0.13872573
9 Weight Fear_falling 0.11886454
3 MoCA 0.06148997
8 Weight MoCA 0.04549678
2 Weight -0.02234858
We could also sort by other criterion as well, such as the akaike information criterion
result <-ols_step_all_possible(model_f)$resultresult[order(result$aic), c("predictors", "aic")]
predictors aic
7 Age Fear_falling 203.7052
13 Age MoCA Fear_falling 204.8409
12 Age Weight Fear_falling 204.9895
15 Age Weight MoCA Fear_falling 205.9198
1 Age 207.0227
6 Age MoCA 208.1517
5 Age Weight 208.2096
11 Age Weight MoCA 209.1203
10 MoCA Fear_falling 211.8140
4 Fear_falling 213.2054
14 Weight MoCA Fear_falling 213.5278
9 Weight Fear_falling 215.1483
3 MoCA 216.9842
8 Weight MoCA 218.6674
2 Weight 220.7490
Best Subset Regression
all possible gives us a lot of options but oftentimes we only care about the best one for each number of predictors. olsrr has another function that does just this.
ols_step_best_subset(model_f)
Best Subsets Regression
-------------------------------------------
Model Index Predictors
-------------------------------------------
1 Age
2 Age Fear_falling
3 Age MoCA Fear_falling
4 Age Weight MoCA Fear_falling
-------------------------------------------
Subsets Regression Summary
--------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
--------------------------------------------------------------------------------------------------------------------------------
1 0.2690 0.2516 0.18 5.8324 207.0227 81.9930 212.3752 260.2336 6.1829 0.1442 0.8006
2 0.3522 0.3206 0.2197 2.6150 203.7052 79.3247 210.8419 236.3752 5.7347 0.1342 0.7425
3 0.3648 0.3172 0.1718 3.8250 204.8409 80.7910 213.7618 237.7204 5.8864 0.1384 0.7622
4 0.3780 0.3142 0.113 5.0000 205.9198 82.3024 216.6249 238.9219 6.0354 0.1426 0.7815
--------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa's Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate normality
FPE: Final Prediction Error
HSP: Hocking's Sp
APC: Amemiya Prediction Criteria
Although this output is fairly readable as is, we can also sort by whatever criteria we want
result <-ols_step_best_subset(model_f)$metricsresult[order(result$adjr), c("predictors", "adjr")]
predictors adjr
1 Age 0.2516312
4 Age Weight MoCA Fear_falling 0.3142077
3 Age MoCA Fear_falling 0.3172074
2 Age Fear_falling 0.3206467
Automatic Model Selection
All possible and best subset is good for selecting models when you only have a few predictors, but what if you have a dataset with much larger numbers of predictors?
Forward selection is a stepwise method to build a predictive model by starting with no independent variables. At each step, the algorithm evaluates all potential predictors and adds the one that most improves the model, based on criteria like the lowest p-value or the greatest increase in adjusted R². This process continues until no remaining variables significantly improve the model.
This code will run forward selection on our model to produce the best model to predict TuG.
ols_step_forward_p(model)
Stepwise Summary
-----------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-----------------------------------------------------------------------------
0 Base Model 218.812 222.380 92.302 0.00000 0.00000
1 Balance 192.134 197.486 66.811 0.47888 0.46647
2 Sway 189.778 196.915 64.819 0.52800 0.50497
3 Age 187.654 196.575 63.425 0.57022 0.53799
4 Weight 186.840 197.545 63.454 0.59685 0.55550
5 Stability 186.398 198.888 64.059 0.61861 0.56843
6 Concern_falling 186.965 201.238 65.589 0.63084 0.57097
-----------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.794 RMSE 1.689
R-Squared 0.631 MSE 2.851
Adj. R-Squared 0.571 Coef. Var 17.252
Pred R-Squared 0.337 AIC 186.965
MAE 1.396 SBC 201.238
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
-------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-------------------------------------------------------------------
Regression 214.368 6 35.728 10.538 0.0000
Residual 125.446 37 3.390
Total 339.814 43
-------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------
(Intercept) 27.774 9.958 2.789 0.008 7.598 47.951
Balance -0.464 0.133 -0.442 -3.484 0.001 -0.734 -0.194
Sway -0.611 0.230 -0.295 -2.657 0.012 -1.078 -0.145
Age 0.103 0.048 0.255 2.134 0.040 0.005 0.201
Weight 0.038 0.021 0.209 1.840 0.074 -0.004 0.080
Stability -0.021 0.014 -0.162 -1.488 0.145 -0.049 0.008
Concern_falling 0.074 0.067 0.119 1.107 0.275 -0.062 0.210
--------------------------------------------------------------------------------------------
Backward Elimination
Backward elimination starts with a full model that includes all candidate predictors. At each step, the variable that contributes the least—typically the one with the highest p-value—is removed. This process repeats until all remaining variables contribute effectively to the model.
ols_step_backward_p(model)
Stepwise Summary
----------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
----------------------------------------------------------------------------
0 Full Model 194.752 223.299 84.867 0.69370 0.54583
1 Falls 192.752 219.515 81.833 0.69369 0.56096
2 PPID 190.974 215.952 78.837 0.69215 0.57298
3 Height 189.282 212.476 75.895 0.68998 0.58342
4 MoCA 188.276 209.686 73.219 0.68290 0.58681
5 Conc_Mvmt_Proc 187.173 206.799 70.646 0.67637 0.59070
----------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.822 RMSE 1.581
R-Squared 0.676 MSE 2.499
Adj. R-Squared 0.591 Coef. Var 16.851
Pred R-Squared 0.336 AIC 187.173
MAE 1.305 SBC 206.799
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------
Regression 229.840 9 25.538 7.895 0.0000
Residual 109.974 34 3.235
Total 339.814 43
------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------
(Intercept) 11.065 12.735 0.869 0.391 -14.815 36.944
Age 0.147 0.052 0.364 2.808 0.008 0.041 0.253
Sex 1.087 0.876 0.178 1.240 0.223 -0.694 2.867
Weight 0.052 0.025 0.283 2.058 0.047 0.001 0.102
Balance -0.329 0.145 -0.313 -2.272 0.030 -0.623 -0.035
Concern_falling 0.132 0.076 0.211 1.730 0.093 -0.023 0.287
Fear_falling 0.027 0.016 0.238 1.682 0.102 -0.006 0.060
Balance_confidence 0.057 0.028 0.480 2.035 0.050 0.000 0.113
Sway -0.459 0.235 -0.222 -1.951 0.059 -0.938 0.019
Stability -0.051 0.023 -0.398 -2.262 0.030 -0.097 -0.005
------------------------------------------------------------------------------------------------
Understanding Forward and Backwards Selection
As is, the output may be a bit hard to understand. Let’s visualize both of these methods to see what it is giving us.
First, forward selection:
plot(ols_step_forward_aic(model))
Backward selection:
plot(ols_step_backward_aic(model))
Drawbacks
We show you how to do forward and backward model selection so you have an idea of what is used in the field.
However, recently these methods are looked down upon because they may be misused for confirmatory data analysis when they should only be used for exploratory data analysis.