Species | TotalSleep | BodyWt | LNBodyWt | BrainWt | LNBrainWt | LifeSpan | LNLifeSpan | Gestation | PredF | ExposF | DangrF |
---|---|---|---|---|---|---|---|---|---|---|---|
Africangiantpouchedrat | 8.3 | 1.00 | 0.00 | 6.6 | 1.89 | 4.5 | 1.50 | 42 | 3 | 1 | 3 |
Americanopossum | 19.4 | 1.70 | 0.53 | 6.3 | 1.84 | 5.0 | 1.61 | 12 | 2 | 1 | 1 |
ArcticFox | 12.5 | 3.39 | 1.22 | 44.5 | 3.80 | 14.0 | 2.64 | 60 | 1 | 1 | 1 |
Baboon | 9.8 | 10.55 | 2.36 | 179.5 | 5.19 | 27.0 | 3.30 | 180 | 4 | 4 | 4 |
Bigbrownbat | 19.7 | 0.02 | -3.77 | 0.3 | -1.20 | 19.0 | 2.94 | 35 | 1 | 1 | 1 |
Braziliantapir | 6.2 | 160.00 | 5.08 | 169.0 | 5.13 | 30.4 | 3.41 | 392 | 4 | 5 | 4 |
Cat | 14.5 | 3.30 | 1.19 | 25.6 | 3.24 | 28.0 | 3.33 | 63 | 1 | 2 | 1 |
Chimpanzee | 9.7 | 52.16 | 3.95 | 440.0 | 6.09 | 50.0 | 3.91 | 230 | 1 | 1 | 1 |
Chinchilla | 12.5 | 0.43 | -0.86 | 6.4 | 1.86 | 7.0 | 1.95 | 112 | 5 | 4 | 4 |
Cow | 3.9 | 465.00 | 6.14 | 423.0 | 6.05 | 30.0 | 3.40 | 281 | 5 | 5 | 5 |
BUA 345 - Lecture 17
More about Model Selection
Housekeeping
Upcoming Dates
HW 7 is available and is due on Wednesday, 3/19.
- Demo videos were posted last week.
HW 8 (Parts 1 and 2) are posted and due on Wednesday (3/26)
Part 1 of HW 8 can be completed after today’s lecture.
Part 2 of HW 8 pertains to Thursday’s lecture on Logistic Regression
Quiz 2 will be on 4/1/2025 in the classroom.
Date has changed and syllabus has been updated.
Practice Questions will be posted this weekend.
More Housekeeping
This Week’s Plan
Today (Tuesday 3/18)
Quick review of model selection concepts
Best Subsets method
measures of model fit
Thursday 3/20
Logistic Regression of binary response data
In-class Polling (Session ID: bua345s25)
Lecture 17 In-class Exercises - Q1
Review Question from Week 8 and HW 7.
If two predictor variables (X variables) in a model have a correlation of 0.85, what do you conclude?
Review of Animals Data
Question: What factors affect a mammal’s sleep duration?
Animals Data Notes:
Population was limited to animals under 1000 pounds (two elephant species excluded).
Natural log (LN) transformed variables were added to original data.
Observations with missing values are removed below
Working dataset has 49 observations (49 different species)
Animals Data
Animals Data Dictionary - Description of Variables
Intuitvely, there is likely to be redundancy between Predation
, Exposure
, and Danger
.
Variable | Type | Description |
---|---|---|
Species | Nominal | Name of Species |
TotalSleep | Quantitative | Total Sleep |
BodyWt | Quantitative | Average Body Weight in kilograms |
LNBodyWt | Quantitative | Natural Log of Body Weight |
BrainWt | Quantitative | Average Brain Weight in grams |
LNBrainWt | Quantitative | Natural Log of Brain Weight |
LifeSpan | Quantitative | Maximum Life Span in years |
LNLifeSpan | Quantitative | Natural Log of Life Span |
Gestation | Quantitative | Gestation Time in days |
PredF | Ordinal | Predation Index (1=least likely to be prey) |
ExposF | Ordinal | Sleep Exposure Index (1=least exposed) |
DangrF | Ordinal | Overall Danger Index (1=least danger from other animals) |
Multicollinearity Concerns in Animals Dataset
LNBodyWt and LNBrainWt (R = 0.95):
- These two predictors can not both be in the final model.
LNBrainWt and LNLifeSpan (R = 0.79):
- These two predictors ideally should not both be in the final model.
Predation (PredF) and Danger (DangrF) (R = 0.95):
- These two predictors can not both be in the final model.
Exposure (ExposF) and Danger (DangrF) (R = 0.78):
- These two predictors ideally should not both be in the final model.
NOTE: Students should know the commands for creating a correlation matrix with rounded values.
- See HW 7 and next two slides
Correlation Matrix of Quantitative Animal Variables
Code
TotalSleep | LNBodyWt | LNBrainWt | LNLifeSpan | |
---|---|---|---|---|
TotalSleep | 1.00 | -0.56 | -0.57 | -0.37 |
LNBodyWt | -0.56 | 1.00 | 0.95 | 0.71 |
LNBrainWt | -0.57 | 0.95 | 1.00 | 0.79 |
LNLifeSpan | -0.37 | 0.71 | 0.79 | 1.00 |
Correlation Matrix of ordinal Variables
Backwards Elimination - Animal Data Final Model
Model Summary
---------------------------------------------------------------
R 0.857 RMSE 2.329
R-Squared 0.734 MSE 5.423
Adj. R-Squared 0.655 Coef. Var 25.223
Pred R-Squared 0.547 AIC 247.894
MAE 1.857 SBC 272.488
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------
Regression 734.163 11 66.742 9.294 0.0000
Residual 265.708 37 7.181
Total 999.871 48
------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------
(Intercept) 6.751 3.305 2.043 0.048 0.054 13.448
LNBodyWt -0.698 0.244 -0.442 -2.859 0.007 -1.192 -0.203
LNLifeSpan 2.855 1.133 0.591 2.519 0.016 0.559 5.151
Gestation -0.020 0.006 -0.447 -3.285 0.002 -0.032 -0.008
PredF2 13.998 4.132 0.041 3.388 0.002 5.626 22.369
PredF3 11.883 5.514 -0.494 2.155 0.038 0.711 23.056
PredF4 2.654 4.102 0.021 0.647 0.522 -5.658 10.966
PredF5 -0.782 4.262 -0.316 -0.183 0.855 -9.418 7.855
LNLifeSpan:PredF2 -5.367 1.478 -0.471 -3.632 0.001 -8.361 -2.373
LNLifeSpan:PredF3 -7.390 3.141 -0.588 -2.352 0.024 -13.755 -1.025
LNLifeSpan:PredF4 -0.941 1.356 -0.083 -0.694 0.492 -3.689 1.807
LNLifeSpan:PredF5 -1.043 1.446 -0.091 -0.721 0.475 -3.973 1.887
-----------------------------------------------------------------------------------------------
Model Selection Methods
Recall that in Multiple Linear Regression (MLR) the goal is to choose the simplest most accurate model, i.e. the ‘BEST’ set of independent variables
How do we decide which variables should be in our model?
There are many methods:
We’ve discussed Backward Elimination which can also be done manually in any software (not recommended).
Backward Elimination starts with all potential terms (including potential interaction terms) in the model and removes the least significant term for each step.
- This is referred to as starting with a full or saturated model.
Forward Selection: By default, this procedure starts with an empty model and adds the most significant term at each step until there are no more useful terms to add.
- Forward selection also needs to know what terms are in the full model.
Stepwise Selection: By default, this procedure starts with an empty model and then adds or removes a term for each step.
Lecture 17 In-class Exercises - Q2
Which model selection method is characterized by starting with NO (0) terms in the model and then adding terms one by one until no more terms added are significant to the model?
A
Backward Elimination
B
Stepwise Selection
C
Forward Selection
D
Adjusted \(R^2\)
Steps for Model Selection Using Multiple Methods
- Examine Matrix of Scatterplots and histograms and determine if any transformations are needed to linearize relationships between continuous predictors and response variable.
- Also look at correlation matrix to check if there are pairs of variables to be concerned about.
Create a ‘saturated’ model with all potential predictor variables and interaction terms (Subjective!).
Use Backward Elimination, Forward Selection, and Stepwise Selection to find preliminary candidate models. (These are automated procedures!)
- Carefully examine results to see where these candidate models agree and disagree.
Steps for Model Selection Using Multiple Methods Cont’d
- Examine predictors in preliminary candidate models to confirm they are not too highly correlated with each other.
- If two predictor variables in any model have a correlation of 0.8 or greater, drop one of them.
Rerun model selection methods, if a candidate model is substantially changed (not always needed).
Compare model fit statistics from final candidate model from all three methods.
Decide on final candidate and make final modifications, if needed.
Interpret final model and use for estimation.
Forward Selection of Animals Data
Full Model:
Forward Model Selection
Code
Stepwise Summary
-------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------------
0 Base Model 290.830 294.614 147.782 0.00000 0.00000
1 Gestation 268.856 274.532 123.817 0.38693 0.37388
2 DangrF 251.692 264.935 98.670 0.63316 0.59050
3 LNBrainWt 248.061 263.196 93.052 0.67298 0.62626
4 PredF 241.628 264.330 78.645 0.75641 0.69231
5 LNLifeSpan 233.996 258.589 69.041 0.79989 0.74039
6 LNLifeSpan:PredF 228.314 260.475 55.409 0.84864 0.77984
7 LNBodyWt 228.450 262.503 53.568 0.85429 0.78143
8 ExposF 229.245 270.865 46.411 0.87421 0.78437
-------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.935 RMSE 1.602
R-Squared 0.874 MSE 2.567
Adj. R-Squared 0.784 Coef. Var 19.948
Pred R-Squared -Inf AIC 229.245
MAE 1.168 SBC 270.865
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------
Regression 874.101 20 43.705 9.73 0.0000
Residual 125.769 28 4.492
Total 999.871 48
------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------
(Intercept) 6.184 2.897 2.135 0.042 0.250 12.118
Gestation -0.020 0.007 -0.454 -2.789 0.009 -0.035 -0.005
DangrF2 -6.733 1.877 -0.641 -3.587 0.001 -10.578 -2.888
DangrF3 -8.462 3.579 -0.655 -2.364 0.025 -15.793 -1.130
DangrF4 -8.780 4.650 -0.718 -1.888 0.069 -18.305 0.745
DangrF5 -20.146 6.095 -1.561 -3.305 0.003 -32.632 -7.661
LNBrainWt -0.180 0.684 -0.092 -0.264 0.794 -1.582 1.221
PredF2 14.954 3.672 1.462 4.072 0.000 7.431 22.477
PredF3 16.956 5.583 1.230 3.037 0.005 5.520 28.393
PredF4 11.583 5.230 0.897 2.215 0.035 0.871 22.295
PredF5 0.598 6.292 0.055 0.095 0.925 -12.290 13.486
LNLifeSpan 3.218 0.937 0.666 3.433 0.002 1.298 5.138
LNBodyWt -0.803 0.511 -0.508 -1.572 0.127 -1.848 0.243
ExposF2 -0.082 1.180 -0.008 -0.070 0.945 -2.499 2.335
ExposF3 0.481 1.723 0.029 0.279 0.782 -3.049 4.011
ExposF4 3.183 1.854 0.213 1.716 0.097 -0.615 6.981
ExposF5 4.951 4.042 0.405 1.225 0.231 -3.328 13.231
PredF2:LNLifeSpan -3.401 1.455 -0.810 -2.337 0.027 -6.381 -0.420
PredF3:LNLifeSpan -5.334 4.249 -0.603 -1.255 0.220 -14.037 3.370
PredF4:LNLifeSpan -1.707 1.767 -0.373 -0.966 0.342 -5.327 1.913
PredF5:LNLifeSpan 3.238 2.070 0.856 1.565 0.129 -1.002 7.478
------------------------------------------------------------------------------------------------
Final Forward (and Stepwise) Selection Model
Drop DangrF due to multicollinearity with PredF
Drop LNBrainWt due to multicollinearity with LNBodyWt
Leave in ExposF(?) and compare to Backward Elimination Model
Stepwise Selection arrived at same model as Forward Selection.
Model Summary
---------------------------------------------------------------
R 0.882 RMSE 2.131
R-Squared 0.777 MSE 4.543
Adj. R-Squared 0.676 Coef. Var 24.445
Pred R-Squared 0.407 AIC 247.220
MAE 1.729 SBC 279.381
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------
Regression 777.272 15 51.818 7.682 0.0000
Residual 222.599 33 6.745
Total 999.871 48
------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------
(Intercept) 7.151 3.217 2.223 0.033 0.605 13.696
LNBodyWt -0.796 0.251 -0.504 -3.167 0.003 -1.307 -0.285
LNLifeSpan 2.604 1.109 0.539 2.349 0.025 0.348 4.860
Gestation -0.014 0.007 -0.313 -2.102 0.043 -0.027 0.000
ExposF2 -2.416 1.272 -0.223 -1.899 0.066 -5.004 0.173
ExposF3 -1.237 1.905 -0.075 -0.649 0.521 -5.113 2.640
ExposF4 1.096 2.009 0.073 0.545 0.589 -2.991 5.182
ExposF5 -2.379 2.864 -0.195 -0.831 0.412 -8.206 3.448
PredF2 12.917 4.095 0.173 3.154 0.003 4.585 21.249
PredF3 14.428 5.566 -0.597 2.592 0.014 3.103 25.752
PredF4 1.813 4.012 -0.019 0.452 0.654 -6.349 9.974
PredF5 -1.068 4.590 -0.206 -0.233 0.817 -10.407 8.270
LNLifeSpan:PredF2 -4.405 1.530 -0.387 -2.880 0.007 -7.518 -1.293
LNLifeSpan:PredF3 -8.959 3.193 -0.712 -2.806 0.008 -15.454 -2.463
LNLifeSpan:PredF4 -0.814 1.418 -0.072 -0.574 0.570 -3.699 2.071
LNLifeSpan:PredF5 -0.458 1.795 -0.040 -0.255 0.800 -4.110 3.195
-----------------------------------------------------------------------------------------------
Comparing Model Results
Comparison Measures:
Adj. \(R^2\): Higher value indicates better model fit
C(p): Lower value indicates better model fit (Also referred to as Mallow’s C(p)).
AIC: Lower value indicates better model fit (Akaike Information Criteria).
RMSE: Lower value indicates better model fit (Root mean Square Error).
Decision is debatable but it seems worthwhile to include
ExposF
(Exposure).Same data and models are covered in HW 8 - Part 1.
Method | Adjusted_R2 | Mallows_Cp | AIC | RMSE |
---|---|---|---|---|
Backward Elimination | 0.655 | 23.527 | 247.894 | 2.329 |
Forward/Stepwise Selection | 0.676 | 23.654 | 247.220 | 2.131 |
Model Validation
How good is our model?
There are many ways to examine model fit.
Here are two straightforward ways:
- Check correlation between observed and estimated values
- Plot a scatterplot of observed and estimated values
Model Validation Plot (R = 0.88)
Wine Data - Model Selection Example
Can we determine what factors affect wine quality even if we KNOW NOTHING about wine cultivation and chemistry?
Maybe!
Since we have no prior knowledge, we start with a straightforward full model with all available predictors and no interactions.
- In practice, a consultant would be working with a wine expert to carefully determine a saturated model that includes all possible interactions.
Import Wine Data
Notice that all variables are numeric (<dbl>
stands for decimal value).
Code
Wine_Quality | Fixed_Acidity | Volatile_Acidity | Citric_Acidity | Residual_Sugar | Chlorides | Free_Sulphur_Dioxide | Total_Sulphur_Dioxide | Ph | Sulfate | Alcohol |
---|---|---|---|---|---|---|---|---|---|---|
5 | 9.3 | 0.48 | 0.29 | 2.1 | 0.127 | 6 | 16 | 3.22 | 0.72 | 11.2 |
6 | 9.1 | 0.22 | 0.24 | 2.1 | 0.078 | 1 | 28 | 3.41 | 0.87 | 10.3 |
7 | 7.9 | 0.34 | 0.36 | 1.9 | 0.065 | 5 | 10 | 3.27 | 0.54 | 11.2 |
5 | 7.2 | 1.00 | 0.00 | 3.0 | 0.102 | 7 | 16 | 3.43 | 0.46 | 10.0 |
7 | 11.9 | 0.43 | 0.66 | 3.1 | 0.109 | 10 | 23 | 3.15 | 0.85 | 10.4 |
5 | 7.2 | 0.49 | 0.24 | 2.2 | 0.070 | 5 | 36 | 3.33 | 0.48 | 9.4 |
Examine Correlation matrix for MultiCollinearity
Wine_Quality Fixed_Acidity Volatile_Acidity
Wine_Quality 1.00 0.11 -0.39
Fixed_Acidity 0.11 1.00 -0.23
Volatile_Acidity -0.39 -0.23 1.00
Citric_Acidity 0.22 0.68 -0.52
Residual_Sugar 0.04 0.20 -0.01
Chlorides -0.10 0.12 0.04
Free_Sulphur_Dioxide 0.01 -0.18 -0.05
Total_Sulphur_Dioxide -0.08 -0.13 0.05
Ph -0.06 -0.70 0.19
Sulfate 0.21 0.19 -0.24
Alcohol 0.45 -0.08 -0.17
Citric_Acidity Residual_Sugar Chlorides
Wine_Quality 0.22 0.04 -0.10
Fixed_Acidity 0.68 0.20 0.12
Volatile_Acidity -0.52 -0.01 0.04
Citric_Acidity 1.00 0.16 0.21
Residual_Sugar 0.16 1.00 0.05
Chlorides 0.21 0.05 1.00
Free_Sulphur_Dioxide -0.07 0.18 -0.04
Total_Sulphur_Dioxide 0.06 0.18 0.00
Ph -0.55 -0.14 -0.26
Sulfate 0.27 -0.01 0.35
Alcohol 0.10 0.07 -0.21
Free_Sulphur_Dioxide Total_Sulphur_Dioxide Ph Sulfate
Wine_Quality 0.01 -0.08 -0.06 0.21
Fixed_Acidity -0.18 -0.13 -0.70 0.19
Volatile_Acidity -0.05 0.05 0.19 -0.24
Citric_Acidity -0.07 0.06 -0.55 0.27
Residual_Sugar 0.18 0.18 -0.14 -0.01
Chlorides -0.04 0.00 -0.26 0.35
Free_Sulphur_Dioxide 1.00 0.65 0.08 0.00
Total_Sulphur_Dioxide 0.65 1.00 -0.07 0.08
Ph 0.08 -0.07 1.00 -0.24
Sulfate 0.00 0.08 -0.24 1.00
Alcohol -0.03 -0.08 0.21 0.05
Alcohol
Wine_Quality 0.45
Fixed_Acidity -0.08
Volatile_Acidity -0.17
Citric_Acidity 0.10
Residual_Sugar 0.07
Chlorides -0.21
Free_Sulphur_Dioxide -0.03
Total_Sulphur_Dioxide -0.08
Ph 0.21
Sulfate 0.05
Alcohol 1.00
[1] 0.68
[1] -0.7
Model Selection
We specify a full model using an easy shortcut:
If all variables are included, you can use
.
instead of listing them all.This model specification is also used in HW 7.
The we do three model selection procedures:
- Backward Elimination (BE)
- Forward Selection (FS)
- Stepwise Selection (SS)
Code
```{r specify full model, echo=T}
wine_full <- lm(Wine_Quality ~ ., data = wine) # specify full model
wine_BE <- ols_step_backward_p(wine_full, progress=F) # backward elimination
wine_FS <- ols_step_forward_p(wine_full, progress=F) # forward selection
wine_SS <- ols_step_both_p(wine_full, progress=F) # stepwise selection
```
Comparing Model Results
Look at the LAST step for each method to determine which method results in the best fit.
Comparison Measures:
Adj. \(R^2\): Higher value indicates better model fit
C(p): Lower value indicates better model fit (Also referred to as Mallow’s C(p)).
AIC: Lower value indicates better model fit (Akaike Information Criteria).
RMSE: Lower value indicates better model fit (Root mean Square Error).
By comparing these measures and accounting for our understanding of these procedures, we can determine that TWO of these methods arrived at the same model.
Lecture 17 In-class Exercises - Q3
Session ID: bua345s24
Which two model selection methods arrived at the same model for the wine data?
- On the next few slides I will show pairs of stepwise summaries so you can compare them.
Backwards Elimination and Forward Selection
Backward Elimination
Forward Selection
Backwards Elimination and Stepwise Selection
Backward Elimination
Stepwise Selection
Forward Selection and Stepwise Selection
Forward Selection
Stepwise Selection
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
Wine Model Validation Plot (R = 0.58)
Best Subsets
Another model selection method is ‘Best Subsets’
- Output shows ‘Best’ one variable model, ‘Best’ two variable model, ‘Best’ three variable model, etc.
Each ‘Best’ model is determined by multiple Fit Statistics.
This method then examines which of these candidates is the overall best by comparing their fit statistics.
If we are fortunate, the optimal choice from
Best Subsets
matches a model above.- In this case (and HW 8) we are fortunate.
NOTE:
ols_step_best_subset
command is VERY slow. You do not need to rerun it. Output is provided.
Some of the Best Subsets PLots
Reading Best Subsets Output
Tabular Output
Bottom table shows which model performs best, based on all of the fit statistics.
For example, if model 3 (Three variable model) was best, it would have the HIGHEST Adjusted \(R^2\), Lowest C(p), and Lowest AIC.
- We can see from bottom table that Model 3 is not the best.
Model 7 IS the best because it does have the HIGHEST Adjusted R2, Lowest C(p), and Lowest AIC.
Top table lists the variables in each of the ‘Best’ models.
Wine Best Subset Output
Preview of HW 8 - Part 1
Review model comparisons for Animal Data from first part of lecture.
Compare the optimal best subset model (Model 7) to the model found by both Backward Elimination and Forward Selection.
The goal is to determine to what extent they agree.
- Spoiler: They are in complete agreement which indicates that we have consensus on the model for these data.
Reminder of Upcoming Dates
Today’s Lecture (3/18) is the third and final lecture on model and variable selection.
HW 7 is due tomorrow, Wed., 3/19.
HW 8 is now posted and is due Wednesday, 3/26
Part 1 pertains to Lectures 15-17
Part 2 pertains to Lecture 18
Quiz 2 is on Tuesday, April 1st, in the classroom
- Practice Questions will be posted this weekend.
Key Points from this Week
Regression modeling can be overwhelming because of all of the possible options.
Automating part of the variable selection process is helpful.
Trying different methods and comparing results is strongly recommended.
Results from Automated processes are preliminary models that can (and should) be tinkered with.
Once we have a final model we can add regression estimates and residuals to the dataset.
Methods Covered: Backwards Elimination, Forward Selection, Stepwise Selection, Best Subsets
- Compare results from multiple methods
To submit an Engagement Question or Comment about material from Lecture 17: Submit it by midnight today (day of lecture).
Comments about Model Selection Methods
Common Practice: Try multiple methods to develop preliminary final model and then tweak as needed.
Steps for model selection using multiple methods are similar to the steps for Backward Elimination (Week 8 Lectures)
Not all steps are ALWAYS required. It depends on how complex the data are.
In the following example, we only need to do part of Step 1 plus Steps 2, 3, and 6.
For Step 1, we only need to examine correlations.
In this case, Step 7 will be apparent.
We can add model estimates to data for future interpretation (Step 8)