This model can be used to find model estimates and residuals for all animals.
We will ALSO do these calculations in an Excel Spreadsheet to clarify each model component in estimate.
We export the data for three species to examine how the model works
Species
TotalSleep
LNBodyWt
LNLifeSpan
Gestation
PredF
ArcticFox
12.5
1.22
2.64
60
1
Baboon
9.8
2.36
3.30
180
4
Donkey
3.1
5.23
3.69
365
5
Using a Model to Find Estimates
Model coefficients for calculations can be extracted and exported to Excel.
We create a two column dataset listing each model component and it’s beta coefficient.
That dataset is exported as a .csv file for an in-class exercise.
model_term
beta
(Intercept)
6.7512
LNBodyWt
-0.6976
LNLifeSpan
2.8550
Gestation
-0.0198
PredF2
13.9979
PredF3
11.8834
PredF4
2.6536
PredF5
-0.7817
LNLifeSpan:PredF2
-5.3668
LNLifeSpan:PredF3
-7.3900
LNLifeSpan:PredF4
-0.9409
LNLifeSpan:PredF5
-1.0427
💥 Lecture 16 In-class Exercises - Q2-Q3 💥
Session ID: bua345s25
Use the provided worksheet to answer these questions:
Question 2. What is the regression estimate of total sleep for ‘Donkey’?
Question 3. What is the regression estimate of total sleep for ‘Artic Fox’ (ArticFox)?
At Home Practice:
Complete the worksheet for ‘Baboon’ at home.
At least one question on Quiz 2 may include an Excel Worksheet like this where you have to correctly do the calculation using the model and x values from the data.
You can use R, but code to add estimates to dataset will not be provided.
This exercise is about understanding the model estimation process.
Using a Model to Find Estimates in R
Model estimates can be calculated in R.
Excel Worksheet is used to demonstrate how those estimates are calculated.
You may see an estimate question based on a complex model on Quiz 2.
Species
TotalSleep
Est_TotalSleep
Resid
LNBodyWt
LNLifeSpan
Gestation
PredF
Africangiantpouchedrat
8.3
11.00
-2.70
0.00
1.50
42
3
Americanopossum
19.4
16.10
3.30
0.53
1.61
12
2
ArcticFox
12.5
12.25
0.25
1.22
2.64
60
1
Baboon
9.8
10.51
-0.71
2.36
3.30
180
4
Model Validation
How good is our model?
There are many ways to examine model fit.
Here are two straightforward ways:
Check correlation between observed and estimated values
Plot a scatterplot of observed and estimated values
Model Validation Plot (R = 0.86)
HW 7 Demo - Questions 1 - 11
Demo videos will be posted over break.
Read instructions in R project which correspond to Blackboard HW Assignment 7.
Run the Setup and import and examine the data.
Examine the correlation matrix of the X variables.
Remove # symbols before incomplete R code and replace blanks (____) with correct commands to calculate correlation matrix with values rounded to 2 decimal places.
Run line or whole code chunk to view correlation matrix which is large.
Helpful tip: On the ’Visual` tab of the R Markdown options change Editor content width to 1500.
Remove # from the two lines of code at the bottom and run these lines to find largest positive and negative correlations in the matrix.
Answer Questions 1 - 2 based on the correlation matrix and min/max output.
HW 7 Demo - Questions 3 - 6
Run next chunk of code to Specify full model and Do Backward Elimination:
Create the full model with all variables and no interactions.
Run the Backward Elimination.
Answer questions 3 - 6 based on the Backward Elimination model output
HW 7 Demo - Questions 7 - 11
Run next code chunk Save the Final Model as final_ames_model.
Complete the code in the next chunk to Import New Data and Add Predictions and run code to add model estimates and residuals to new small dataset of two new houses.
It is helpful to run the lines in this code block one at a time.
Run the first command that begins new_houses <- read_csv(... to import a new small datset with 2 observations.
Run the command that begins
-(new_houses <- new_houses |> mutate(Est_Price... to add Est_Price, the regression estimates to this dataset.
HW 7 Demo - Questions 7 - 11 Continued
Remove # before the following three lines to complete them:
#(new_houses <- new_houses |>
# mutate(Resid = ____ - ____ |> round()) |>
# relocate(Est_Price, Resid, .after=Price))
In the line with the blanks you are calculating residuals as
Price minus Estimated Price (Resid = Price - Est_Price)
The next line relocates Est_Price and Resid in the left side of the dataset, after Price.
Answer Questions 7 - 11 based on this output.
Model Selection Methods
Recall that in Multiple Linear Regression (MLR) the goal is to choose the simplest most accurate model, i.e. the ‘BEST’ set of independent variables
How do we decide which variables should be in our model?
There are many methods:
We’ve discussed Backward Elimination which can also be done manually in any software (not recommended).
Description of Other Model Selection Methods
Backward Elimination starts with all potential terms (including potential interaction terms) in the model and removes the least significant term for each step.
This is referred to as starting with a full or saturated model.
Forward Selection: By default, this procedure starts with an empty model and adds the most significant term at each step until there are no more useful terms to add.
Forward selection also needs to know what terms are in the full model.
Stepwise Selection: By default, this procedure starts with an empty model and then adds or removes a term for each step.
Common Practice: Try multiple methods to develop preliminary final model and then tweak as needed.
Notes about Model Selection
Using Multiple Methods
The steps for other methods are similar to the steps for Backward Elimination.
Not all steps are ALWAYS required. It depends on how complex the data are.
In the following example, we only need to do part of Step 1 plus Steps 2, 3, and 6.
For Step 1, we only need to examine correlations.
In this case, Step 7 will be apparent.
We can add model estimates to data for future interpretation (Step 8)
Steps for Model Selection Using Multiple Methods
Examine Matrix of Scatterplots and histograms and determine if any transformations are needed to linearize relationships between continuous predictors and response variable.
Also look at correlation matrix to check if there are pairs of variables to be concerned about.
Create a ‘saturated’ model with all potential predictor variables and interaction terms (Subjective!).
Use Backward Elimination, Forward Selection, and Stepwise Selection to find preliminary candidate models. (These are automated procedures!)
Carefully examine results to see where these candidate models agree and disagree.
Steps for Model Selection Cont’d
Examine predictors in preliminary candidate models to confirm they are not too highly correlated with each other.
If two predictor variables in any model have a correlation of 0.8 or greater, drop one of them.
Rerun model selection methods, if a candidate model is substantially changed (not always needed).
Compare model fit statistics from final candidate model from all three methods.
Decide on final candidate and make final modifications, if needed.
Interpret final model.
Wine Data - Model Selection Example
Can we determine what factors affect wine quality even if we KNOW NOTHING about wine cultivation and chemistry?
Maybe!
Since we have no prior knowledge, we start with a straightforward full model with all available predictors and no interactions.
In practice, a consultant would be working with a wine expert to carefully determine a saturated model that includes all possible interactions.
Import Wine Data
Notice that all variables are numeric (<dbl> stands for decimal value).
Look at the LAST step for each method to determine which method results in the best fit.
Comparison Measures:
Adj.\(R^2\): Higher value indicates better model fit
C(p): Lower value indicates better model fit (Also referred to as Mallow’s C(p)).
AIC: Lower value indicates better model fit (Akaike Information Criteria).
RMSE: Lower value indicates better model fit (Root mean Square Error).
By comparing these measures and accounting for our understanding of these procedures, we can determine that TWO of these methods arrived at the same model.
💥 Lecture 16 In-class Exercises - Q4 💥
Session ID: bua345s25
Which two model selection methods arrived at the same model for the wine data?
On the next few slides I will show pairs of stepwise summaries so you can compare them.
Backwards Elimination and Forward Selection
Backward Elimination
Forward Selection
Backwards Elimination and Stepwise Selection
Backward Elimination
Stepwise Selection
Forward Selection and Stepwise Selection
Forward Selection
Stepwise Selection
Wine Model Validation Plot (R = 0.58)
Key Points from this Week
Regression modeling can be overwhelming
Automating part of the variable selection process is helpful.
Try different methods and compare results.
Results from automated processes are preliminary.
Model estimates and residuals can be added to dataset.
Demonstrated in HW 7.
HW 6 due on Wed. 3/5 (Grace Period extended until 3/7).
HW 7 is posted and is due on Wed. 3/19
Date of Quiz 2 has been changed to Tuesday, 4/1.
To submit an Engagement Question or Comment about material from Lecture 16: Submit it by midnight today (day of lecture).