Model Summary
--------------------------------------------------------------------------
R 0.772 RMSE 45426.628
R-Squared 0.596 MSE 2063578544.951
Adj. R-Squared 0.594 Coef. Var 27.670
Pred R-Squared 0.579 AIC 4863.117
MAE 31692.288 SBC 4873.012
--------------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------------------------
Regression 609852999259.857 1 609852999259.857 292.576 0.0000
Residual 412715708990.143 198 2084422772.677
Total 1022568708250.000 199
------------------------------------------------------------------------------------
Parameter Estimates
-------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-------------------------------------------------------------------------------------------------
(Intercept) 16505.199 9262.237 1.782 0.076 -1760.095 34770.493
Living_Area 82.588 4.828 0.772 17.105 0.000 73.066 92.110
-------------------------------------------------------------------------------------------------
MAS 261 - Lecture 27
Multiple Linear Regression
Housekeeping
Today’s plan
Review of Simple Linear Regression (SLR) Concepts from Lecture 25
Interpreting Regression Model Output
Understanding the Hypotheses
Drawing conclusions
Answering estimation questions
Introduction to Multiple Linear Regression (MLR)
How to add variables to a model
Interpreting a model with more than one X variable
Examining MLR Model Output
Understanding hypotheses being tested
Answering estimation questions
More Housekeeping and Upcoming Dates
HW 8 is due on Thursday, 12/5.
- Demo Videos were pposted on Sunday 12/1.
In-person Final Exam is on 12/16/24 at 5:15 PM
- Timed Remote option will be available at 8:30 PM on 12/16 and must be completed before 10:00 PM on 12/17.
I will hold a Q&A Review for all material on Wednesday, 12/11.
- Time and Location TBD.
R and RStudio
In this course we use R and RStudio to understand statistical concepts.
You can access R and RStudio through Posit Cloud.
- Sign up for a Free Posit Cloud Account
I post R/RStudio files on Posit Cloud that you can access in provided links.
I also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
I demo how to download completed work so that you can use this allotment efficiently.
For those who want to go further with R/RStudio:
- I have added a new page to the MAS 261 website, Installing R and RStudio
Lecture 27 In-class Exercise - Q1
Below is the model output for a regression model relating the size of the living area of a house to it’s selling price.
What is the estimated selling price of a 2000 sq. ft. house, based on this model?
Round your answer to a whole dollar amount.
Lecture 27 In-class Exercise - Q1 cont’d
Focus on the Parameter Estimates
table to answer this question:
Regression Output Interpretation in MAS 261
MAS 261 Skills:
Interpreting Parameter Estimates
Beta
(model coefficients) andSig
(P-values) columns\(Est. Selling Price = 16505.199 + 82.588\times Living Area\)
Limitations of Simple Linear Regression
Simmple Linear Regression - One X variable
In this case, X is the size of the living area.
This model says that regardless of other factors:
a 2500 sq. ft house has a selling price of 222975.
The model ignores number of bathrooms, age of house, etc.
These factors may also be helpful in explaining selling price.
`geom_smooth()` using formula = 'y ~ x'
Correlation between Living Area and Selling price is 0.77
This a is strong correlation, but maybe we can explain more of the variability in the data.
SLR vs. MLR
Transitioning from SLR to MLR is Straightforward
In R and most software adding a variable to our model is as simple as addition.
The challenge is interpretation because we can no longer visualize the model.
There are 3-D visualization tools in R, BUT they are not always helpful.
Instead, I recommend extending the SLR model output interpretation to the new variables in the model.
One the next slide we’ll add number of bathrooms.
- Spoiler: Number of bathrooms is a huge deal when buying a house.
MLR Model with two X variables
Code
Model Summary
--------------------------------------------------------------------------
R 0.815 RMSE 41412.317
R-Squared 0.665 MSE 1714980011.473
Adj. R-Squared 0.661 Coef. Var 25.289
Pred R-Squared 0.640 AIC 4828.109
MAE 30629.922 SBC 4841.302
--------------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------------------------
Regression 679572705955.336 2 339786352977.668 195.157 0.0000
Residual 342996002294.664 197 1741096458.349
Total 1022568708250.000 199
------------------------------------------------------------------------------------
Parameter Estimates
---------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
---------------------------------------------------------------------------------------------------
(Intercept) -11553.295 9556.111 -1.209 0.228 -30398.701 7292.110
Living_Area 58.047 5.875 0.543 9.881 0.000 46.462 69.633
Bathrooms 38141.447 6027.411 0.348 6.328 0.000 26254.916 50027.977
---------------------------------------------------------------------------------------------------
A closer look at the Parameter Estimates
Interpreting the new Model
Model: \[ Est. Selling Price = -11553.295 + 58.047\times Living Area + 38141.447 \times Bathrooms \]
Interpretation:
If number of bathrooms remains unchanged, each additional square foot is estimated to raise the selling price by about 58 dollars.
If living area remains unchanged, each additional bathroom will raise the estimated selling price by about 38 THOUSAND dollars.
Lecture 27 In-class Exercise - Q2
Based on this model, if a house is renovated to increase the square footage by 1000 square feet and two bathrooms are added, what would be estimated change in price?
Round your answer to a whole dollar amount.
Model: \[ Est. Selling Price = -11553.295 + 58.047\times Living Area + 38141.447 \times Bathrooms \]
To answer this question, exclude the intercept because we are only intersted in the change in price, not the price itself.
Adding ANOTHER Term to our MLR
next, we add age of the house to the model:
Code
Model Summary
--------------------------------------------------------------------------
R 0.821 RMSE 40864.224
R-Squared 0.673 MSE 1669884825.573
Adj. R-Squared 0.668 Coef. Var 25.018
Pred R-Squared 0.641 AIC 4824.780
MAE 30119.407 SBC 4841.271
--------------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------------------------
Regression 688591743135.442 3 229530581045.147 134.704 0.0000
Residual 333976965114.558 196 1703964107.727
Total 1022568708250.000 199
------------------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------
(Intercept) 5775.299 12087.330 0.478 0.633 -18062.622 29613.220
Living_Area 60.614 5.918 0.567 10.243 0.000 48.943 72.285
Bathrooms 30089.928 6913.944 0.274 4.352 0.000 16454.654 43725.201
House_Age -235.721 102.458 -0.112 -2.301 0.022 -437.783 -33.658
--------------------------------------------------------------------------------------------------
Examing the new model
Hopefully, the interpretation will seem redundant at this point…
The New Model
Model: \[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]
Interpretation:
If number of bathrooms and age of the house remain unchanged, each additional square foot is estimated to raise the selling price by about 61 dollars.
If living area and age of the house remain unchanged, each additional bathroom will raise the estimated selling price by about 30 THOUSAND dollars.
If living area and number of bathrooms remain unchanged, each additional year will LOWER the estimated selling price by about 236 dollars.
Lecture 27 In-class Exercise - Q3
What is the estimated price of a house that 2500 square feet with 4 bathrooms that is 20 years old?
Round your answer to a whole dollar amount.
Model:
\[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]
(Intercept) Living_Area Bathrooms House_Age
5775.299 60.614 30089.928 -235.721
In this case, the calculation INCLUDES the intercept because we want to estimate the price of a house, not the change in price.
More about SLR and MLR Models in BUA 345
This introduction to MLR is meant to be exactly that.
Ideally this gives you a better understanding of some the Parameter Estimates
output.
In BUA 345 we spend about 2/3 of the semester on modeling.
This section begins with a short review of SLR and MLR models and then continues with the following topics:
Using the p-values and model fit statistics to decide between models
Which variables should we keep in the model?
How much information does each variable add to the model?
Are there interactions between the variables that should be added to the model?
More time is also spent on model dagnostics and writing the model code.
Key Points from Today
MLR models are a logical and straightforward extension of SLR models
Visualizing MLR models isn’t typically feasible
Interpretation is similar, but not identical to MLR models.
As with SLR models, the model is only valid for the range of X values used to create it.
Today’s model would not apply to a 10000 square foot house with 8 bathrooms.
Regression model output includes hypothesis tests of each model coefficient.
For SLR and MLR, the hypothesis test of \(\beta_{i}\) is an indication of whether that variable is useful to the model.
In BUA 345, we will use these hypothesis tests and other measures of model fit to determine which variables to include in our models.
To submit an Engagement Question or Comment about material from Lecture 27: Submit it by midnight today (day of lecture).