MAS 261 - Lecture 27

Multiple Linear Regression

Author

Penelope Pooler Eisenbies

Published

December 3, 2024

Housekeeping

  • Today’s plan

    • Review of Simple Linear Regression (SLR) Concepts from Lecture 25

      • Interpreting Regression Model Output

      • Understanding the Hypotheses

      • Drawing conclusions

      • Answering estimation questions

    • Introduction to Multiple Linear Regression (MLR)

      • How to add variables to a model

      • Interpreting a model with more than one X variable

      • Examining MLR Model Output

      • Understanding hypotheses being tested

      • Answering estimation questions

More Housekeeping and Upcoming Dates

  • HW 8 is due on Thursday, 12/5.

    • Demo Videos were pposted on Sunday 12/1.
  • In-person Final Exam is on 12/16/24 at 5:15 PM

    • Timed Remote option will be available at 8:30 PM on 12/16 and must be completed before 10:00 PM on 12/17.
  • I will hold a Q&A Review for all material on Wednesday, 12/11.

    • Time and Location TBD.

R and RStudio

  • In this course we use R and RStudio to understand statistical concepts.

  • You can access R and RStudio through Posit Cloud.

  • I post R/RStudio files on Posit Cloud that you can access in provided links.

  • I also provide demo videos that show how to access files and complete exercises.

  • NOTE: The free Posit Cloud account is limited to 25 hours per month.

    • I demo how to download completed work so that you can use this allotment efficiently.

    • For those who want to go further with R/RStudio:

Lecture 27 In-class Exercise - Q1

Below is the model output for a regression model relating the size of the living area of a house to it’s selling price.

What is the estimated selling price of a 2000 sq. ft. house, based on this model?

Round your answer to a whole dollar amount.

Code
```{r echo=T}
(house_mod1 <- ols_regress(Price ~ Living_Area, data=real_estate))
```
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.772       RMSE                    45426.628 
R-Squared                   0.596       MSE                2063578544.951 
Adj. R-Squared              0.594       Coef. Var                  27.670 
Pred R-Squared              0.579       AIC                      4863.117 
MAE                     31692.288       SBC                      4873.012 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                       ANOVA                                         
------------------------------------------------------------------------------------
                         Sum of                                                     
                        Squares         DF         Mean Square       F         Sig. 
------------------------------------------------------------------------------------
Regression     609852999259.857          1    609852999259.857    292.576    0.0000 
Residual       412715708990.143        198      2084422772.677                      
Total         1022568708250.000        199                                          
------------------------------------------------------------------------------------

                                       Parameter Estimates                                        
-------------------------------------------------------------------------------------------------
      model         Beta    Std. Error    Std. Beta      t        Sig         lower        upper 
-------------------------------------------------------------------------------------------------
(Intercept)    16505.199      9262.237                  1.782    0.076    -1760.095    34770.493 
Living_Area       82.588         4.828        0.772    17.105    0.000       73.066       92.110 
-------------------------------------------------------------------------------------------------

Lecture 27 In-class Exercise - Q1 cont’d

Focus on the Parameter Estimates table to answer this question:

Code
```{r echo=T}
house_mod1$betas |> round(3)
```
(Intercept) Living_Area 
  16505.199      82.588 

Regression Output Interpretation in MAS 261

MAS 261 Skills:

  • Interpreting Parameter Estimates Beta (model coefficients) and Sig (P-values) columns

  • \(Est. Selling Price = 16505.199 + 82.588\times Living Area\)

Limitations of Simple Linear Regression

Simmple Linear Regression - One X variable

In this case, X is the size of the living area.

This model says that regardless of other factors:

  • a 2500 sq. ft house has a selling price of 222975.

  • The model ignores number of bathrooms, age of house, etc.

  • These factors may also be helpful in explaining selling price.

`geom_smooth()` using formula = 'y ~ x'


  • Correlation between Living Area and Selling price is 0.77

  • This a is strong correlation, but maybe we can explain more of the variability in the data.

SLR vs. MLR

Transitioning from SLR to MLR is Straightforward

  • In R and most software adding a variable to our model is as simple as addition.

  • The challenge is interpretation because we can no longer visualize the model.

  • There are 3-D visualization tools in R, BUT they are not always helpful.

  • Instead, I recommend extending the SLR model output interpretation to the new variables in the model.

  • One the next slide we’ll add number of bathrooms.

    • Spoiler: Number of bathrooms is a huge deal when buying a house.

MLR Model with two X variables

Code
```{r echo=T}
(house_mod2 <- ols_regress(Price ~ Living_Area + Bathrooms, data=real_estate))
```
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.815       RMSE                    41412.317 
R-Squared                   0.665       MSE                1714980011.473 
Adj. R-Squared              0.661       Coef. Var                  25.289 
Pred R-Squared              0.640       AIC                      4828.109 
MAE                     30629.922       SBC                      4841.302 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                       ANOVA                                         
------------------------------------------------------------------------------------
                         Sum of                                                     
                        Squares         DF         Mean Square       F         Sig. 
------------------------------------------------------------------------------------
Regression     679572705955.336          2    339786352977.668    195.157    0.0000 
Residual       342996002294.664        197      1741096458.349                      
Total         1022568708250.000        199                                          
------------------------------------------------------------------------------------

                                        Parameter Estimates                                         
---------------------------------------------------------------------------------------------------
      model          Beta    Std. Error    Std. Beta      t        Sig          lower        upper 
---------------------------------------------------------------------------------------------------
(Intercept)    -11553.295      9556.111                 -1.209    0.228    -30398.701     7292.110 
Living_Area        58.047         5.875        0.543     9.881    0.000        46.462       69.633 
  Bathrooms     38141.447      6027.411        0.348     6.328    0.000     26254.916    50027.977 
---------------------------------------------------------------------------------------------------

A closer look at the Parameter Estimates

Interpreting the new Model

Model: \[ Est. Selling Price = -11553.295 + 58.047\times Living Area + 38141.447 \times Bathrooms \]

Interpretation:

  • If number of bathrooms remains unchanged, each additional square foot is estimated to raise the selling price by about 58 dollars.

  • If living area remains unchanged, each additional bathroom will raise the estimated selling price by about 38 THOUSAND dollars.

Lecture 27 In-class Exercise - Q2

Based on this model, if a house is renovated to increase the square footage by 1000 square feet and two bathrooms are added, what would be estimated change in price?

Round your answer to a whole dollar amount.

Model: \[ Est. Selling Price = -11553.295 + 58.047\times Living Area + 38141.447 \times Bathrooms \]

To answer this question, exclude the intercept because we are only intersted in the change in price, not the price itself.

Code
```{r echo=T}
house_mod2$betas |> round(3)
```
(Intercept) Living_Area   Bathrooms 
 -11553.295      58.047   38141.447 

Adding ANOTHER Term to our MLR

next, we add age of the house to the model:

Code
```{r echo=T}
(house_mod3 <- ols_regress(Price ~ Living_Area + Bathrooms + House_Age, data=real_estate))
```
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.821       RMSE                    40864.224 
R-Squared                   0.673       MSE                1669884825.573 
Adj. R-Squared              0.668       Coef. Var                  25.018 
Pred R-Squared              0.641       AIC                      4824.780 
MAE                     30119.407       SBC                      4841.271 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                       ANOVA                                         
------------------------------------------------------------------------------------
                         Sum of                                                     
                        Squares         DF         Mean Square       F         Sig. 
------------------------------------------------------------------------------------
Regression     688591743135.442          3    229530581045.147    134.704    0.0000 
Residual       333976965114.558        196      1703964107.727                      
Total         1022568708250.000        199                                          
------------------------------------------------------------------------------------

                                       Parameter Estimates                                         
--------------------------------------------------------------------------------------------------
      model         Beta    Std. Error    Std. Beta      t        Sig          lower        upper 
--------------------------------------------------------------------------------------------------
(Intercept)     5775.299     12087.330                  0.478    0.633    -18062.622    29613.220 
Living_Area       60.614         5.918        0.567    10.243    0.000        48.943       72.285 
  Bathrooms    30089.928      6913.944        0.274     4.352    0.000     16454.654    43725.201 
  House_Age     -235.721       102.458       -0.112    -2.301    0.022      -437.783      -33.658 
--------------------------------------------------------------------------------------------------

Examing the new model

Hopefully, the interpretation will seem redundant at this point…

The New Model

Model: \[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]

Interpretation:

  • If number of bathrooms and age of the house remain unchanged, each additional square foot is estimated to raise the selling price by about 61 dollars.

  • If living area and age of the house remain unchanged, each additional bathroom will raise the estimated selling price by about 30 THOUSAND dollars.

  • If living area and number of bathrooms remain unchanged, each additional year will LOWER the estimated selling price by about 236 dollars.

Lecture 27 In-class Exercise - Q3

What is the estimated price of a house that 2500 square feet with 4 bathrooms that is 20 years old?

Round your answer to a whole dollar amount.

Model:

\[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]

Code
```{r echo=T}
house_mod3$betas |> round(3)
```
(Intercept) Living_Area   Bathrooms   House_Age 
   5775.299      60.614   30089.928    -235.721 

In this case, the calculation INCLUDES the intercept because we want to estimate the price of a house, not the change in price.

More about SLR and MLR Models in BUA 345

This introduction to MLR is meant to be exactly that.

Ideally this gives you a better understanding of some the Parameter Estimates output.

  • In BUA 345 we spend about 2/3 of the semester on modeling.

  • This section begins with a short review of SLR and MLR models and then continues with the following topics:

  • Using the p-values and model fit statistics to decide between models

    • Which variables should we keep in the model?

    • How much information does each variable add to the model?

    • Are there interactions between the variables that should be added to the model?

  • More time is also spent on model dagnostics and writing the model code.

Key Points from Today

  • MLR models are a logical and straightforward extension of SLR models

  • Visualizing MLR models isn’t typically feasible

  • Interpretation is similar, but not identical to MLR models.

    • As with SLR models, the model is only valid for the range of X values used to create it.

    • Today’s model would not apply to a 10000 square foot house with 8 bathrooms.

  • Regression model output includes hypothesis tests of each model coefficient.

    • For SLR and MLR, the hypothesis test of \(\beta_{i}\) is an indication of whether that variable is useful to the model.

    • In BUA 345, we will use these hypothesis tests and other measures of model fit to determine which variables to include in our models.

To submit an Engagement Question or Comment about material from Lecture 27: Submit it by midnight today (day of lecture).