Housekeeping

  • Today’s plan 📋

    • Review of Simple Linear Regression (SLR) Concepts from Lecture 25

      • Interpreting Regression Model Output

      • Understanding the Hypotheses

      • Drawing conclusions

      • Answering estimation questions

    • Introduction to Multiple Linear Regression (MLR)

      • How to add variables to a model

      • Interpreting a model with more than one X variable

      • Examining MLR Model Output

      • Understanding hypotheses being tested

      • Answering estimation questions

More Housekeeping and Upcoming Dates

  • HW 8 is due on Thursday, 12/5.

    • Demo Videos were pposted on Sunday 12/1.
  • In-person Final Exam is on 12/16/24 at 5:15 PM

    • Timed Remote option will be available at 8:30 PM on 12/16 and must be completed before 10:00 PM on 12/17.
  • I will hold a Q&A Review for all material on Wednesday, 12/11.

    • Time and Location TBD.

R and RStudio

  • In this course we use R and RStudio to understand statistical concepts.

  • You can access R and RStudio through Posit Cloud.

  • I post R/RStudio files on Posit Cloud that you can access in provided links.

  • I also provide demo videos that show how to access files and complete exercises.

  • NOTE: The free Posit Cloud account is limited to 25 hours per month.

    • I demo how to download completed work so that you can use this allotment efficiently.

    • For those who want to go further with R/RStudio:

💥 Lecture 27 In-class Exercise - Q1 💥

Below is the model output for a regression model relating the size of the living area of a house to it’s selling price.

What is the estimated selling price of a 2000 sq. ft. house, based on this model?

Round your answer to a whole dollar amount.

(house_mod1 <- ols_regress(Price ~ Living_Area, data=real_estate))
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.772       RMSE                    45426.628 
R-Squared                   0.596       MSE                2063578544.951 
Adj. R-Squared              0.594       Coef. Var                  27.670 
Pred R-Squared              0.579       AIC                      4863.117 
MAE                     31692.288       SBC                      4873.012 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                       ANOVA                                         
------------------------------------------------------------------------------------
                         Sum of                                                     
                        Squares         DF         Mean Square       F         Sig. 
------------------------------------------------------------------------------------
Regression     609852999259.857          1    609852999259.857    292.576    0.0000 
Residual       412715708990.143        198      2084422772.677                      
Total         1022568708250.000        199                                          
------------------------------------------------------------------------------------

                                       Parameter Estimates                                        
-------------------------------------------------------------------------------------------------
      model         Beta    Std. Error    Std. Beta      t        Sig         lower        upper 
-------------------------------------------------------------------------------------------------
(Intercept)    16505.199      9262.237                  1.782    0.076    -1760.095    34770.493 
Living_Area       82.588         4.828        0.772    17.105    0.000       73.066       92.110 
-------------------------------------------------------------------------------------------------

💥 Lecture 27 In-class Exercise - Q1 cont’d 💥

Focus on the Parameter Estimates table to answer this question:

house_mod1$betas |> round(3)
(Intercept) Living_Area 
  16505.199      82.588 

Regression Output Interpretation in MAS 261

MAS 261 Skills:

  • Interpreting Parameter Estimates Beta (model coefficients) and Sig (P-values) columns

  • \(Est. Selling Price = 16505.199 + 82.588\times Living Area\)

Limitations of Simple Linear Regression

Simmple Linear Regression - One X variable

In this case, X is the size of the living area.

This model says that regardless of other factors:

  • a 2500 sq. ft house has a selling price of 222975.

  • The model ignores number of bathrooms, age of house, etc.

  • These factors may also be helpful in explaining selling price.


  • Correlation between Living Area and Selling price is 0.77

  • This a is strong correlation, but maybe we can explain more of the variability in the data.

SLR vs. MLR

Transitioning from SLR to MLR is Straightforward

  • In R and most software adding a variable to our model is as simple as addition.

  • The challenge is interpretation because we can no longer visualize the model.

  • There are 3-D visualization tools in R, BUT they are not always helpful.

  • Instead, I recommend extending the SLR model output interpretation to the new variables in the model.

  • One the next slide we’ll add number of bathrooms.

    • Spoiler: Number of bathrooms is a huge deal when buying a house.

MLR Model with two X variables

(house_mod2 <- ols_regress(Price ~ Living_Area + Bathrooms, data=real_estate))
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.815       RMSE                    41412.317 
R-Squared                   0.665       MSE                1714980011.473 
Adj. R-Squared              0.661       Coef. Var                  25.289 
Pred R-Squared              0.640       AIC                      4828.109 
MAE                     30629.922       SBC                      4841.302 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                       ANOVA                                         
------------------------------------------------------------------------------------
                         Sum of                                                     
                        Squares         DF         Mean Square       F         Sig. 
------------------------------------------------------------------------------------
Regression     679572705955.336          2    339786352977.668    195.157    0.0000 
Residual       342996002294.664        197      1741096458.349                      
Total         1022568708250.000        199                                          
------------------------------------------------------------------------------------

                                        Parameter Estimates                                         
---------------------------------------------------------------------------------------------------
      model          Beta    Std. Error    Std. Beta      t        Sig          lower        upper 
---------------------------------------------------------------------------------------------------
(Intercept)    -11553.295      9556.111                 -1.209    0.228    -30398.701     7292.110 
Living_Area        58.047         5.875        0.543     9.881    0.000        46.462       69.633 
  Bathrooms     38141.447      6027.411        0.348     6.328    0.000     26254.916    50027.977 
---------------------------------------------------------------------------------------------------

A closer look at the Parameter Estimates

Interpreting the new Model

Model: \[ Est. Selling Price = -11553.295 + 58.047\times Living Area + 38141.447 \times Bathrooms \]

Interpretation:

  • If number of bathrooms remains unchanged, each additional square foot is estimated to raise the selling price by about 58 dollars.

  • If living area remains unchanged, each additional bathroom will raise the estimated selling price by about 38 THOUSAND dollars.

💥 Lecture 27 In-class Exercise - Q2 💥

Based on this model, if a house is renovated to increase the square footage by 1000 square feet and two bathrooms are added, what would be estimated change in price?

Round your answer to a whole dollar amount.

Model: \[ Est. Selling Price = -11553.295 + 58.047\times Living Area + 38141.447 \times Bathrooms \]

To answer this question, exclude the intercept because we are only intersted in the change in price, not the price itself.

house_mod2$betas |> round(3)
(Intercept) Living_Area   Bathrooms 
 -11553.295      58.047   38141.447 

Adding ANOTHER Term to our MLR

next, we add age of the house to the model:

(house_mod3 <- ols_regress(Price ~ Living_Area + Bathrooms + House_Age, data=real_estate))
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.821       RMSE                    40864.224 
R-Squared                   0.673       MSE                1669884825.573 
Adj. R-Squared              0.668       Coef. Var                  25.018 
Pred R-Squared              0.641       AIC                      4824.780 
MAE                     30119.407       SBC                      4841.271 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                       ANOVA                                         
------------------------------------------------------------------------------------
                         Sum of                                                     
                        Squares         DF         Mean Square       F         Sig. 
------------------------------------------------------------------------------------
Regression     688591743135.442          3    229530581045.147    134.704    0.0000 
Residual       333976965114.558        196      1703964107.727                      
Total         1022568708250.000        199                                          
------------------------------------------------------------------------------------

                                       Parameter Estimates                                         
--------------------------------------------------------------------------------------------------
      model         Beta    Std. Error    Std. Beta      t        Sig          lower        upper 
--------------------------------------------------------------------------------------------------
(Intercept)     5775.299     12087.330                  0.478    0.633    -18062.622    29613.220 
Living_Area       60.614         5.918        0.567    10.243    0.000        48.943       72.285 
  Bathrooms    30089.928      6913.944        0.274     4.352    0.000     16454.654    43725.201 
  House_Age     -235.721       102.458       -0.112    -2.301    0.022      -437.783      -33.658 
--------------------------------------------------------------------------------------------------

Examing the new model

Hopefully, the interpretation will seem redundant at this point…

The New Model

Model: \[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]

Interpretation:

  • If number of bathrooms and age of the house remain unchanged, each additional square foot is estimated to raise the selling price by about 61 dollars.

  • If living area and age of the house remain unchanged, each additional bathroom will raise the estimated selling price by about 30 THOUSAND dollars.

  • If living area and number of bathrooms remain unchanged, each additional year will LOWER the estimated selling price by about 236 dollars.

💥 Lecture 27 In-class Exercise - Q3 💥

What is the estimated price of a house that 2500 square feet with 4 bathrooms that is 20 years old?

Round your answer to a whole dollar amount.

Model:

\[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]

house_mod3$betas |> round(3)
(Intercept) Living_Area   Bathrooms   House_Age 
   5775.299      60.614   30089.928    -235.721 

In this case, the calculation INCLUDES the intercept because we want to estimate the price of a house, not the change in price.

More about SLR and MLR Models in BUA 345

This introduction to MLR is meant to be exactly that.

Ideally this gives you a better understanding of some the Parameter Estimates output.

  • In BUA 345 we spend about 2/3 of the semester on modeling.

  • This section begins with a short review of SLR and MLR models and then continues with the following topics:

  • Using the p-values and model fit statistics to decide between models

    • Which variables should we keep in the model?

    • How much information does each variable add to the model?

    • Are there interactions between the variables that should be added to the model?

  • More time is also spent on model dagnostics and writing the model code.