MAS 261 - Lecture 28

Introduction to Linear Transformations

Author

Penelope Pooler Eisenbies

Published

December 5, 2024

Housekeeping

  • Today’s plan

    • Review of Multiple Linear Regression (SLR) Concepts from Lecture 27

      • Interpreting Regression Model Output

      • Understanding the Hypotheses

      • Drawing conclusions

      • Answering estimation questions

    • Introduction to Transformation

      • Linear Regression Model Assumptions

      • How Transformations Help

      • Log (LN) Transformation of X or Y

    • Course Evaluations

More Housekeeping and Upcoming Dates

  • HW 8 is due on Today

  • HW 9 is posted and is due, Wednesday, 12/11.

    • I will post HW 9 demo videos this weekend.
  • In-person Final Exam is on 12/16/24 at 5:15 PM

    • Timed Remote option will be available at 8:30 PM on 12/16 and must be completed before 10:00 PM on 12/17.
  • I will hold a Q&A Review for all material on Wednesday, 12/11.

    • Time and Location TBD.

R and RStudio

  • In this course we use R and RStudio to understand statistical concepts.

  • You can access R and RStudio through Posit Cloud.

  • I post R/RStudio files on Posit Cloud that you can access in provided links.

  • I also provide demo videos that show how to access files and complete exercises.

  • NOTE: The free Posit Cloud account is limited to 25 hours per month.

    • I demo how to download completed work so that you can use this allotment efficiently.

    • For those who want to go further with R/RStudio:

Lecture 28 In-class Exercise - Q1


Below is the final model we arrived at in Lecture 26.


What is the estimated price of a house that is 3000 square feet, has 4 bathrooms, and is 30 years old?


Round your answer to a whole dollar amount.

Code
```{r echo=T}
house_mod3 <- ols_regress(Price ~ Living_Area + Bathrooms + House_Age, data=real_estate)
house_mod3$betas |> round(3)
```
(Intercept) Living_Area   Bathrooms   House_Age 
   5775.299      60.614   30089.928    -235.721 

Lecture 28 In-class Exercises - Q2


What is the CHANGE in price we can expect for a house that has aged 10 years and has two bathrooms and 1500 feet added during a renovation?

Round your answer to a whole dollar amount.


Code
```{r echo=T}
house_mod3$betas |> round(3)
```
(Intercept) Living_Area   Bathrooms   House_Age 
   5775.299      60.614   30089.928    -235.721 

Review of MLR Model and Interpretation

Interpreting Coefficients

Model: \[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]

Interpretation:

  • If number of bathrooms and age of the house remain unchanged, each additional square foot is estimated to raise the selling price by about 61 dollars.

  • If living area and age of the house remain unchanged, each additional bathroom will raise the estimated selling price by about 30 THOUSAND dollars.

  • If living area and number of bathrooms remain unchanged, each additional year will LOWER the estimated selling price by about 236 dollars.

Simple Linear Regression and Model Assumptions

There are TWO primary assumptions of SLR:

  1. There is a linear (straight line) relationship between the dependent variable (Y) and the independent variable (X).

    • We can test this visually (this lecture) and with statistical diagnostics
  2. At each value of X, the POPULATION of Y values are normally distributed

    • This can be verified, to some extent, with model diagnostics (not in MAS 261)

Curvilinear Data and What to Do

  • These data show the years to maturity and yield (%) of 40 Corporate Bonds.

  • We want to predict yield based on the years to maturity of a corporate bond.

  • Do these data adhere to the assumption of linearity?

Evaluating this Curvilinear Relationship

  • Slope is positive but not consistent.

  • Yield appears to level off for longer maturity periods.

  • Relationship appears CURVILINEAR, Not linear.

  • Assumption of Linear Relationship is NOT valid

One Possible Solution

What do we do if data do not meet linear assumption?

  • Data can be transformed.

  • Many many many transformation options.

  • We will discuss just a couple of transformations

  • MAS 261 students are NOT expected to know what transformation is needed.

  • In this case, LN(X) works well

    • In R, log command is LN (natural log)
  • Correct Model:

    • \(\hat{Y} = 0.8279 + 1.5626\times LN(X)\)

Interpreting a Model with LN(X)

  • If LN(X is used to create model) then

    • we use LN(X), log(x) in R to calculate model estimates
  • To use this model to estimate yield of a bond that matures in 20 years,

    • plug in log(20)

Lecture 28 In-class Exercises - Q3


What is the estimated yield of a bond that matures in 20 years?


Round percentage to one decimal place and do not include percent symbol.


Recall model:

\(\hat{Y} = 0.8279 + 1.5626\times LN(X)\)

Lecture 28 In-class Exercises - Q4

If LN of years to maturity, LN(X), results in the best model, how do we interpret the intercept, \(b_0\)?


HINTS:

  1. \(b_0\) is the value of Y, when our NEW X, LN of years to maturity equals 0.

  2. LN(1) = 0, so when years to maturity = 1 (X = 1), then LN(X) = 0


A. \(b_0\) has no real world interpretation.

B. \(b_0\) is the yield (Y) when the bond is first issued (X = 0)

C. \(b_0\) is the yield when the bond matures in one year (X = 1)

D. \(b_0\) is the change in yield that happens in one year.

Another Common Linear Transformation

Suppose you are a manager of a motorcycle store

You want to predict the selling price of motorcycles based on ‘wheelbase (in inches)’.

For this purpose, you collect data from 86 motorcycle models.

LN(Y) Transformation

  • Non-linear relationship between X and Y is apparent.

  • Linear regression between X and Y will not work on raw data.

  • Transformation of X and/or Y may linearize relationship.

  • For concave up non-linearity where Y > 0 for all values, we use LN(Y)

  • Model:

    • \(LN(\hat{Y}) = 3.8361 + 0.086\times X\)

Lecture 28 In-class Exercises - Q5

The wheelbase (X) of a motorcycle is 50 inches.

The estimated regression equation is:

\[LN(\hat{Y}) = 3.8361 + 0.086\times X\]


What is the selling price (Y) of the motorcycle?

Round your answer to closest whole dollar.


NOTE: Use the exp command to back-transform estimate, \(LN(\hat{Y})\) to find the selling price in dollars, \(\hat{Y}\).

Summary and Helpful Tips

Key Points from Today

  • Two Essential Assumptions for Simple Linear Regression (SLR) and Multiple Linear Regression (MLR):

    1. There is a straight line relationship between Y and each X in the model.

    2. At each value of X, Y is normally distributed.

  • In MAS 261, we focus on Assumption 1 for SLR

    • Evaluating relationship visually

    • Linearizing relationship using LN(X), log(x), or LN(Y), log(y).

  • There are many transformation options, but we cover only these two which are most common for data with values > 0.

To submit an Engagement Question or Comment about material from Lecture 28: Submit it by midnight today (day of lecture).