Lecture 27 - Introduction to Linear Transformations

Penelope Pooler Eisenbies
MAS 261

2023-12-05

Housekeeping

  • Today’s plan 📋

    • Review of Multiple Linear Regression (SLR) Concepts from Lecture 26

      • Interpreting Regression Model Output

      • Understanding the Hypotheses

      • Drawing conclusions

      • Answering estimation questions

    • Introduction to Transformation

      • Linear Regression Model Assumptions

      • How Transformations Help

      • Log (LN) Transformation of X or Y

    • Course Evaluations

Review: R and RStudio 🪄

  • Review: You have two options to facilitate your introduction to R and RStudio:

  • If you are comfortable with coding: Start with Option 1, but still sign up for Posit Cloud account.

    • We will use Posit Cloud for Quizzes.
  • If you are nervous about coding: Choose Option 2.

  • For both options: I can help with download/install issues during office hours.

  • What I do: I maintain a Posit Cloud account for helping students but I do most of my work on my laptop.

  • NOTE: We will use R and RStudio in class during MOST lectures

    • You can use either Posit Cloud or your laptop.

Upcoming Dates

  • Including today there are four more lectures

    • Four more opportunities to submit engagement questions

    • I rearranged the syllabus (slightly) to put today’s topic first.

  • HW 8 is posted and covers

    • Portfolio calculations

    • Simple Linear Regression

    • HW 8 is due on 12/6 (2 day grace period.)

  • HW 9 will be posted on 12/7 and will cover

    • SLR, MLR, and Log Transformations.

    • HW 9 is due on 12/13 (2 day grace period.)

  • Final Exam is on 12/19/23

💥 Lecture 26 In-class Exercises - Q1 💥


Below is the final model we arrived at in Lecture 26.


What is the estimated price of a house that is 3000 square feet, has 4 bathrooms, and is 30 years old?


Round your answer to a whole dollar amount.

house_mod3 <- ols_regress(Price ~ Living_Area + Bathrooms + House_Age, data=real_estate)
house_mod3$betas |> round(3)
(Intercept) Living_Area   Bathrooms   House_Age 
   5775.299      60.614   30089.928    -235.721 

💥 Lecture 26 In-class Exercises - Q2 💥


What is the CHANGE in price we can expect for a house that has aged 10 years and has two bathrooms and 1500 feet added in renovations?

Round your answer to a whole dollar amount.


house_mod3$betas |> round(3)
(Intercept) Living_Area   Bathrooms   House_Age 
   5775.299      60.614   30089.928    -235.721 

Review of MLR Model and Interpretation

Interpreting Coefficients

Model: \[ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age \]

Interpretation:

  • If number of bathrooms and age of the house remain unchanged, each additional square foot is estimated to raise the selling price by about 61 dollars.

  • If living area and age of the house remain unchanged, each additional bathroom will raise the estimated selling price by about 30 THOUSAND dollars.

  • If living area and number of bathrooms remain unchanged, each additional year will LOWER the estimated selling price by about 236 dollars.

Simple Linear Regression and Model Assumptions

There are TWO primary assumptions of SLR:

  1. There is a linear (straight line) relationship between the dependent variable (Y) and the independent variable (X).

    • We can test this visually (this lecture) and with statistical diagnostics
  2. At each value of X, the POPULATION of Y values are normally distributed

    • This can be verified, to some extent, with model diagnostics (not in MAS 261)

Curvilinear Data and What to Do

  • These data show the years to maturity and yield (%) of 40 Corporate Bonds.

  • We want to predict yield based on the years to maturity of a corporate bond.

  • Do these data adhere to the assumption of linearity?

Evaluating this Curvilinear Relationship

  • Slope is positive but not consistent.

  • Yield appears to level off for longer maturity periods.

  • Relationship appears CURVILINEAR, Not linear.

  • Assumption of Linear Relationship is NOT valid

One Possible Solution

What do we do if data do not meet linear assumption?

  • Data can be transformed.

  • Many many many transformation options.

  • We will discuss just a couple of transformations

  • MAS 261 students are NOT expected to know what transformation is needed.

  • In this case, LN(X) works well

    • In R, log command is LN (natural log)
  • Correct Model:

    • \(\hat{Y} = 0.8279 + 1.5626\times LN(X)\)

Using a Model with LN(X) for Interpretation

  • If LN(X is used to create model) then

    • we use LN(X), log(x) in R to calculate model estimates
  • To use this model to estimate yield of a bond that matures in 20 years,

    • plug in log(20)

💥 Lecture 26 In-class Exercises - Q3 💥


What is the estimated yield of a bond that matures in 20 years?


Round percentage to one decimal place and do not include percent symbol.


Recall model:

\(\hat{Y} = 0.8279 + 1.5626\times LN(X)\)

💥 Lecture 26 In-class Exercises - Q4 💥

If LN of years to maturity, LN(X), results in the best model, how do we interpret the intercept, \(b_0\)?


HINTS:

  1. \(b_0\) is the value of Y, when our NEW X, LN of years to maturity equals 0.

  2. LN(1) = 0, so when years to maturity = 1 (X = 1), then LN(X) = 0


A. \(b_0\) has no real world interpretation.

B. \(b_0\) is the yield (Y) when the bond is first issued (X = 0)

C. \(b_0\) is the yield when the bond matures in one year (X = 1)

D. \(b_0\) is the change in yield that happens in one year.

Another Common Linear Transformation

Suppose you are a manager of a motorcycle store

You want to predict the selling price of motorcycles based on ‘wheelbase (in inches)’.

For this purpose, you collect data from 86 motorcycle models.

LN(Y) Transformation

  • Non-linear relationship between X and Y is apparent.

  • Linear regression between X and Y will not work on raw data.

  • Transformation of X and/or Y may linearize relationship.

  • For concave up non-linearity where Y > 0 for all values, we use LN(Y)

  • Model:

    • \(LN(\hat{Y}) = 3.8361 + 0.086\times X\)

💥 Lecture 26 In-class Exercises - Q5 💥

The wheelbase (X) of a motorcycle is 50 inches.

The estimated regression equation is:

\[LN(\hat{Y}) = 3.8361 + 0.086\times X\]


What is the selling price (Y) of the motorcycle?

Round your answer to closest whole dollar.


NOTE: Use the exp command to back-transform estimate, \(LN(\hat{Y})\) to find the selling price in dollars, \(\hat{Y}\).

Key Points from Today

  • Two Essential Assumptions for Simple Linear Regression (SLR) and Multiple Linear Regression (MLR):

    1. There is a straight line relationship between Y and each X in the model.

    2. At each value of X, Y is normally distributed.

  • In MAS 261, we focus on Assumption 1 for SLR

    • Evaluating relationship visually

    • Linearizing relationship using LN(X), log(x), or LN(Y), log(y).

  • There are many transformation options, but we cover only these two which are most common for data with values > 0.

  • In Lecture 28 (and HW 9), we will cover how to create model with a transformation and use the model for estimation.

To submit an Engagement Question or Comment about material from Lecture 27: Submit by midnight today (day of lecture). Click on Link next to the under Lecture 27