---
title: "MAS 261 - Lecture 27"
subtitle: "Introduction to Linear Transformations"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Housekeeping
```{r setup, echo=FALSE, warning=F, message=F, include=F}
#| include: false
# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=F)
# suppress scientific notation
options(scipen=100)
# install helper package that loads and installs other packages, if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
pacman::p_load(pacman,tidyverse, magrittr, olsrr, shadowtext, mapproj, knitr, kableExtra,
countrycode, usdata, maps, RColorBrewer, gridExtra, ggthemes, gt,
mosaicData, epiDisplay, vistributions, psych, tidyquant, dygraphs)
# verify packages
# p_loaded()
```
- Today's plan :clipboard:
- Review of Multiple Linear Regression (SLR) Concepts from Lecture 26
- Interpreting Regression Model Output
- Understanding the Hypotheses
- Drawing conclusions
- Answering estimation questions
- Introduction to Transformation
- Linear Regression Model Assumptions
- How Transformations Help
- Log (LN) Transformation of X or Y
- Course Evaluations
- I will step out for five minutes at the end of class.
- [Please use this link.](http://coursefeedback.syr.edu/){target="_blank"}
##
### More Housekeeping and Upcoming Dates
- HW 8 is due on Thursday, 12/4 (One day grace period)
- HW 9 is posted and is due, Wednesday, 12/10.
- Demo Videos are Posted.
- In-person Final Exam is on 12/12/24 at 5:15 PM
- On Thursday, 12/4, we will review the material covered after Quiz 2 using the posted practice questions.
- On Tuesday, 12/9, there will be an in-class Q&A Review of all material from the whole semester.
- ***Come with questions!***
## R and RStudio
- In this course we will use R and RStudio to understand statistical concepts.
- You will access R and RStudio through **Posit Cloud**.
- Sign up for a [Free Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"}
- I will post R/RStudio files on Posit Cloud that you can access in provided links.
- I will also provide demo videos that show how to access files and complete exercises.
- NOTE: The free Posit Cloud account is limited to 25 hours per month.
- For those who want to go further with R/RStudio:
- If you are interested in downloading R and RStudio to your own computer, I can guide you through the process.
- The software is completely free but it does have to be updated a couple times each year.
##
### Lecture 27 In-class Exercise - Q1
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
<br>
Below is the final model we arrived at in Lecture 26.
<br>
**What is the estimated price of a house that is 3000 square feet, has 4 bathrooms, and is 30 years old?**
<br>
Round your answer to a whole dollar amount.
```{r}
real_estate <- read_csv("data/Real_Estate.csv", show_col_types = F)
```
```{r echo=T}
house_mod3 <- ols_regress(Price ~ Living_Area + Bathrooms + House_Age, data=real_estate)
house_mod3$betas |> round(3)
```
##
### Lecture 27 In-class Exercises - Q2
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
<br>
**What is the CHANGE in price we can expect for a house that has aged 10 years and has two bathrooms and 1500 feet added during a renovation?**
Round your answer to a whole dollar amount.
<br>
```{r echo=T}
house_mod3$betas |> round(3)
```
## Review of MLR Model and Interpretation
```{r}
knitr::include_graphics("img/MLR3_Output_Interpretation.png", dpi=100)
```
## Interpreting Coefficients
Model: $$ Est. Selling Price = 5775.299 + 60.614\times Living Area + 30089.928 \times Bathrooms - 235.721\times House Age $$
Interpretation:
- If number of bathrooms and age of the house remain unchanged, each additional square foot is estimated to raise the selling price by about 61 dollars.
- If living area and age of the house remain unchanged, each additional bathroom will raise the estimated selling price by about 30 THOUSAND dollars.
- If living area and number of bathrooms remain unchanged, each additional year will **LOWER** the estimated selling price by about 236 dollars.
##
### Simple Linear Regression and Model Assumptions
::::: columns
::: {.column width="55%"}
There are TWO primary assumptions of SLR:
1. There is a linear (straight line) relationship between the dependent variable (Y) and the independent variable (X).
- We can test this visually (this lecture) and with statistical diagnostics
2. At each value of X, the POPULATION of Y values are normally distributed
- This can be verified, to some extent, with model diagnostics (not in MAS 261)
:::
::: {.column width="45%"}
```{r}
knitr::include_graphics("img/slr_assumptions.png", dpi=100)
```
:::
:::::
##
```{r}
knitr::include_graphics("img/Review_Log_LN.png", dpi=100)
```
## Curvilinear Data and What to Do
::::: columns
::: {.column width="50%"}
- These data show the years to maturity and yield (%) of 40 Corporate Bonds.
- We want to predict yield based on the years to maturity of a corporate bond.
- Do these data adhere to the assumption of linearity?
:::
::: {.column width="50%"}
```{r}
knitr::include_graphics("img/corp_bonds_untransformed.png", dpi=100)
```
:::
:::::
## Evaluating this Curvilinear Relationship
::::: columns
::: {.column width="55%"}
- Slope is positive but not consistent.
- Yield appears to level off for longer maturity periods.
- Relationship appears CURVILINEAR, Not linear.
- Assumption of Linear Relationship is NOT valid
:::
::: {.column width="45%"}
```{r}
knitr::include_graphics("img/corp_bonds_untransformed_reg.png", dpi=100)
```
:::
:::::
##
:::::: columns
::: {.column width="50%"}
### One Possible Solution
What do we do if data do not meet linear assumption?
- Data can be transformed.
- Many many many transformation options.
- We will discuss just a couple of transformations
- **MAS 261 students are NOT expected to know what transformation is needed.**
- In this case, LN(X) works well
- In R, `log` command is LN (natural log)
- Correct Model:
- $\hat{Y} = 0.8279 + 1.5626\times LN(X)$
:::
:::: {.column width="50%"}
```{r}
knitr::include_graphics("img/corp_bonds_untransformed_reg.png", dpi=100)
```
::: fragment
```{r}
knitr::include_graphics("img/corp_bonds_transformed_reg.png", dpi=100)
```
:::
::::
::::::
##
### Interpreting a Model with LN(X)
::::: columns
::: {.column width="50%"}
- If LN(X) is used to create model then
- we use LN(X), `log(x)` in R to calculate model estimates
- To use this model to estimate yield of a bond that matures in 20 years,
- plug in `log(20)`
:::
::: {.column width="50%"}
```{r}
knitr::include_graphics("img/corp_bonds_transformed_reg.png", dpi=100)
```
:::
:::::
##
### Lecture 27 In-class Exercises - Q3
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
::::: columns
::: {.column width="50%"}
<br>
**What is the estimated yield of a bond that matures in 20 years?**
<br>
Round percentage to one decimal place and do not include percent symbol.
<br>
Recall model:
$\hat{Y} = 0.8279 + 1.5626\times LN(X)$
:::
::: {.column width="50%"}
```{r}
knitr::include_graphics("img/corp_bonds_transformed_reg.png", dpi=100)
```
:::
:::::
##
### Lecture 27 In-class Exercises - Q4
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
**If LN of years to maturity, LN(X), results in the best model, how do we interpret the intercept,** $b_0$?
<br>
HINTS:
1. $b_0$ is the value of Y, when our NEW X, LN of years to maturity equals 0.
2. LN(1) = 0, so when years to maturity = 1 (X = 1), then LN(X) = 0
<br>
**`A.`** $b_0$ has no real world interpretation.
**`B.`** $b_0$ is the yield (Y) when the bond is first issued (X = 0)
**`C.`** $b_0$ is the yield when the bond matures in one year (X = 1)
**`D.`** $b_0$ is the change in yield that happens in one year.
##
```{r}
knitr::include_graphics("img/ln_x_model_interpretation.png")
```
## Another Common Linear Transformation
::::: columns
::: {.column width="50%"}
Suppose you are a manager of a motorcycle store
You want to predict the selling price of motorcycles based on ‘wheelbase' (in inches).
For this purpose, you collect data from 86 motorcycle models.
:::
::: {.column width="50%"}
{height="2.5in"}

:::
:::::
##
:::::: columns
::: {.column width="50%"}
### LN(Y) Transformation
- Non-linear relationship between X and Y is apparent.
- Linear regression between X and Y will not work on raw data.
- Transformation of X and/or Y may linearize relationship.
- For concave up non-linearity where Y \> 0 for all values, we use LN(Y)
- Model:
- $LN(\hat{Y}) = 3.8361 + 0.086\times X$
:::
:::: {.column width="50%"}
```{r}
knitr::include_graphics("img/motorcycle_untr_plot.png", dpi=100)
```
::: fragment
```{r}
knitr::include_graphics("img/motorcycle_tr_plot.png", dpi=100)
```
:::
::::
::::::
##
### Lecture 27 In-class Exercises - Q5
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
::::: columns
::: {.column width="55%"}
The wheelbase (X) of a motorcycle is 50 inches.
The estimated regression equation is:
$$LN(\hat{Y}) = 3.8361 + 0.086\times X$$
<br>
**What is the selling price (Y) of the motorcycle?**
Round your answer to closest whole dollar.
<br>
**NOTE: Use the `exp` command to back-transform estimate,** $LN(\hat{Y})$ to find the selling price in dollars, $\hat{Y}$.
:::
::: {.column width="45%"}
```{r}
knitr::include_graphics("img/motorcycle_tr_plot.png", dpi=100)
```
:::
:::::
##
### Summary and Helpful Tips
```{r}
knitr::include_graphics("img/lin_trans_smry.png")
```
##
### Key Points from Today
- Two Essential Assumptions for Simple Linear Regression (SLR) and Multiple Linear Regression (MLR):
1. There is a straight line relationship between Y and each X in the model.
2. At each value of X, Y is normally distributed.
- In MAS 261, we focus on Assumption 1 for SLR
- Evaluating relationship visually
- Linearizing relationship using LN(X), `log(x)`, or LN(Y), `log(y)`.
- There are many transformation options, but we cover only these two which are most common for data with values greater than 0.
::: fragment
**To submit an Engagement Question or Comment about material from Lecture 27:** Submit it by midnight today (day of lecture).
:::