2023-11-27
Today’s plan 📋
Comments about Quiz 2 and R
Introduction to Simple Linear Regression
Function vs. Model
Examining Real Data
Creating a Model
Interpreting an Regression Model
Review: You have two options to facilitate your introduction to R and RStudio:
If you are comfortable with coding: Start with Option 1, but still sign up for Posit Cloud account.
If you are nervous about coding: Choose Option 2.
For both options: I can help with download/install issues during office hours.
What I do: I maintain a Posit Cloud account for helping students but I do most of my work on my laptop.
NOTE: We will use R and RStudio in class during MOST lectures
There are few make-up tests that will be completed today.
After tests and solutions are posted:
Please go through yor test carefully
If you missed a question due to a typo, please let me know.
I would be happy to go through any questions you missed with you.
HW 8 and HW 9 will be assigned after break
Final Exam is on 12/19/23
Import the data find the average rate of return (expected value) and volatility for a portfolio that invests 75% in Starbucks (SBUX) and 25% in Nestle(NSRGY).
Use stock adjusted close data from 1/1/23 to 11/1/23.
Question 1: What is the average rate of return or expected value of this coffee portfolio? Round answer to two decimal places.
Question 2: What is the volatility of this coffee portfolio? Round answer to two decimal places.
NOTE The final exam will include questions like this.
Average Rate of Return questions ask for a weighted average and could include three or more stocks.
Volatility questions require calculating covariances and variances and will only include two stocks, at most.
In high school algebra, the concept of a function, \(y=f(x)\) is covered.
For example, a function that most people recall from high school is
\[y=x^2\] How does this function appear?
Every point is exactly on the line
No points are above or below the line
BOTH the points and the line were generated with the same function
\[y = mx + b\]
m is the slope of the line
b is the y-intercept
Examples:
Positive slope: \(y = 2x + 3\)
Negative slope: \(y = -3x + 7\)
Notice the Y axis is each plot.
Favorite Quote attributed to George Box:
“All models are wrong, but some are useful.”
Common student query:
If all models are wrong, why do we bother modeling?
Models are considered ‘wrong’ because they simplify the ‘messiness’ of the real world to a mathematical relationship.
Models can’t (and shouldn’t) include all the noise of real world data
No. of Bedrooms helps explain selling price
MANY other factors effect selling price
Location
Size
Age
Mileage helps explain resale price
MANY other factors effect resale price
Model
Maintenance and Climate
Years of Education helps explain income
Many other factors do too:
Major
College
Employer
So what do we do about all this noise?
As Box would say, we “worry selectively”
A strong relationship is still useful and informative
In a later lecture will talk about adding more variables to a model.
To make Russian Tea Cake Cookies, you need 6 tablespoons of powdered sugar to make 3 dozen cookies.
Here is the full recipe.
Here is the equation (y-intercept = 0):
\(y = 6x\)
Is this a function or a model?
The plot and model show the relationship between height and mass for all Star Wars characters for whom data were available.
Questions 4: Is the relationship show here a model or a function?
Follow up Question (not on Point Solutions): What is a good way to determine this?
True Population Model
\[y_{i} = \beta_{0} + \beta_{1}x_{i} + e_{i}\]
\(\beta_{0}\) is the y-intercept
\(\beta_{1}\) is the slope
\(e\) is the unexplained variability in Y
Estimated Sample Data Model
\[\hat{y} = b_{0} + b_{1}x\]
\(\hat{y}\) is model estimate of y from x
\(b_{0}\) is model estimate of y-intercept
\(b_{1}\) is model estimate of slope
Each \(e_{i}\) is a residual.
y obs. - reg. estimate of y
\(e_{i} = y_{i} - \hat{y}_{i}\)
Software estimates model with smallest sum of all squared residuals
Function of a Line
\[y = mx + b\]
Exact precise mathmatical relationship with NO NOISE
Regression Model Equation
\[\hat{y} = b_{0} + b_{1}x\] Estimated line that is simultaneously as close as possible to all observations.
\[\hat{y} = b_{0} + b_{1}x\]
\(\hat{y}\) is regression est. of y
\(b_{0}\) is value of y when X = 0
\(b_{1}\) is change in y due to 1 unit change in x.
NOTE:
Model is only valid for the range of X values used to estimate it.
Using a model to estimate a value outside of this range is called extrapolation and the estimate will be invalid.
Regression Model:
\[\hat{y} = 33.6841 - 0.022417x\]
Question 5. Based on this model, if Horsepower (x) is increased by 1, what is the change in Highway MPG?
Question 6. Based on this model, if Horsepower (x) is increased by 20 (which is more realistic), what is the change in Highway MPG?
Regression Model:
\[\hat{y} = 33.6841 - 0.022417x\]
Question 7. If HP is 600, what is the estimated Highway MPG?
Question 8. What is the residual for the 2016 Aston Martin Vantage
Simple linear regression (SLR) models are similar in format to the function of line.
The interpretation is very different because SLR models are simplification of the real world.
Box said “All models are wrong, but some are useful”
This refers to the inherent simplication of modeling that leaves out the noise of the real world.
Despite this simplfication, models privde valuable insight.
A model is only valid for the range data used to create it.
To submit an Engagement Question or Comment about material from Lecture 24: Submit by midnight today (day of lecture). Click on Link next to the ❓ under Lecture 24