Housekeeping

  • Today’s plan 📋

    • Comments about Quiz 2 and R

    • Introduction to Simple Linear Regression

      • Function vs. Model

      • Examining Real Data

      • Creating a Model

      • Interpreting an Regression Model

Upcoming Dates

  • I will check and recheck solutions and post grades on on Monday or Tuesday.

  • After tests and solutions are posted:

    • Please go through your test carefully

      • If you missed a question due to a typo, please let me know.

      • I would be happy to go through any questions you missed with you.

  • HW 8 will be posted on Tuesday (11/20)

  • There will be no lecture on Thursday 11/22.

  • In-person Final Exam is on 12/16/24 at 5:15 PM

    • Timed Remote option will be available at 8:30 PM on 12/16 and must be completed before 10:00 PM on 12/17.

R and RStudio

  • In this course we will use R and RStudio to understand statistical concepts.

  • You will access R and RStudio through Posit Cloud.

  • I will post R/RStudio files on Posit Cloud that you can access in provided links.

  • I will also provide demo videos that show how to access files and complete exercises.

  • NOTE: The free Posit Cloud account is limited to 25 hours per month.

    • I demo how to download completed work so that you can use this allotment efficiently.

    • For those who want to go further with R/RStudio:

💥 Lecture 24 In-class Exercises - Q1-Q2 💥

Import the data find the average rate of return (expected value) and volatility for a portfolio that invests 75% in Starbucks (SBUX) and 25% in Nestle(NSRGY).

Use stock adjusted close data from 1/1/23 to 11/1/23.

getSymbols("SBUX", from = "2024-01-01", to = "2024-11-01")
getSymbols("NSRGY", from = "2024-01-01", to = "2024-11-01")


Question 1: What is the average rate of return or expected value of this coffee portfolio? Round answer to two decimal places.

Question 2: What is the volatility of this coffee portfolio? Round answer to two decimal places.

NOTE: The final exam will include questions like this.

  • Average Rate of Return questions ask for a weighted average and could include three or more stocks.

  • Volatility questions require calculating covariances and variances and will only include two stocks, at most.

Models vs. Functions

In high school algebra, the concept of a function, \(y=f(x)\) is covered.

For example, a function that most people recall from high school is

\[y=x^2\] How does this function appear?

Functions are Mathematical relationships

  • Every point is exactly on the line

  • No points are above or below the line

  • BOTH the points and the line were generated with the same function

Function of a LINE

  • While covering functions, a common topic is the function of a line

\[y = mx + b\]

  • m is the slope of the line

  • b is the y-intercept


  • Examples:

    • Positive slope: \(y = 2x + 3\)

    • Negative slope: \(y = -3x + 7\)

    • Y axis range is the same on both plots.

Models ARE NOT Functions

Favorite Quote attributed to George Box:

“All models are wrong, but some are useful.”


Common student query:

If all models are wrong, why do we bother modeling?

Models are considered ‘wrong’ because they simplify the ‘messiness’ of the real world to a mathematical relationship.

Models can’t (and shouldn’t) include all the noise of real world data

  • BUT models are still useful in understanding how variables are related to each other.

Examples of Models of Noisy Data

  • No. of Bedrooms helps explain selling price

  • MANY other factors effect selling price

    • Location

    • Size

    • Age

  • Mileage helps explain resale price

  • MANY other factors effect resale price

    • Model

    • Maintenance and Climate

One More Example

  • Years of Education helps explain income

  • Many other factors do too:

    • Major

    • College

    • Employer

  • So what do we do about all this noise?

    • As Box would say, we “worry selectively”.

    • A strong relationship is still useful and informative.

    • In a later lecture will talk about adding more variables to a model.

💥 Lecture 24 In-class Exercises - Q3 💥

To make Russian Tea Cake Cookies, you need 6 tablespoons of powdered sugar to make 3 dozen cookies.

Here is the full recipe.


Here is the equation (y-intercept = 0):

\(y = 6x\)


Is this a function or a model?

💥 Lecture 24 In-class Exercises - Q4 💥

The scatterplot and line show the relationship between height and mass for all Star Wars characters for whom data were available.


Questions 4: Is the relationship show here a model or a function?


Follow up Question (not on Point Solutions):

What is a good way to determine this?

Simple Linear Regression Model

True Population Model

\[y_{i} = \beta_{0} + \beta_{1}x_{i} + e_{i}\]

  • \(\beta_{0}\) is the y-intercept

  • \(\beta_{1}\) is the slope

  • \(e\) is the unexplained variability in Y

Estimated Sample Data Model

\[\hat{y} = b_{0} + b_{1}x\]

  • \(\hat{y}\) is model estimate of y from x

  • \(b_{0}\) is model estimate of y-intercept

  • \(b_{1}\) is model estimate of slope

  • Each \(e_{i}\) is a residual.

    • y obs. - reg. estimate of y

    • \(e_{i} = y_{i} - \hat{y}_{i}\)

  • Software estimates model with smallest sum of all squared residuals

    • minimizes \(\sum_{i=1}^ne_{i}^2\)

Function of a Line vs. Regression Model

Function of a Line

\[y = mx + b\]

Exact precise mathematical relationship with NO NOISE:

Regression Model Equation

\[\hat{y} = b_{0} + b_{1}x\]

Estimated line that is simultaneously as close as possible to all observations.

Interpreting a Regression Model

\[\hat{y} = b_{0} + b_{1}x\]

  • \(\hat{y}\) is regression est. of y

  • \(b_{0}\) is value of y when X = 0

    • NOT always meaningful
  • \(b_{1}\) is change in y due to 1 unit change in x.

    • unit depends on data
  • NOTE:

    • Model is only valid for the range of X values used to estimate it.

    • Using a model to estimate a value outside of this range is referred to as extrapolation and this estimate is invalid.


Specifying the Model in R

hp_mod <- lm(mpg_h ~ hp, data=gt_cars)
hp_mod$coefficients
(Intercept)          hp 
33.86410831 -0.02241685 

\[\hat{y} = 33.8641 - 0.022417x\]

💥 Lecture 24 In-class Exercises - Q5-Q6 💥

Regression Model:

\[\hat{y} = 33.8641 - 0.022417x\]

Question 5. Based on this model, if Horsepower (x) is increased by 1, what is the change in Highway MPG?

  • Round answer to six decimal places



Question 6. Based on this model, if Horsepower (x) is increased by 20 (which is more realistic), what is the change in Highway MPG?

  • Round answer to 3 decimal places.

💥 Lecture 24 In-class Exercises - Q7-Q8 💥

Regression Model:

\[\hat{y} = 33.8641 - 0.022417x\]


Question 7. If HP is 600, what is the estimated Highway MPG?


Question 8. What is the residual for the 2016 Aston Martin Vantage


  • Follow up Question (not on Point Solutions): Does the intercept have a real-world interpretation in this model.