What Is Simple Linear Regression?

  • Simple linear regression studies the relationship between one predictor and one response.
  • It helps us understand how changes in one variable are associated with changes in another.
  • In this presentation:
    • Predictor: vehicle weight (wt)
    • Response: fuel efficiency (mpg)
  • The goal is to fit a best-fit line to the data.

The Simple Linear Regression Model

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Where:

  • \(Y\) = response variable, or the value we want to predict
  • \(X\) = predictor variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\varepsilon\) = random error

For this example:

\[ mpg = \beta_0 + \beta_1(wt) + \varepsilon \]

Dataset and Example

  • I will use the built-in R dataset mtcars
  • This dataset contains measurements for several automobiles
  • For this example, I focus on:
    • wt = vehicle weight
    • mpg = miles per gallon
  • Question:
    • Can vehicle weight help predict fuel efficiency?

Weight vs. Fuel Efficiency

Predicted Values and Residuals

\[ \hat{y} = b_0 + b_1x \]

\[ e_i = y_i - \hat{y}_i \]

Where:

  • \(\hat{y}\) = predicted value from the regression line
  • \(b_0\) = estimated intercept
  • \(b_1\) = estimated slope
  • \(e_i\) = residual, or prediction error
  • \(y_i\) = actual observed value

For this example: - \(\hat{y}\) is the predicted mpg - the residual shows how far the real mpg is from the predicted mpg.

Residual Plot: Checking Model Fit

  • Residuals are the differences between actual and predicted mpg values
  • Most points are scattered around 0, which is a good sign
  • There is no extremely strong pattern, so a linear model looks reasonable here.

Interactive Scatterplot with Plotly

R Code for the Regression Model

model = lm(mpg ~ wt, data = mtcars)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Fuel Efficiency vs Vehicle Weight",
    x = "Weight (1000 lbs)",
    y = "Miles Per Gallon"
  )