Simple Linear Regression

2025-11-13

Simple Linear Regression: milesPerGallon & horsePower

Using the built-in mtcars dataset in R
Goal: model milesPerGallon (mpg) as a function of horsePower (hp)

“What is simple linear regression ?”

We model a response variable \(Y\) using one predictor \(X\)
The model has the form

\[ Y = \beta_0 + \beta_1 X + \varepsilon, \]

where:

\(\beta_0\): intercept
\(\beta_1\): slope
\(\varepsilon\): random error term

Example Data: `mtcars` Dataset

15 observations of average height and weight of American women
Variables:
- mpg: miles per gallon (fuel efficiency)
- hp: gross horsepower

We will model:

\[ \text{mpg} = \beta_0 + \beta_1 \cdot \text{hp} + \varepsilon. \]

Exploring the data

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Exploring the Relationship (ggplot #1)

We want to see how milesPerGallon (mpg) changes with horsePower (hp)
Next slide: scatter plot of mpg vs `hp

Scatter Plot of milesPerGallon vs horsePower

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(
    title = "Scatter Plot of milesPerGallon vs horsePower",
    x = "horsePower (hp)",
    y = "milesPerGallon (mpg)"
  )

Fitting the Linear Regression Model (R code)

We fit the model

\[ \text{mpg} = \beta_0 + \beta_1 \cdot \text{hp} + \varepsilon \]

in R using lm():

model <- lm(mpg ~ hp, data = mtcars)
coef(summary(model))  # smaller table than full summary()

##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 30.09886054  1.6339210 18.421246 6.642736e-18
## hp          -0.06822828  0.0101193 -6.742389 1.787835e-07

Interpreting the Coefficients

Suppose the fitted model is:

\[ \hat{\text{mpg}} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{hp}. \]

\(\hat{\beta}_1\): estimated change in milesPerGallon for each additional unit of horsePower
- Example: If \(\hat{\beta}_1 = -0.05\), then for each extra 1 hp, predicted mpg decreases by about 0.05.
\(\hat{\beta}_0\): predicted milesPerGallon when horsePower = 0
- Often not meaningful in practice, but needed for the equation.

We can also make predictions for a car with horsepower x:

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x. \]

Residuals vs Fitted Values (ggplot #2)

Residuals show how far each point is from the regression line
Plot of residuals vs fitted values helps check model assumptions

Residuals vs Fitted Values (Plot)

mtcars$fit   <- fitted(model)
mtcars$resid <- resid(model)

ggplot(mtcars, aes(x = fit, y = resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals vs Fitted",
    x = "Fitted milesPerGallon",
    y = "Residuals"
  )

Interactive Plot with plotly

Same relationship (mpg vs hp), but interactive
You can hover over points to see exact values

Interactive milesPerGallon vs horsePower (plotly)

p <- ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Interactive milesPerGallon vs horsePower",
    x = "horsePower (hp)",
    y = "milesPerGallon (mpg)"
  )

ggplotly(p)

## `geom_smooth()` using formula = 'y ~ x'

Conclusion

We used simple linear regression to model mpg as a function of hp
Scatter plot shows a negative relationship between horsepower and fuel efficiency
The slope _0 tells us how mpg changes when horsepower increases by 1 unit
Residual plot helps check whether the linear model is reasonable
Interactive plotly visualization allows us to explore individual points more easily