- Using the built-in mtcars dataset in R
- Goal: model milesPerGallon (mpg) as a function of horsePower (hp)
2025-11-13
\[ Y = \beta_0 + \beta_1 X + \varepsilon, \]
where:
mtcars DatasetWe will model:
\[ \text{mpg} = \beta_0 + \beta_1 \cdot \text{hp} + \varepsilon. \]
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mpg) changes with horsePower (hp)mpg vs `hpggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point() +
labs(
title = "Scatter Plot of milesPerGallon vs horsePower",
x = "horsePower (hp)",
y = "milesPerGallon (mpg)"
)
We fit the model
\[ \text{mpg} = \beta_0 + \beta_1 \cdot \text{hp} + \varepsilon \]
in R using lm():
model <- lm(mpg ~ hp, data = mtcars) coef(summary(model)) # smaller table than full summary()
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.09886054 1.6339210 18.421246 6.642736e-18 ## hp -0.06822828 0.0101193 -6.742389 1.787835e-07
Suppose the fitted model is:
\[ \hat{\text{mpg}} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{hp}. \]
We can also make predictions for a car with horsepower x:
\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x. \]
mtcars$fit <- fitted(model)
mtcars$resid <- resid(model)
ggplot(mtcars, aes(x = fit, y = resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
title = "Residuals vs Fitted",
x = "Fitted milesPerGallon",
y = "Residuals"
)
mpg vs hp), but interactivep <- ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Interactive milesPerGallon vs horsePower",
x = "horsePower (hp)",
y = "milesPerGallon (mpg)"
)
ggplotly(p)
## `geom_smooth()` using formula = 'y ~ x'