- Regression models the relationship between a predictor and an outcome.
- Example: Predict fuel efficiency (mpg) from engine size (liters) and optionally vehicle weight.
2025-09-16
We assume a linear model: \[ Y = \beta_0 + \beta_1 X + \epsilon, \] where \(Y\) is mpg, \(X\) is engine size, \(\beta_0\) is the intercept, \(\beta_1\) the slope, and \(\epsilon\) the random error.
library(ggplot2) library(plotly) set.seed(123) n <- 50 engine_size <- runif(n, 1.0, 5.0) weight <- runif(n, 2000, 4000) mpg <- 40 - 3*engine_size - 0.002*weight + rnorm(n, 0, 2) df <- data.frame(engine_size, weight, mpg) # Fit models fit1 <- lm(mpg ~ engine_size, data = df) fit2 <- lm(mpg ~ engine_size + weight, data = df)
ggplot(df, aes(engine_size, mpg)) + geom_point() + labs(title = "Engine Size vs MPG", x = "Engine Size (L)", y = "MPG")
ggplot(df, aes(engine_size, mpg)) + geom_point(alpha = 0.8) + geom_smooth(method = "lm", se = FALSE) + labs(title = "Linear Regression Fit", x = "Engine Size (L)", y = "MPG")
## `geom_smooth()` using formula = 'y ~ x'
coef(fit1)
## (Intercept) engine_size ## 35.031109 -3.209846
summary(fit1)$r.squared
## [1] 0.7647601
plot_ly( df, x = ~engine_size, y = ~weight, z = ~mpg, type = "scatter3d", mode = "markers" )
Let \(\hat{\beta}_0\) and \(\hat{\beta}_1\) be the OLS estimates. The fitted line is: \[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X. \] For this dataset (rounded), \[ \hat{Y} \approx \hat{\beta}_0 + \hat{\beta}_1 X. \]
# Fit and plot fit1 <- lm(mpg ~ engine_size, data = df) ggplot(df, aes(engine_size, mpg)) + geom_point() + geom_smooth(method = "lm", se = FALSE) # 3D plot library(plotly) plot_ly(df, x=~engine_size, y=~weight, z=~mpg, type="scatter3d", mode="markers")