- Quick intro to Simple Linear Regression
- Data example: car weight vs. fuel efficiency
- Fit the model, test the slope, check assumptions
- Plots and key takeaways
mtcars
dataset.We model \[ Y = \beta_0 + \beta_1 X + \varepsilon \] where: - \(Y\): response (mpg) - \(X\): predictor (weight) - \(\varepsilon\): random error
library(tidyverse) library(broom) library(plotly) data(mtcars) df <- mtcars %>% select(mpg, wt, hp) head(df)
## mpg wt hp ## Mazda RX4 21.0 2.620 110 ## Mazda RX4 Wag 21.0 2.875 110 ## Datsun 710 22.8 2.320 93 ## Hornet 4 Drive 21.4 3.215 110 ## Hornet Sportabout 18.7 3.440 175 ## Valiant 18.1 3.460 105
fit <- lm(mpg ~ wt, data = df)
ggplot(df, aes(x = wt, y = mpg)) + geom_point() + geom_smooth(method = "lm", se = TRUE) + labs(title="Fuel Efficiency vs Weight", x="Weight (1000 lbs)", y="MPG")
## Estimate Std. Error t value Pr(>|t|) ## -5.344472e+00 5.591010e-01 -9.559044e+00 1.293959e-10
Test if slope differs from 0: \[ H_0: \beta_1 = 0, \quad H_a: \beta_1 \neq 0 \] t-statistic: \[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} \]
aug <- augment(fit) ggplot(aug, aes(.fitted, .resid)) + geom_hline(yintercept=0, linetype="dashed") + geom_point() + labs(title="Residuals vs Fitted", x="Fitted", y="Residuals")
plot_ly(df, x=~wt, y=~hp, z=~mpg, type="scatter3d", mode="markers") %>% layout(scene=list( xaxis=list(title="Weight"), yaxis=list(title="Horsepower"), zaxis=list(title="MPG") ))