- What simple linear regression is
- Model assumptions
- How we find the line (OLS)
- Visualizing fit & residuals
- Testing if the slope matters
- Key takeaways
We assume a straight-line relationship between a predictor \(x\) and a response \(y\): \[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]
Where: - \(\beta_0\): intercept (value of y when x = 0) - \(\beta_1\): slope (change in y per 1 unit x) - \(\varepsilon_i\): random error (unexplained stuff)
The best-fit line minimizes the squared differences between actual and predicted values.
For linear regression to be valid, we assume:
If these don’t hold, results may be off.
library(tidyverse) library(ggplot2) data(mtcars) df <- mtcars %>% transmute(mpg = mpg, wt = wt) fit <- lm(mpg ~ wt, data = df) summary(fit)
## ## Call: ## lm(formula = mpg ~ wt, data = df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
The R² value tells how much of the variation in y is explained by x: \[ R^2 = 1 - \frac{SSE}{SST} \]
Here, R² ≈ 0.75 — about 75% of MPG variation is explained by car weight.
We test whether \(\beta_1 = 0\): \[ H_0: \beta_1 = 0 \quad vs \quad H_a: \beta_1 \neq 0 \]
If the p-value < 0.05 → slope is statistically significant. This means weight and MPG are actually related, not just random.
These quick checks help confirm if our model’s okay.