- Topic: Simple Linear Regression
- What you’ll see: intuition, math, examples, plots (ggplot + plotly), R code
Simple linear regression models a response variable \(Y\) as a linear function of a single predictor \(X\) plus random error:
\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \quad \varepsilon_i \sim N(0,\sigma^2) \]
(assumptions shown on the next slide)
The ordinary least squares estimates minimize the residual sum of squares:
\[ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2}, \qquad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. \]
We’ll use the built-in mtcars
dataset to predict mpg
from wt
(weight).
## Warning: `line.width` does not currently support multiple values. ## Warning: `line.width` does not currently support multiple values. ## Warning: `line.width` does not currently support multiple values.
# Fit simple linear regression: mpg ~ wt fit <- lm(mpg ~ wt, data=mtcars) summary(fit)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Under standard assumptions:
\[ \hat{\beta}_1 \pm t_{n-2, 1-\alpha/2} \cdot SE(\hat{\beta}_1), \]
where \(SE(\hat{\beta}_1)=\sqrt{\widehat{\sigma}^2/\sum (x_i-\bar{x})^2}\).
new <- data.frame(wt = c(2.0, 3.0)) predict(fit, new, interval='confidence') # mean prediction
## fit lwr upr ## 1 26.59618 24.82389 28.36848 ## 2 21.25171 20.12444 22.37899
predict(fit, new, interval='prediction') # individual prediction
## fit lwr upr ## 1 26.59618 20.12811 33.06425 ## 2 21.25171 14.92987 27.57355