What is Linear Regression?

Linear regression models the relationship between a quantitative response variable and one or more predictors.

We use the mtcars dataset: - Response: mpg - Predictors: wt (weight), hp (horsepower)

The Regression Model (Math)

The multiple linear regression model:

\[ Y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \varepsilon_i \]

For our example:

\[ mpg = \beta_0 + \beta_1(wt) + \beta_2(hp) + \varepsilon \]

Estimation and Hypothesis Testing (Math)

Least squares estimator:

\[ \hat{\beta} = (X^T X)^{-1} X^T y \]

Test statistic for significance:

\[ t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)} \]

MPG vs Weight (ggplot #1)

Residuals vs Fitted (ggplot #2)

Regression Output

## # A tibble: 3 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)  37.2      1.60        23.3  2.57e-20
## 2 wt           -3.88     0.633       -6.13 1.12e- 6
## 3 hp           -0.0318   0.00903     -3.52 1.45e- 3

3D Plot (plotly)

Example R Code (Displayed)

# Fit a multiple regression model
fit_multiple <- lm(mpg ~ wt + hp, data = mtcars)

# View summary statistics
summary(fit_multiple)

# Create a scatter plot
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  geom_smooth(method = "lm")

Key Takeaways

  • Regression estimates relationships using least squares.
  • Coefficients measure expected change in response.
  • t-tests and p-values determine significance.
  • Visualization improves interpretation.