What Is Multiple Linear Regression?

Multiple linear regression models a response variable \(Y\) as a linear function of two or more predictors \(X_1, X_2, \dots, X_k\).

It extends simple linear regression (one predictor) by letting several variables jointly explain the outcome.

  • We will use the built-in mtcars dataset.
  • Response: miles per gallon (mpg)
  • Predictors: weight (wt) and horsepower (hp)

The goal: understand how fuel efficiency depends on a car’s weight and power simultaneously.

The Model

For \(k\) predictors, the population model is

\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_k X_{ik} + \varepsilon_i \]

where the errors are assumed independent with

\[ \varepsilon_i \sim \mathcal{N}(0, \sigma^2). \]

In our two-predictor case:

\[ \text{mpg}_i = \beta_0 + \beta_1\,\text{wt}_i + \beta_2\,\text{hp}_i + \varepsilon_i. \]

Estimating the Coefficients

The coefficients are estimated by least squares, minimizing the residual sum of squares. In matrix form, with design matrix \(\mathbf{X}\) and response vector \(\mathbf{y}\):

\[ \hat{\boldsymbol{\beta}} = \left( \mathbf{X}^{\top}\mathbf{X} \right)^{-1} \mathbf{X}^{\top}\mathbf{y}. \]

Each \(\hat{\beta}_j\) measures the expected change in \(Y\) for a one-unit increase in \(X_j\), holding the other predictors fixed:

\[ \hat{\beta}_j = \frac{\partial \,\widehat{\mathbb{E}}[Y]}{\partial X_j}. \]

Fitting the Model in R

model <- lm(mpg ~ wt + hp, data = mtcars)
coef(model)
## (Intercept)          wt          hp 
## 37.22727012 -3.87783074 -0.03177295

The fitted regression equation is

\[ \widehat{\text{mpg}} = \hat{\beta}_0 + \hat{\beta}_1\,\text{wt} + \hat{\beta}_2\,\text{hp}. \]

Both weight and horsepower carry negative coefficients: heavier and more powerful cars tend to consume more fuel, lowering mpg.

Exploring the Data (ggplot #1)

Heavier cars get fewer miles per gallon, and the darkest (highest-power) points sit lowest.

Checking the Fit (ggplot #2)

Residuals scattered randomly around zero suggest the linear model is reasonable.

The Regression Plane (3D plotly)

Summary

  • Multiple linear regression explains a response using several predictors at once.
  • Coefficients are found via least squares: \(\hat{\boldsymbol{\beta}} = (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{y}\).
  • For mtcars, both weight and horsepower reduce predicted mpg.
  • The fitted model is a plane in 3D space, visualized above with plotly.
  • Residual diagnostics support the linear assumption.

Thanks for watching!