Multiple regression is a widely utilized method due to its relatively straightforward nature and power of fitting linear relationships. The concepts explored in a previous post on simple regression apply in the case of multiple regression as simple regression is a special case of the linear regression model (Winner, 2009, pp. 142).

In this example, multiple regression with two predictor variables will be explored using the delivery dataset from the robustbase package. The calculations of multiple regression will also be investigated using matrix algebra to verify the results and gain a deeper understanding of how multiple regression is performed. Multiple regression models with three or more predictor variables are much more complicated to calculate manually and is best left to a computer. However, the case of two predictor variables is relatively straightforward and provides a view into how a multiple regression model is fit.

A regression model with two predictor variables \(x_1\) and \(x_2\) has the form:

\[ Y_i = \beta_0 + \beta_1 x_{1} + \beta_2 x_{2} + \epsilon_i \]

The formulation of the model will be examined more closely later in the post.

Getting Started

First, load the packages that will be needed and the delivery dataset. The gridExtra package is handy for combining plots created by ggplot2.

library(ggplot2)
library(gridExtra)
library(robustbase)

## Warning: package 'robustbase' was built under R version 3.3.1

data(delivery)

According to the dataset’s documentation, “The aim is to explain the time required to service a vending machine (Y) using the number of products stocked (X1) and the distance walked by the route driver (X2).”

Plot two scatterplots of the predictor variables with the response variable.

del.nprod <- ggplot(delivery, aes(n.prod, delTime)) + 
  geom_point(color='#2980B9', size = 3)
del.dist <- ggplot(delivery, aes(distance, delTime)) + 
  geom_point(color='#2980B9', size = 3) + labs(y=NULL)
grid.arrange(del.nprod, del.dist, ncol = 2)

It does appear there is a linear relationship between the number of products stocked, and the distance walked on the total delivery time. This relationship would make intuitive sense as one would assume that a delivery would take longer as the number of products stocked, or distance walked to the site increases. The observation in the far top right of both scatterplots is potentially an outlier but treating it will be ignored in this example as it is outside its intended scope.

Fitting a Multiple Regression Model

The multiple regression model is fitted with the lm() function. The . tells the function to use all of the variables in the dataset other than the response variable to the left of the ~ sign. The same model would be fit if the call was lm(delTime ~ n.prod + distance, data = delivery).

del.lm <- lm(delTime ~ ., data = delivery)
summary(del.lm)

## 
## Call:
## lm(formula = delTime ~ ., data = delivery)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7880 -0.6629  0.4364  1.1566  7.4197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.341231   1.096730   2.135 0.044170 *  
## n.prod      1.615907   0.170735   9.464 3.25e-09 ***
## distance    0.014385   0.003613   3.981 0.000631 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 22 degrees of freedom
## Multiple R-squared:  0.9596, Adjusted R-squared:  0.9559 
## F-statistic: 261.2 on 2 and 22 DF,  p-value: 4.687e-16

The summary() function shows both coefficients are significant, and thus there is indeed a linear relationship between the total delivery time and the two predictor variables. The coefficients are interpreted as the change in the response \(y\) per unit increase in \(x_1\) when \(x_2\) is held constant and vice-versa. For example, according to the model, the mean delivery time changes 1.62 units for each unit increase in the number of products while the distance walked remains constant. Due to this interpretation, the coefficients can also be called partial regression coefficients. Please note that this is an example analysis, it is recommended to scale the variables in the dataset before fitting a model as the variables are all represented by different units.

Formulating the Multiple Regression Model

Multiple regression is usually performed using matrix algebra. As mentioned previously, the linear regression model with two predictor variables is defined as:

\[ Y_i = \beta_0 + \beta_1 x_{1} + \beta_2 x_{2} + \epsilon_i \]

And is also called a first-order regression model as its predictor variables are linear. The model can be extended to the general case with \(p\) predictor variables.

\[ Y_i = \beta_0 + \beta_1 x_{1} + \beta_2 x_2 + \dots + \beta_p x_p + \epsilon \]

Which can also be written as:

\[ Y_i = \beta_0 + \sum^p_{k=1} \beta_k x_k + \epsilon \]

\(\beta_0, \beta_1, \dots, \beta_p\) are the coefficients, also known as the parameters of the model.
\(x_1, \dots, x_2\) are known constants
is normally distributed with mean 0 and variance \(\sigma^2\)

In matrix terms, the multiple regression model can be expressed as:

\(\underset{n x 1}{Y} = \begin{bmatrix} \begin{aligned} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{aligned} \end{bmatrix}\)

\(\underset{n x p}{X} = \begin{bmatrix} 1 & x_{11} & x_{12} & \dots & x_{1,p} \\ 1 & x_{21} & x_{22} & \dots & x_{2,p} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & x_{n2} & \dots & x_{n,p} \end{bmatrix}\)

\(\underset{p x 1}{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{bmatrix}\)

\(\underset{n x 1}{\epsilon} = \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix}\)

Which can also be written as:

\[ Y = X\beta + \epsilon \]

Estimation of the Regression Parameters

As in the case of simple linear regression, the goal of least squares estimation is to find values for the parameters \(\beta_0, \beta_1, \dots, \beta_p\) that minimize the standard error. This value will be denoted as \(Q\) and is defined as:

\[ Q = \sum^n_{i=1} (Y_i - \beta_0 - \beta_1 x_1 - \dots - \beta_p x_p)^2 \]

The vector (a matrix with one column) of the estimated parameters are denoted as:

\(\underset{p x 1}{b} = \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_p \end{bmatrix}\)

Thus the least squares equations for the multiple regression model can be represented as:

\[ X'Xb = X'Y \]

With estimators:

\[ \underset{2 x 1}{b} = \underset{2 x 2}{(X'X)^{-1}} \underset{2 x 1}{(X'X)}Y \]

The fitted response can also be represented as a vector denoted \(\hat{Y}\) with values \(\hat{Y}_i\) and residuals \(\epsilon_i = Y_i - \hat{Y}_i\).

\(\underset{n x 1}{\hat{Y}} = \begin{bmatrix} \hat{Y}_1 \\ \hat{Y}_2 \\ \vdots \\ \hat{Y}_n \end{bmatrix}\)

\(\underset{n x 1}{\epsilon} = \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix}\)

The fitted values can also be written as:

\[ \underset{n x 1}{\hat{Y}} = Xb \]

With residuals:

\[ \underset{n x 1}{\epsilon} = Y - \hat{Y} = Y - Yb \]

The hat matrix is defined as:

\[ H = X(X'X)^{-1} X' \]

Therefore, the fitted values vector \(\hat{Y}\) can be expressed in terms of the hat matrix:

\[ \underset{n x 1}{\hat{Y}} = HY \]

With the vector of residuals then:

\[ \underset{n x 1}{\epsilon} = (I - H)Y \]

Analysis of Variance

Briefly touching on ANOVA in the multiple regression setting, the sum of squares and mean sum of squares from each source of variation are defined in matrix terms as:

\[ SSE = Y'Y - b'X'Y = Y'(I - H)Y\] \[ SST = Y'Y - \frac{\sum (Y)^2}{n} \] \[ SSR = SST - SSE \]

\[ MSE = \frac{SSE}{n - p} \] \[ MSR = \frac{SSR}{p - 1} \]

Testing for Regression Relation

The F-statistic for regression relationship between the response variable and the predictor variables remains the same from simple linear regression:

\[ F = \frac{MSR}{MSE} \]

The null hypothesis of no regression relation between any variables is rejected if the critical \(F\) value exceeds the F value at a given level of confidence with \(p - 1\) and \(n - p\) degrees of freedom.

\[ F > F_{\alpha, p - 1, n - p} \]

\(R^2\), Coefficient of Multiple Determination

In the multiple regression setting, \(R^2\) is designated the coefficient of multiple determination and is defined as:

\[ R^2 = 1 - \frac{SSE}{SST} \]

As in the case of simple linear regression, \(R^2\) measures the proportionate reduction of total variance in the response variable associated with the predictor variables in the model. \(R^2\) can only increase as more predictor variables are added to the model. \(R_a^2\), the adjusted coefficient of multiple determination corrects the problem of artificially inflating \(R^2\) by dividing the each sum of squares by its degrees of freedom.

\[ R_a^2 = 1 - \left(\frac{n - 1}{n - p}\right)\frac{SSE}{SST} \]

Verifying the Multiple Regression Fit

Phew! Apologies for all of the definitions. A good portion of the above should be familiar from the simple linear regression setting. To summarize the above, a first order regression model with two predictor variables would appear in matrix form as:

\(Y = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix}\)

\(X = \begin{bmatrix} 1 & x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ \vdots & \vdots & \vdots \\ 1 & x_{1n} & x_{2n} \end{bmatrix}\)

\(\epsilon = \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix}\)

To following steps are a quick summary of how to calculate the regression parameters for a two predictor variable model.

Find crossproduct \(X'X\)
Find crossproduct \(X'Y\)
Compute inverse of crossproduct \(X'X\), \((X'X)^{-1}\)
The \(b\) parameters can then be found as \(b = (X'X)^{-1} X'Y\)

The fitted response values are thus:

\[\hat{Y} = Xb \]

The following function implements the above definitions and equations to fit a first order multiple regression model with two predictor variables.

two.predictor.regression <- function(x, y) {
  y <- as.matrix(y)
  if (ncol(y) > 1) {
    stop('y must be a vector')
  }
  
  x <- data.frame(x)
  if (ncol(x) != 2) {
    stop('x must have only two predictor variables')
  }
  
  x$intercept <- 1 # Add the intercept term to the X matrix
  col <- grep('intercept', names(x))
  x <- as.matrix(x[, c(col, (1:ncol(x))[-col])]) # Put the intercept term first in the X matrix
  n <- length(y)
  if (n != length(x[,1])) {
    stop('x and y must be the same length')
  }
  
  # Degrees of freedom. As only two predictor variables and one response variable are considered, the degrees of freedom will always be 3
  df <- 3 
  x.crossprod <- crossprod(x) # Calculate crossproduct X'X
  y.crossprod <- crossprod(y) # Calculate crossproduct Y'Y
  xy.crossprod <- crossprod(x, y) # crossproduct X'Y

  x.inv <- solve(x.crossprod) # Find inverse of X'X
  b <- x.inv %*% xy.crossprod # Get b parameters by multiplying the inverse of X'X and X'Y
  
  fitted.y <- x %*% b # Find the fitted response values
  resid.mat <- y - fitted.y # Find the residual values

  min.resid <- min(resid.mat)
  max.resid <- max(resid.mat)
  med.resid <- median(resid.mat)
  Q1.resid <- quantile(resid.mat, .25)
  Q3.resid <- quantile(resid.mat, .75)
  
  resid.df <- data.frame(round(cbind(min.resid, Q1.resid, med.resid, Q3.resid, max.resid), 3), row.names = '')
  colnames(resid.df) <- c('Min', '1Q', 'Median', '3Q', 'Max')
  
  sse <- y.crossprod - crossprod(b, xy.crossprod) # SSE
  mse <- sse / (n - df) # MSE
  
  sst <- y.crossprod - sum(y)^2 / n # SST
  
  ssr <- sst - sse # SSR
  msr <- ssr / 2 # MSR
  
  f.stat <- msr / mse # Critical F-statistic
  p.val <- pf(f.stat, df1 = 2, df2 = n - df, lower.tail = FALSE) # Computed p-value
  R2 <- 1 - sse / sst # R-squared
  R2.adj <- 1 - ((n - 1) / (n - df)) * sse / sst # adjusted R-squared
  
  s2b <- mse[1] * x.inv # Estimate variance-covariance matrix of regression parameters
  
  # Get the parameter standard errors 
  std.err <- suppressWarnings(matrix(sqrt(s2b)[!is.na(sqrt(s2b))], nrow = 3, ncol=1))
  t.values <- b / std.err # Critical t-values of regression parameters
  p.values <- 2 * pt(t.values, df = n - df, lower.tail = FALSE)
  res.std.err <- sqrt(mse)
  
  # The following collects and returns all the results found above into a list
  coef <- data.frame(cbind(round(b,3), round(std.err,3), round(t.values,3), format(p.values, digits = 3)))
  colnames(coef) <- c('Estimate', 'Std. Error', 't value', 'Pr(>|t|)')
  
  resids <- paste('Residual standard error:', round(res.std.err, 3), 'on', length(delivery$n.prod - df), ' degrees of freedom')
R.squared <- paste('Multiple R-squared: ', round(R2,4), 'Adjusted R-squared: ', round(R2.adj,4))
  f.stat <- paste('F-statistic: ', round(f.stat, 1), 'on 2 and', length(delivery$n.prod - df), ' DF')
  p <- paste('p-value: ', format(p.val, digits = 4, scientific = TRUE))
  
  res <- list('Residuals'=resid.df, 'Coefficients'=coef, resids, R.squared, f.stat, p)
  return(res)
}

two.predictor.regression(delivery[,1:2], delivery[,3])

## $Residuals
##     Min     1Q Median    3Q  Max
##  -5.788 -0.663  0.436 1.157 7.42
## 
## $Coefficients
##           Estimate Std. Error t value Pr(>|t|)
## intercept    2.341      1.097   2.135 4.42e-02
## n.prod       1.616      0.171   9.464 3.25e-09
## distance     0.014      0.004   3.981 6.31e-04
## 
## [[3]]
## [1] "Residual standard error: 3.259 on 25  degrees of freedom"
## 
## [[4]]
## [1] "Multiple R-squared:  0.9596 Adjusted R-squared:  0.9559"
## 
## [[5]]
## [1] "F-statistic:  261.2 on 2 and 25  DF"
## 
## [[6]]
## [1] "p-value:  4.687e-16"

summary(del.lm)

## 
## Call:
## lm(formula = delTime ~ ., data = delivery)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7880 -0.6629  0.4364  1.1566  7.4197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.341231   1.096730   2.135 0.044170 *  
## n.prod      1.615907   0.170735   9.464 3.25e-09 ***
## distance    0.014385   0.003613   3.981 0.000631 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 22 degrees of freedom
## Multiple R-squared:  0.9596, Adjusted R-squared:  0.9559 
## F-statistic: 261.2 on 2 and 22 DF,  p-value: 4.687e-16

The function outputs the same results as the lm() function used earlier.

Summary

This post examined multiple regression in-depth in the two predictor variable setting to get a deeper understanding of the method. Multiple regression is such a frequently used tool that it pays to have a fundamental knowledge of the reasoning and assumptions behind it. Fortunately, many of the concepts from simple linear regression apply to multiple regression.

References

Kutner, M. H., Nachtsheim, C. J., Neter, J., Li, W., & Wasserman, W. (2004). Applied linear statistical models (5th ed.). Boston, MA: McGraw-Hill Higher Education.

5.4 - A matrix formulation of the multiple regression model. (2016). Retrieved from Regression Methods, https://onlinecourses.science.psu.edu/stat501/node/382

Brannick, M. (2040). Regression with Two independent variables. Retrieved from Regression with Two Independent Variables, http://faculty.cas.usf.edu/mbrannick/regression/Reg2IV.html

Winner, L. (2009). Applied Statistical Methods. University of Florida.