Under the hood of Multiple Regression

Time to dive in!

Let’s start by attaching the package “mtcars”. We will use some data from here as our dataset. We can see mtcars is a dataset composed of various information about cars including measures such as: mpg (miles per gallon), hp (horsepower), cyl (cylinders) etc …

data("mtcars")

head(mtcars,5)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

Let’s select 3 of these variables: mpg, hp, and disp (displacement). Here, we will be predicting mpg (y) from hp (x1) and disp (x2). At a quick glance, we can see that mpg seems to be inversely related to both hp and disp.

library(GGally)

y<-mtcars[,"mpg"]

x<-mtcars[,c("hp","disp")]

ggpairs(cbind(y,x))

The next step is to finish our x matrix which currently consists of our 2 predictor variables, hp and disp. To do this, we simply need to add a column (b0) representing the intercept. We can accomplish this by padding the current x matrix with a vector of ones.

x0<-rep(1,nrow(x))

x<-as.matrix(cbind(x0,x))

head(x,5)

##                   x0  hp disp
## Mazda RX4          1 110  160
## Mazda RX4 Wag      1 110  160
## Datsun 710         1  93  108
## Hornet 4 Drive     1 110  258
## Hornet Sportabout  1 175  360

Now, the fun begins. We’ll first start by finding our beta coefficients. In order to do this, we need to take the inverse of X transpose X and multiply that by X tranpose Y. Put more readably: $b = (X^TX)^{-1}X^TY$. Below, we’ll begin by creating our $X^TX$ and $X^TY$ matrices.

XtX<-t(x)%*%x

XtY<-t(x)%*%y

It turns out the X’X matrix has some neat properties. On the diagonal you’ll find the sum squared values and on the off-diagonal you’ll find the cross-products. As an example, we will calculate the sum of squares for hp and the cross product between hp and disp. You can see these values also appear in the appropriate spot in the X’X matrix.

print(XtX)

##          x0      hp      disp
## x0     32.0    4694    7383.1
## hp   4694.0  834278 1291364.4
## disp 7383.1 1291364 2179627.5

sum(mtcars$hp*mtcars$hp)

## [1] 834278

sum(mtcars$hp*mtcars$disp)

## [1] 1291364

The X’Y matrix will take a bit of a different form as it will always be a vector of length n (number of covariates) + 1 (intercept), but you can still find the cross-products as seen below with respect to mpg and hp.

print(XtY)

##          [,1]
## x0      642.9
## hp    84362.7
## disp 128705.1

sum(mtcars$hp*mtcars$mpg)

## [1] 84362.7

Alright, time to calculate some regression coefficients. As mentioned earlier, $b = (X^TX)^{-1}X^TY$. At this point, we have everything ready to plug into that formula, so we will go ahead and do so. Just to ensure that I’m not lying to you, we will compare out resulting calculation to that of lm.

betas<-solve(XtX)%*%XtY
betas

##             [,1]
## x0   30.73590425
## hp   -0.02484008
## disp -0.03034628

summary(lm(mpg ~ hp + disp,data=mtcars))$coefficients[1:3]

## [1] 30.73590425 -0.02484008 -0.03034628

Now that we have our matrix we are going to compute something a little funky known as the hat matrix or projection matrix. This matrix maps the vector of our response variable (mpg) to a vector of predicted values. In this matrix, for every observation, is a description of how each response value influences the fitted value (these are known as leverages and found on the diagonal of the hat matrix). To find this hat matrix we need to use the following formula: $H = X(X^TX)^{-1}X^T$. Once we have calclated the hat matrix, we multiple it by y to find our predicted values. We will do so, and compare our results to lm below:

hat_matrix<- x%*%solve(XtX)%*%t(x)

predicted<-hat_matrix%*%y

head(predicted,5)

##                       [,1]
## Mazda RX4         23.14809
## Mazda RX4 Wag     23.14809
## Datsun 710        25.14838
## Hornet 4 Drive    20.17416
## Hornet Sportabout 15.46423

lm(mpg ~ hp + disp,data=mtcars)$fitted.values[1:5]

##         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
##          23.14809          23.14809          25.14838          20.17416 
## Hornet Sportabout 
##          15.46423

all.equal(as.numeric(predicted),as.numeric(lm(mpg ~ hp + disp,data=mtcars)$fitted.values))

## [1] TRUE

Now that we have our predicted values we can easily find our residuals and again show that they are equal to the output produced by “lm”

residuals<-(y-predicted)
residuals[1:5]

## [1] -2.148091 -2.148091 -2.348379  1.225844  3.235770

lm(mpg ~ hp + disp,data=mtcars)$residuals[1:5]

##         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
##         -2.148091         -2.148091         -2.348379          1.225844 
## Hornet Sportabout 
##          3.235770

all.equal(as.numeric(residuals),as.numeric(lm(mpg ~ hp + disp,data=mtcars)$residuals))

## [1] TRUE

We can now use the residuals we’ve calculated to help us find the standard errors of our beta estimates. We can accomplish this by first finding the variance-covariance matrix between the regression coefficients. The formula to do so is: $ var() = {^2}(XTX)^{-1}$. To proceed we need to estimate $\sigma^2$. To estimate $\hat{\sigma^2}$ we can use the following formula $\hat{\sigma^2} = \dfrac{1}{N-p}(r^Tr)$ where r is our vector of residuals, N = number of observations and p = number of columns in the design matrix (inctercept included). Once we have an estimate of variance, we can plug it in to the first formula to obtain our variance co-variance matrix. Once we have that, we can take the square root of its diagnol to obtain our standard errors. As usual, we will compare our calculations to “lm”.

var_est <- (1/(nrow(x)-ncol(x)))*t(residuals)%*%residuals
ses<-sqrt(diag(as.numeric(var_est)*solve(XtX)))
print(ses)

##          x0          hp        disp 
## 1.331566129 0.013385499 0.007404856

summary(lm(mpg ~ hp + disp,data=mtcars))$coefficients[4:6]

## [1] 1.331566129 0.013385499 0.007404856

Now, we can finish our calculations by obtain t and p - values. We can even calculate a few other things lm gives you such as r2 and adjusted r2

tvals<-betas/ses
print(tvals)

##           [,1]
## x0   23.082522
## hp   -1.855746
## disp -4.098159

pvals<-sapply(tvals, function (p) 2*pt(abs(-p), nrow(x)-ncol(x), lower=FALSE))  
print(pvals)

## [1] 3.262507e-20 7.367905e-02 3.062678e-04

r2<-sum((predicted-mean(y))^2)/sum((y-mean(y))^2)
print(r2)

## [1] 0.7482402

adjusted_r2<-1-(((1-r2)*(nrow(x)-1))/(nrow(x)-(ncol(x)-1)-1))
print(adjusted_r2)

## [1] 0.7308774

Finally, we can combine everything in one spot and compare it to the summary output of lm

round(data.frame(b=betas,se=ses,t_value = tvals,p_value=pvals,r2=c(r2,rep(NA,2)),adj_r2=c(adjusted_r2,rep(NA,2))),6)

##              b       se   t_value  p_value      r2   adj_r2
## x0   30.735904 1.331566 23.082522 0.000000 0.74824 0.730877
## hp   -0.024840 0.013385 -1.855746 0.073679      NA       NA
## disp -0.030346 0.007405 -4.098159 0.000306      NA       NA

summary(lm(mpg ~ hp+disp,data=mtcars))

## 
## Call:
## lm(formula = mpg ~ hp + disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7945 -2.3036 -0.8246  1.8582  6.9363 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.735904   1.331566  23.083  < 2e-16 ***
## hp          -0.024840   0.013385  -1.856 0.073679 .  
## disp        -0.030346   0.007405  -4.098 0.000306 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.127 on 29 degrees of freedom
## Multiple R-squared:  0.7482, Adjusted R-squared:  0.7309 
## F-statistic: 43.09 on 2 and 29 DF,  p-value: 2.062e-09

What about incorportating an interaction? To do so is pretty simple, we just need to create the term in our design matrix and repeat the following steps. To make this a little cleaner, let’s create a function that takes a y value and design matrix as input and returns something similiar to the summary statement from lm.

my_lm<-function(y,x){
  
XtX<-t(x)%*%x

XtY<-t(x)%*%y 

betas<-solve(XtX)%*%XtY  
  
hat_matrix<- x%*%solve(XtX)%*%t(x)

predicted<-hat_matrix%*%y

residuals<-(y-predicted)

var_est <- (1/(nrow(x)-ncol(x)))*t(residuals)%*%residuals

ses<-sqrt(diag(as.numeric(var_est)*solve(XtX)))

tvals<-betas/ses

pvals<-sapply(tvals, function (p) 2*pt(abs(-p), nrow(x)-ncol(x), lower=FALSE))  

r2<-sum((predicted-mean(y))^2)/sum((y-mean(y))^2)

adjusted_r2<-1-(((1-r2)*(nrow(x)-1))/(nrow(x)-(ncol(x)-1)-1))

return(round(data.frame(b=betas,se=ses,t_value = tvals,p_value=pvals,r2=c(r2,rep(NA,ncol(x)-1)),adj_r2=c(adjusted_r2,rep(NA,ncol(x)-1))),6))
}

x<-cbind(rep(1,nrow(mtcars)),mtcars[,c("hp","disp")],mtcars[,"hp"]*mtcars[,"disp"])
x<-as.matrix(x)

y<-mtcars[,"mpg"]

my_lm(y,x)

##                                           b       se   t_value  p_value
## rep(1, nrow(mtcars))              39.674263 2.914172 13.614248 0.000000
## hp                                -0.097894 0.024745 -3.956175 0.000473
## disp                              -0.073373 0.014387 -5.100113 0.000021
## mtcars[, "hp"] * mtcars[, "disp"]  0.000290 0.000087  3.336151 0.002407
##                                         r2   adj_r2
## rep(1, nrow(mtcars))              0.819849 0.800548
## hp                                      NA       NA
## disp                                    NA       NA
## mtcars[, "hp"] * mtcars[, "disp"]       NA       NA

Under the hood of Multiple Regression

My first experience with regression and matrix algebra

Time to dive in!