Collinearity and Multiple Linear Regression

This is a short demo to try to show that using colinear explanatory variables can lead to less significant tests of an individual coefficient. There are other issues with colienarity that can mess up the estimates of the coefficients, but those are not explored here.

## Generate data
set.seed(101)
x1 <- rnorm(100)
set.seed(1002)
x2 <- x1 + rnorm(100, 2, 0.5)
set.seed(10003)
x3 <- x1 + rnorm(100, -2, 0.4)
set.seed(100004)
y <- sapply(x1, function(z) {
    rnorm(1, mean = z, sd = 0.3)
})

## Merge the data
data <- data.frame(y, x1, x2, x3)

## Load graphic library
library(car)

## Loading required package: MASS

## Loading required package: nnet


## Make a scatterplot matrix of the data
scatterplotMatrix(~y + x1 + x2 + x3, data = data, smooth = FALSE, col = c("blue", 
    "", "orange"), diag = "none")

plot of chunk matrix


## Simple Linear Regression
f1 <- lm(y ~ x1, data = data)
summary(f1)

## 
## Call:
## lm(formula = y ~ x1, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7144 -0.1771 -0.0325  0.1934  0.6298 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.0114     0.0296    0.39      0.7    
## x1            1.0395     0.0319   32.62   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.296 on 98 degrees of freedom
## Multiple R-squared: 0.916,   Adjusted R-squared: 0.915 
## F-statistic: 1.06e+03 on 1 and 98 DF,  p-value: <2e-16


## Multiple linear regression
f2 <- lm(y ~ x1 + x2 + x3, data = data)
summary(f2)

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7212 -0.1858 -0.0262  0.2089  0.6534 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.13407    0.19578    0.68     0.50    
## x1           1.09177    0.11482    9.51  1.7e-15 ***
## x2          -0.05659    0.06608   -0.86     0.39    
## x3           0.00259    0.07613    0.03     0.97    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.298 on 96 degrees of freedom
## Multiple R-squared: 0.916,   Adjusted R-squared: 0.914 
## F-statistic:  350 on 3 and 96 DF,  p-value: <2e-16


## Correlation matrix
cor(data)

##         y     x1     x2     x3
## y  1.0000 0.9569 0.8439 0.8926
## x1 0.9569 1.0000 0.8939 0.9314
## x2 0.8439 0.8939 1.0000 0.8142
## x3 0.8926 0.9314 0.8142 1.0000

Note how the test for \( beta_1 = 0 \) is less significant in the MLR than in the SLR (it has higher SE).

## Added variable plot
avPlots(f2)

plot of chunk avp

Both x2 and x3 dont explain much once the others have been taken into account.