This is a short demo to try to show that using colinear explanatory variables can lead to less significant tests of an individual coefficient. There are other issues with colienarity that can mess up the estimates of the coefficients, but those are not explored here.
## Generate data
set.seed(101)
x1 <- rnorm(100)
set.seed(1002)
x2 <- x1 + rnorm(100, 2, 0.5)
set.seed(10003)
x3 <- x1 + rnorm(100, -2, 0.4)
set.seed(100004)
y <- sapply(x1, function(z) {
rnorm(1, mean = z, sd = 0.3)
})
## Merge the data
data <- data.frame(y, x1, x2, x3)
## Load graphic library
library(car)
## Loading required package: MASS
## Loading required package: nnet
## Make a scatterplot matrix of the data
scatterplotMatrix(~y + x1 + x2 + x3, data = data, smooth = FALSE, col = c("blue",
"", "orange"), diag = "none")
## Simple Linear Regression
f1 <- lm(y ~ x1, data = data)
summary(f1)
##
## Call:
## lm(formula = y ~ x1, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7144 -0.1771 -0.0325 0.1934 0.6298
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0114 0.0296 0.39 0.7
## x1 1.0395 0.0319 32.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.296 on 98 degrees of freedom
## Multiple R-squared: 0.916, Adjusted R-squared: 0.915
## F-statistic: 1.06e+03 on 1 and 98 DF, p-value: <2e-16
## Multiple linear regression
f2 <- lm(y ~ x1 + x2 + x3, data = data)
summary(f2)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7212 -0.1858 -0.0262 0.2089 0.6534
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.13407 0.19578 0.68 0.50
## x1 1.09177 0.11482 9.51 1.7e-15 ***
## x2 -0.05659 0.06608 -0.86 0.39
## x3 0.00259 0.07613 0.03 0.97
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.298 on 96 degrees of freedom
## Multiple R-squared: 0.916, Adjusted R-squared: 0.914
## F-statistic: 350 on 3 and 96 DF, p-value: <2e-16
## Correlation matrix
cor(data)
## y x1 x2 x3
## y 1.0000 0.9569 0.8439 0.8926
## x1 0.9569 1.0000 0.8939 0.9314
## x2 0.8439 0.8939 1.0000 0.8142
## x3 0.8926 0.9314 0.8142 1.0000
Note how the test for \( beta_1 = 0 \) is less significant in the MLR than in the SLR (it has higher SE).
## Added variable plot
avPlots(f2)
Both x2 and x3 dont explain much once the others have been taken into account.