We will show that residuals in a linear regression can be correlated with an explanatory variable. y is a function of two explanatory variables x1 and x2.
set.seed(0)
x1 <- runif(100)
x2 <- runif(100)
y <- 2 * x1 + x2^2 + 1 #intentionally making the data non-linear to rule out a perfect fit
When the intercept is allowed, the correlations between residuals and explanatory variables will be zero.
df <- as.data.frame(cbind(x1, x2, y))
lm <- lm(y~x1+x2, data=df)
df$y_hat <- predict(lm, df)
df$residuals <- df$y - df$y_hat
cor1 = cor(df$residuals, df$x1)
cor2 = cor(df$residuals, df$x2)
sum_residual = sum(df$residuals)
head(df)
## x1 x2 y y_hat residuals
## 1 0.8966972 0.6049333 3.159339 3.248314 -0.08897483
## 2 0.2655087 0.6547239 1.959681 1.995757 -0.03607655
## 3 0.3721239 0.3531973 1.868996 1.905783 -0.03678651
## 4 0.5728534 0.2702601 2.218747 2.235086 -0.01633843
## 5 0.9082078 0.9926841 3.801837 3.670992 0.13084524
## 6 0.2016819 0.6334933 1.804678 1.842076 -0.03739840
Correlation coefficients between the residuals and x1 and x2 respectively are 2.538167310^{-15} and -5.073593610^{-16}. The sum of all residuals is 4.507505510^{-14}. All of these values are close to 0.0.
Now we will see what happens when we do not allow an intercept in the linear fit.
df <- as.data.frame(cbind(x1, x2, y))
lm <- lm(y~x1+x2+0, data=df)#adding the 0 in the fit forces that there is no intercept.
df$y_hat <- predict(lm, df)
df$residuals <- df$y - df$y_hat
cor1 = cor(df$residuals, df$x1)
cor2 = cor(df$residuals, df$x2)
sum_residual = sum(df$residuals)
head(df)
## x1 x2 y y_hat residuals
## 1 0.8966972 0.6049333 3.159339 3.461635 -0.3022961
## 2 0.2655087 0.6547239 1.959681 1.824276 0.1354047
## 3 0.3721239 0.3531973 1.868996 1.608236 0.2607600
## 4 0.5728534 0.2702601 2.218747 2.016174 0.2025737
## 5 0.9082078 0.9926841 3.801837 4.144671 -0.3428341
## 6 0.2016819 0.6334933 1.804678 1.614562 0.1901154
When we force the intercept to be 0, the correlation coefficients between the residuals and x1 and x2 respectively are -0.692422 and -0.6843629. The sum of all residuals is 9.3365199. None of these values is 0.0.