Partial regression plots

… show the independent relations between variables

set.seed(1)
x <- runif(100, 0, 100)
set.seed(2)
x1 <- x + rnorm(100, sd=10) 
set.seed(3)
x2 <- 1 * x + rnorm(100, sd=10) # x2 is postively correlated with x1, with noise
cor.test(x1, x2)

## 
##  Pearson's product-moment correlation
## 
## data:  x1 and x2
## t = 16.515, df = 98, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7952793 0.9021483
## sample estimates:
##       cor 
## 0.8577159

p1 <- 1 # positive effect of x1
p2 <- -1 # neg. effect of x2
set.seed(1)
y <- p1*x1 + p2*x2 + rnorm(100, sd=10) # independent effects of x1 and x2, plus noise)
cor(cbind(x,x1,x2,y))

##              x        x1         x2           y
## x   1.00000000 0.9124646  0.9570942 -0.08130745
## x1  0.91246464 1.0000000  0.8577159  0.19951112
## x2  0.95709415 0.8577159  1.0000000 -0.25406373
## y  -0.08130745 0.1995111 -0.2540637  1.00000000

Simple correlations show that x1 and x2 are only very weakly correlated to y.

Multiple regression shows something else.

summary( mod0 <- lm(y ~ x1 +x2))

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.7097  -6.1218   0.4508   5.5007  21.0753 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.29621    1.91797   0.154    0.878    
## x1           0.94909    0.06238  15.214   <2e-16 ***
## x2          -0.93422    0.06028 -15.497   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.016 on 97 degrees of freedom
## Multiple R-squared:  0.7238, Adjusted R-squared:  0.7181 
## F-statistic: 127.1 on 2 and 97 DF,  p-value: < 2.2e-16

This regression model shows strong and highly significant relations between x1 and x2 vs. y.

What gives?

What is the effect of x1 on y, factoring out x2, and vice versa? First we find all the right residuals.

# y, given x2
mod1 <- lm(y ~ x2 )
resid.1 <- resid(mod1)

# x1, given x2
mod2 <- lm(x1 ~ x2)
resid.2 <- resid(mod2)


# y, given x1
mod3 <- lm(y ~ x1 )
resid.3 <- resid(mod3)

# x2, given x1
mod4 <- lm(x2 ~ x1)
resid.4 <- resid(mod4)

Now we plot the raw relations (which suggest y ~ x1) and the partial regression plots (which show the independent effects of x1 and x2 on y).

layout(matrix(1:4, nc=2, byrow=TRUE) )
plot(x1, y, main='y ~ x1')
plot(x2, y, main='y ~ x2')
plot(resid.2, resid.1, main='y|x2 ~ x1|x2')
plot(resid.4, resid.3, main='y|x1 ~ x2|x1')

The partial regression plots reveal the strong independent relations between x1 and x2 vs. y.