Do decision trees model mediating variables and interactions automatically?

Or do we have to include interaction terms as inputs to the decision tree model? This toy example shows that interaction terms are modeled appropriately by decision trees simply due to the way in which decision trees are built. We therefore do not need to (and in fact should not) include interaction terms in the decision tree model.

Background: Modelling Mediating Variables and Interactions with Linear Regression

First, we build a model Y = aX + bW + cXW where W is 1 if X is between 1 and 2 but W is 0 for all other values of X.

x = seq(0,3,0.01)
w = (x>1 & x<2)*1
epsilon = rnorm(length(x),mean=0,sd=0.1)
y = 0.1*x + w + w*x + epsilon

We plot x and y to visualize the relationship

plot(x,y)

Let’s try regressing y on x without the interaction term, w

linear.model = lm(y~x)

If we plot the predicted vs. actual we see how poorly our model is performing…. Red line is perfect predictor

plot(predict(linear.model),y,xlab="Predicted",ylab="Actual")
abline(0,1,col='red')

If we look at the residuals they show us a different snapshot of our model’s poor performance

par(mfrow=c(2,2))
plot(linear.model)

Obviously, if we include the mediating variable in our model, the model performs much better

interaction.model = lm(y ~ x + w + w:x)
summary(interaction.model)

## 
## Call:
## lm(formula = y ~ x + w + w:x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25713 -0.06688 -0.00511  0.06178  0.28360 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.008352   0.012522  -0.667    0.505    
## x            0.101320   0.006857  14.777   <2e-16 ***
## w            1.082382   0.055932  19.352   <2e-16 ***
## x:w          0.968520   0.036352  26.643   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1015 on 297 degrees of freedom
## Multiple R-squared:  0.9931, Adjusted R-squared:  0.993 
## F-statistic: 1.419e+04 on 3 and 297 DF,  p-value: < 2.2e-16

par(mfrow=c(2,2))
plot(interaction.model)

par(mfrow=c(1,1))
plot(predict(interaction.model),y)
abline(0,1,col='red',xlab="Predict",ylab="Actual")

Next Steps: Modelling Mediating Variables and Interactions with Decision Trees

Now let’s try building a decision tree around our model. Does the decision tree capture the mediating variable effect?

library(tree)
tree.model2 = tree(y~x+w)
plot(tree.model2)
text(tree.model2,pretty=0)

Note that the decision tree captured the interaction effect of W with X. When W is less than 0.5 (which in this case means it’s equal to zero), we just predict that Y will be equal to the average value of X. When W is greater than 0.5 (i.e. when W is equal to 1) then our prediction of the outcome variable, Y, is adjusted/increased because we’re basically just doing regression on the segment 1 < X < 2. This is exactly the interaction that was baked into the model. Way to go, decision trees!

How well did we do on our predictions?

plot(predict(tree.model2),y,xlab="Predicted",ylab="Actual")
abline(0,1,col="red")

What if we don’t even include the mediating variable in the construction of our decision tree model? Can we still predict Y reasonably well?

tree.model = tree(y~x)
plot(tree.model)
text(tree.model,pretty=0)

Indeed, we see that even without including the mediating variable, W, in our model, we were able to predict Y with reasonable accuracy. This is because the construction of the decision tree with binary splits allows us to treat observations with 1 < X < 2 entirely differently from observations with X < 1 or X > 2.

plot(predict(tree.model),y,xlab="Predicted",ylab="Actual")
abline(0,1,col="red")

Modelling Interactions with Decision Trees

Do decision trees model mediating variables and interactions automatically?

Background: Modelling Mediating Variables and Interactions with Linear Regression

Next Steps: Modelling Mediating Variables and Interactions with Decision Trees