What are interaction effects?

Sometimes predictor variables depend on each other. This isn’t the same as multicollinearity, where certain predictor variables move in the same direction. When we’re talking about interaction effects, we mean situations where the slope of a variable with respect to the target variable changes depending on the value of another variable.

What are some examples?

Interaction effects are quite common and here are some examples:

In a study looking at health benefits, we might find that lifting more weights contributes to the health of younger people, but damages the health of older people.

In a study looking at consumer preferences, we might find that poor individuals are less interested in purchasing a commodity as the price goes up, while more affluent individuals, more interested in status, might be more likely to purchase a commodity as the price goes up. (In these scenarios sellers are able to increase gross sales by raising, not lowering their prices. This phenomenon drove the sales of sparkling waters when they first appeared on the market.)

How do we handle interaction effects in regression?

We can handle interaction effects in regression by creating a term which is a product of the two effects that are interacting. Using the example above, we might multiply the amount of weight lifted by a dummy variable indicating whether the individual was younger or older.

How do we know when there are interaction effects in our data?

We can use interaction plots, where we display fitted values for the data under both conditions. For example, we might fit regression lines of the health benefits from weightlifting for both younger and older individuals on the same plot. If we see the lines cross to a significant degree, we may suspect interaction effect. We still need to do hypothesis testing, as such an effect could occur randomly.

Let’s look more closely at interaction plots. The plots below examine the interaction between gender and the other variables in the teengamb dataset from the faraway package, which looks at teenager’s expenditures per year on gambling. (In this dataset, a sex of 1 means female.)

grid.arrange(grobs=x[c(2:4)], ncol=3)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

The interaction plots suggest that the other predictors’ relationships to how much an individual spends on gambling depends heavily on gender. This is especially the case with status - the plot suggests that the more high status a boy is, the less likely he is to gamble, while the more high status a girl is, the more likely she is to gamble.

We can test this by adding the interaction term to a multiple regression model and examining the effect.

First we run the regression without the term.

summary(lm(gamble~., teengamb))
## 
## Call:
## lm(formula = gamble ~ ., data = teengamb)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -51.082 -11.320  -1.451   9.452  94.252 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  22.55565   17.19680   1.312   0.1968    
## sex         -22.11833    8.21111  -2.694   0.0101 *  
## status        0.05223    0.28111   0.186   0.8535    
## income        4.96198    1.02539   4.839 1.79e-05 ***
## verbal       -2.95949    2.17215  -1.362   0.1803    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.69 on 42 degrees of freedom
## Multiple R-squared:  0.5267, Adjusted R-squared:  0.4816 
## F-statistic: 11.69 on 4 and 42 DF,  p-value: 1.815e-06

Now we add the term:

teengamb2 <- teengamb %>%
  mutate(InteractionTerm = sex*status)

summary(lm(gamble~., teengamb2))
## 
## Call:
## lm(formula = gamble ~ ., data = teengamb2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -50.772 -13.212   0.364   7.929  87.668 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      34.8959    17.5484   1.989  0.05345 .  
## sex             -62.8769    20.9378  -3.003  0.00454 ** 
## status           -0.3176     0.3225  -0.985  0.33062    
## income            4.9886     0.9861   5.059 9.28e-06 ***
## verbal           -1.9691     2.1413  -0.920  0.36319    
## InteractionTerm   0.9922     0.4721   2.102  0.04175 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.82 on 41 degrees of freedom
## Multiple R-squared:  0.5728, Adjusted R-squared:  0.5207 
## F-statistic: 10.99 on 5 and 41 DF,  p-value: 9.346e-07

The new term is significant and the R-squared has improved. The sex variable’s p value has fallen signicantly and its coefficient has changed. We can interpret the interaction term and main effects as follows:

First, the coefficients for sex and status only hold if the value of the other term is 0. In other words, because male = 0, when the individual is male, status has a dampening effect. At the same time, when status is low or near 0, females are very unlikely to gamble. This matches the findings from our interaction plot. As for the interaction term itself, it adds a bonus to the dependent variable of .992*status when the individual is female. When the individual is male, the interaction term is 0.

In sum, interaction terms are important not only because they improve our R-squared and enhance the predictability of our model. Without knowing about the interactions in our model, we might have drawn incorrect conclusions about the role of status and of gender. When we find ourselves saying, “well, it depends ..” then we have discovered an interaction term.