BS2004 | Week 5

2024-02-01

Interactions between continuous EVs

are simply multiplicative effects…

Mathematically, interactions between two continuous EVs are even easier (but nonlinear effects may be masquerading as interactions!)

Simple Example

Weight of wood in coniferous trees, given trunk height and diameter at ground level:

Anova( lm(wood ~ height + diam + height:diam, Trees))

## Anova Table (Type II tests)
## 
## Response: wood
##              Sum Sq Df F value    Pr(>F)
## height       2873.5  1 199.608 < 2.2e-16
## diam        10965.7  1 761.729 < 2.2e-16
## height:diam   361.3  1  25.095 2.482e-06
## Residuals    1382.0 96

Note: strong interaction between height and diameter. This means that

Weight of wood from the tree depends on height and diameter (unsurprisingly), but
The effect of height on the weight of wood depends on diameter,
and vice versa.

Interactions as multiplicative effects

Tree trunk is conical, and volume \(V\) of a cone depends on diameter (or radius: \(d=2r\)) and height \(h\), as follows:

\[V = \frac{1}{3} \pi r^2 h\]

The effects of \(h\) and \(r\) on \(V\) are not additive, they are multiplicative: note the equation includes \(r^2 \times h\), not \(r^2 + h\).
A purely additive model does not work well here.
Including the height:diam interaction in the model means including \(r \times h\) as a predictor!

Interactions as multiplicative effects (2)

We can literally multiply diam and height, and use this as a predictor in place of diam:height:

Trees$diam_x_height <- Trees$diam * Trees$height  # calculating the product
coef( lm(wood ~ diam + height + diam:height, Trees) )   # written as interaction

## (Intercept)        diam      height diam:height 
##  -9.7985935   4.9966979  -0.4913513   0.8098758

coef( lm(wood ~ diam + height + diam_x_height, Trees) ) # with actual product

##   (Intercept)          diam        height diam_x_height 
##    -9.7985935     4.9966979    -0.4913513     0.8098758

You get the same coefficients!
If you compare the Anova() tables, the SSQ, \(F\) and \(P\) for diam:height and diam_x_height also come out the same (try this yourself: data Trees.csv).
height and diam come out different (despite the coefficients being the same!). Can you guess why? (Hint: the diam_x_height is treated as a third main effect).

But wait…

Our model is still not quite right, because what we have modelled is

\[V \sim r\times h\] whereas we know that

\[V \sim r^2\times h\]

— the amount of wood depends nonlinearly on diam!

Later, we’ll learn how to properly model nonlinear effects. For now, I want to finish on a related point: ignoring nonlinearities can give apparent interactions where there are none (this was not the case in the Trees example).

Pseudo-interactions

Why nonlinearities matter

Simulated Data

Two continuous EVs, \(X_1\) and \(X_2\).

\(Y\) depends linearly on \(X_1\); and
\(Y\) depends nonlinearly on \(X_2\).

\[Y \sim X_1 + (X_2)^2 + \epsilon\]

There is also some non-orthogonality, \(X_1 \sim X_2\).
There is no interaction between \(X_1\) and \(X_2\)!

We know the true relationships, because that’s what we’ve used to generate the data. The data are in NonlinearSim.csv.

Plot the data

U-shaped relationship between \(Y\) and \(X_2\)

ANOVA on linear model

Anova( lm(y ~ x1 + x2 + x1:x2, Sim) )

## Anova Table (Type II tests)
## 
## Response: y
##           Sum Sq  Df F value    Pr(>F)
## x1         85.73   1  39.102 2.469e-09
## x2         35.44   1  16.163 8.286e-05
## x1:x2     859.93   1 392.214 < 2.2e-16
## Residuals 429.73 196

If we assume in our model that the effects on \(X_1\) and \(X_2\) are linear, we are getting an apparent strong interaction effect \(X_1 \times X_2\) — which we know isn’t there (because we have simulated the data!)

ANOVA on non-linear model

Anova( lm(y ~ x1 + x2 + I(x2^2) + x1:x2, Sim) )  # include x2^2

## Anova Table (Type II tests)
## 
## Response: y
##           Sum Sq  Df   F value Pr(>F)
## x1        101.95   1 2354.5327 <2e-16
## x2          0.01   1    0.3081 0.5795
## I(x2^2)   421.29   1 9729.5924 <2e-16
## x1:x2       0.02   1    0.4599 0.4985
## Residuals   8.44 195

If we allow for a quadratic effect of \(X_2\) in the model,

Evidence for interaction \(X_1 \times X_2\): gone!
Evidence for quadratic effect \((X_2)^2\): overwhelming.

— which we know to be true (that’s what we simulated)!

Summary for Week 5

Interactions are really important.
Linearity is an assumption.
If the data don’t fit your assumptions, you get nonsense.
For example, if you ignore non-linear effects, you can get strong “evidence” for interactions where there are none.