Mathematically, interactions between two continuous EVs are even easier (but nonlinear effects may be masquerading as interactions!)
2024-02-01
Mathematically, interactions between two continuous EVs are even easier (but nonlinear effects may be masquerading as interactions!)
Weight of wood in coniferous trees, given trunk height and diameter at ground level:
Anova( lm(wood ~ height + diam + height:diam, Trees))
## Anova Table (Type II tests) ## ## Response: wood ## Sum Sq Df F value Pr(>F) ## height 2873.5 1 199.608 < 2.2e-16 ## diam 10965.7 1 761.729 < 2.2e-16 ## height:diam 361.3 1 25.095 2.482e-06 ## Residuals 1382.0 96
Note: strong interaction between height and diameter. This means that
height and diameter (unsurprisingly), butheight on the weight of wood diameter,Tree trunk is conical, and volume \(V\) of a cone depends on diameter (or radius: \(d=2r\)) and height \(h\), as follows:
\[V = \frac{1}{3} \pi r^2 h\]
height:diam interaction in the model means including \(r \times h\) as a predictor!We can diam and height, and use this as a predictor in place of diam:height:
Trees$diam_x_height <- Trees$diam * Trees$height # calculating the product coef( lm(wood ~ diam + height + diam:height, Trees) ) # written as interaction
## (Intercept) diam height diam:height ## -9.7985935 4.9966979 -0.4913513 0.8098758
coef( lm(wood ~ diam + height + diam_x_height, Trees) ) # with actual product
## (Intercept) diam height diam_x_height ## -9.7985935 4.9966979 -0.4913513 0.8098758
Anova() tables, the SSQ, \(F\) and \(P\) for diam:height and diam_x_height also come out the same (try this yourself: data Trees.csv).height and diam come out different (despite the coefficients being the same!). Can you guess why? (Hint: the diam_x_height is treated as a third main effect).Our model is still not quite right, because what we have modelled is
\[V \sim r\times h\] whereas we know that
\[V \sim r^2\times h\]
— the amount of wood depends diam!
Later, we’ll learn how to properly model nonlinear effects. For now, I want to finish on a related point:
Two continuous EVs, \(X_1\) and \(X_2\).
\[Y \sim X_1 + (X_2)^2 + \epsilon\]
We NonlinearSim.csv.
U-shaped relationship between \(Y\) and \(X_2\)
Anova( lm(y ~ x1 + x2 + x1:x2, Sim) )
## Anova Table (Type II tests) ## ## Response: y ## Sum Sq Df F value Pr(>F) ## x1 85.73 1 39.102 2.469e-09 ## x2 35.44 1 16.163 8.286e-05 ## x1:x2 859.93 1 392.214 < 2.2e-16 ## Residuals 429.73 196
If we assume in our model that the effects on \(X_1\) and \(X_2\) are linear, we are getting an apparent strong interaction effect \(X_1 \times X_2\) — which we know isn’t there (because we have simulated the data!)
Anova( lm(y ~ x1 + x2 + I(x2^2) + x1:x2, Sim) ) # include x2^2
## Anova Table (Type II tests) ## ## Response: y ## Sum Sq Df F value Pr(>F) ## x1 101.95 1 2354.5327 <2e-16 ## x2 0.01 1 0.3081 0.5795 ## I(x2^2) 421.29 1 9729.5924 <2e-16 ## x1:x2 0.02 1 0.4599 0.4985 ## Residuals 8.44 195
If we allow for a quadratic effect of \(X_2\) in the model,
— which we know to be true (that’s what we simulated)!