Phillip M. Alday
22 November 2017 (Nijmegen)
And if it's not hard, then it's probably not worth a whole presentation. So I probably will say something infelicitous at some point.
If you catch it (either now or later), do let me know.
I'm the one-eyed guy with bad cataracts in the land of the blind. I trip a little bit less often, but I still trip.
(alpha-Fehler in der dt. Tradition)
(beta-Fehler in der dt. Tradition)
There's a ton!
And of course see Andrew Gelman's description of our increasing awareness of the role of power in the replication crisis (search for 'timeline')
\[ Y = \beta_{0} + \beta_{1}X_1 + \beta_{2}X_2 \ldots + \beta_{p}X_p + \varepsilon \] \[ \varepsilon \sim N(0,\sigma) \]
\[ Y = \beta_{0} + \beta_{1}X_1 + \beta_{2}X_2 \ldots + \beta_{p}X_p + \varepsilon \] \[ \varepsilon \sim N(0,\sigma) \]
lsmeans, but there are potential multiple-comparisons issues)Additive effect: \[ \beta_{1}X_1 + \beta_{2}X_2 \] Letting the coefficient of \( X_1 \) be modulated by \( X_2 \) \[ (\beta_{1} + \beta_{int}X_2)X_1 + \beta_{2}X_2 \] Distributive Property \[ \beta_{1}X_1 + \beta_{int}X_{2}X_1 + \beta_{2}X_2 \] Commutativity of (element-wise) multiplication \[ \beta_{1}X_1 + \beta_{int}X_{1}X_2 + \beta_{2}X_2 \] Distributive Property + Commutativity of Addition \[ \beta_{1}X_1 + (\beta_{2} + \beta_{int}X_{1})X_2 \]
Note that this gives a nice interpretation of quadratic and other polynomial effects – a model term is interacting with itself, thus providing a bit of a feedback cycle and changing its own slope.
summary(lm(mpg ~ disp * cyl),data=mtcars)
Call:
lm(formula = mpg ~ disp * cyl)
Residuals:
Min 1Q Median 3Q Max
-4.0809 -1.6054 -0.2948 1.0546 5.7981
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.037212 5.004636 9.798 1.51e-10 ***
disp -0.145526 0.040002 -3.638 0.001099 **
cyl -3.405244 0.840189 -4.053 0.000365 ***
disp:cyl 0.015854 0.004948 3.204 0.003369 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.66 on 28 degrees of freedom
Multiple R-squared: 0.8241, Adjusted R-squared: 0.8052
F-statistic: 43.72 on 3 and 28 DF, p-value: 1.078e-10
summary(lm(mpg ~ disp + cyl + disp:cyl),data=mtcars)
Call:
lm(formula = mpg ~ disp + cyl + disp:cyl)
Residuals:
Min 1Q Median 3Q Max
-4.0809 -1.6054 -0.2948 1.0546 5.7981
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.037212 5.004636 9.798 1.51e-10 ***
disp -0.145526 0.040002 -3.638 0.001099 **
cyl -3.405244 0.840189 -4.053 0.000365 ***
disp:cyl 0.015854 0.004948 3.204 0.003369 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.66 on 28 degrees of freedom
Multiple R-squared: 0.8241, Adjusted R-squared: 0.8052
F-statistic: 43.72 on 3 and 28 DF, p-value: 1.078e-10
summary(lm(mpg ~ disp + cyl + I(disp*cyl)),data=mtcars)
Call:
lm(formula = mpg ~ disp + cyl + I(disp * cyl))
Residuals:
Min 1Q Median 3Q Max
-4.0809 -1.6054 -0.2948 1.0546 5.7981
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.037212 5.004636 9.798 1.51e-10 ***
disp -0.145526 0.040002 -3.638 0.001099 **
cyl -3.405244 0.840189 -4.053 0.000365 ***
I(disp * cyl) 0.015854 0.004948 3.204 0.003369 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.66 on 28 degrees of freedom
Multiple R-squared: 0.8241, Adjusted R-squared: 0.8052
F-statistic: 43.72 on 3 and 28 DF, p-value: 1.078e-10
head(model.matrix(~ disp * cyl,data=mtcars))
(Intercept) disp cyl disp:cyl
Mazda RX4 1 160 6 960
Mazda RX4 Wag 1 160 6 960
Datsun 710 1 108 4 432
Hornet 4 Drive 1 258 6 1548
Hornet Sportabout 1 360 8 2880
Valiant 1 225 6 1350
Note the format
So we when write the GLM as a matrix operation, we write \[ Y = X \beta + \varepsilon \]
where \( X \) is the model matrix as above, \( \beta \) is a column vector of coefficients/effects (matrices are multipled row by column) and the result is a column vector of predictions for each observation. The error \( \varepsilon \) a column vector whose entires are sampled from \( N(0,\sigma) \), where \( \sigma \) is the residual variance.
This is equivalent to the column-vector representation we used earlier and it's easier to manipulate for computations, but harder to discuss individual terms.
What effect does log-transforming variables have on the interpretation of additive and multiplicative effects?
\[ \log(ab) = \log(a) + \log(b) \]
So additive effects on the log scale are multiplicative effects on the original scale!
Extending \[ Y = \beta_{0} + \beta_{1}X_1 + \beta_{2}X_2 \ldots + \beta_{p}X_p + \varepsilon \] \[ \varepsilon \sim N(0,\sigma) \]
All such that, \[ S_i \stackrel{iid}{\sim} N(0,\sigma_{S_i}) \] \[ I_i \stackrel{iid}{\sim} N(0,\sigma_{I_i}) \]
The grouping variable behind the vertical bar is used so that each member of that group gets an offset attached to each of the fixed effects included before the vertical bar. The exact offsets for the observed groups are predicted (BLUPs/conditional modes), but overall the model only estimates the variance (or equivalently, the standard deviation) of the grouping variable for each effect, under the assumption that the offsets are drawn from a normal distribution with that variance and mean of 0 (remember, since these are offsets from the population-level fixed effect, this is equiv. to the group has the fixef est. as its mean.)