This is part 2 of 2 tutorials on mixed effects models. Part 1 is entitled Multiple Regression Model (Part 1 of 2 Mixed Effect Model).
## Loading required package: Matrix
#politeness=read.csv("http://www.bodowinter.com/tutorial/politeness_data.csv")
politeness=read.csv("politeness_data.csv")
#which(!complete.cases(politeness))
which(is.na(politeness$frequency))
## [1] 39
There is s a null value in row 39. However, our model is robust to such disturbances.
boxplot(frequency ~ attitude*gender,col=c("white","lightgray"),politeness)
In this case, there is a fixed effect of only attitude, and two random effects (subject & scenario). This function requires a random effect, adding just fixed effects will generate an error.
politeness.model = lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=politeness)
summary(politeness.model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: frequency ~ attitude + (1 | subject) + (1 | scenario)
## Data: politeness
##
## REML criterion at convergence: 793.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2006 -0.5817 -0.0639 0.5625 3.4385
##
## Random effects:
## Groups Name Variance Std.Dev.
## scenario (Intercept) 219 14.80
## subject (Intercept) 4015 63.36
## Residual 646 25.42
## Number of obs: 83, groups: scenario, 7; subject, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 202.588 26.754 7.572
## attitudepol -19.695 5.585 -3.527
##
## Correlation of Fixed Effects:
## (Intr)
## attitudepol -0.103
Here we have added back in gender as a fixed effect, improving the model. Directly quoting Winter describing a classic example:
“As we didn’t inform our model that there’s two sexes in our dataset, the intercept is particularly off, in between the voice pitch of males and females. This is just like the classic example of a farm with a dozen hens and a dozen cows … where the mean legs of all farm animals considered together is three, not a particularly informative representation of what’s going on at the farm.”
politeness.model = lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=politeness)
summary(politeness.model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
## Data: politeness
##
## REML criterion at convergence: 775.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2591 -0.6236 -0.0772 0.5388 3.4795
##
## Random effects:
## Groups Name Variance Std.Dev.
## scenario (Intercept) 219.5 14.81
## subject (Intercept) 615.6 24.81
## Residual 645.9 25.41
## Number of obs: 83, groups: scenario, 7; subject, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 256.846 16.116 15.938
## attitudepol -19.721 5.584 -3.532
## genderM -108.516 21.013 -5.164
##
## Correlation of Fixed Effects:
## (Intr) atttdp
## attitudepol -0.173
## genderM -0.652 0.004
The variation that’s associated with the random effect “subject” dropped considerably (compared to earlier model) because it is not confounded with gender. For the “fixed effects” attitudepol remains the same, but now we see “genderM” reduces pitch by about 108 Hz.
Likelihood is the probability of seeing the data you collected given your model. Then for the ratio test, you compare the likelihood of two models with each other.
For a trivial example, we compare two models of hiking speed \[mdl1 = hiking\_speed \sim gallon\_of\_ water + flashlight\] \[mdl2 = hiking\_speed \sim flashlight\] For the politeness model in R you construct the null model:
politeness.null = lmer(frequency ~ gender + (1|subject) + (1|scenario), data=politeness, REML=FALSE)
When comparing models you must add the argument REML=FALSE. The details are not too important to the casual user, but simply it changes internal computations (like the likelihood estimator) which is necessary to accurately compare ratio tests see Pinheiro, J.C., & Bates, D.M. (2000). Mixed-Effects Models in S and SPLUS. New York: Springer.
then re-do the full model, again with REML=FALSE:
politeness.model = lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=politeness, REML=FALSE)
now we can compare the two models using the anova() function:
politeness.null = lmer(frequency ~ gender + (1|subject) + (1|scenario), data=politeness, REML=FALSE)
politeness.model = lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=politeness, REML=FALSE)
anova(politeness.null,politeness.model)
## Data: politeness
## Models:
## politeness.null: frequency ~ gender + (1 | subject) + (1 | scenario)
## politeness.model: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
## Df AIC BIC logLik deviance Chisq Chi Df
## politeness.null 5 816.72 828.81 -403.36 806.72
## politeness.model 6 807.10 821.61 -397.55 795.10 11.618 1
## Pr(>Chisq)
## politeness.null
## politeness.model 0.0006532 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
see the two models you are comparing and then find Chi-Squared value, degrees of freedom, and the p-value. This model would be reported as such:
“… politeness affected pitch (\(\chi^2 (1)\)=11.62, p=0.00065), lowering it by about 19.7 Hz \(\pm\) 5.6 (standard error) …”
Let’s look at the coefficients of the model by subject and by item:
coef(politeness.model)
## $scenario
## (Intercept) attitudepol genderM
## 1 243.4859 -19.72207 -108.5173
## 2 263.3592 -19.72207 -108.5173
## 3 268.1322 -19.72207 -108.5173
## 4 277.2546 -19.72207 -108.5173
## 5 254.9319 -19.72207 -108.5173
## 6 244.8015 -19.72207 -108.5173
## 7 245.9618 -19.72207 -108.5173
##
## $subject
## (Intercept) attitudepol genderM
## F1 243.3684 -19.72207 -108.5173
## F2 266.9443 -19.72207 -108.5173
## F3 260.2276 -19.72207 -108.5173
## M3 284.3536 -19.72207 -108.5173
## M4 262.0575 -19.72207 -108.5173
## M7 224.1292 -19.72207 -108.5173
##
## attr(,"class")
## [1] "coef.mer"
This model has a random intercept, but the slope is constant. So what happens if some people are more polite, or some subjects affect people differently? What we need then is a random slope model.
politeness.model = lmer(frequency ~ attitude + gender + (1+attitude|subject) + (1+attitude|scenario), data=politeness, REML=FALSE)
coef(politeness.model)
## $scenario
## (Intercept) attitudepol genderM
## 1 245.2603 -20.43832 -110.8021
## 2 263.3012 -15.94385 -110.8021
## 3 269.1432 -20.63361 -110.8021
## 4 276.8309 -16.30132 -110.8021
## 5 256.0579 -19.40575 -110.8021
## 6 246.8605 -21.94816 -110.8021
## 7 248.4702 -23.55752 -110.8021
##
## $subject
## (Intercept) attitudepol genderM
## F1 243.8053 -20.68245 -110.8021
## F2 266.7321 -19.17028 -110.8021
## F3 260.1484 -19.60452 -110.8021
## M3 285.6958 -17.91950 -110.8021
## M4 264.1982 -19.33741 -110.8021
## M7 227.3551 -21.76744 -110.8021
##
## attr(,"class")
## [1] "coef.mer"
Now we have a random slope model by scenario and subject (but not by gender). In general, you should favor random slopes over random intercept models. Why? Random intercept models tend to be “anti-conservative”, or in other words have a high Type I error rate (false positive).
All the same assumptions hold for Mixed Effects Models as they do for Multiple Regression Models. The only addition is that one of the main reasons you do a Mixed Effects Model is because you need to resolve non-independencies in your data. As long as you are not missing important interdependencies, the mixed effects model is more robust against violations of the independence assumption.
The function used on the Multiple Regression Model dfbeta() will not work for a mixed effect model. Please see the package influence.ME for similar functionality.
Example adapted from Bodo Winter of the University of California:
Bates, D.M., Maechler, M., & Bolker, B. (2012). lme4: Linear mixed-effects models using S4 classes. R package version 0.999999-0
Nieuwenhuis, R., te Grotenhuis, M., & Pelzer, B. (2012). Influence.ME: Tools for Detecting Influential Data in Mixed Effects Models. R Journal, 4(2): pp. 38-47.
Pinheiro, J.C., & Bates, D.M. (2000). Mixed-Effects Models in S and SPLUS. New York: Springer.
R Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Winter, B. (2013). Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499. [http://arxiv.org/pdf/1308.5499.pdf]