In this example, I want to show how to conduct model comparisons for nested multilevel models. Here I have a previous data set that I used in an earlier blog post: http://rpubs.com/mhanauer/305300
setwd("~/Desktop")
dat = read.csv("SkillsDataSet.csv", header = TRUE)
dat = dat[c("id", "school", "sex", "like")]
dat = as.data.frame(apply(dat, 2, function(x){ifelse(x == "f", 1, ifelse(x == "m", 0, ifelse(x == 19.1, NA, x)))}))
dat = as.data.frame(na.omit(dat))
dat = write.csv(dat, "dat.csv",row.names = FALSE)
dat = read.csv("dat.csv", header = TRUE)
dat = as.data.frame(dat[-c(1:3),])
dat$id = 1:length(dat$like)
dat$school = rep(1:30, each = 46)
head(dat)
## id school sex like
## 4 1 1 1 2
## 5 2 1 1 5
## 6 3 1 1 6
## 7 4 1 1 3
## 8 5 1 1 7
## 9 6 1 0 6
I am going to create two models. First a null or empty model that contains the average or intercept values for the dependent variable like. Then I add the covariate sex to the example, and then have sex have a random slope. When running model comparisons, we need to use maximum likelihood and not the default restricted maximum likelihood (REML), because the degrees of freedom are calculated differently for REML for models with different numbers of parameters. Finally, I will compare all three models using the anova function in the car package.
library(nlme)
library(car)
model1 = lme(fixed = like ~ 1, random = ~1 | school, data = dat, method = "ML")
model2 = lme(fixed = like ~ sex, random = ~1 | school, data = dat, method = "ML")
model3 = lme(fixed = like ~ sex, random = ~sex | school, data = dat, method = "ML")
anova(model1, model2, model3)
## Model df AIC BIC logLik Test L.Ratio p-value
## model1 1 3 5583.230 5598.919 -2788.615
## model2 2 4 5582.317 5603.236 -2787.158 1 vs 2 2.913034 0.0879
## model3 3 6 5581.939 5613.318 -2784.970 2 vs 3 4.377635 0.1120
Overall, there does not appear to be a better fit by adding an additional covariate or a random slopes component for that covariate.