Column 1: School ID
Column 2: Pupil ID
Column 3: Verbal IQ
Column 4: Language achievement score
Column 5: A family’s Socio-Economic Status
Column 6: The number of 8th graders in the classroom
Column 7: Verbal IQ centered
Column 8: School mean centered IQ
Column 9: Group size centered
Column 10: SES centered
# load package
pacman::p_load(mlmRev, tidyverse, lme4, nlme)
# input data
dta1 <- read.table("iq_language.txt", header = T)
#
head(dta1)
## School Pupil IQ Language Group_size IQ_c School_mean Group_mean
## 1 1 17001 15.0 46 29 3.1659379 -1.51406 5.9
## 2 1 17002 14.5 45 29 2.6659379 -1.51406 5.9
## 3 1 17003 9.5 33 29 -2.3340621 -1.51406 5.9
## 4 1 17004 11.0 46 29 -0.8340621 -1.51406 5.9
## 5 1 17005 8.0 20 29 -3.8340621 -1.51406 5.9
## 6 1 17006 9.5 30 29 -2.3340621 -1.51406 5.9
## SES_c
## 1 -4.811981
## 2 -17.811981
## 3 -12.811981
## 4 -4.811981
## 5 -17.811981
## 6 -17.811981
One level: School(randoom effect)
summary(m0 <- lmer(Language ~ (1|School), data = dta1))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ (1 | School)
## Data: dta1
##
## REML criterion at convergence: 16253.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.11618 -0.65703 0.07597 0.74128 2.50639
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 19.63 4.431
## Residual 64.56 8.035
## Number of obs: 2287, groups: School, 131
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 40.3624 0.4282 94.26
The estimated overall language score is 40.36.
The estimated between school variance in language score is 19.63.
The variation between student within the same school is estimated to be 64.56.
The between school variance is about 19.63 / (19.63 + 64.56) = 0.2331631 (23%)of the total variance. This is the intraclass correlation.
One level with one predictor
summary(m1 <- lmer(Language ~ IQ_c + (1|School), data = dta1))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ IQ_c + (1 | School)
## Data: dta1
##
## REML criterion at convergence: 15255.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0939 -0.6375 0.0579 0.7061 3.1448
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 9.602 3.099
## Residual 42.245 6.500
## Number of obs: 2287, groups: School, 131
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 40.60823 0.30819 131.8
## IQ_c 2.48759 0.07008 35.5
##
## Correlation of Fixed Effects:
## (Intr)
## IQ_c 0.018
9.4968/(9.4968 + 42.2272) = .18
9.602/(9.602 + 42.245) = 0.1851988
The between-pupil differences in language scores are partially explained by IQ score.
sjPlot::tab_model(m0, m1, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)
Language | Language | |||
---|---|---|---|---|
Predictors | Estimates | std. Error | Estimates | std. Error |
(Intercept) | 40.36 | 0.43 | 40.61 | 0.31 |
IQ_c | 2.49 | 0.07 | ||
Random Effects | ||||
σ2 | 64.56 | 42.24 | ||
τ00 | 19.63 School | 9.60 School | ||
ICC | 0.23 | 0.19 |
# load them
pacman::p_load(foreign, tidyverse, lme4, merTools)
# file location
fLoc <- "eg_hlm.sav"
# read spss format and covert it to data frame
dta2 <- read.spss(fLoc) %>% as.data.frame
# check data structure
str(dta2)
## 'data.frame': 7230 obs. of 12 variables:
## $ size : num 380 380 380 380 380 380 380 380 380 380 ...
## $ lowinc : num 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 ...
## $ mobility: num 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 ...
## $ female : num 0 0 0 0 0 0 0 0 0 0 ...
## $ black : num 0 0 0 0 0 0 0 0 0 0 ...
## $ hispanic: num 1 1 1 0 0 0 0 0 1 1 ...
## $ year : num 0.5 1.5 2.5 -1.5 -0.5 0.5 1.5 2.5 -1.5 -0.5 ...
## $ grade : num 2 3 4 0 1 2 3 4 0 1 ...
## $ math : num 1.146 1.134 2.3 -1.303 0.439 ...
## $ retained: num 0 0 0 0 0 0 0 0 0 0 ...
## $ school : Factor w/ 60 levels " 2020",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ cid : Factor w/ 1721 levels " 101480302",..: 244 244 244 248 248 248 248 248 253 253 ...
# set black-and-white theme
theme_set(theme_bw())
ggplot(dta2, aes(year, math, group = cid)) +
geom_point(alpha = .1) +
geom_line(alpha = .1) +
stat_smooth(aes(group = school), method = "lm", se = F, lwd = .6) +
stat_smooth(aes(group = 1), method = "lm", se = F, col = "magenta") +
labs(x = "Year of age (centered)", y = "Math score")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
# nested specification with random intercepts only
# fixed-effect for slope (year)
# same as
# lmer(math ~ year + (1 | school) +( 1 | school:cid) , data = dta))
summary(m1 <- lmer(math ~ year + (1 | school/cid), data = dta2))
## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ year + (1 | school/cid)
## Data: dta2
##
## REML criterion at convergence: 16759.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8093 -0.5831 -0.0270 0.5566 6.0867
##
## Random effects:
## Groups Name Variance Std.Dev.
## cid:school (Intercept) 0.6699 0.8185
## school (Intercept) 0.1869 0.4324
## Residual 0.3470 0.5891
## Number of obs: 7230, groups: cid:school, 1721; school, 60
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.780482 0.061088 -12.78
## year 0.746123 0.005396 138.26
##
## Correlation of Fixed Effects:
## (Intr)
## year -0.031
# random slopes (year)
summary(m2 <- lmer(math ~ ( year | school/cid), data = dta2))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00321233 (tol = 0.002, component 1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ (year | school/cid)
## Data: dta2
##
## REML criterion at convergence: 16553.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2231 -0.5611 -0.0308 0.5312 5.1579
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## cid:school (Intercept) 0.64127 0.8008
## year 0.01132 0.1064 0.55
## school (Intercept) 0.92436 0.9614
## year 0.60793 0.7797 0.92
## Residual 0.30115 0.5488
## Number of obs: 7230, groups: cid:school, 1721; school, 60
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -1.64424 0.05462 -30.11
## convergence code: 0
## Model failed to converge with max|grad| = 0.00321233 (tol = 0.002, component 1)
# compare models
anova(m1, m2)
## refitting model(s) with ML (instead of REML)
## Data: dta2
## Models:
## m1: math ~ year + (1 | school/cid)
## m2: math ~ (year | school/cid)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m1 5 16757 16792 -8373.5 16747
## m2 8 16566 16621 -8275.0 16550 197.1 3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sjPlot::tab_model(m1, m2, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)
math | math | |||
---|---|---|---|---|
Predictors | Estimates | std. Error | Estimates | std. Error |
(Intercept) | -0.78 | 0.06 | -1.64 | 0.05 |
year | 0.75 | 0.01 | ||
Random Effects | ||||
σ2 | 0.35 | 0.30 | ||
τ00 | 0.67 cid:school | 0.64 cid:school | ||
0.19 school | 0.92 school | |||
τ11 | 0.01 cid:school.year | |||
0.61 school.year | ||||
ρ01 | 0.55 cid:school | |||
0.92 school | ||||
ICC | 0.71 | 0.84 |
# merTools
# estimate random-effects parameters by simulation
m2_re <- REsim(m2)
# plot for random effects
plotREsim(m2_re)
# residual plots of m1
plot(m1, ylim = c(-4, 4),
xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)
# residual plots of m2
plot(m2, ylim = c(-4, 4),
xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)
Column 1: Score
Column 2: School ID
Column 3: Student ID
# load package
pacman::p_load(mlmRev, tidyverse, lme4, nlme)
# input data
dta3_1 <- read.table("demo1.txt", header = T)
dta3_2 <- read.table("demo2.txt", header = T)
# grand mean
summary(dta3_1$Score)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 51.00 55.00 62.11 93.00 97.00
summary(dta3_2$Score)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 51.00 53.00 57.44 91.00 97.00
dta3_1 <- dta3_1 %>%
group_by(School) %>%
mutate( ave_Score = mean(Score),
c_Score = ave_Score - Score)
dta3_2 <- dta3_2 %>%
group_by(School) %>%
mutate( ave_Score = mean(Score),
c_Score = ave_Score - Score)
ggplot(data = dta3_1, aes(School, Score)) +
geom_point(data = dta3_1, aes(School, Score), colour = 'skyblue', size = 1) +
geom_point(data = dta3_2, aes(School, Score), colour = 'coral', size = 1) +
geom_hline(yintercept = mean(dta3_1$Score), colour = 'skyblue') +
geom_hline(yintercept = mean(dta3_2$Score), colour = 'coral')
## LMER
summary(m0_1 <- lmer(Score ~ (1|School), data = dta3_1))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ (1 | School)
## Data: dta3_1
##
## REML criterion at convergence: 51.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3280 -0.4745 0.0000 0.4608 1.3553
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 1679 40.976
## Residual 5 2.236
## Number of obs: 9, groups: School, 3
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 53.01 23.67 2.24
summary(m0_2 <- lmer(Score ~ (1|School), data = dta3_2))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ (1 | School)
## Data: dta3_2
##
## REML criterion at convergence: 80.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.36734 -0.34286 -0.02776 1.07153 1.13999
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 104.2 10.21
## Residual 967.8 31.11
## Number of obs: 9, groups: School, 3
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 56.36 12.01 4.692
sjPlot::tab_model(m0_1, m0_2, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)
Score | Score | |||
---|---|---|---|---|
Predictors | Estimates | std. Error | Estimates | std. Error |
(Intercept) | 53.01 | 23.67 | 56.36 | 12.01 |
Random Effects | ||||
σ2 | 5.00 | 967.76 | ||
τ00 | 1679.06 School | 104.17 School | ||
ICC | 1.00 | 0.10 |
plot(m0_1, ylim = c(-20, 20),
xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)
plot(m0_2, ylim = c(-20, 20),
xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)
ICC(outcome = "Score", group ="School", data=dta3_1)
## [1] 0.9970309
ICC(outcome = "Score", group ="School", data=dta3_2)
## [1] 0.0971812
summary(m0_1c <- lm(Score ~ School:c_Score, data = dta3_1))
##
## Call:
## lm(formula = Score ~ School:c_Score, data = dta3_1)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9
## -50.111 -50.111 -9.111 -9.111 -9.111 31.889 31.889 31.889 31.889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.111 14.406 4.311 0.00763 **
## SchoolS1:c_Score -1.000 30.560 -0.033 0.97516
## SchoolS2:c_Score -1.000 15.280 -0.065 0.95036
## SchoolS3:c_Score -1.000 9.664 -0.103 0.92161
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 43.22 on 5 degrees of freedom
## Multiple R-squared: 0.003202, Adjusted R-squared: -0.5949
## F-statistic: 0.005354 on 3 and 5 DF, p-value: 0.9994
summary(m0_2c <- lm(Score ~ School:c_Score, data = dta3_2))
##
## Call:
## lm(formula = Score ~ School:c_Score, data = dta3_2)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9
## -26.444 -26.444 -4.444 -4.444 -4.444 16.556 16.556 16.556 16.556
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.4444 7.5340 7.625 0.000617 ***
## SchoolS1:c_Score -1.0000 0.7991 -1.251 0.266142
## SchoolS2:c_Score -1.0000 0.4094 -2.443 0.058462 .
## SchoolS3:c_Score -1.0000 0.5131 -1.949 0.108844
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.6 on 5 degrees of freedom
## Multiple R-squared: 0.6938, Adjusted R-squared: 0.5101
## F-statistic: 3.777 on 3 and 5 DF, p-value: 0.09333
sjPlot::tab_model(m0_1c, m0_2c, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)
Score | Score | |||
---|---|---|---|---|
Predictors | Estimates | std. Error | Estimates | std. Error |
(Intercept) | 62.11 | 14.41 | 57.44 | 7.53 |
School [S1] : c_Score | -1.00 | 30.56 | -1.00 | 0.80 |
School [S2] : c_Score | -1.00 | 15.28 | -1.00 | 0.41 |
School [S3] : c_Score | -1.00 | 9.66 | -1.00 | 0.51 |