The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2287, and the number of schools is 131. Class sizes are from 4 to 35.

The question of interest is how the language test score depends on the pupil’s intelligence and his family’s socio-economic status, and on some school variables.

Source: Snijders, T. & Bosker, R. (2002). Multilevel Analysis.

Column 1: School ID Column 2: Pupil ID Column 3: Verbal IQ Column 4: Language achievement score Column 5: A family’s Socio-Economic Status Column 6: The number of 8th graders in the classroom Column 7: Verbal IQ centered Column 8: School mean centered IQ Column 9: Group size centered Column 10: SES centered

1 load packages

pacman::p_load(mlmRev, tidyverse, lme4, nlme)

2 input data

dta_IQ<-read.table("C:/Users/Ching-Fang Wu/Documents/data/iq_language.txt", h=T)

3 show first 7 lines

head(dta_IQ,7)
##   School Pupil   IQ Language Group_size       IQ_c School_mean Group_mean
## 1      1 17001 15.0       46         29  3.1659379    -1.51406        5.9
## 2      1 17002 14.5       45         29  2.6659379    -1.51406        5.9
## 3      1 17003  9.5       33         29 -2.3340621    -1.51406        5.9
## 4      1 17004 11.0       46         29 -0.8340621    -1.51406        5.9
## 5      1 17005  8.0       20         29 -3.8340621    -1.51406        5.9
## 6      1 17006  9.5       30         29 -2.3340621    -1.51406        5.9
## 7      1 17007  9.5       30         29 -2.3340621    -1.51406        5.9
##        SES_c
## 1  -4.811981
## 2 -17.811981
## 3 -12.811981
## 4  -4.811981
## 5 -17.811981
## 6 -17.811981
## 7  -4.811981

4 show data structure

str(dta_IQ)
## 'data.frame':    2287 obs. of  9 variables:
##  $ School     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Pupil      : int  17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
##  $ IQ         : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
##  $ Language   : int  46 45 33 46 20 30 30 57 36 36 ...
##  $ Group_size : int  29 29 29 29 29 29 29 29 29 29 ...
##  $ IQ_c       : num  3.166 2.666 -2.334 -0.834 -3.834 ...
##  $ School_mean: num  -1.51 -1.51 -1.51 -1.51 -1.51 ...
##  $ Group_mean : num  5.9 5.9 5.9 5.9 5.9 5.9 5.9 5.9 5.9 5.9 ...
##  $ SES_c      : num  -4.81 -17.81 -12.81 -4.81 -17.81 ...

5 data management

dta_IQ[,1:2] <- apply(dta_IQ[,1:2], 2, as.factor) #將學校ID和學生ID資料屬性轉換為factor

6 baseline model

summary(m0 <- lmer(Language ~ (1|School), data = dta_IQ))#(1 | School)是random intercept,意思是假設每個學校的截距不同,其中"1"代表截距項。
## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ (1 | School)
##    Data: dta_IQ
## 
## REML criterion at convergence: 16253.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.11618 -0.65703  0.07597  0.74128  2.50639 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 19.63    4.431   
##  Residual             64.56    8.035   
## Number of obs: 2287, groups:  School, 131
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  40.3624     0.4282   94.26

7 random intercept; fixed-effect is IQ_c

summary(m1 <- lmer(Language ~ IQ_c + (1|School), data = dta_IQ))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ IQ_c + (1 | School)
##    Data: dta_IQ
## 
## REML criterion at convergence: 15255.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0939 -0.6375  0.0579  0.7061  3.1448 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept)  9.602   3.099   
##  Residual             42.245   6.500   
## Number of obs: 2287, groups:  School, 131
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 40.60823    0.30819   131.8
## IQ_c         2.48759    0.07008    35.5
## 
## Correlation of Fixed Effects:
##      (Intr)
## IQ_c 0.018

9.602/(9.602+42.245) =0.185

The between-pupil differences in language scores are partially explained by IQ score.

8

sjPlot::tab_model(m0, m1, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)
  Language Language
Predictors Estimates std. Error Estimates std. Error
(Intercept) 40.36 0.43 40.61 0.31
IQ_c 2.49 0.07
Random Effects
σ2 64.56 42.24
τ00 19.63 School 9.60 School
ICC 0.23 0.19

1.The estimated mean of Language scores for all children is 40.61 with an SD of √(42.24+9.6)

2.For one point increase in IQ_c(School mean centered IQ), Language score increase by 2.49 point, on average.

3.Adjusting for IQ_c (School mean centered IQ), correlation between Language scores among children attending the same school is 0.1.

4.About 34.57%〔=(64.56-42.24)/64.56〕 of variance in gain scores can be attributed to differences in IQ_c(School mean centered IQ) children attending different schools.

5.About 38.42% (=〔(64.56+19.63)-(42.24+9.6)〕/(64.56+19.63)) of variance in gain scores can be attributed to differences in IQ_c(School mean centered IQ) among pupils attending the same school.

9 compare models

anova(m0, m1)
## refitting model(s) with ML (instead of REML)
## Data: dta_IQ
## Models:
## m0: Language ~ (1 | School)
## m1: Language ~ IQ_c + (1 | School)
##    npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## m0    3 16259 16276 -8126.6    16253                         
## m1    4 15260 15283 -7625.9    15252 1001.4  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#The End