The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2287, and the number of schools is 131. Class sizes are from 4 to 35.

The question of interest is how the language test score depends on the pupil’s intelligence and his family’s socio-economic status, and on some school variables.

Source: Snijders, T. & Bosker, R. (2002). Multilevel Analysis.

Column 1: School ID Column 2: Pupil ID Column 3: Verbal IQ Column 4: Language achievement score Column 5: A family’s Socio-Economic Status Column 6: The number of 8th graders in the classroom Column 7: Verbal IQ centered Column 8: School mean centered IQ Column 9: Group size centered Column 10: SES centered

1 load packages

pacman::p_load(mlmRev, tidyverse, lme4, nlme)

2 input data

dta_IQ<-read.table("C:/Users/Ching-Fang Wu/Documents/data/iq_language.txt", h=T)

3 show first 7 lines

head(dta_IQ,7)

##   School Pupil   IQ Language Group_size       IQ_c School_mean Group_mean
## 1      1 17001 15.0       46         29  3.1659379    -1.51406        5.9
## 2      1 17002 14.5       45         29  2.6659379    -1.51406        5.9
## 3      1 17003  9.5       33         29 -2.3340621    -1.51406        5.9
## 4      1 17004 11.0       46         29 -0.8340621    -1.51406        5.9
## 5      1 17005  8.0       20         29 -3.8340621    -1.51406        5.9
## 6      1 17006  9.5       30         29 -2.3340621    -1.51406        5.9
## 7      1 17007  9.5       30         29 -2.3340621    -1.51406        5.9
##        SES_c
## 1  -4.811981
## 2 -17.811981
## 3 -12.811981
## 4  -4.811981
## 5 -17.811981
## 6 -17.811981
## 7  -4.811981

4 show data structure

str(dta_IQ)

## 'data.frame':    2287 obs. of  9 variables:
##  $ School     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Pupil      : int  17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
##  $ IQ         : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
##  $ Language   : int  46 45 33 46 20 30 30 57 36 36 ...
##  $ Group_size : int  29 29 29 29 29 29 29 29 29 29 ...
##  $ IQ_c       : num  3.166 2.666 -2.334 -0.834 -3.834 ...
##  $ School_mean: num  -1.51 -1.51 -1.51 -1.51 -1.51 ...
##  $ Group_mean : num  5.9 5.9 5.9 5.9 5.9 5.9 5.9 5.9 5.9 5.9 ...
##  $ SES_c      : num  -4.81 -17.81 -12.81 -4.81 -17.81 ...

5 data management

dta_IQ[,1:2] <- apply(dta_IQ[,1:2], 2, as.factor) #將學校ID和學生ID資料屬性轉換為factor

6 baseline model

summary(m0 <- lmer(Language ~ (1|School), data = dta_IQ))#(1 | School)是random intercept，意思是假設每個學校的截距不同，其中"1"代表截距項。

## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ (1 | School)
##    Data: dta_IQ
## 
## REML criterion at convergence: 16253.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.11618 -0.65703  0.07597  0.74128  2.50639 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 19.63    4.431   
##  Residual             64.56    8.035   
## Number of obs: 2287, groups:  School, 131
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  40.3624     0.4282   94.26

7 random intercept; fixed-effect is IQ_c

summary(m1 <- lmer(Language ~ IQ_c + (1|School), data = dta_IQ))

## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ IQ_c + (1 | School)
##    Data: dta_IQ
## 
## REML criterion at convergence: 15255.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0939 -0.6375  0.0579  0.7061  3.1448 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept)  9.602   3.099   
##  Residual             42.245   6.500   
## Number of obs: 2287, groups:  School, 131
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 40.60823    0.30819   131.8
## IQ_c         2.48759    0.07008    35.5
## 
## Correlation of Fixed Effects:
##      (Intr)
## IQ_c 0.018

9.602/(9.602+42.245) =0.185

The between-pupil differences in language scores are partially explained by IQ score.

8

sjPlot::tab_model(m0, m1, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)

	Language		Language
Predictors	Estimates	std. Error	Estimates	std. Error
(Intercept)	40.36	0.43	40.61	0.31
IQ_c			2.49	0.07
Random Effects
σ²	64.56		42.24
τ₀₀	19.63 _School		9.60 _School
ICC	0.23		0.19

1.The estimated mean of Language scores for all children is 40.61 with an SD of √(42.24+9.6)

2.For one point increase in IQ_c(School mean centered IQ), Language score increase by 2.49 point, on average.

3.Adjusting for IQ_c (School mean centered IQ), correlation between Language scores among children attending the same school is 0.1.

4.About 34.57%〔=(64.56-42.24)/64.56〕 of variance in gain scores can be attributed to differences in IQ_c(School mean centered IQ) children attending different schools.

5.About 38.42% (=〔(64.56+19.63)-(42.24+9.6)〕/(64.56+19.63)) of variance in gain scores can be attributed to differences in IQ_c(School mean centered IQ) among pupils attending the same school.

9 compare models

anova(m0, m1)

## refitting model(s) with ML (instead of REML)

## Data: dta_IQ
## Models:
## m0: Language ~ (1 | School)
## m1: Language ~ IQ_c + (1 | School)
##    npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## m0    3 16259 16276 -8126.6    16253                         
## m1    4 15260 15283 -7625.9    15252 1001.4  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#The End

Week5 Homework 1

Ching-Fang Wu

Mon Oct 12 12:13:31 2020