We use data on 1721 students nested within 60 urban public primary schools. The outcome of interest is mathematics achievement. The data were collected at the end of first grade and annually thereafter up to sixth grade, but not all students have six observations.
# read spss format and covert it to data frame
dta <- foreign::read.spss("eg_hlm.sav") %>% as.data.frame
# check data structure
str(dta)'data.frame': 7230 obs. of 12 variables:
$ size : num 380 380 380 380 380 380 380 380 380 380 ...
$ lowinc : num 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 ...
$ mobility: num 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 ...
$ female : num 0 0 0 0 0 0 0 0 0 0 ...
$ black : num 0 0 0 0 0 0 0 0 0 0 ...
$ hispanic: num 1 1 1 0 0 0 0 0 1 1 ...
$ year : num 0.5 1.5 2.5 -1.5 -0.5 0.5 1.5 2.5 -1.5 -0.5 ...
$ grade : num 2 3 4 0 1 2 3 4 0 1 ...
$ math : num 1.146 1.134 2.3 -1.303 0.439 ...
$ retained: num 0 0 0 0 0 0 0 0 0 0 ...
$ school : Factor w/ 60 levels " 2020",..: 1 1 1 1 1 1 1 1 1 1 ...
$ cid : Factor w/ 1721 levels " 101480302",..: 244 244 244 248 248 248 248 248 253 253 ...
# Create the summary table at the school level and calculate the quantiles
with(dta, table(school)) |> quantile() 0% 25% 50% 75% 100%
18.00 66.75 103.00 162.00 387.00
# Create the summary table at the student level and calculate the range
with(dta, table(cid)) |> range()[1] 2 6
We have 60 schools with a median of 103 students for a total of 1721 children, who have between 2 and 6 observations each for a total of 7230 observations.
We have specified two random part, one at the school level (level 3), and one at the child-level (level 2). They are both intercept-only, adding an error term at each level.
# add new column(name: nc, number of observations per school)
dta <- dta %>% group_by(school) %>% mutate(nc=n()) %>% ungroup()ggplot(subset(dta, nc > 200), aes(reorder(cid, math, mean), math))+
stat_summary(fun.data="mean_cl_boot", size = rel(.2))+
facet_wrap(.~ school)+
coord_flip()+
labs(x="Subject ID",
y="Mean math score")+
theme_minimal()+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())ggplot(dta, aes(reorder(school, math, mean), math))+
stat_summary(fun.data="mean_cl_boot", size = rel(.3))+
coord_flip()+
labs(x="School ID",
y="Mean math score")+
theme_minimal()m0 <- lme4::lmer(math ~ 1 + (1 | cid) + (1 | school), data=dta)
summary(m0, corr=FALSE)Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 1 + (1 | cid) + (1 | school)
Data: dta
REML criterion at convergence: 25309.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.674 -0.658 0.026 0.674 3.334
Random effects:
Groups Name Variance Std.Dev.
cid (Intercept) 0.570 0.755
school (Intercept) 0.324 0.569
Residual 1.524 1.234
Number of obs: 7230, groups: cid, 1721; school, 60
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.5100 0.0787 -6.48
The estimated mean math score is -0.51 and the variances are 0.32, 0.57 and 1.52 at the school, child and observation levels.
Most of the variance comes from the measurements (within the same child), whereas the variance in the child level is greater than that at the school level. These results indicate fairly significant variation in math achievement among children and in mean math among schools.
# by hand
v <- c(sigma(m0)^2, unlist(VarCorr(m0)))
v[3]/(v[1] + v[2] + v[3]) school
0.134074
(v[2] + v[3])/(v[1] + v[2] + v[3]) cid
0.369935
# use package
performance::icc(m0, by_group=TRUE)# ICC by Group
Group | ICC
--------------
cid | 0.236
school | 0.134
At the school level, the intra-class correlation is 0.13; 0.37, at the child level. These values indicate the correlation in math achievement between two children in the same school, and between two measurements on the same child (in the same school). It means that 13% of the variation in math achievement can be attributed to attending the same schools and 37% to the children themselves (which includes being in the same school).
The two variables, year and grade, are the same but coded differently. Year zero is near the start of the third grade, and can be referred to as ‘third grade’ for simplicity.
with(dta, table(grade, year)) year
grade -2.5 -1.5 -0.5 0.5 1.5 2.5
0 131 1346 104 1 0 0
1 0 0 1412 194 5 1
2 0 0 4 1470 174 5
3 0 0 0 7 1200 149
4 0 0 0 0 8 1010
5 0 0 0 0 0 9
ggplot(subset(dta, nc < 67), aes(year, math, group=cid))+
facet_wrap(. ~ school)+
stat_smooth(method='lm', formula=y~x, se=F, size=rel(.5), alpha=.5)+
geom_point(pch=1, size=rel(.5)) +
labs(x="Year (centered)",
y="Math score")+
theme_minimal()ggplot(dta, aes(year, math, group=school))+
stat_smooth(method='lm', formula=y~x, se=F, col=8,
size=rel(.5), alpha=.5)+
geom_point(pch=1, size=rel(.5)) +
labs(x="Year (centered)",
y="Math score",
subtitle='School level')+
theme_minimal()set <- aggregate(dta$size, list(dta$school),FUN=mean)
ggplot(subset(dta, size == quantile(set$x, prob=c(.75), type=1)), aes(year, math, , group=cid))+
facet_wrap(. ~ school)+
stat_smooth(method='lm', formula=y~x, se=F, size=rel(.5), alpha=.5)+
geom_point(pch=1, size=rel(.5)) +
labs(x="Year (centered)",
y="Math score")+
theme_minimal() # 上圖結果顯示,在學生人數為Q3(the upper quartile)的學校中,每位學生的數學成績皆有逐年增加的趨勢。We fit a model with random intercepts and slopes, with no predictors by child or school. The variable year is at the observation level.
m1 <- lme4::lmer(math ~ year + (1 + year | cid) + (1 + year | school), data=dta)
summary(m1, corr=FALSE)Linear mixed model fit by REML ['lmerMod']
Formula: math ~ year + (1 + year | cid) + (1 + year | school)
Data: dta
REML criterion at convergence: 16336.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.205 -0.564 -0.037 0.531 5.260
Random effects:
Groups Name Variance Std.Dev. Corr
cid (Intercept) 0.6405 0.800
year 0.0113 0.106 0.55
school (Intercept) 0.1686 0.411
year 0.0113 0.106 0.40
Residual 0.3014 0.549
Number of obs: 7230, groups: cid, 1721; school, 60
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.7792 0.0583 -13.4
year 0.7631 0.0154 49.6
The fixed-effect estimates yield an intercept of -0.78 and a slope of 0.76. These values can be interpreted as the average achievement in third grade, and the average rate of change per year. Average math achievement clearly increases significantly over time.
The intercepts and slopes have correlations of 0.40 at the school level and 0.55 at the child level. The latter can be interpreted as the correlation between a child’s achievement around third grade and her rate of growth; the former, the correlation between a school’s math achievement around third grade and the rate of change per year.
Most of the variance in the intercept comes from the children, whereas the variance in the slope is about the same at each level. These results indicate fairly significant variation among schools in mean achievement around third grade, as well as in the rates of change over time.
One way to look for school or individual characteristics that may be associated with levels and trends in mathematics achievement is to plot or regress the estimates of intercepts and slopes on potential predictors at each level.
# recode ethnicity from indicator variables into one factor variable with
# four levels with white as the reference
dta <- mutate(dta,
eth = factor(1 + black + 2 * hispanic,
labels=c("White", "Black", "Hispanic")))
with(dta, table(eth))eth
White Black Hispanic
1229 4977 1024
ggplot(subset(dta, nc > 167), aes(year, math, group=cid))+
facet_wrap(eth ~ .)+
stat_smooth(method='lm', formula=y~x, se=F, size=rel(.5), alpha=.5, col=8)+
geom_point(pch=1, size=rel(.5)) +
labs(x="Year (centered)",
y="Math score")+
theme_minimal()# 計算平均數學分數
dta <- dta %>% group_by(school, year) %>% mutate(math_mean=mean(math)) %>% ungroup()
dta %>%
group_by(school) %>%
summarise(math_mean = mean(math), lowinc_mean = mean(lowinc)) ->dta2
reg<-lm(math_mean ~ lowinc_mean,data=dta2)
# 繪圖
ggplot(dta, aes(x = lowinc, y = math_mean, color = year, group = year))+
geom_point(size = 1) +
stat_smooth(method = 'lm', formula = y ~ x, se = F) +
labs(x="The percent of students with low income",
y="Average math achievement",
subtitle='School level, Year zero is near the start of the third grade')+
geom_abline(intercept = reg$coefficients[1], slope = reg$coefficients[2], col=5) # 上圖結果顯示,學校的低社經地位學生比例越高,學校的平均數學表現越差。Here we consider a model in which a child’s math achievement depends on to which ethnic group he or she belongs (whites, blacks or hispanics); and where the school average math trend depends on the percent of students with low income.
m2 <- lme4::lmer(math ~ year * (eth + lowinc) + (year | cid) +
(year | school), data=dta)
summary(m2, corr=FALSE)Linear mixed model fit by REML ['lmerMod']
Formula: math ~ year * (eth + lowinc) + (year | cid) + (year | school)
Data: dta
REML criterion at convergence: 16293.2
Scaled residuals:
Min 1Q Median 3Q Max
-3.247 -0.565 -0.033 0.534 5.268
Random effects:
Groups Name Variance Std.Dev. Corr
cid (Intercept) 0.62304 0.789
year 0.01115 0.106 0.56
school (Intercept) 0.08236 0.287
year 0.00846 0.092 0.04
Residual 0.30157 0.549
Number of obs: 7230, groups: cid, 1721; school, 60
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.140234 0.129912 1.08
year 0.874019 0.040017 21.84
ethBlack -0.503498 0.078154 -6.44
ethHispanic -0.319758 0.086258 -3.71
lowinc -0.007560 0.001721 -4.39
year:ethBlack -0.030514 0.022550 -1.35
year:ethHispanic 0.043159 0.024710 1.75
year:lowinc -0.001364 0.000534 -2.56
Starting with a line (along the year) that has 0.1402 points of math achievement by the middle of third grade and increases by 0.8740 points per year, provided the school has no poor students. Each percentage point of low income students is associated with a third grade mean that’s 0.0076 lower and increases 0.0014 points less per year, so a school with 100% poor students would have a third-grade mean of -0.6168 (0.140227-0.756) and an increase of 0.7377 (0.874017-0.1363) points per year. The intercept and slope in any particular school will vary around the average line for its percent low income children.
Blacks have a third-grade mean 0.5035 points below whites in the same school, and the increase per year is 0.0305 points less than for whites in the same school. The results for hispanics indicate a third-grade mean 0.3198 points below whites, but a growth rate per year 0.0432 points higher than whites in the same school. Each child’s growth curve, however, will vary around the average line for kids of the same ethnicity in the same school.
The random effects indicate that the average third-grade score varies by school, with a variance of 0.0824, while the rate of change has a variance of 0.0085. The correlation between intercept and slope is 0.04, suggesting that it can be attributed to the school’s low-income composition.
Variation at the child level is even more sustantial; expected third-grade scores vary with a standard deviation of 0.789, and the rates of change vary with a standard deviation of 0.106. These two random effects remain highly correlated, so children who have high levels of math proficiency in third grade also tend to improve faster over time.
Raudenbush, S., Byrk, A., et al. (2004). HLM 6: Hierarchical Linear and Nonlinear Modeling. Chapter 4. Scientific Software International.
Rodríguez, G. Multilevel Models. 3-Level Model