For this practical, we will use data from a sample of the Teaching dataset of The Longitudinal Study of Young People in England, 2004-2006 available here.
We will be fitting a series of school value-added models, which are one of the most prominent examples of multilevel models applied in education research.
A traditional school value-added model is a model that attempts to isolate the “school effects” from the inherent variability/heterogeneity of the pupils. It attempts to ascertain what schools add to the progress of their pupils beyond what is expected of them, given their circumstances.
You can use any directory in your computer. As in the example below:
setwd("C:/myfolder")
Remember to download the data to the folder you will define as working directory, as this makes matters easier.
You can always load packages later on, but it is a good practice to load packages at the beginning of the session on the top of your script or R markdown file.
In this practical, we will use the packages haven, lme4 and ggplot2. Remember that if you haven’t installed them before, you need to do so before you call the library function:
install.packages("haven")
install.packages("lme4")
install.packages("ggplot2")
install.packages("dplyr")
Then you load them as such:
library(haven)
library(lme4)
library(ggplot2)
library(dplyr)
You can download the data from the UKDS website. There are two SPSS datasets and we will be using the one named “lsype_15000_final_2011_05_04.sav”.
To read this dataset into R, we need to use the package haven:
ype<-read_sav("lsype_15000_final_2011_05_04.sav")
In England and Wales, the Department for Education (DfE) publishes periodically the so-called performance tables, in which schools are assessed (and classified) according to the progress that their pupils make from one stage to another. Secondary schools are judged on the GCSE results of their pupils and the progress they made since the end of primary, when they sat the KS2 tests.
We will select the variables: pupilid, schoolID, ks2stand (KS2 scores), ks4stand (GCSE scores), gender, fsm and indschool.
For this, we need to use the function select of the dplyr package:
valueadded <- select(ype, pupilid, schoolID,
ks2stand, ks4stand, gender,
fsm)
Plot the relationship between KS2 and GCSE scores
plot1 <- ggplot(valueadded, aes(x=ks2stand, y=ks4stand)) +
geom_point() + geom_smooth(aes(x=ks2stand, y=ks4stand), method = "lm")
plot1
1.1 What can you observe in the plot?
1.2 How correlated are KS2 and GCSE scores?
cor(valueadded$ks2stand, valueadded$ks4stand, use="comp")
## [1] 0.6714863
Fit an empty multilevel model of pupils within schools
library(lme4)
We will use the lmer functions, which stands for “linear mixed effects regression”. The basic syntax follows the conventions of most R packages running regression. You specify an outcome regressed ~ on variables. Each variable you add needs to be preceded by a + sign. You specify the data.
Note that we will use ML, not REML. REML is the default in lmer.
Random effects are added within brackets after the fixed effects. 1 indicates that the constant is allowed to vary freely. The random effects are specified like this: (1|level2id). If you want to want random slopes, you specify (1+variable|level2id)
m0<-lmer(ks4stand~1+(1|schoolID), data=valueadded, REML=F)
summary(m0)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: ks4stand ~ 1 + (1 | schoolID)
## Data: valueadded
##
## AIC BIC logLik deviance df.resid
## 111792.8 111815.7 -55893.4 111786.8 15401
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7751 -0.6241 0.0163 0.6378 4.0338
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 24.33 4.932
## Residual 75.85 8.709
## Number of obs: 15404, groups: schoolID, 657
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.1125 0.2057 -0.547
This model is called “type 0” value-added model in the literature
2.1. What is the proportion of variation that lies between schools in the empty model?
2.2. What does that value mean?
Fit a model with “prior attainment” as the only covariate. According to the literature this is a “type AA” value-added model. According to the DfE, this is a school value-added model or “VA”
m1<-lmer(ks4stand~ks2stand+(1|schoolID), data=valueadded, REML=F)
summary(m1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: ks4stand ~ ks2stand + (1 | schoolID)
## Data: valueadded
##
## AIC BIC logLik deviance df.resid
## 94184.3 94214.5 -47088.2 94176.3 14048
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8324 -0.5431 0.0227 0.5858 4.6552
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 8.884 2.981
## Residual 44.207 6.649
## Number of obs: 14052, groups: schoolID, 628
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.110626 0.132210 -0.837
## ks2stand 0.644306 0.006298 102.298
##
## Correlation of Fixed Effects:
## (Intr)
## ks2stand 0.006
3.1. How much variance is “explained” by prior attainment?
Fit a model with all the available level 1 variables. In the literature, this model is called “type AA” value-added. The DfE would this model a “contextualised value-added model” or “CVA”.
m2<-lmer(ks4stand~ks2stand+factor(gender)+factor(fsm)+(1|schoolID), data=valueadded, REML=F)
summary(m2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## ks4stand ~ ks2stand + factor(gender) + factor(fsm) + (1 | schoolID)
## Data: valueadded
##
## AIC BIC logLik deviance df.resid
## 91571.5 91616.7 -45779.8 91559.5 13740
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0675 -0.5341 0.0103 0.5770 4.8736
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 8.984 2.997
## Residual 42.304 6.504
## Number of obs: 13746, groups: schoolID, 628
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.547500 0.148461 -3.688
## ks2stand 0.622985 0.006357 98.005
## factor(gender)1 1.911990 0.118926 16.077
## factor(fsm)1 -2.271849 0.157245 -14.448
##
## Correlation of Fixed Effects:
## (Intr) ks2stn fctr(g)1
## ks2stand -0.023
## fctr(gndr)1 -0.394 -0.033
## factr(fsm)1 -0.215 0.183 -0.005
4.1. How much have the variances (at both levels) reduced?
4.2. What does this mean for the concept of value-added?
5.1. Do male and female pupils have different levels of progress?
m3<-lmer(ks4stand~ks2stand+ks2stand*factor(gender)+factor(gender)+
factor(fsm)+(1|schoolID), data=valueadded, REML=F)
summary(m3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## ks4stand ~ ks2stand + ks2stand * factor(gender) + factor(gender) +
## factor(fsm) + (1 | schoolID)
## Data: valueadded
##
## AIC BIC logLik deviance df.resid
## 91571.8 91624.5 -45778.9 91557.8 13739
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0469 -0.5340 0.0103 0.5770 4.8719
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 8.976 2.996
## Residual 42.300 6.504
## Number of obs: 13746, groups: schoolID, 628
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.544662 0.148432 -3.669
## ks2stand 0.630388 0.008533 73.874
## factor(gender)1 1.911635 0.118920 16.075
## factor(fsm)1 -2.272931 0.157238 -14.455
## ks2stand:factor(gender)1 -0.015305 0.011776 -1.300
##
## Correlation of Fixed Effects:
## (Intr) ks2stn fctr(g)1 fctr(f)1
## ks2stand -0.008
## fctr(gndr)1 -0.394 -0.026
## factr(fsm)1 -0.215 0.133 -0.005
## ks2stnd:()1 -0.015 -0.667 0.002 0.005
5.2. Do FSM eligible pupils make more or less progress?
m4<-lmer(ks4stand~ks2stand+ks2stand*factor(fsm)+factor(gender)+
factor(fsm)+(1|schoolID), data=valueadded, REML=F)
summary(m4)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: ks4stand ~ ks2stand + ks2stand * factor(fsm) + factor(gender) +
## factor(fsm) + (1 | schoolID)
## Data: valueadded
##
## AIC BIC logLik deviance df.resid
## 91556.1 91608.8 -45771.0 91542.1 13739
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0963 -0.5351 0.0088 0.5752 4.8744
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 8.962 2.994
## Residual 42.253 6.500
## Number of obs: 13746, groups: schoolID, 628
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.56386 0.14836 -3.801
## ks2stand 0.63557 0.00703 90.410
## factor(fsm)1 -2.51428 0.16752 -15.009
## factor(gender)1 1.91338 0.11885 16.099
## ks2stand:factor(fsm)1 -0.06332 0.01516 -4.178
##
## Correlation of Fixed Effects:
## (Intr) ks2stn fctr(f)1 fctr(g)1
## ks2stand -0.033
## factr(fsm)1 -0.193 0.007
## fctr(gndr)1 -0.394 -0.028 -0.005
## ks2stnd:()1 0.026 -0.428 0.346 -0.003
One of the strengths of MLM is that we can evaluate the effect of multiple variables at different levels on the outcome of interest. Adding higher-level variables is done in the same way as any other individual-level variable.
We can easily create a new school-level variable from the dataset we have if we aggregate pupil-level data. The code below uses the function mutate of the dplyr package to create a new variable that represents the percentage of pupils eligible for free school meals in each school:
valueadded <- valueadded %>%
group_by(schoolID) %>%
mutate(schoolfsm=mean(fsm, na.rm=T)*100)
You can inspect the results by clicking on the object valueadded that is in your Environment tab.
After that, we’re ready to fit the model with schoolfsm.
m5<-lmer(ks4stand ~ ks2stand + schoolfsm + (1|schoolID), data=valueadded, REML=F)
summary(m5)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: ks4stand ~ ks2stand + schoolfsm + (1 | schoolID)
## Data: valueadded
##
## AIC BIC logLik deviance df.resid
## 94182.1 94219.8 -47086.0 94172.1 14047
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8302 -0.5454 0.0223 0.5870 4.6542
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 8.832 2.972
## Residual 44.203 6.649
## Number of obs: 14052, groups: schoolID, 628
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.191653 0.196889 0.973
## ks2stand 0.642585 0.006354 101.130
## schoolfsm -0.014478 0.007003 -2.067
##
## Correlation of Fixed Effects:
## (Intr) ks2stn
## ks2stand -0.095
## schoolfsm -0.742 0.133
6.1. What is the effect of the percentage of FSM eligible pupils on GCSE scores?
Plotting the higher-level residuals can be helpful to identify groups that have higher or lower than average effect on the individual-level outcome. In the case of school performance, the residuals can be thought of as the effect uniquely attributable to the school on the progress of their pupils.
To plot the residuals with this purpose, we can use a “caterpillar plot”.
u0 <- ranef(m1, condVar = TRUE) # These are the residuals from model "m1"
u0se <- sqrt(attr(u0[[1]], "postVar")[1,,]) # These are the standard errors of the residuals
schoolid <- as.numeric(rownames(u0[[1]])) # This is to create school identifiers
You will there are three additional objects in your environment. To put them together in one dataset, we do the following:
school_resid <- cbind(schoolid, u0[[1]], u0se)
colnames(school_resid) <- c("schoolid","u0","u0se")
# Then we sort the residuals in ascending order:
school_resid <- school_resid[order(school_resid$u0), ]
# And we create a new column (variable) containing the ranks:
school_resid <- cbind(school_resid, c(1:dim(school_resid)[1]))
colnames(school_resid)[4] <- "u0rank" # This is to give a name to the new column containing the ranks
After all this, we end up with a new dataset school_resid containing the school value-added estimates. We can plot with ggplot2 as such:
school_VA_plot <- ggplot(school_resid, aes(x=u0rank, y=u0)) +
geom_point(stat="identity") +
geom_errorbar(aes(ymin=u0-1.96*u0se, ymax=u0+1.96*u0se)) +
geom_hline(yintercept=0,size=1.2, alpha=0.7,colour="#EF3B2C", linetype="twodash") +
xlab("Rank of residuals") +
ylab("School VA estimates") +
theme_bw()
school_VA_plot
In the plot above, the red line at y=0 represents the overall national average. Each school is represented by a point and a vertical segment, which represent the average school-specific effect and its 95% confidence interval (respectively). Schools on the left-hand side of the distribution that do not overlap with the national average line are said to be “significantly underperforming”; whereas those on the right-hand side that do not overlap the red line are “significantly overperforming”. All schools that do overlap are those that can be thought of as “performing as expected”.
NB: This is not the only tool to make such judgements about school performance; a comprehensive accountability system would involve also school inspections and qualitative judgements.
You could plot predictions for each school:
valueadded2<-filter(valueadded, !is.na(ks4stand) & !is.na(ks2stand)) # this filter is necessary to avoid issues with missing values
valueadded2$pred<-fitted(m1)
school_plot<-ggplot(valueadded2, aes(x=ks2stand, y=pred, group=factor(schoolID))) +
geom_smooth(method="lm", colour="black") +
xlab("Standardised KS2 score") +
ylab("Predicted KS4 score") +
theme_bw()
school_plot
In the plot above, each line represents a school. As you can see, there is a lot of variability across schools.In this plot, school predicted lines are parallel because we haven’t allowed the effect of KS2 scores to vary across schools; this is a random intercepts model. You can compare this plot with the first plot we did above, where the single-level regression line was clearly not enough to represent the extreme variability in scores. The multilevel model can account for that variability across schools and hence the multiple regression lines seen here are a much better representation of the observed data.
To visualise non-parallel school predicted lines, i.e. varying slopes, we need to fit a model that allows the effect of KS2 to vary across schools. To fit a random slopes model, we run the following code:
m1_rs<-lmer(ks4stand~as.vector(scale(ks2stand))+
(1+as.vector(scale(ks2stand))|schoolID), data=valueadded, REML=F)
# Note that KS2 have been added like this "as.vector(scale(ks2stand))". This is done to prevent a convergence error. The function "scale" rescales KS2 scores to units of standard deviation.
summary(m1_rs)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## ks4stand ~ as.vector(scale(ks2stand)) + (1 + as.vector(scale(ks2stand)) |
## schoolID)
## Data: valueadded
##
## AIC BIC logLik deviance df.resid
## 94083.5 94128.8 -47035.7 94071.5 14046
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.9474 -0.5315 0.0255 0.5764 4.5486
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolID (Intercept) 8.964 2.994
## as.vector(scale(ks2stand)) 1.295 1.138 0.50
## Residual 43.160 6.570
## Number of obs: 14052, groups: schoolID, 628
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.11737 0.13375 -0.877
## as.vector(scale(ks2stand)) 6.42098 0.07797 82.353
##
## Correlation of Fixed Effects:
## (Intr)
## as.vct((2)) 0.282
To compare the random intercepts models with the random slopes model, we can run the following code:
anova(m1, m1_rs)
## Data: valueadded
## Models:
## m1: ks4stand ~ ks2stand + (1 | schoolID)
## m1_rs: ks4stand ~ as.vector(scale(ks2stand)) + (1 + as.vector(scale(ks2stand)) |
## m1_rs: schoolID)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m1 4 94184 94215 -47088 94176
## m1_rs 6 94083 94129 -47036 94071 104.87 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results indicate that, even though more complex (2 extra parameters), the random slopes model has a significantly better fit than the random intercepts model.
To plot the school predicted lines, we can retrieve the fitted values from the model m1_rs, as such:
valueadded2$pred_rs<-fitted(m1_rs)
Then we can simply copy the code from the previous plot, replacing m1 for m1_rs:
school_plot_rs<-ggplot(valueadded2, aes(x=ks2stand, y=pred_rs, group=factor(schoolID))) +
geom_smooth(method="lm", colour="black") +
xlab("Standardised KS2 score") +
ylab("Predicted KS4 score") +
theme_bw()
school_plot_rs
Voilà! School predicted scores have varying slopes for the relationship between KS2 and GCSE scores. You can see that pupils in some schools make more progress than others on average and some make less.
You can also plot the higher-level (school) residuals to check for normality
hist(school_resid$u0)
You can also plot individual-level to check for normality
valueadded2$ind_resid <- residuals(m1)
hist(valueadded2$ind_resid)
And finally, you can plot individual-level residuals against the predicted values (previously retrieved):
homoscedasticity <- ggplot(valueadded2, aes(y = ind_resid, x = pred)) + geom_point()
homoscedasticity
Multilevel modelling can be used for so much more than modelling educational outcomes of pupils nested within schools. It can also be used to understand variation across time within individuals; variation between prisoners nested within prisons; variation in income for individuals nested within geographical areas; variation in health outcomes of patients nested within GP practices and hospitals; and so many other examples.
Here are some applications that you may want to explore:
a) About school value-added models:
Leckie, G. (2009). The complexity of school and neighbourhood effects and movements of pupils on school differences in models of educational achievement
Rasbash, J., Leckie, G., Pillinger, R., Jenkins, J. (2010). Children’s educational progress: partitioning family, school and area effects
Troncoso, P., Pampaka, M., Olsen, W. (2016). Beyond traditional school value-added models: a multilevel analysis of complex school effects in Chile
Troncoso, P. (2019). A two-fold indicator of school performance and the cost of ignoring it
b) About prison effects:
c) General multilevel modelling books:
Goldstein, H. (2011). Multilevel statistical models (4th ed.). John Wiley and Sons
Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications (3rd Ed). Routledge
Snijders, T., Bosker, R. (2012). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling (2nd ed.). Sage
HIGHLY RECOMMENDED: For a more complete (and free) course on Multilevel Modelling, visit the LEMMA website of the University of Bristol.
Once you see multilevel structures in your data, you cannot unsee them…
-‘I see MLMs!’ (via Giphy)
patricio.troncoso@manchester.ac.uk