CCPD - methodological note and results - revised July 2023
Intro
This document outlines the choices related to statistical modelling of student attainment. The focus of interest are the estimates pertaining the staff component, which are interpreted to quantify the value added of teaching, measured as residual variation after regressing students’ standardised attainment on previous year’s, by subject and adjusted for relevant intervening factors.
The estimates are then contrasted with student attainments, as well as with teacher observation data.
Data
Following discussion with subject matter experts and a review of the literature, the following variables have been selected as predictors in all of the models.
variable name | description | type | notes |
---|---|---|---|
current_attainment |
MAT-wide standardised score | ordinal or numerical {0-9} |
this is our outcome variable |
prior_attainment |
MAT-wide standardised previous year’s score | ordinal or numerical {0-9} |
|
year_group |
year group | ordinal or factor {8, 9, 10} |
|
gender |
sex | categorical {F, M} |
pupil-level variable |
ethnicity_group |
broad ethnic group | categorical {Asian, Black, Chinese, Mixed, Other, Unknown, White} |
pupil-level variable |
sen |
special education needs | categorical {EHC Plan, Non-SEN, School Support} |
pupil-level variable |
age |
age of teacher | integer {22-70} |
staff-level variable |
pp_sch |
proportion of pupil premium students | numerical [0, 1] |
school-level variable |
eal_sch |
proportion of english as additional language students | numerical [0, 1] |
school-level variable |
fsm_sch |
proportion of free school meals students | numerical [0, 1] |
school-level variable |
sen_sch |
proportion of special education needs students | numerical [0, 1] |
school-level variable |
subj |
subject | categorical {Maths, Science, English Language} |
|
staff_id |
unique staff identifier | categorical “Staff_xxx” |
|
join_key |
teaching group | categorical “year group + school id + group id” |
used as join key across datasets |
school_id |
unique school identifier | categorical “School_xx” |
Each variable might enter each model in a slightly different way, as described in the relevant sections. Below an example (the first few rows) of the final dataset (also includes the model-based estimates discussed in the following).
The dataset contains 7934 records from -
schools on current and previous year’s attainments for students and -
teachers. The observed frequencies by subject and year group are as follows:
[DETAILED STATS REMOVED JULY 2023]
The sample of schools spans a range of socio-economic characteristics, the table below reports some.
[DETAILED STATS REMOVED JULY 2023]
Missingness
The dataset presents a moderate degree of missingness with respect to some teacher-level characteristics (not used in the modelling phase at present) but is broadly complete. The upset plot below shows the most frequently occurring combinations of missingness across variables. Please note that when looking at the frequencies, one should bear in mind that each row in the dataset pertains to a given combination of pupil, subject and year group; for this reason, teacher data can incur duplications, thus artificially inflating the missingness figures below for staff-level information.
When we take a look at the teacher development data (focussing on staff-level data only, so no duplications), we immediately notice a consistent pattern of missingness for summer observations. This is likely due to most second observations not having been carried out/recorded yet at the time of data sharing.
Descriptives
Below is the full sample distribution of the attainment scores by subject and year group; the first boxplot in each subject group is year 8, the second year 9, the third year 10. The diamonds represent group averages.
Central tendency and variation statistics are in the table below.
subject | year group | average mark | standard deviation | median mark | interquartile range |
---|---|---|---|---|---|
eng_lang | 8 | 4.77 | 1.92 | 5.0 | 2.0 |
eng_lang | 9 | 5.04 | 1.90 | 5.0 | 2.0 |
eng_lang | 10 | 4.60 | 1.78 | 5.0 | 3.0 |
maths | 8 | 4.59 | 2.11 | 4.0 | 3.0 |
maths | 9 | 4.32 | 2.10 | 4.0 | 2.0 |
maths | 10 | 4.08 | 1.89 | 4.0 | 2.0 |
science | 8 | 4.29 | 1.92 | 4.0 | 3.0 |
science | 9 | 5.02 | 2.28 | 5.0 | 4.0 |
science | 10 | 3.60 | 1.98 | 3.5 | 3.5 |
There are patterns in average attainment that seem to suggest a marked decrease in average performance for year 10 students. The MAT suggests this might be due, at least for maths, to the tiered nature of the subject. It is worth reminding, however, that different cohorts contribute to these statistics: the distributions depicted here all refer to the same academic year (2021/2022), with three year groups (8, 9, 10) within the same year. The following two plots complement the previous one by showing the overall distributions of prior attainment (i.e., in year 7, 8 and 9 during academic year 2020/2021) and the difference between the current and prior.
There appears to be some evidence that the average difference in scores between prior and current year is be overall positive for younger students (year 8 and 9 during academic year 2021/2022), and very close to zero for older ones (year 10), across subjects.
As an additional consistency check of the attainment measures, below are the scatterplots of prior vs current attainment, by ethnicity group, by subject. The linear interpolators added on top of the (jittered) data points highlight the dependence between pre/post standardised scores.
The next plot presents an overview of the distribution of current_attainment
across schools and subjects.
[PLOT REMOVED JULY 2023]
Schools present some variability in terms of available data. Subject-specific patterns aside, it is of interest to observe the uneven distribution of data availability: while some schools have attainment records for all year groups and all subjects, most seem to only have data for certain combinations thereof. This creates convergence issues with some of the models, as discussed in the next section.
Models
We have grounded the statistical models discussed in the following on a discussion of the theory underlying teaching value added, and built on the existing implementations in the VAM literature (refs). We have obtained the model estimates using R
and Python. R
packages: tidyverse
(data manipulation), rstan
, brms
(bayesian modelling).
Model I - simple “dynamic” or autoregressive model
Full fixed effects, frequentist approach, gaussian errors, as per VAM literature (refs).
\[^{s}y_{ijkl}(t) = \beta_0 + \beta_1\cdot {^{s}y}_{ijkl}(t-1)+ \beta_2 \cdot v_k+\sum_{h=3}^{H+1}\beta_h\cdot x_{(h-1)ijk} + \varepsilon_{ijl}\]
The subscript \(i=1,...,I\) indexes the student, \(j=1,...,J\) the teaching group, \(k=1,...,K\) the teacher, \(l=1,...,L\) the school. The superscript \(s \in \{\text{maths},\text{english language},\text{science}\}\) denotes the subject, \(t\in \{8,9,10\}\) the year group and \(h\) indexes the additional covariates \(x\). The error term \(\varepsilon_{ijl}\) is assumed gaussian.
We fit three versions of the model above in a frequentist setting:
no covariates
only adjusts for the teacher indicatorwith covariates
adjusts for all of the covariatesnested
adds a teacher \(\times\) teaching group interaction effect to model 2.
We have observed convergence issues beyond the null model with just the teacher effect. Models 2 and 3 fail to estimate all of the parameters, and uncertainty measures indicate high instability. This is attributed to the sparsity of data for certain combinations of covariates, as highlighted previously: not all schools have data for all subjects and year groups. We have obtained the estimates using both R and Python and reached the same results, up to negligible approximations due to differences in numerical accuracy thresholds across packages.
The graphs below display the distribution of the estimates of the teaching value added (i.e., the \(K-1\) \(\beta_2\) parameters linked to the teacher indicator \(v_k\)) by model and subject. We also plot the \(v_k\)s against average attainment (current, prior, and difference in attainment measured on the group of pupils taught in the current year by that specific teacher), to get a sense of the association of teaching value added estimates with attainment scores. To better highlight relationships, we overlay linear (in blue) and non-linear (in coral) interpolants. All such plots are stratified by subject and year group (8-10).
Model I.a - no covariates
Model I.b - with covariates
Model I.c - with covariates, interaction teacher\(\times\)teaching group
Convergence issues aside, the distributions appear roughly symmetrical, with a reasonable range.
Model II - mixed effects estensions of model I
To overcome convergence issues, we implement Model I within a Bayesian setting. All the computations have been carried out using R
’s packages that provide an interface to the probabilistic programming language stan
, and model fits have been processed using rstan
and related packages.
Another reason to opt for a Bayesian framework is that it allows here to consider more flexible models, paying the relatively small price of some additional work to specify prior distributions. Furthermore, we explore using mixed effects models under a variety of crossing and nesting combinations. Specifically, we have looked into the following extensions and combinations thereof:
- allowing for a non-linear relationship between current and previous year’s attainment via splines
- acknowledging the bounded support of the scores by putting a constraint for the predicted outcome to fall within \([0,9]\) (continuously)
- acknowledging the non-continuous nature of the outcome by considering an ordered logistic regression with logit link, to model the cumulative probabilities of endorsing each attainment score across the \(\{0-9\}\) scale
- considering crossed teacher and teaching group effects, as well as simple (teaching groups within teachers), and more involved (teaching groups within teachers within subjects) nesting structures
- pooling all data and using subject as an explanatory factor in the model (specifically, as nesting level).
In the following we present parameter estimates for some of the combinations we have considered and compare them for consistency. We have assessed satisfactory convergence of the estimates via the usual Hamiltonian Monte Carlo diagnostic checks (\(\widehat{R}\), bulk expected sample size, energy fractions, etc).
We don’t report here prior choices (they are not the default, very diffuse ones, and have been selected via a prior predictive modelling exercise).
Model II.a - teacher and teaching group as crossed random effects
In this model:
- teachers and teaching groups are considered as crossed random effects, i.e., they enter the model as independent grouping factors
current_attainment
is modelled as a potentially non-linear function ofprior_attainment
via a spline function- we adjust for
school_id
- no other covariates are considered.
The distribution of teaching value added estimates presents marked differences in variability by subject. The next graph shows the so called posterior predictive check, i.e., some draws (in light blue) from the posterior distribution, overimposed to the observed distribution (in black). This offers a visual check of the predictive behaviour of the models: ideally, we would like our models to be able to approximately reproduce the original distribution of the outcome.
The overall shape of the current attainments is recovered, to varying degrees, but due to the modelling assumptions the discrete nature of the observations is lost, as is the range of possible values.
Model II.b - simple nesting
Model II.b.1 - no covariates. This model:
- builds on model II.a by considering nesting instead of a cross-classified structure; specifically, each teaching group is now seen as nested within teachers
- everything else is as II.a
Model II.b.2 - with covariates. This model:
- builds on model II.b.1 by additionally adjusting for
gender
,ethnicity_group
,sen
,age
,year_group
,pp_sch
,eal_sch
,fsm_sch
andsen_sch
.
Model II.b.3 - with covariates, bounded outcome. This model:
- builds on model II.b.2 by additionally making the explicit constraint for the outcome to be bounded (continuously) between \([0,9]\).
Accounting for the nesting structure with this modelling choice does not seem to make much difference in terms of estimates distribution.
Model III - subject as nesting factor
Here we choose subject as a nesting factor, the main reasons being to a) obtain estimates of teaching effect that are, in a sense, ‘standardised’ to have similar variability, and b) have a single model that can leverage the fact that some students in the sample have attainments for more than one subject.
Model III.a - with covariates, subject nesting, bounded outcome
The distribution of attainments pertains all of the subjects at once.
Model III.b - with covariates, subject nesting, ordinal outcome
explicitly accounts for the ordinal nature of the outcome via an ordered logistic regression approach (no need to directly constrain the outcome to be bounded anymore)
everything else as Model III.a.
Same as model III.a, all subjetcs at once. The posterior predictive check function shifts (for some reason) the outcome range from 0-9 to 1-10, I’ll look into it but it’s probably just a matter of encoding via number of categories rather than actual numerical values.