CCPD - methodological note and results - revised July 2023

Author

Fede Andreis [federico.andreis@nesta.org.uk]

Intro

This document outlines the choices related to statistical modelling of student attainment. The focus of interest are the estimates pertaining the staff component, which are interpreted to quantify the value added of teaching, measured as residual variation after regressing students’ standardised attainment on previous year’s, by subject and adjusted for relevant intervening factors.

The estimates are then contrasted with student attainments, as well as with teacher observation data.

Data

Following discussion with subject matter experts and a review of the literature, the following variables have been selected as predictors in all of the models.

variable name	description	type	notes
`current_attainment`	MAT-wide standardised score	ordinal or numerical {0-9}	this is our outcome variable
`prior_attainment`	MAT-wide standardised previous year’s score	ordinal or numerical {0-9}
`year_group`	year group	ordinal or factor {8, 9, 10}
`gender`	sex	categorical {F, M}	pupil-level variable
`ethnicity_group`	broad ethnic group	categorical {Asian, Black, Chinese, Mixed, Other, Unknown, White}	pupil-level variable
`sen`	special education needs	categorical {EHC Plan, Non-SEN, School Support}	pupil-level variable
`age`	age of teacher	integer {22-70}	staff-level variable
`pp_sch`	proportion of pupil premium students	numerical [0, 1]	school-level variable
`eal_sch`	proportion of english as additional language students	numerical [0, 1]	school-level variable
`fsm_sch`	proportion of free school meals students	numerical [0, 1]	school-level variable
`sen_sch`	proportion of special education needs students	numerical [0, 1]	school-level variable
`subj`	subject	categorical {Maths, Science, English Language}
`staff_id`	unique staff identifier	categorical “Staff_xxx”
`join_key`	teaching group	categorical “year group + school id + group id”	used as join key across datasets
`school_id`	unique school identifier	categorical “School_xx”

Each variable might enter each model in a slightly different way, as described in the relevant sections. Below an example (the first few rows) of the final dataset (also includes the model-based estimates discussed in the following).

The dataset contains 7934 records from - schools on current and previous year’s attainments for students and - teachers. The observed frequencies by subject and year group are as follows:

[DETAILED STATS REMOVED JULY 2023]

The sample of schools spans a range of socio-economic characteristics, the table below reports some.

[DETAILED STATS REMOVED JULY 2023]

Missingness

The dataset presents a moderate degree of missingness with respect to some teacher-level characteristics (not used in the modelling phase at present) but is broadly complete. The upset plot below shows the most frequently occurring combinations of missingness across variables. Please note that when looking at the frequencies, one should bear in mind that each row in the dataset pertains to a given combination of pupil, subject and year group; for this reason, teacher data can incur duplications, thus artificially inflating the missingness figures below for staff-level information.

When we take a look at the teacher development data (focussing on staff-level data only, so no duplications), we immediately notice a consistent pattern of missingness for summer observations. This is likely due to most second observations not having been carried out/recorded yet at the time of data sharing.

Descriptives

Below is the full sample distribution of the attainment scores by subject and year group; the first boxplot in each subject group is year 8, the second year 9, the third year 10. The diamonds represent group averages.

Central tendency and variation statistics are in the table below.

subject	year group	average mark	standard deviation	median mark	interquartile range
eng_lang	8	4.77	1.92	5.0	2.0
eng_lang	9	5.04	1.90	5.0	2.0
eng_lang	10	4.60	1.78	5.0	3.0
maths	8	4.59	2.11	4.0	3.0
maths	9	4.32	2.10	4.0	2.0
maths	10	4.08	1.89	4.0	2.0
science	8	4.29	1.92	4.0	3.0
science	9	5.02	2.28	5.0	4.0
science	10	3.60	1.98	3.5	3.5

There are patterns in average attainment that seem to suggest a marked decrease in average performance for year 10 students. The MAT suggests this might be due, at least for maths, to the tiered nature of the subject. It is worth reminding, however, that different cohorts contribute to these statistics: the distributions depicted here all refer to the same academic year (2021/2022), with three year groups (8, 9, 10) within the same year. The following two plots complement the previous one by showing the overall distributions of prior attainment (i.e., in year 7, 8 and 9 during academic year 2020/2021) and the difference between the current and prior.

There appears to be some evidence that the average difference in scores between prior and current year is be overall positive for younger students (year 8 and 9 during academic year 2021/2022), and very close to zero for older ones (year 10), across subjects.

As an additional consistency check of the attainment measures, below are the scatterplots of prior vs current attainment, by ethnicity group, by subject. The linear interpolators added on top of the (jittered) data points highlight the dependence between pre/post standardised scores.

The next plot presents an overview of the distribution of current_attainment across schools and subjects.

[PLOT REMOVED JULY 2023]

Schools present some variability in terms of available data. Subject-specific patterns aside, it is of interest to observe the uneven distribution of data availability: while some schools have attainment records for all year groups and all subjects, most seem to only have data for certain combinations thereof. This creates convergence issues with some of the models, as discussed in the next section.

Models

We have grounded the statistical models discussed in the following on a discussion of the theory underlying teaching value added, and built on the existing implementations in the VAM literature (refs). We have obtained the model estimates using R and Python. R packages: tidyverse (data manipulation), rstan, brms (bayesian modelling).

Model I - simple “dynamic” or autoregressive model

Full fixed effects, frequentist approach, gaussian errors, as per VAM literature (refs).

\[^{s}y_{ijkl}(t) = \beta_0 + \beta_1\cdot {^{s}y}_{ijkl}(t-1)+ \beta_2 \cdot v_k+\sum_{h=3}^{H+1}\beta_h\cdot x_{(h-1)ijk} + \varepsilon_{ijl}\]

The subscript \(i=1,...,I\) indexes the student, \(j=1,...,J\) the teaching group, \(k=1,...,K\) the teacher, \(l=1,...,L\) the school. The superscript \(s \in \{\text{maths},\text{english language},\text{science}\}\) denotes the subject, \(t\in \{8,9,10\}\) the year group and \(h\) indexes the additional covariates \(x\). The error term \(\varepsilon_{ijl}\) is assumed gaussian.

We fit three versions of the model above in a frequentist setting:

no covariates only adjusts for the teacher indicator
with covariates adjusts for all of the covariates
nested adds a teacher \(\times\) teaching group interaction effect to model 2.

We have observed convergence issues beyond the null model with just the teacher effect. Models 2 and 3 fail to estimate all of the parameters, and uncertainty measures indicate high instability. This is attributed to the sparsity of data for certain combinations of covariates, as highlighted previously: not all schools have data for all subjects and year groups. We have obtained the estimates using both R and Python and reached the same results, up to negligible approximations due to differences in numerical accuracy thresholds across packages.

The graphs below display the distribution of the estimates of the teaching value added (i.e., the \(K-1\) \(\beta_2\) parameters linked to the teacher indicator \(v_k\)) by model and subject. We also plot the \(v_k\)s against average attainment (current, prior, and difference in attainment measured on the group of pupils taught in the current year by that specific teacher), to get a sense of the association of teaching value added estimates with attainment scores. To better highlight relationships, we overlay linear (in blue) and non-linear (in coral) interpolants. All such plots are stratified by subject and year group (8-10).

Model I.a - no covariates

Model I.b - with covariates

Model I.c - with covariates, interaction teacher\(\times\)teaching group

Convergence issues aside, the distributions appear roughly symmetrical, with a reasonable range.

Model II - mixed effects estensions of model I

To overcome convergence issues, we implement Model I within a Bayesian setting. All the computations have been carried out using R’s packages that provide an interface to the probabilistic programming language stan, and model fits have been processed using rstan and related packages.

Another reason to opt for a Bayesian framework is that it allows here to consider more flexible models, paying the relatively small price of some additional work to specify prior distributions. Furthermore, we explore using mixed effects models under a variety of crossing and nesting combinations. Specifically, we have looked into the following extensions and combinations thereof:

allowing for a non-linear relationship between current and previous year’s attainment via splines
acknowledging the bounded support of the scores by putting a constraint for the predicted outcome to fall within \([0,9]\) (continuously)
acknowledging the non-continuous nature of the outcome by considering an ordered logistic regression with logit link, to model the cumulative probabilities of endorsing each attainment score across the \(\{0-9\}\) scale
considering crossed teacher and teaching group effects, as well as simple (teaching groups within teachers), and more involved (teaching groups within teachers within subjects) nesting structures
pooling all data and using subject as an explanatory factor in the model (specifically, as nesting level).

In the following we present parameter estimates for some of the combinations we have considered and compare them for consistency. We have assessed satisfactory convergence of the estimates via the usual Hamiltonian Monte Carlo diagnostic checks (\(\widehat{R}\), bulk expected sample size, energy fractions, etc).

We don’t report here prior choices (they are not the default, very diffuse ones, and have been selected via a prior predictive modelling exercise).

Model II.a - teacher and teaching group as crossed random effects

In this model:

teachers and teaching groups are considered as crossed random effects, i.e., they enter the model as independent grouping factors
current_attainment is modelled as a potentially non-linear function of prior_attainment via a spline function
we adjust for school_id
no other covariates are considered.

The distribution of teaching value added estimates presents marked differences in variability by subject. The next graph shows the so called posterior predictive check, i.e., some draws (in light blue) from the posterior distribution, overimposed to the observed distribution (in black). This offers a visual check of the predictive behaviour of the models: ideally, we would like our models to be able to approximately reproduce the original distribution of the outcome.

The overall shape of the current attainments is recovered, to varying degrees, but due to the modelling assumptions the discrete nature of the observations is lost, as is the range of possible values.

Model II.b - simple nesting

Model II.b.1 - no covariates. This model:

builds on model II.a by considering nesting instead of a cross-classified structure; specifically, each teaching group is now seen as nested within teachers
everything else is as II.a

Model II.b.2 - with covariates. This model:

builds on model II.b.1 by additionally adjusting for gender, ethnicity_group, sen, age, year_group, pp_sch, eal_sch, fsm_sch and sen_sch.

Model II.b.3 - with covariates, bounded outcome. This model:

builds on model II.b.2 by additionally making the explicit constraint for the outcome to be bounded (continuously) between \([0,9]\).

Accounting for the nesting structure with this modelling choice does not seem to make much difference in terms of estimates distribution.

Model III - subject as nesting factor

Here we choose subject as a nesting factor, the main reasons being to a) obtain estimates of teaching effect that are, in a sense, ‘standardised’ to have similar variability, and b) have a single model that can leverage the fact that some students in the sample have attainments for more than one subject.

Model III.a - with covariates, subject nesting, bounded outcome

The distribution of attainments pertains all of the subjects at once.

Model III.b - with covariates, subject nesting, ordinal outcome

explicitly accounts for the ordinal nature of the outcome via an ordered logistic regression approach (no need to directly constrain the outcome to be bounded anymore)
everything else as Model III.a.

Same as model III.a, all subjetcs at once. The posterior predictive check function shifts (for some reason) the outcome range from 0-9 to 1-10, I’ll look into it but it’s probably just a matter of encoding via number of categories rather than actual numerical values.

Overall comparison

Comparison of estimates across models, for consistency. The labels in the plots refer to the models as follows:

label	model	label	model
`lm_basic`	I.a - linear model, fixed effects, no covariates	`nested_adj`	II.b.2 - mixed effects, with covariates, nesting teacher\(\setminus\)teaching group
`lm_all`	I.b - linear model, fixed effects, with covariates	`nested_adj_trunc`	II.b.3 - mixed effects, with covariates, nesting teacher\(\setminus\)teaching group, bounded outcome
`lm_nest`	I.c - linear model, fixed effects, with covariates, interaction teacher \(\times\)teaching group	`all`	III.a - mixed effects, with covariates, additional nesting within subject
`crossed`	II.a - mixed effects, no covariates	`all_ordinal`	III.b - mixed effects, with covariates, additional nesting within subject, ordered logistic regression
`nested`	II.b.1 - mixed effects, no covariates, nesting teacher\(\setminus\)teaching group

Correlation matrix of the estimates across models, missing values excluded on a pairwise availability basis.

	est_teacher_effect_crossed	est_teacher_effect_nested	est_teacher_effect_nested_adj	est_teacher_effect_nested_adj_truncated	est_teacher_effect_lm_basic	est_teacher_effect_lm_all	est_teacher_effect_lm_nest	est_teacher_effect_all	est_teacher_effect_all_ordinal
est_teacher_effect_crossed	1.00	1.00	0.97	0.96	0.78	0.84	0.83	0.79	0.78
est_teacher_effect_nested	1.00	1.00	0.97	0.96	0.77	0.83	0.82	0.79	0.77
est_teacher_effect_nested_adj	0.97	0.97	1.00	0.99	0.80	0.83	0.81	0.82	0.81
est_teacher_effect_nested_adj_truncated	0.96	0.96	0.99	1.00	0.80	0.82	0.81	0.81	0.80
est_teacher_effect_lm_basic	0.78	0.77	0.80	0.80	1.00	0.87	0.87	0.89	0.89
est_teacher_effect_lm_all	0.84	0.83	0.83	0.82	0.87	1.00	0.99	0.89	0.89
est_teacher_effect_lm_nest	0.83	0.82	0.81	0.81	0.87	0.99	1.00	0.89	0.88
est_teacher_effect_all	0.79	0.79	0.82	0.81	0.89	0.89	0.89	1.00	0.99
est_teacher_effect_all_ordinal	0.78	0.77	0.81	0.80	0.89	0.89	0.88	0.99	1.00

Teacher development data

We now turn our attention to the teacher development data. We will be only looking at the autumn observations in this instance, as the summer ones are heavily missing. We provide some exploratory analysis (as well as an attempt to run some confirmatory modelling), and focus, in the last section, on association between observations a teaching value added estimates.

[CATEGORIES ABSTRACTED AND ITEMS ANONYMISED - JULY 2023]

Descriptives

There appear to be only two distinct missingness patterns: for each teacher, either all of the observation scores are missing, or those related to four items.

For this first look into the observation data, we have mapped the four item levels to integers. Below are the correlogram and correlation matrices, missing values excluded on a pairwise availability basis. In the plot below, colours range from dark red (perfect negative correlation) to dark blue (perfect positive correlation).

	item_1	item_2	item_3	item_4	item_5	item_6	item_7	item_8	item_9	item_10	item_11
item_1	1.00	0.83	0.82	0.65	0.68	0.65	0.69	0.72	0.70	0.61	0.70
item_2	0.83	1.00	0.81	0.65	0.64	0.65	0.68	0.66	0.60	0.59	0.61
item_3	0.82	0.81	1.00	0.68	0.65	0.64	0.67	0.72	0.63	0.63	0.64
item_4	0.65	0.65	0.68	1.00	0.77	0.68	0.80	0.74	0.69	0.65	0.68
item_5	0.68	0.64	0.65	0.77	1.00	0.74	0.78	0.75	0.67	0.64	0.72
item_6	0.65	0.65	0.64	0.68	0.74	1.00	0.67	0.73	0.75	0.61	0.76
item_7	0.69	0.68	0.67	0.80	0.78	0.67	1.00	0.71	0.63	0.69	0.74
item_8	0.72	0.66	0.72	0.74	0.75	0.73	0.71	1.00	0.73	0.70	0.75
item_9	0.70	0.60	0.63	0.69	0.67	0.75	0.63	0.73	1.00	0.63	0.71
item_10	0.61	0.59	0.63	0.65	0.64	0.61	0.69	0.70	0.63	1.00	0.63
item_11	0.70	0.61	0.64	0.68	0.72	0.76	0.74	0.75	0.71	0.63	1.00

We now take a more in-depth look at the (linear) correlation structure of the observation data. In the following, we present the results from both exploratory and confirmatory factor analysis.

Exploratory Factor Analysis

We consider observation data for both teachers in the subset used for the analysis and in the full dataset (again only looking at the autumn measurements and omitting records with missing data), for additional assurance that no significant selection-to-sample mechanisms are at play.

Visual inspection of the scree plots above seems to indicate no difference between the subset and the full sample, in terms of suggestion of number of factors to consider. A one factor solution would then seem to be preferable.

From the MAT’s documentation, we learn that the 11 items we have data for are categorised into 4 larger strands of the rubric, as follows:

[CATEGORIES ABSTRACTED AND ITEMS ANONYMISED - JULY 2023]

In the interest of exploration and understanding the 4 strands used by the MAT, we have carried out our analyses considering 1 to 4 factors. We have explored both orthogonal and non-orthogonal rotations. For solutions with two or more factors, we estimate a strong correlation amongst all factors, thus pointing to a non-orthogonal rotation as preferable However, the usual oblimin approach this results in convergence issues when the number of factors is greater than 2, and overall does not provide substantially different estimates for the loadings. We overcome this issue by employing a Procrustes rotation of the varimax solution (usually labelled as promax): this still allows for correlation amongst factors, while being numerically more robust.

Study dataset

What follows is a visualisation of the \(1-\) to \(4-\)factor solutions based on the subset of the full sample used for the analysis. We use a loadings threshold of \(|0.3|\) for the edges to be displayed.

One factor solution

Factor Analysis using method =  pa
Call: fa(r = combined_df_, nfactors = 1, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
         PA1   h2   u2 com
item_1  0.85 0.72 0.28   1
item_2  0.81 0.66 0.34   1
item_3  0.83 0.69 0.31   1
item_4  0.84 0.71 0.29   1
item_5  0.85 0.72 0.28   1
item_6  0.83 0.68 0.32   1
item_7  0.85 0.72 0.28   1
item_8  0.87 0.76 0.24   1
item_9  0.82 0.67 0.33   1
item_10 0.76 0.58 0.42   1
item_11 0.84 0.70 0.30   1

                PA1
SS loadings    7.61
Proportion Var 0.69

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

df null model =  55  with the objective function =  11.06 with Chi Square =  1288.44
df of  the model are 44  and the objective function was  1.27 

The root mean square of the residuals (RMSR) is  0.05 
The df corrected root mean square of the residuals is  0.05 

The harmonic n.obs is  122 with the empirical chi square  31.66  with prob <  0.92 
The total n.obs was  122  with Likelihood Chi Square =  147.22  with prob <  4.7e-13 

Tucker Lewis Index of factoring reliability =  0.895
RMSEA index =  0.138  and the 90 % confidence intervals are  0.115 0.164
BIC =  -64.16
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1
Correlation of (regression) scores with factors   0.98
Multiple R square of scores with factors          0.96
Minimum correlation of possible factor scores     0.92

Two-factors solution

Factor Analysis using method =  pa
Call: fa(r = combined_df_, nfactors = 2, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
         PA1  PA2   h2   u2 com
item_1  0.14 0.81 0.84 0.16 1.1
item_2  0.04 0.88 0.82 0.18 1.0
item_3  0.11 0.82 0.81 0.19 1.0
item_4  0.81 0.06 0.73 0.27 1.0
item_5  0.86 0.01 0.75 0.25 1.0
item_6  0.78 0.06 0.70 0.30 1.0
item_7  0.80 0.07 0.73 0.27 1.0
item_8  0.75 0.15 0.77 0.23 1.1
item_9  0.69 0.16 0.68 0.32 1.1
item_10 0.68 0.11 0.59 0.41 1.1
item_11 0.82 0.04 0.72 0.28 1.0

                       PA1  PA2
SS loadings           5.42 2.72
Proportion Var        0.49 0.25
Cumulative Var        0.49 0.74
Proportion Explained  0.67 0.33
Cumulative Proportion 0.67 1.00

 With factor correlations of 
     PA1  PA2
PA1 1.00 0.79
PA2 0.79 1.00

Mean item complexity =  1
Test of the hypothesis that 2 factors are sufficient.

df null model =  55  with the objective function =  11.06 with Chi Square =  1288.44
df of  the model are 34  and the objective function was  0.56 

The root mean square of the residuals (RMSR) is  0.03 
The df corrected root mean square of the residuals is  0.03 

The harmonic n.obs is  122 with the empirical chi square  9.78  with prob <  1 
The total n.obs was  122  with Likelihood Chi Square =  64.43  with prob <  0.0012 

Tucker Lewis Index of factoring reliability =  0.96
RMSEA index =  0.085  and the 90 % confidence intervals are  0.053 0.118
BIC =  -98.9
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1  PA2
Correlation of (regression) scores with factors   0.97 0.97
Multiple R square of scores with factors          0.95 0.93
Minimum correlation of possible factor scores     0.90 0.86

Three-factors solution

Factor Analysis using method =  pa
Call: fa(r = combined_df_, nfactors = 3, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
          PA1   PA3  PA2   h2   u2 com
item_1   0.19 -0.04 0.80 0.84 0.16 1.1
item_2  -0.04  0.07 0.88 0.82 0.18 1.0
item_3   0.03  0.08 0.82 0.81 0.19 1.0
item_4   0.12  0.73 0.06 0.77 0.23 1.1
item_5   0.32  0.58 0.01 0.76 0.24 1.6
item_6   0.83  0.04 0.01 0.75 0.25 1.0
item_7  -0.05  0.92 0.06 0.85 0.15 1.0
item_8   0.57  0.22 0.13 0.77 0.23 1.4
item_9   0.80 -0.04 0.11 0.74 0.26 1.0
item_10  0.34  0.36 0.11 0.58 0.42 2.2
item_11  0.66  0.21 0.01 0.73 0.27 1.2

                       PA1  PA3  PA2
SS loadings           3.17 2.65 2.61
Proportion Var        0.29 0.24 0.24
Cumulative Var        0.29 0.53 0.77
Proportion Explained  0.38 0.31 0.31
Cumulative Proportion 0.38 0.69 1.00

 With factor correlations of 
     PA1  PA3  PA2
PA1 1.00 0.83 0.78
PA3 0.83 1.00 0.75
PA2 0.78 0.75 1.00

Mean item complexity =  1.2
Test of the hypothesis that 3 factors are sufficient.

df null model =  55  with the objective function =  11.06 with Chi Square =  1288.44
df of  the model are 25  and the objective function was  0.31 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.02 

The harmonic n.obs is  122 with the empirical chi square  3.79  with prob <  1 
The total n.obs was  122  with Likelihood Chi Square =  35.01  with prob <  0.088 

Tucker Lewis Index of factoring reliability =  0.982
RMSEA index =  0.057  and the 90 % confidence intervals are  0 0.099
BIC =  -85.09
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1  PA3  PA2
Correlation of (regression) scores with factors   0.96 0.96 0.97
Multiple R square of scores with factors          0.92 0.93 0.93
Minimum correlation of possible factor scores     0.85 0.85 0.87

Four-factors solution

Factor Analysis using method =  pa
Call: fa(r = combined_df_, nfactors = 4, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
          PA1  PA2   PA3   PA4   h2   u2 com
item_1   0.17 0.80 -0.01 -0.01 0.84 0.16 1.1
item_2   0.02 0.90  0.09 -0.11 0.83 0.17 1.1
item_3  -0.05 0.79  0.03  0.18 0.82 0.18 1.1
item_4   0.08 0.06  0.63  0.16 0.77 0.23 1.2
item_5   0.35 0.04  0.57 -0.03 0.77 0.23 1.7
item_6   0.92 0.03  0.05 -0.11 0.81 0.19 1.0
item_7  -0.04 0.07  0.90  0.01 0.86 0.14 1.0
item_8   0.37 0.08  0.11  0.41 0.80 0.20 2.2
item_9   0.63 0.10 -0.06  0.23 0.73 0.27 1.3
item_10  0.08 0.03  0.24  0.52 0.64 0.36 1.5
item_11  0.59 0.03  0.22  0.07 0.73 0.27 1.3

                       PA1  PA2  PA3  PA4
SS loadings           2.63 2.57 2.36 1.06
Proportion Var        0.24 0.23 0.21 0.10
Cumulative Var        0.24 0.47 0.69 0.78
Proportion Explained  0.31 0.30 0.27 0.12
Cumulative Proportion 0.31 0.60 0.88 1.00

 With factor correlations of 
     PA1  PA2  PA3  PA4
PA1 1.00 0.76 0.80 0.78
PA2 0.76 1.00 0.73 0.73
PA3 0.80 0.73 1.00 0.75
PA4 0.78 0.73 0.75 1.00

Mean item complexity =  1.3
Test of the hypothesis that 4 factors are sufficient.

df null model =  55  with the objective function =  11.06 with Chi Square =  1288.44
df of  the model are 17  and the objective function was  0.23 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.02 

The harmonic n.obs is  122 with the empirical chi square  2.32  with prob <  1 
The total n.obs was  122  with Likelihood Chi Square =  26.35  with prob <  0.068 

Tucker Lewis Index of factoring reliability =  0.975
RMSEA index =  0.067  and the 90 % confidence intervals are  0 0.115
BIC =  -55.32
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1  PA2  PA3  PA4
Correlation of (regression) scores with factors   0.96 0.97 0.96 0.90
Multiple R square of scores with factors          0.92 0.94 0.92 0.81
Minimum correlation of possible factor scores     0.84 0.87 0.85 0.62

Full dataset

Same as previous sub-section, but with observation data from the full dataset. Here, we observe some slight differences in the proposed solutions, it would be interesting to think about what that implies.

One factor solution

Factor Analysis using method =  pa
Call: fa(r = staff_gtr, nfactors = 1, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
         PA1   h2   u2 com
item_1  0.84 0.71 0.29   1
item_2  0.81 0.65 0.35   1
item_3  0.81 0.66 0.34   1
item_4  0.85 0.72 0.28   1
item_5  0.86 0.73 0.27   1
item_6  0.87 0.75 0.25   1
item_7  0.86 0.73 0.27   1
item_8  0.87 0.76 0.24   1
item_9  0.85 0.72 0.28   1
item_10 0.81 0.66 0.34   1
item_11 0.85 0.72 0.28   1

                PA1
SS loadings    7.82
Proportion Var 0.71

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

df null model =  55  with the objective function =  11.13 with Chi Square =  6582.75
df of  the model are 44  and the objective function was  0.74 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.04 

The harmonic n.obs is  597 with the empirical chi square  106.33  with prob <  4.5e-07 
The total n.obs was  597  with Likelihood Chi Square =  436.02  with prob <  5.8e-66 

Tucker Lewis Index of factoring reliability =  0.925
RMSEA index =  0.122  and the 90 % confidence intervals are  0.112 0.133
BIC =  154.78
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1
Correlation of (regression) scores with factors   0.98
Multiple R square of scores with factors          0.96
Minimum correlation of possible factor scores     0.93

Two-factors solution

Factor Analysis using method =  pa
Call: fa(r = staff_gtr, nfactors = 2, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
         PA1   PA2   h2   u2 com
item_1  0.14  0.78 0.81 0.19 1.1
item_2  0.01  0.89 0.81 0.19 1.0
item_3  0.10  0.80 0.78 0.22 1.0
item_4  0.84  0.04 0.75 0.25 1.0
item_5  0.93 -0.05 0.78 0.22 1.0
item_6  0.76  0.13 0.76 0.24 1.1
item_7  0.82  0.06 0.75 0.25 1.0
item_8  0.77  0.13 0.77 0.23 1.1
item_9  0.65  0.23 0.72 0.28 1.2
item_10 0.63  0.21 0.66 0.34 1.2
item_11 0.78  0.10 0.74 0.26 1.0

                       PA1  PA2
SS loadings           5.49 2.83
Proportion Var        0.50 0.26
Cumulative Var        0.50 0.76
Proportion Explained  0.66 0.34
Cumulative Proportion 0.66 1.00

 With factor correlations of 
    PA1 PA2
PA1 1.0 0.8
PA2 0.8 1.0

Mean item complexity =  1.1
Test of the hypothesis that 2 factors are sufficient.

df null model =  55  with the objective function =  11.13 with Chi Square =  6582.75
df of  the model are 34  and the objective function was  0.12 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.01 

The harmonic n.obs is  597 with the empirical chi square  8.94  with prob <  1 
The total n.obs was  597  with Likelihood Chi Square =  69.97  with prob <  0.00027 

Tucker Lewis Index of factoring reliability =  0.991
RMSEA index =  0.042  and the 90 % confidence intervals are  0.028 0.056
BIC =  -147.35
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1  PA2
Correlation of (regression) scores with factors   0.98 0.96
Multiple R square of scores with factors          0.96 0.93
Minimum correlation of possible factor scores     0.91 0.85

Three-factors solution

Factor Analysis using method =  pa
Call: fa(r = staff_gtr, nfactors = 3, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
         PA1   PA2   PA3   h2   u2 com
item_1  0.14  0.80 -0.02 0.81 0.19 1.1
item_2  0.00  0.90  0.01 0.81 0.19 1.0
item_3  0.08  0.81  0.01 0.78 0.22 1.0
item_4  0.86  0.09 -0.07 0.78 0.22 1.0
item_5  0.87 -0.02  0.06 0.79 0.21 1.0
item_6  0.60  0.09  0.29 0.78 0.22 1.5
item_7  0.80  0.10 -0.01 0.77 0.23 1.0
item_8  0.72  0.16  0.05 0.77 0.23 1.1
item_9  0.50  0.20  0.25 0.73 0.27 1.8
item_10 0.52  0.20  0.19 0.66 0.34 1.6
item_11 0.61  0.06  0.29 0.75 0.25 1.4

                       PA1  PA2  PA3
SS loadings           4.84 2.88 0.71
Proportion Var        0.44 0.26 0.06
Cumulative Var        0.44 0.70 0.77
Proportion Explained  0.57 0.34 0.08
Cumulative Proportion 0.57 0.92 1.00

 With factor correlations of 
     PA1  PA2  PA3
PA1 1.00 0.78 0.61
PA2 0.78 1.00 0.64
PA3 0.61 0.64 1.00

Mean item complexity =  1.2
Test of the hypothesis that 3 factors are sufficient.

df null model =  55  with the objective function =  11.13 with Chi Square =  6582.75
df of  the model are 25  and the objective function was  0.07 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.01 

The harmonic n.obs is  597 with the empirical chi square  4.78  with prob <  1 
The total n.obs was  597  with Likelihood Chi Square =  42.29  with prob <  0.017 

Tucker Lewis Index of factoring reliability =  0.994
RMSEA index =  0.034  and the 90 % confidence intervals are  0.015 0.051
BIC =  -117.51
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1  PA2  PA3
Correlation of (regression) scores with factors   0.97 0.96 0.79
Multiple R square of scores with factors          0.95 0.93 0.63
Minimum correlation of possible factor scores     0.89 0.86 0.26

Four-factors solution

Factor Analysis using method =  pa
Call: fa(r = staff_gtr, nfactors = 4, rotate = "promax", max.iter = 100, 
    fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
         PA1   PA2   PA3   PA4   h2   u2 com
item_1  0.10  0.77  0.10 -0.08 0.82 0.18 1.1
item_2  0.01  0.92 -0.06  0.05 0.82 0.18 1.0
item_3  0.09  0.80 -0.01  0.02 0.77 0.23 1.0
item_4  0.84  0.05  0.05 -0.09 0.78 0.22 1.0
item_5  0.93 -0.01 -0.09  0.10 0.80 0.20 1.0
item_6  0.58  0.10  0.11  0.26 0.80 0.20 1.5
item_7  0.86  0.11 -0.09  0.01 0.77 0.23 1.1
item_8  0.68  0.12  0.14 -0.04 0.78 0.22 1.2
item_9  0.24  0.07  0.62  0.04 0.82 0.18 1.3
item_10 0.51  0.20  0.10  0.12 0.66 0.34 1.5
item_11 0.59  0.07  0.17  0.15 0.74 0.26 1.3

                       PA1  PA2  PA3  PA4
SS loadings           4.65 2.73 0.88 0.30
Proportion Var        0.42 0.25 0.08 0.03
Cumulative Var        0.42 0.67 0.75 0.78
Proportion Explained  0.54 0.32 0.10 0.03
Cumulative Proportion 0.54 0.86 0.97 1.00

 With factor correlations of 
     PA1  PA2  PA3  PA4
PA1 1.00 0.79 0.81 0.39
PA2 0.79 1.00 0.76 0.40
PA3 0.81 0.76 1.00 0.41
PA4 0.39 0.40 0.41 1.00

Mean item complexity =  1.2
Test of the hypothesis that 4 factors are sufficient.

df null model =  55  with the objective function =  11.13 with Chi Square =  6582.75
df of  the model are 17  and the objective function was  0.03 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.01 

The harmonic n.obs is  597 with the empirical chi square  2.31  with prob <  1 
The total n.obs was  597  with Likelihood Chi Square =  19.88  with prob <  0.28 

Tucker Lewis Index of factoring reliability =  0.999
RMSEA index =  0.017  and the 90 % confidence intervals are  0 0.042
BIC =  -88.79
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1  PA2  PA3   PA4
Correlation of (regression) scores with factors   0.97 0.96 0.91  0.66
Multiple R square of scores with factors          0.95 0.93 0.83  0.44
Minimum correlation of possible factor scores     0.90 0.86 0.65 -0.13

Confirmatory Factor Analysis

Moving now on to a confirmatory approach, we consider some models, outlined below. For each model, we present the underlying implied structure, the estimates and the corresponding path diagram. We also present the most common fit statistics (cfi, rmsea, bic, aic).

Model 0 - testing the rubric structure

lavaan 0.6.15 ended normally after 60 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        27

  Number of observations                           122

Model Test User Model:
                                                      
  Test statistic                                62.932
  Degrees of freedom                                39
  P-value (Chi-square)                           0.009

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 =~                                                            
    item_1            1.000                               0.794    0.922
    item_2            0.908    0.057   16.000    0.000    0.721    0.898
    item_3            0.945    0.059   16.136    0.000    0.750    0.901
  strand2 =~                                                            
    item_4            1.000                               0.683    0.869
    item_5            0.976    0.072   13.517    0.000    0.667    0.880
    item_6            0.872    0.073   12.018    0.000    0.595    0.828
    item_7            1.016    0.077   13.263    0.000    0.694    0.872
  strand3 =~                                                            
    item_8            1.000                               0.628    0.891
    item_9            0.970    0.077   12.641    0.000    0.610    0.834
    item_10           0.906    0.081   11.122    0.000    0.569    0.779
  strand4 =~                                                            
    item_11           1.000                               0.721    1.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 ~~                                                            
    strand2           0.456    0.072    6.354    0.000    0.842    0.842
    strand3           0.433    0.066    6.523    0.000    0.869    0.869
    strand4           0.413    0.067    6.201    0.000    0.723    0.723
  strand2 ~~                                                            
    strand3           0.408    0.061    6.652    0.000    0.951    0.951
    strand4           0.410    0.062    6.576    0.000    0.833    0.833
  strand3 ~~                                                            
    strand4           0.381    0.057    6.673    0.000    0.841    0.841

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .item_1            0.112    0.022    5.002    0.000    0.112    0.150
   .item_2            0.124    0.022    5.733    0.000    0.124    0.193
   .item_3            0.130    0.023    5.655    0.000    0.130    0.188
   .item_4            0.151    0.024    6.428    0.000    0.151    0.245
   .item_5            0.129    0.021    6.265    0.000    0.129    0.225
   .item_6            0.163    0.024    6.848    0.000    0.163    0.315
   .item_7            0.152    0.024    6.391    0.000    0.152    0.240
   .item_8            0.102    0.018    5.592    0.000    0.102    0.206
   .item_9            0.162    0.025    6.603    0.000    0.162    0.304
   .item_10           0.210    0.030    7.024    0.000    0.210    0.393
   .item_11           0.000                               0.000    0.000
    strand1           0.630    0.095    6.603    0.000    1.000    1.000
    strand2           0.467    0.078    5.995    0.000    1.000    1.000
    strand3           0.395    0.064    6.210    0.000    1.000    1.000
    strand4           0.519    0.066    7.810    0.000    1.000    1.000

     cfi    rmsea      bic      aic 
   0.982    0.071 1936.147 1860.438

Model 1 - only one factor

   Length     Class      Mode 
        1 character character

     cfi    rmsea      bic      aic 
   0.914    0.144 2004.191 1942.503

Model 2 - two-factors, based on the corresponding exploratory version

lavaan 0.6.15 ended normally after 44 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        23

  Number of observations                           122

Model Test User Model:
                                                      
  Test statistic                                71.534
  Degrees of freedom                                43
  P-value (Chi-square)                           0.004

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 =~                                                            
    item_1            1.000                               0.794    0.922
    item_2            0.909    0.057   16.027    0.000    0.721    0.899
    item_3            0.944    0.059   16.090    0.000    0.750    0.901
  other =~                                                              
    item_4            1.000                               0.672    0.855
    item_5            0.974    0.076   12.776    0.000    0.655    0.864
    item_6            0.894    0.074   12.009    0.000    0.601    0.835
    item_7            1.013    0.081   12.526    0.000    0.681    0.855
    item_8            0.917    0.070   13.075    0.000    0.617    0.875
    item_9            0.894    0.076   11.685    0.000    0.601    0.822
    item_10           0.835    0.080   10.448    0.000    0.562    0.768
    item_11           0.909    0.074   12.335    0.000    0.611    0.848

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 ~~                                                            
    other             0.460    0.072    6.410    0.000    0.863    0.863

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .item_1            0.112    0.022    4.998    0.000    0.112    0.151
   .item_2            0.123    0.022    5.702    0.000    0.123    0.191
   .item_3            0.131    0.023    5.665    0.000    0.131    0.189
   .item_4            0.166    0.024    6.848    0.000    0.166    0.268
   .item_5            0.145    0.021    6.769    0.000    0.145    0.253
   .item_6            0.157    0.022    6.998    0.000    0.157    0.302
   .item_7            0.171    0.025    6.852    0.000    0.171    0.269
   .item_8            0.116    0.017    6.658    0.000    0.116    0.234
   .item_9            0.173    0.024    7.076    0.000    0.173    0.324
   .item_10           0.219    0.030    7.304    0.000    0.219    0.410
   .item_11           0.146    0.021    6.909    0.000    0.146    0.281
    strand1           0.630    0.095    6.600    0.000    1.000    1.000
    other             0.452    0.077    5.865    0.000    1.000    1.000

     cfi    rmsea      bic      aic 
   0.978    0.074 1925.532 1861.040

Model 3 - three-factors, based on the corresponding exploratory version

lavaan 0.6.15 ended normally after 50 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        25

  Number of observations                           122

Model Test User Model:
                                                      
  Test statistic                                53.102
  Degrees of freedom                                41
  P-value (Chi-square)                           0.098

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 =~                                                            
    item_1            1.000                               0.794    0.922
    item_2            0.908    0.057   16.044    0.000    0.721    0.899
    item_3            0.943    0.059   16.101    0.000    0.749    0.900
  strand2. =~                                                           
    item_4            1.000                               0.693    0.881
    item_5            0.960    0.070   13.746    0.000    0.665    0.878
    item_7            1.019    0.073   14.031    0.000    0.706    0.886
    item_10           0.807    0.076   10.646    0.000    0.559    0.765
  strand3. =~                                                           
    item_6            1.000                               0.614    0.853
    item_8            1.013    0.078   13.026    0.000    0.622    0.882
    item_9            1.001    0.084   11.963    0.000    0.615    0.841
    item_11           1.010    0.081   12.461    0.000    0.620    0.861

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 ~~                                                            
    strand2.          0.459    0.072    6.365    0.000    0.835    0.835
    strand3.          0.418    0.066    6.359    0.000    0.857    0.857
  strand2. ~~                                                           
    strand3.          0.397    0.061    6.501    0.000    0.934    0.934

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .item_1            0.111    0.022    4.972    0.000    0.111    0.149
   .item_2            0.123    0.022    5.714    0.000    0.123    0.192
   .item_3            0.131    0.023    5.682    0.000    0.131    0.190
   .item_4            0.138    0.023    6.123    0.000    0.138    0.223
   .item_5            0.132    0.021    6.188    0.000    0.132    0.230
   .item_7            0.136    0.023    6.033    0.000    0.136    0.215
   .item_10           0.221    0.031    7.139    0.000    0.221    0.415
   .item_6            0.141    0.021    6.548    0.000    0.141    0.272
   .item_8            0.110    0.018    6.146    0.000    0.110    0.222
   .item_9            0.156    0.023    6.675    0.000    0.156    0.293
   .item_11           0.135    0.021    6.461    0.000    0.135    0.259
    strand1           0.631    0.095    6.610    0.000    1.000    1.000
    strand2.          0.480    0.078    6.123    0.000    1.000    1.000
    strand3.          0.377    0.065    5.813    0.000    1.000    1.000

     cfi    rmsea      bic      aic 
   0.991    0.049 1916.708 1846.608

Model 4 - like model 3, but with literacy on its own

lavaan 0.6.15 ended normally after 61 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        27

  Number of observations                           122

Model Test User Model:
                                                      
  Test statistic                                48.533
  Degrees of freedom                                39
  P-value (Chi-square)                           0.141

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 =~                                                            
    item_1            1.000                               0.794    0.922
    item_2            0.908    0.057   16.028    0.000    0.721    0.899
    item_3            0.944    0.059   16.111    0.000    0.749    0.901
  strand2.. =~                                                          
    item_4            1.000                               0.697    0.887
    item_5            0.957    0.069   13.976    0.000    0.668    0.881
    item_7            1.017    0.071   14.287    0.000    0.709    0.890
  strand3. =~                                                           
    item_6            1.000                               0.612    0.851
    item_8            1.018    0.078   13.030    0.000    0.623    0.884
    item_9            1.005    0.084   11.942    0.000    0.615    0.842
    item_11           1.011    0.082   12.363    0.000    0.619    0.859
  strand3 =~                                                            
    item_10           1.000                               0.731    1.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  strand1 ~~                                                            
    strand2..         0.457    0.072    6.338    0.000    0.826    0.826
    strand3.          0.417    0.066    6.351    0.000    0.857    0.857
    strand3           0.387    0.066    5.877    0.000    0.666    0.666
  strand2.. ~~                                                          
    strand3.          0.393    0.061    6.467    0.000    0.921    0.921
    strand3           0.378    0.061    6.189    0.000    0.742    0.742
  strand3. ~~                                                           
    strand3           0.339    0.055    6.167    0.000    0.759    0.759

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .item_1            0.111    0.022    4.982    0.000    0.111    0.150
   .item_2            0.124    0.022    5.719    0.000    0.124    0.192
   .item_3            0.131    0.023    5.672    0.000    0.131    0.189
   .item_4            0.131    0.022    5.865    0.000    0.131    0.213
   .item_5            0.128    0.021    5.990    0.000    0.128    0.224
   .item_7            0.131    0.023    5.798    0.000    0.131    0.207
   .item_6            0.143    0.022    6.592    0.000    0.143    0.276
   .item_8            0.108    0.018    6.127    0.000    0.108    0.218
   .item_9            0.155    0.023    6.681    0.000    0.155    0.291
   .item_11           0.136    0.021    6.502    0.000    0.136    0.262
   .item_10           0.000                               0.000    0.000
    strand1           0.631    0.095    6.607    0.000    1.000    1.000
    strand2..         0.486    0.079    6.181    0.000    1.000    1.000
    strand3.          0.375    0.065    5.789    0.000    1.000    1.000
    strand3           0.534    0.068    7.810    0.000    1.000    1.000

     cfi    rmsea      bic      aic 
   0.993    0.045 1921.747 1846.038

Overall similar results for all models when using the entire Staff GTR dataset:

	cfi	rmsea	bic	aic
model_0	0.993	0.045	9278.842	9160.260
model_1	0.940	0.123	9599.130	9502.507
model_2	0.990	0.050	9274.897	9173.883
model_3	0.992	0.047	9273.831	9164.033
model_4	0.994	0.040	9268.671	9150.089

Association with teaching value added estimates

As previously, we consider observations taken in autumn. To retain more information, were available, we drop missing values on a pairwise availability basis, and employ linear interpolants to provide a sense of trends. Non-linear smoothers can also be presented, if of interest.

The following subsections are labelled as per the MAT’s thematic grouping of the observation items. We present both the overall and by-subject linear association between estimated teacher effect and observation scores (simply coded as numerical).

Each color represent one model used to estimate the teacher effect, as outlined below:

We have chosen to present association as estimated by each of the models as a sensitivity check on our assumptions.

Strand 1

Overall linear trends:

Subject-specific linear trends:

Strand 2

Overall linear trends:

Subject-specific linear trends:

Strand 3

Overall linear trends:

Subject-specific linear trends:

strand 4

Overall linear trends:

Subject-specific linear trends: