Ofqual have provided a detailed technical report about the algorithm they used to determine the grades to be given for A level and GCSE students in England in 2020. Here, I’ll try to explain it step by step. Note this is not (currently) a critique; I’m just sharing it to help people understand it.
If you have any questions or things that you’d like more explanation around, please contact me on Twitter and I’ll do my best.
The algorithm itself is described in Section 8 of Ofqual’s technical report (p83). It includes the following steps:
This algorithm is used if a school has more than fifteen children doing an A level or GCSE in a given subject.
If a school has five or fewer children doing an A level or GCSE in a given subject, steps 1-7 get skipped, and the rough grades that get used to allocate marks to students are based on the grades their teachers originally predicted for them.
If a school has between five and fifteen children doing an A level or GCSE in a given subject, then a combination of the teacher predictions and the algorithmic predictions get used.
As teachers overall tend to over-estimate grades, this means overall scores will tend to be higher for small classes.
In the first step, Ofqual create a historic profile of the grades pupils have previously achieved for each subject offered at each school.
For A levels, this historic profile looks at the past three years of results.
For GCSEs it’s a bit more complicated because they have recently gone through reform: the grading has changed to numbers (9 to 1) and the curriculum has also changed to be more demanding. This reform has been staggered. For maths and English this happened in 2017. For most other popular GCSEs (including science, humanities and modern foreign languages like French and Spanish) it was in 2018. For less common GCSEs, including astronomy, sociology and Polish, the only grades we have come from 2019, the first year the reformed curriculum came into play.
As a consequence, for Phase One and Phase Two GCSEs, two years of historic data is used. For Phase Three GCSEs, one year of historic data is used. Phase Four GCSEs don’t have any historic data, so, again, raw teacher predictions are used for these subjects (biblical Hebrew, Gujarati, Persian, Portuguese and Turkish).
This example used in the technical report, shows how this is done for A levels but it’s similar for GCSEs.
Year | A* | A | B | C | D | E | U | Total students |
---|---|---|---|---|---|---|---|---|
2017 | 2 | 5 | 7 | 9 | 4 | 2 | 1 | 30 |
2018 | 3 | 3 | 7 | 11 | 5 | 2 | 1 | 32 |
2019 | 1 | 3 | 7 | 8 | 4 | 2 | 2 | 27 |
If you prefer something more visual, here’s a graph of the same numbers.
The majority of the algorithm works on the percentage of students achieving at least a given grade. Here’s what that looks like:
In the algorithm the total number of students from the last one, two or three years achieving at least a given grade is converted into a percentage. The percentage achieving at least the lowest grade is always 100%.
A*+ | A+ | B+ | C+ | D+ | E+ | U+ |
---|---|---|---|---|---|---|
6.7 | 19.1 | 42.7 | 74.2 | 88.8 | 95.5 | 100 |
In this step, Ofqual zooms out to look at the whole of England and all the students who have studied and got results on the relevant qualifications. Then Ofqual goes and looks at how their prior attainment: how well they did on their GCSEs for A levels, or their Key Stage 2 results for GCSEs.
Key Stage 2 results come from testing at the end of Year 6 (primary school) and give pupils scores in reading, writing and maths. These scores represent a general level of achievement at the end of primary school and are commonly used as a basis to understand how much progress children make during secondary school.
Rather than using raw scores from this prior attainment, Ofqual created ten buckets, each containing 10% of students, based on their GCSE or Key Stage 2 results.
How do these measures of prior attainment relate to the final grades students achieve in different subjects? To understand that, Ofqual create a matrix for each subject. Each row contains a different prior attainment (one of the ten buckets as described above) and each column an A level or GCSE grade as applicable. Here’s an example of what that table looks like.
As you might expect, looking across the whole of England, in general students that do well on Key Stage 2 will do well on GCSEs, and those that do well at GCSEs will do well at A levels. However, there are also some students who despite doing poorly at Key Stage 2 do very well on particular subjects at GCSE, and vice versa.
This matrix can be used to predict how many students will achieve particular grades in a subject, assuming they performed at the national average. To make it useful for doing that, all the numbers are again turned into percentages, and instead of looking at the percentage that achieved each grade, you look at the percentage that achieved at least that grade.
In this step, Ofqual look at the final results of previous students compared to that cohort’s prior attainment. They’re interested in seeing what pattern of grades the school would have got, if it had performed in a way aligned with the national average. So they count up pupils from previous years (eg 2017-2019 for A levels; 2018 and 2019, or just 2019 depending on the subject for GCSEs) and see what buckets their prior attainment (GCSEs for A levels, Key Stage 2 for GCSEs) fell into.
Here’s an example of what that prior attainment pattern might look like:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|
9 | 7 | 10 | 16 | 11 | 12 | 10 | 3 | 5 | 2 |
Now we can feed these numbers into the matrix we created earlier. So for example, there are 10 students in the GCSE prior attainment category ‘3’. If we look back at the matrix, we can see the relevant percentage of these students who get at least each grade:
Prior | A*+ | A+ | B+ | C+ | D+ | E+ | U+ |
---|---|---|---|---|---|---|---|
3 | 2.8 | 23 | 56.1 | 83.9 | 96.4 | 99.7 | 100 |
These are percentages so we can divide the numbers by 10 to understand how the 10 previous “prior attainment bucket 3” students in this school would have fared. Out of the 10 students, we would expect 0.28 of them to achieve a A*, 2.3 to achieve an A or more, 5.61 to achieve a B or more, and 9.97 to achieve a E or more.
The algorithm performs the same calculation for each of the prior attainment buckets, and creates another matrix that shows how many of these previous students would have been expected to achieve each grade.
Prior attainment bucket | A* | A | B | C | D | E | U |
---|---|---|---|---|---|---|---|
1 | 3.2 | 7.1 | 8.5 | 8.9 | 9.0 | 9.0 | 9 |
2 | 0.6 | 3.1 | 5.4 | 6.6 | 6.9 | 7.0 | 7 |
3 | 0.3 | 2.3 | 5.6 | 8.4 | 9.6 | 10.0 | 10 |
4 | 0.2 | 2.0 | 6.4 | 11.6 | 14.9 | 15.8 | 16 |
5 | 0.1 | 0.7 | 2.9 | 6.5 | 9.5 | 10.8 | 11 |
6 | 0.0 | 0.5 | 2.0 | 5.5 | 9.4 | 11.5 | 12 |
7 | 0.0 | 0.2 | 1.1 | 3.6 | 7.0 | 9.3 | 10 |
8 | 0.0 | 0.0 | 0.3 | 0.9 | 1.9 | 2.7 | 3 |
9 | 0.0 | 0.0 | 0.3 | 1.0 | 2.5 | 4.1 | 5 |
10 | 0.0 | 0.0 | 0.1 | 0.3 | 0.7 | 1.3 | 2 |
Now we can add all these predicted numbers together for each grade, to give the total number of students we think would have achieved each grade, if the school was performing the same as the average English school. For example, we expect there to be 8.5 students from prior attainment bucket 1 getting at least a B, 5.4 from bucket 2, 5.6 from bucket 3 and so on. Add all those up and we come to 38.4 students getting at least a B.
A*+ | A+ | B+ | C+ | D+ | E+ | U+ |
---|---|---|---|---|---|---|
4.4 | 15.9 | 32.6 | 53.3 | 71.4 | 81.5 | 85 |
Again, these can get turned into percentages:
A*+ | A+ | B+ | C+ | D+ | E+ | U+ |
---|---|---|---|---|---|---|
5.2 | 18.7 | 38.4 | 62.7 | 84 | 95.9 | 100 |
We can compare these numbers with the actual attainment in the school that we looked at in step one.
Minimum grade | Actual percentage | Predicted percentage |
---|---|---|
A*+ | 6.7 | 5.2 |
A+ | 19.1 | 18.7 |
B+ | 42.7 | 38.4 |
C+ | 74.2 | 62.7 |
D+ | 88.8 | 84.0 |
E+ | 95.5 | 95.9 |
U+ | 100.0 | 100.0 |
In this example, the school performs slightly better than the national average. For example, just over 74% of students achieve a C or more, compared to the prediction (based on the prior attainment of the students who achieved these grades) of just under 63%.
Ofqual next predict the achievement of the current cohort of students in the class they’re interested in, in exactly the same way as for previous students, based on the performance of students nationally.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|
4 | 4 | 3 | 6 | 5 | 6 | 1 | 1 | 1 | 0 |
Grade | Actual percentage of previous students | Predicted percentage of previous students | Predicted percentage of current students |
---|---|---|---|
A*+ | 6.7 | 5.2 | 6.5 |
A+ | 19.1 | 18.7 | 21.9 |
B+ | 42.7 | 38.4 | 43.9 |
C+ | 74.2 | 62.7 | 67.7 |
D+ | 88.8 | 84.0 | 88.1 |
E+ | 95.5 | 95.9 | 97.1 |
U+ | 100.0 | 100.0 | 100.0 |
Here, we can see the current intake is predicted to perform better than previous cohorts. This is because it has a higher proportion of students who had good prior attainment (GCSEs for A levels, Key Stage 2 results for GCSEs).
The data that Ofqual have about students isn’t perfect. In particular, it’s not always possible to match a given student against their prior attainment (GCSEs or Key Stage 2 results). They might not have taken them (for example if they’ve recently moved to England), or there might be problems in the data itself, such as students changing their names.
Table D.1 on page 195 of the technical report provides data about the proportion of centres in which matches were possible between pupils’ previous attainment at Key Stage 2 and GCSE.
This is important because low match rates lead to Ofqual just looking at the historic results (from Step One), without trying to adjust those results based on the difference between the prior attainment of previous students and the current students.
Not having a good match percentage benefits students if past students did well in a subject, but it might disbenefit them if past students did poorly.
The proportions of poor levels of matching varies by subject, as can be seen here for GCSEs:
This has a differential effect because some subjects are more popular than others.
Combining these, we can see how many schools are affected by low levels of matching with prior attainment data.
You can see there are very low levels of matches for English and Maths in around 1000 of a bit over 5000 schools. In some subjects, for example Italian, there are very low levels of matches in large proportions of the schools that offer the subject.
This is significant because, as we’ll see next, when there are low levels of matching Ofqual again fall back to the grades achieved by previous cohorts of students.
Here are the equivalent graphs for A level subjects.
The proportions of poor levels of matching varies by subject, as can be seen here for GCSEs:
Ofqual now have three sets of grades for each subject in each school:
Both predicted distributions are based on what was achieved nationally in the subject, in previous years, by children with similar prior attainment (GCSEs or Key Stage 2 results).
Now Ofqual have the information they need to produce a target distribution of grades for each school.
In schools where there are no matches on students (either historic ones or current ones), the algorithm uses the actual distribution of grades from previous years as the target distribution for the current year. So in about 1000 schools, the target distribution for maths will be exactly the same as the average in those schools of the last one or two years.
Grade | Actual percentage of previous students | Predicted percentage of previous students | Predicted percentage of current students | Grade adjustment | Target percentage |
---|---|---|---|---|---|
A*+ | 6.7 | 5.2 | 6.5 | 0 | 6.7 |
A+ | 19.1 | 18.7 | 21.9 | 0 | 19.1 |
B+ | 42.7 | 38.4 | 43.9 | 0 | 42.7 |
C+ | 74.2 | 62.7 | 67.7 | 0 | 74.2 |
D+ | 88.8 | 84.0 | 88.1 | 0 | 88.8 |
E+ | 95.5 | 95.9 | 97.1 | 0 | 95.5 |
U+ | 100.0 | 100.0 | 100.0 | 0 | 100.0 |
In schools where there’s full visibility of all the prior attainment for previous and current students, on the other hand, the calculation is a bit more complicated:
With the example that we’ve been using, this would work out as follows:
Grade | Actual percentage of previous students | Predicted percentage of previous students | Predicted percentage of current students | Grade adjustment | Target percentage |
---|---|---|---|---|---|
A*+ | 6.7 | 5.2 | 6.5 | 1.3 | 8.0 |
A+ | 19.1 | 18.7 | 21.9 | 3.2 | 22.3 |
B+ | 42.7 | 38.4 | 43.9 | 5.5 | 48.2 |
C+ | 74.2 | 62.7 | 67.7 | 5.0 | 79.2 |
D+ | 88.8 | 84.0 | 88.1 | 4.1 | 92.9 |
E+ | 95.5 | 95.9 | 97.1 | 1.2 | 96.7 |
U+ | 100.0 | 100.0 | 100.0 | 0.0 | 100.0 |
For classes in between, where there is less than perfect but more than a zero match of students to their prior attainment, there’s a weighting towards the historic actual grade distribution for the school. For example, if 89% of the students can be matched to their prior attainment, and 11% can’t, it would end up as follows:
Grade | Historic actual percentage | Historic predicted percentage | Current predicted percentage | Grade adjustment | Target percentage |
---|---|---|---|---|---|
A*+ | 6.7 | 5.2 | 6.5 | 1.3 | 7.9 |
A+ | 19.1 | 18.7 | 21.9 | 3.2 | 21.9 |
B+ | 42.7 | 38.4 | 43.9 | 5.5 | 47.6 |
C+ | 74.2 | 62.7 | 67.7 | 5.0 | 78.7 |
D+ | 88.8 | 84.0 | 88.1 | 4.1 | 92.4 |
E+ | 95.5 | 95.9 | 97.1 | 1.2 | 96.6 |
U+ | 100.0 | 100.0 | 100.0 | 0.0 | 100.0 |
Ofqual are now in a position to assign rough grades to students based on the ranking provided by teachers. Students in a class are assigned grades such that the percentage getting at least that grade is at minimum the target percentage worked out in the previous step.
For example, if there are 35 students, each one is equivalent to 2.9% percentage points. So if the target percentage of students to get As or A*s is 21.5%, 7.5 students can get those grades at this school in this subject. Obviously that’s not a whole number. In the report from Ofqual, they round up, effectively giving more students the higher grades, in this case 8 students.
35 students in a class with a target distribution as calculated above would be distributed as follows:
Grade | Target percentage | Calculated students | Final students | Final percentage |
---|---|---|---|---|
A*+ | 7.9 | 2.8 | 3 | 8.6 |
A+ | 21.9 | 7.7 | 8 | 22.9 |
B+ | 47.6 | 16.7 | 17 | 48.6 |
C+ | 78.7 | 27.5 | 28 | 80.0 |
D+ | 92.4 | 32.3 | 33 | 94.3 |
E+ | 96.6 | 33.8 | 34 | 97.1 |
U+ | 100.0 | 35.0 | 35 | 100.0 |
Looking at this another way, the number of students in the class of 35 that will get each grade are as follows:
A* | A | B | C | D | E | U |
---|---|---|---|---|---|---|
3 | 5 | 9 | 11 | 5 | 1 | 1 |
A smaller class, of 15 students, would be distributed differently:
Grade | Target percentage | Calculated students | Final students | Final percentage |
---|---|---|---|---|
A*+ | 7.9 | 1.2 | 2 | 13.3 |
A+ | 21.9 | 3.3 | 4 | 26.7 |
B+ | 47.6 | 7.1 | 8 | 53.3 |
C+ | 78.7 | 11.8 | 12 | 80.0 |
D+ | 92.4 | 13.9 | 14 | 93.3 |
E+ | 96.6 | 14.5 | 15 | 100.0 |
U+ | 100.0 | 15.0 | 15 | 100.0 |
Resulting in:
A* | A | B | C | D | E | U |
---|---|---|---|---|---|---|
2 | 2 | 4 | 4 | 2 | 1 | 0 |
Because of the way rounding up is applied, classes with lower numbers of students in them are less likely to have students that get a U.
In the normal run of things, students get marks before they get grades, and the mark (from the exam and coursework) determines the grade they get. Exam boards always go through a process of adjusting the grade boundaries so that roughly the desired proportions of children get different grades.
Obviously there aren’t any marks from exams and coursework this year, but marks are still needed for this standard process of setting boundaries to give the right national picture. So the next step in the process is to assign marks to the students based on the grades they have been given.
To do this, each grade is broken up into 100 marks, and each student is given a mark within the grade based on their rank within the grade.
Another way of looking at this graph is to consider the grade boundaries against the scores the students are given:
The final step is the adjustment of the grade boundaries so that, overall across England, the distribution of scores across the different grades is similar to what it was in the same subject in previous years.
There is data about overall grades in different subjects available for A levels and GCSEs.
Looking at A levels, the distributions differ by subject as follows:
This recalibration process will include students whose grades were allocated from the predictions provided by their teachers. In general, these students will have been given higher scores than the students whose grades come through the standardisation process (given that teachers tend to overestimate in their predictions). So this recalibration will push students’ grades down further.
Furthermore, because of the process of allocating marks to students, student marks will be closer to grade boundaries (and thus subject to revision) when they are in classes with large numbers of other students performing at the same level as them. This works in both directions (revisions up and down) but is most likely to result in downgrading of marks.