30 Sep Dissecting Progress 8. The good, the bad and the ugly.
Now that Progress 8 has been born and looks like it’s here to stay, there’s no use wallowing for too long in outrage at the data-garbage bonkersness of it all. We might all agree that averaging out every child’s achievements across the school via a super-convoluted and arbitrary algorithm to generate a single number with a massive margin of error in order for schools to be judged and ranked – is INSANE. But let’s move on. Let’s explore what P8 tells us. Having received the detailed data from DFE on Monday, we’ve been examining how our P8 is constructed and, for some time now, we’ve been discussing how to use it constructively. Here are some observations.
- Curriculum structure
A positive feature of the Attainment 8, Progress 8 structure is that it does actually give a healthy weighting towards all students having a broad curriculum. The Ebacc slots and Open slots need to be filled in order to secure the best possible outcomes. Our most recent cohort followed a curriculum model I inherited that was not remotely designed to yield P8 maximisation. From 200 students only 136 had all 8 slots filled; 64 had at least one slot empty. The students with a full set of P8 slots averaged 0.3 higher than our overall P8 score. That’s an own goal in data terms – but actually it’s unhealthy for the students’ curriculum breadth. P8 does, therefore, introduce a strongly positive incentive to ensure all students are entered for enough subjects. There’s no incentive to fuel non-entries. A G counts more than a zero. There’s no incentive to reduce Science to Core -instead of Core+Additional.
There’s a mechanism that allocates subjects to slots to maximise the P8 score for Ebacc at the expense of the Open slots. That’s important to note. When French or History could be exchanged in the different slots, the highest score seems to go to the Ebacc slot – so the Open slot P8 average is likely to be less; it’s important not to see this as a performance issue; it’s largely a technical one.
The impact of non-Ebacc Humanities subjects is interesting. We had 8 students taking either Sociology or RE and no MFL; this left them with a vacant Ebacc slot because of the folly of denigrating these subjects relative to History and Geography. (You’ll never persuade me that Sociology is less academically rigorous than Geography.) However, at this scale, the impact is marginal on the overall P8 figure; it’s not a big deal to stick to our principles here instead of insisting on History or Geography for the non-linguists (who would be students taking extra English and Maths instead of MFL).
2. Every grade counts
Above all else, this is the most positive aspect of progress 8: every grade counts. I think this will be influential. We have seen that, as a consequence of the 5A*-C culture being so intense for so long, grades below C have had a falling-off-the-cliff effect. In order words, Ds slip to Es; Es slip to Gs – because the value given to the outcomes from all concerned has been too low. Now, the P8 mechanism reinforces the value of every grade. If a student falls back from a C prediction to a D, it’s vital to hold them there at the very least. The needs of a student on a B who could get an A need to be regarded as being as vital as those of a student moving from E to D. It’s crucial to give students a sense that any grade is better than none. I think this will change our psychology quite a bit. Hopefully any residual tendency in some departments to debate whether to enter students at the lowest end, for fear of getting Us, will stop. Who cares about Us if there’s a chance of a G or an F?
3. Cohort Profiles
I’ve found seeing the students’ individual Attainment 8 and Progress 8 scores hugely valuable. The biggest flaw in the P8 concept is that it ends up as an average. Averages are terrible measures – because school cohorts are so much more varied. I’ve produced profiles or our P8 and A8 scores so that I can see the range. This tells me far more than the average does. Year on year, the comparison of these profiles will be very instructive. Here are some illustrations based on dummy data, not my school’s.
The balance of students achieving P8 averages above and below zero and the numbers gaining very high or very low scores is fascinating. Similarly, the profile of Attainment 8 scores is very interesting. It shows how wide the spread of outcomes is in my school. We might, in future, find success raising the very lowest end, the middle range or the very highest end – but we shall see by looking at the profile. The average P8 score will lose this detail.
4. The significant impact of outliers
This has been a huge eye-opener. In my school we had a minority of students who found the GCSE run-in very difficult to engage with; various chronic mental health or family difficulties took their toll and we had a tail of students who did not do well. The impact they had on P8 was HUGE. Our lowest scoring 18 students reduced our P8 score by 0.3. Our lowest scoring 7 students – who only took a smattering of exams and had individual P8 scores of -3.0 or below- reduced our overall P8 by 0.14. That’s the same size effect as the whole confidence interval of about +/- 0.15. It’s hugely significant. The effect of these students outweighs any number of other tweaks and imperfections elsewhere. A score of -3.0 means 30 Attainment 8 points lost from what was expected by one student; that’s the same as 30 students dropping one grade each.
To me, the mathematical weight given to the performance of outliers is a deep flaw in the whole mechanism. A few students who underachieve miserably create a totally disproportionate effect relative to the performance of the cohort as a whole. When schools are ranked by their P8 scores, this will be hidden; the true story will not be known and that is wrong. It seems absolutely obvious to me that schools should be able to remove outliers from their data in order to present a meaningful figure.
The implications are that we need to watch out very carefully for students who fall off the GCSE cliff. Writing students off is absolutely out of the question. Every grade we can get out of every student, whatever it takes with the resources we have, will matter. Previously, a student below the 5A*-C EM threshold might have represented a disappointment in a binary sense (over the bar, under the bar). Now, exactly how far they fall makes a big difference in this measure.
5. Diagnostic power
Finally, despite the problems with P8 as a meaningful, absolute measure of progress, there is some value in using internal comparison to trigger support and interventions. Previously, we might have targeted the C/D borderline students. Now, we’re looking at our lowest scoring P8 students. The lists are similar but not entirely the same. Here’s a small sample -with the P8 scores in rank order on the right:
Using colour coding in SIMS to trigger responses to concerns around individual subject grades is helpful. However, global support can be targeted at students with low P8 predictions. This includes some students who are securing Cs and Bs but should be getting As as well as those securing Fs who should be securing Ds and Cs. The P8 number itself means nothing of value; it’s only its position relative to the scale within the cohort that matters and this is useful.
My conclusion is that the most useful aspects of P8 are those that push us to support more students to succeed in more subjects, across the attainment spectrum. Even without playing any artificial games to plug some of the vacant slots with unrigorous qualifications, the P8 measure will be a positive influence on our thinking around curriculum entitlement for our lowest attaining students. Where it falls down is when all the detail is averaged out, losing all the meaning and nuance in what is a complex data set. Data profiles are the way to go as I’ve said before. P8 cohort profiles could be a thing of the future…that’s what we’ll be doing for sure.
- This is an interesting post on P8 as a floor standard measure from datalab: http://educationdatalab.org.uk/2016/10/how-should-we-define-ks4-floor-standards/
- I’ve found that the raw data in the P8 files allows us to calculate a useful and illuminating two-way table of P8 figures: Disadvantaged/Non-disadvantaged and Low, Middle and High prior attainers. This yields a figure for High prior attaining Disadvantaged students distinct from the overall figures for either category. In my school the figures vary and we have a different trend in relation to prior attainment for the PP and non-PP subgroups.
- I realised late in the day that there would be a national P8 score for Disadvantaged/Pupil Premium students overall – because their Attainment 8 is compared to the median attainment of all students for a given KS2 fine level. A message on twitter suggested this is -0.4. I don’t know the source. That’s significantly below 0.0. In a school like mine with 70% PP students, its important to know this figure so we can know how well our PP students did compared to PP students nationally as well as all students nationally. I don’t suppose the figure will be published officially until RAISEOnline comes out.