24 Feb More issues with Progress 8.
In this post I’m going to set out a few of the issues I see with the use of Progress 8 data, especially as one of the indicators of school effectiveness in inspections and in general school-to-school comparison and evaluation.
For further reading on Progress 8, you want also wish to read the following:
- My blog, Dissecting Progress 8: The good, the bad and the ugly – where I explore the issue of outliers.
- An earlier blog : Progress 8: Looks like data garbage to me – where I explore the convoluted P8 mechanism and the tenuous link to any actual learning.
- I also recommend the many Datalab posts on the subject including this latest post by Dr Becky Allen: https://educationdatalab.org.uk/2017/01/outliers-in-progress-8/
Update: This post by Dave Thomson at Education Datalab is a must-read: https://educationdatalab.org.uk/2017/03/putting-progress-8-in-context/
- Progress 8 is an inadequate average measure – massively open to distortion by outliers.
Take the sample data above. This is based on real school data via a Headteacher colleague but the figures and details are changed. On face value, P8 for this sub-group of 80 students is -0.20. That’s certainly below average with a degree of significance. A report might state: Middle Ability Disadvantaged students make below average progress based on this data. But, actually that’s not quite true. It only takes five students out of 80 in the group to distort the average. These five students each have very low P8 scores – signifying major issues with final exams. Any number of factors could be at play but they wouldn’t be related to standards as such; at this scale it would need to be missed exams, massive exam blow-outs and so on. Outliers ‘lie outside’ the main pattern of a data set. It’s fair to say that these five students don’t represent general standards or issues for this group. 75 out of 80 students – those that constitute the core group of this sub-group – actually score P8 + 0.07. It’s not even negative. The crude, raw average doesn’t really tell the story of achievement for the group – unless it is broken down and explored.
Clearly outcomes for those five students matter – but their numerical attainment scores have an undue effect on what is an arbitrary numerical representation of ‘progress and standards’ and this needs to be recognised if we’re going to talk sensibly about these things in the context of school improvement.
2. Averages mask profiles and the spread of performance within a cohort.
Consider two schools, A and B. Both have the same P8 score of -0.13. However, if we look at a profile, School A has a very different make-up to School B. School A looks to have some very low-scoring students; possibly outliers if studied in depth. The core progress for School A looks to be just above zero, pulled down by the outlier effect above. In fact, School A also has some strong P8 performance at the top – the school is capable of delivering this outcome for some students. School B is more solidly centred evenly around the mean value; it’s more likely that P8 represents a more typical progress outcome because the spread is much smaller.
I’ve always argued for profiling all data outcomes – instead of the stupidity of condensing all student achievement into one number – and now P8 is here and is already being used in its crude average form to inform judgements on schools by inspectors, my worst fears are being realised.
3. Just possibly, the KS2 Input is the biggest factor, not anything the school does:
When we look at the range of KS2 outcomes, do we really think that children’s abilities and learning are so different from region to region – or it is simply that some schools are better are converting learning into the specific form of SATs outcomes? Let’s just assume that some primary schools generate better outcomes from students of broadly similar capabilities – it’s not a big stretch.
Now look at this part of the Progress 8 alchemy-algorithm :
Let’s say a school has an average KS2 fine level intake of 4.5 (remembering that this derives from an average of two made-up numbers derived from two raw scores in Maths and English that measure totally different things – and you times by six or something….). The median A8 expected is 47.85 -(four sig figs..??). But it’s not inconceivable, given the national variation in KS2 outcomes and the differing degrees of pressure schools feel to prime-pump the SATs machine – that another secondary school inherits very similar students with an average fine score of 4.6. (Pretty much the same right?). This, however leaves them needing to achieve an Attainment 8 median score of 49.96 – just over two points higher. But that’s equivalent to two more GCSE grades across 10. In Progress 8 terms that’s equivalent to +0.2 (because you just divide by 1o). So, without doing anything special at all, the school that receives the 4.5 KS2 intake has a 0.2 advantage over the 4.6 KS2 intake, assuming they actually have broadly similar students. Bearing in mind the coasting schools threshold is -0.25, this is pretty major.
- I think this is significant. A shift of 4.5 to 4.6 is small; 4.5 to 4.7 would also be conceivably possible with even bigger swings in P8. I’ve looked at data for hundreds of schools and I reckon, if all truths could be known, we would find that lots of schools in England get decent P8 scores without really working too hard for it; it’s simply a function of having averagely hard-working students who went to primary schools that weren’t very pushy and pressured. Meanwhile, elsewhere, schools are busting a gut just to get level in an environment where the primaries are also busting a gut a bit more to satisfy the accountability machine and the outcomes are notched up just a bit more.
In other words, we don’t really know how much of P8 is due to the accuracy of KS2 data as an effective baseline for secondary schools rather than being very much to do with the secondary schools’ provision and standards. Everything in P8 rests on those KS2 test being absolutely rigorously consistent across the entire nation. I’m doubtful. Just look at the map – and imagine what the more detailed picture would be.
4. The Disadvantage Factor. Let’s not pretend this isn’t a real thing:
Repeating the scatter-graph from the top of this post, this data comes from the KS4 performance data on the DFE website. I made this myself – a simple plot of over 3000 schools’ P8 vs the % of ‘ever 6 FSM’. The pattern is clear. For schools with 20% or fewer PP students, over 75% of them have positive progress 8 scores. For the schools with 60% of higher PP students, less than 30% have a positive progress 8 score. It’s just a massive factor. If you combine this with the primary school factor above, we’ve got two major inputs – the KS2 baseline and the level of disadvantage – that pretty much shape a school’s P8 parameters before anything happens in lessons. I’m absolutely certain that success in P8 is super easy for some schools in certain situations; a quantum universe away from those that struggle – they are not better schools; it’s just that they don’t need to try as hard.
It’s obviously inspiring to see how well some high PP schools do. There’s a box of just a few schools above 70% PP with positive P8 – well done to them. But, tempting at it may be to imagine all schools could emulate them, this is an illusion. Once we add in Ofqual’s comparable outcomes approach that sets a de facto limit on the number of top end grades that are available on the bell-curve, it’s clear that not all students can meet and exceed the median Attainment 8 score (by definition of median); there is a repeated zero-sum effect that means that, nationally, about half of the students for any starting point, must get a negative P8 score. This is more likely if they live with real disadvantage – (and if their SATs results were just a tiny bit better than someone else’s.)
It’s clear from this graph that high levels of FSM have an exponential impact on outcomes. It’s not a linear effect. Once you go beyond a critical mass of non-FSM students, the P8 scores fall off a cliff. And yet – judgements are made as if all these things are equal. No excuses.
So -there we are. I described P8 as data garbage once. I was chastised for doing so. But, please, let’s at least see it for what it is – and let’s not continue to judge and rank schools using crude single figure numbers without knowing about their profile and their context. I can’t help thinking that, one day, when we’ve finally established an education system that is truly world class, P8 will be long-forgotten; a slightly embarrassing hang-over from the days when we lost the plot with uber-accountability. Sadly, that time is rather a long way off.
Having studied this further, prompted by someone suggesting the P8vsPP graph had two distinct populations, I’ve separated the 400 Special Schools which I hadn’t done before. Clearly that makes a big difference so here are the separate graphs:
The regression line is less steep but there is certainly a significant pattern especially given the scale that determines coasting schools (-0.25) and the impact of KS2 inputs as described above where a +/- 0.2 is easily lost or gained. 43% of non-special schools have positive P8 for schools with over 60% PP – compared to 75% of schools with PP below 20%.
The special schools are distributed below. There’s a similar pattern but the scale of P8 is very different.
As an aside, it’s interesting to note that there is only one special school where the PP% is below 20%. Is this a function of more affluent parents using independent schools for children with high levels of special needs?
Update: The March Ofsted bulletin for inspectors shows that they have now recognised the outliers issue and have asked for this to be taken into account:
— Tom Sherrington (@teacherhead) March 2, 2017