17 Jul Assessment, Standards and the Bell Curve
After announcements about new proposals for KS2 assessment, the issues around relative and absolute standards are getting a working over. Director of The Institute of Education, Chris Husbands, has written a thoughtful blog on some of the issues. I’ve had personal reasons to engage recently as the parent of a Year 6 student who has just received his SATs results. The playground and school gate are buzzing with ‘who got what?’
My feeling is this. Firstly we need to hold on to the notion a broad education where only certain aspects can be assessed. We need to give value to the learning experience in all its diverse glory, going far far beyond what can be measured in tests. There are ways of doing this… but that’s another post. However, at the same time, we need to recognise that formal assessment is an important element in the big picture. Here we need an honest re-assessment of what we mean by standards; we need to face the reality that absolute standards are very difficult to isolate from a deep-seated devotion to the bell-curve.
Of course we should also acknowledge the limitations a bell-curve philosophy imposes on us. Some MPs and newspaper editors don’t seem to understand that “Shock!! HALF of all students are BELOW AVERAGE!!” is a joke. With the trailed idea of placing children in deciles after KS2 tests, we are likely to see “Disgrace! Still only 10% of students are progressing to the top decile!!” I predict that this will happen. It will.
Similarly, politicians need to get their sums right. There is a fairly profound schizophrenia that leads to people jumping up and down about standards slipping because too many people gain As and A*s at GCSE. This is ‘proof’ that standards are falling. At the same time, there is some kind of outcry that students reaching varying degrees of Level 4 at KS2, then go on to reach different standards at GCSE:
This shock news that students with lower attainment at KS2 subsequently also gain fewer GCSEs on average… is worrying. Don’t they understand how the system works? If all these L4s actually did turn into 5 A*-Cs with English and Maths, the shouts of ‘dumbing down’ would drown out the cheers of celebration – by the same people. There’s not enough room up that end of the bell-curve.
However, before we go all Watergate, Grassy Knoll, Area 54 on all of this, let’s examine our own sense of what we mean by standards. We can all acknowledge that marking extended writing is a messy business. We can give an impression mark using some criteria; we can count up some definable features – we can give a score out of 30 referencing some exemplars. All of this is complex. Try getting two or three people to agree the mark or level on 10 pieces of work… then scale that up to 2000 markers and see the scale of the issue. What is a Level 4A? How do you measure it? On a comprehension paper, if the marks vary from one year to another, how do we know whether it was a hard paper or a less successful cohort of students? We need reference points and these generally arise from the cohort. It is a safer bet that each cohort has a similar ability profile year on year than the tests we set each year are of the same standard – especially on a national scale.
Now let’s look at something that is purely objective. The High Jump. Imagine 1000 children being asked to do the high jump – give them a few attempts to get their personal best. There is no limit imposed on them. Imagine John goes home and tells Dad “I jumped 120 cm today”. He’s delighted of course… but then asks ‘Was that good? How high did everyone else jump?’ It’s the obvious question. Without any reference point, 120cm means little. It could be very high, average or very low – Dad needs to know what the background profile is in order to gauge his son’s achievement. 190cm is an exceptional jump because of how it compares to everyone else. 50cm is rather poor – but only because almost anyone could jump this high. Of course we can talk about progress, about personal bests, personal triumphs, disability and so on… but every child will also want to know where they stand along the line before their personal achievement makes sense.
The same is true of piano exams. The grades 1 – 8 are chosen at intervals of difficulty that span the typical range of human ability playing the piano. It would be daft to set Grade 1 so high, no-one could pass or so low that anyone could pass. It represents a fair challenge – a standard – for beginners. This is uncontroversial. Even within the cohort of Grade 1 entrants, there is a range. A bell-curve. Some fail, some pass, some get Merit and some get Distinction. These measures are defined with reference to a background spread of ability within what is possible given the pieces and scales (the curriculum) set for Grade 1. There is a good sense of piano exams being absolute standards, but every aspect has an origin in relative standards. As I’ve described in Data Delusion Solutions, piano exams are excellent. In passing Grade 1 – and since 2 and 3 – my son did not feel that he failed Grade 8. He started out at the bottom end of the bell curve and is making his way along at his own pace. The structure of assessment builds confidence at each step whilst remaining challenging continually… but he knows where he stands.
Looking at NC levels, and subject based assessment in general, of course this is still true. We might like the idea of fixed benchmarks that anyone can meet but, in practice, do we really?
- Elif can spell ‘accommodation’
- James can multiply any three digit number by any other
- Hassan can describe the formation of an ox-bow lake
- Louisa can write a coherent paragraph analysing Othello’s character flaws.
- Joshua can work out the nth term of a linear sequence.
- Eni can state which of two compounds is formed via ionic bonding.
These statements of what children can do appear to be absolute. But, if they are used to set standards, they are no use unless we know how difficult they are. So – we ask 1000s of children questions that tease out whether they can do these things and see what we get. If lots of children can do something – we say it is easier than something only a few children can do. Absolute standards only fully gain meaning in reference to a cohort. Of course this is how SATs test work any way. In Maths, tests are marked and scored and the scores are morphed into levels. 6C and 5A are literally just different sections of a bell-curve based on marks on a test.
English is much closer to a criterion referenced assessment process, but when the new tests come in, it will soon emerge that certain scores fit in certain places on the bell curve in much the same way as levels do now. So, unless we change how well children are taught, new tests and new measures are really just cosmetic… we just get rank order described slightly differently and slightly more explicitly.
So – big reality check. Here are some truths about exams.
- Exams are a competition. Not everyone can get the top grades… because outcomes that deviate from a bell-curve do not fit with our values system around excellence. The system continually reacts to changes to ensure outcomes revert to the bell-curve; it’s a strong feedback loop.
- Assessment is fuzzy-edged. Levels and GCSE grades cannot be assessed accurately enough to withstand the scrutiny they are placed under. Every year, students from across the same national cohort will get different grades for work of the same standard – it is inevitable and unavoidable, especially in subjects where examiners need to reference success criteria.
- Grade inflation is systematic if examiners are asked to give candidates the benefit of the doubt on the 2% tolerance for grade boundaries. We either accept it or stop it dead by fixing grades.
- In any cohort of a large size, students given a test which doesn’t place limits on achievement at the top, will generate outcomes that are very close to a bell-curve based on raw data. Turning this data into NC levels or GCSE grades is a distortion of the raw data and any notion that a grade or level describes specific learning is wrong. Students with the same level or grade will be able to do different things; the link is not the nature of what they know, but their position on the bell curve.
- Absolute standards can go up over time. The whole nation could get better at maths and reading. However, in our current system, the exam boards and OfQual have too many reasons to resist giving that impression.. there is too much evidence that standards are not improving; too many examples of accountability driven outcome improvements that aren’t driven by genuine cognitive development and, consequently, the bell-curves are being held firmly in place. I think we should be glad about that at the moment because, otherwise, the gaming machinery will win the day.
Finally, – before I go on for ever…let’s look at ranking. People have been getting worked up about the idea of giving students a decile ranking. It feels wrong… but I’d argue that telling a student they are in Decile 3 or Decile 9 isn’t far off saying 6C or 3B. Either way you know where you stand.. and the process for determining them is pretty much the same – you take a test and see how you did compared to everyone else nationally.
If this does get introduced we need to work Growth Mindset thinking into our practice regardless. Here is a ray of hope. At KEGS, students gain a place based on their rank order on our entrance exam. They know their rank order because this affects several schools that use the same test. However, we use this image to reassure them (and warn them) that rank order means little after a couple of years.
Our intake is taken from the far right-hand end of the bell curve… Here, there is signficant statistical fluctuation so on any given test over time, rank order would vary. However, it is a good message to give our students that they could be first or last when they started – it is still all to play for as they move through the school. The same will apply to deciles.. we can’t allow them to become a barrier to improvement or a mechanism to crush self-esteem.
UPDATE: From meetings at the DFE, where nods and winks are given by advisers, I doubt very much that the KS2 deciles will see the light of day. Protestors, stand down. But whatever we do get, let’s not hope to avoid the inevitability that some students will always do much better and worse than others. There will be a bell curve. The only question is how how this is presented and how obvious it is to parents where their children sit along the curve.