03 Apr Educational Lab Rats: The Search for Evidence
The recent wave of blogs and twitter exchanges that have focused on the evidence-base that underpins educational policy and practice has been fascinating. I am one of many eagerly anticipating the ResearchED Conference organised by Tom Bennett at Dulwich College in September. This has been catalysed in part by the exuberant Ben Goldacre, author of Bad Science. This is a must-read book ..and not just for its demolition of the credentials of TV diet magician Gillian McKeith (“…or to use her full medical title, Gillian McKeith”….a classic line!). If you’ve ever believed that those drops of Balsamum Peruvianum or alpine blackbird spit from the homeopathy cabinet made you better…..well, sorry….you’ve been suckered by the voodoo. Except in one respect…the placebo effect.
Ben’s book explores this in detail. It is truly amazing. Proper scientific studies have shown how powerful the placebo effect is. Positive effects from taking neutral sugar pills are reported in trials in all kinds of medical scenarios. The effect can be affected by the colour of the pills and the manner in which the pills are prescribed. For example, if a Doctor thinks the pills are likely to work or, conversely, are unlikely to work (pre- conditioned as part of a blind trial) ..this has a huge impact on the patient’s placebo response because of the consequent Doctor-patient dialogue. If the effect is convincing enough evidently even placebo knee operations can work! Homeopathy is essentially a giant exercise in distributing placebo pills and potions; people want to believe in them so they use them despite the fact that there is zero evidence for their efficacy that would survive a randomised controlled trial (RCT). The reason I find this interesting is because it highlights the complexity of the interaction between social/emotional effects and physical bio-chemical mechanisms even when evaluating a highly reproducible event: the taking of a pill.
Turning towards education, Ben Goldacre and others are suggesting that a more highly developed research culture would be a benefit to policy makers and practitioners. It is hard to argue with that. The big question, however, is ‘what kind of evidence do we need?’ What kind of research is needed in order to provide that evidence?. Given that no two teacher-student interactions are the same, no learning process is entirely reproducible, how are we going to use research methodologies to the greatest effect? As I argue in The Data Delusion even physical systems that appear simple (like dropping a bag of marbles) are actually too complex to be predictable as there are too many variables; we’re left with broad general patterns at best and a list of average effects (as with the Hattie effect sizes). If we throw in all the psychological factors that do their work in medical placebo effects, educational cause and effect is highly problematic from a research perspective. In looking for evidence, we must proceed cautiously with realistic expectations.
To explore this further here are some scenarios:
1) MA Research: Dialogue as a precursor for writing.
A colleague, Emma, completed her MA in Education at Cambridge with a thesis based on the process of students engaging in extended dialogue prior to writing an analysis of a text. Her methodology section was fascinating in itself. There is a large body of literature surrounding the validity and limitations of a wide range of social science research methods. In developing our thinking in the current debate, we’d be wise to engage with it; this isn’t new ground. Emma’s work involved a series of detailed interviews with three of her students – an established method. This enabled her to examine the effects of the dialogic exchanges on the subsequent writing in some detail; this narrow but deep method yielded insights but not data. Rather than tables and graphs, the thesis contains transcripts of student-teacher dialogues and the interviews. It’s a supremely interesting piece of work from which other teachers in her department have benefitted.
Questions to consider:
Does Emma’s research provide evidence that this teaching method could be applied in another context?
To what extent would the findings be more valid or more insightful if extended to a large scale trial?
Is it necessary to quantify the impact of the process in order to have confidence that it works?
Emma is an inspirational teacher in any case. Would another teacher have had the same effect with the same method? Would we find a similar effect on average if 100 teachers tried it? If the results from 100 teachers were positive on average, would that mean that this method ‘works’? What proof would be needed? In practice is ‘insight’ all we need as opposed to ‘evidence’ given all the variables?
2) Observational Experience: Think Pair Share
My blog post about the effect of students discussing in pairs before giving answers as opposed to the default ‘hands up’ method is one of my most popular. In my judgement, based on years of experience of teaching and observing lessons taught by lots of different teachers, it is immensely powerful. However, despite claiming it to be ‘the washing hands of learning’, I’ve never actually measured the difference in student outcomes generated by the two methods. My convictions lie in observing the quality of class interactions and the verbal responses generated. The learning process seems significantly more positive and engaging for all and I suppose I’m making the assumption that higher quality interactions and answers lead to deeper understanding. But, on that point, I could well be wrong…after all, plenty of students appear to learn well in didactic university lectures.
To find out if my hunch is valid or the dubious quackery of a charlatan, it would certainly be possible to conduct a trial: several hundred students could be taught with TPS as the default questioning mode and several hundred others using “Hands Up”. Understanding of some specific content could be assessed before and after the trial and the results compared. What would this show? If the data showed general support for my experience-based hunch, I’d feel vindicated. It would suggest alignment between the obervable interactions and measurable learning outcomes; all neat and tidy; q.e.d.
But what if the effect was small, neutral or, heaven forbid…negative? I’d have to re-evaluate my position and perhaps promote the idea a little less but I’d still use the method myself. Why? Mainly because no amount of data would override my sense that ‘Hands up’ is a poor process; I would argue that the testing process is too limiting; that the in-class interactions amount to more than that which can be meaningfully measured in a test; I’d impose my value system regardless. However, I could not promote TPS as a way to pass tests.
What does this say? It suggests that, in conducting research we need to be clear about what outcomes we place value on. If an initiative cannot be shown to have a reproducible, quantifiable effect in a certain direction, deciding what to do becomes more concretely a matter of working from our values; our gut instincts, biases and prejudices. At a school level is it not valid to reach a consensus on what the value-system is? Beyond that? Probably not. The DFE can’t dictate the values at play in a classroom..even if it wanted to.
3) Action Research: Co-construction.
As I have described in the post Research as CPD: CPD as Research, action research is a routine feature of life at my school. Every teacher is involved in a project where they are trying to find out about the impact of a particular teaching method. The findings are shared as part of our in-house professional dialogue and some teachers are involved with the Cambridge CamSTAR group, disseminating their work more widely. Each project is small scale, often limited to one class. Our focus is based on developing insights.
Should we be trying to scale these projects up or attempting to run them as RCTs? I’m not sure. The main purpose for our action research is to find strategies that work of each teacher in their own context; sharing the findings provides a source of reflection for others. The whole process is highly motivating, including the collaborative aspect and this in itself feels like an important ingredient in driving effective teaching and learning. There is often an organic consensus among a group of teachers about the value of a certain strategy but, at the same time, there are variations in the details of how each person implements the idea. It seems to me that an RCT or large scale trial would require a much tighter definition of the specific strategy in order for it to be valid.
For example, I have been working with a colleague to develop the idea of co-construction. We both use different methods within a common umbrella; this allows us to compare notes and learn from each other. However, we wouldn’t be able to state ‘co-construction works’ in any definitive sense. This is especially true because the value of the process is not in enhancing measurable content-based outcomes; it is in developing a range of other skills and aptitudes such as the confidence to plan and teach a lesson to your peers. I’m not sure we could scale up the trial without having to prescribe a lot of the elements of the process, thereby losing the spirit of it. At the same time, I wouldn’t ever insist that any teacher should adopt the method…. I don’t have evidence to suggest it would work for them; all I can do is suggest it might be worth trying with as much enthusiasm as the idea deserves. Is that not still worthwhile? I think it is. Small scale action research is immensely powerful in creating a culture in which outstanding teachers thrive; knock the value of action research at your peril! Size isn’t everythng and the results of a small local trial could well be more meaningful in that context than the transfer of findings from data-rich large-scale RCTs which average out the detail.
On the issue of measuring outcomes, as a cautionary aside, I often reflect on the sad fact that the very best exam results I’ve ever had for a class of my own came after we ran out of time and I taught the P3 GCSE module in about 15 lessons; we crammed, taught to the test in a mad panic and drilled on past papers. Bingo! A*s galore. Were they all that good at Physics? No. Were they well prepared for A level? No. Was it a good learning experience? No. But the data never lies! What this shows is that the testing process is limited and that surface recall gets you a long way; too far. We can’t always measure what we value and that is a key concern in conducting a trial.
With all that said, here are some examples of RCTs I’d like to see the results of:
- Does teaching about particles atoms and molecules before teaching about chemical reactions improve understanding? Logically it should…but does it? My hunch is that effective sequencing in the curriculum is an important area.
- Does extensive use of mini-whiteboards in class discussion during lessons provide Maths teachers with as good or better understanding of students’ capabilities compared to marking books after work has been completed? Does it have any impact at all? (My bet is ‘yes’ but only where the teacher engages with the responses…. Hmmm, how to control for that? )
- How does the improvement in students’ writing following peer-assessment using a given technique compare to the impact of teacher assessment…controlling for time spent and other variables? Related to this: does a student’s writing improve if they have regular opportunities to peer assess the work of others?
- If Year 7 was taught the identical scheme of work to Year 8, would they do just as well? (With obvious implications…)
- If, in multiple trials, teams of three teachers taught parallel mixed ability groups and then the same teachers taught three tiered abilty groups, competing to gain the highest progress score each time, which structure would yield the best outcomes?
- If students with level 3 in English engage in paired reading with a Sixth Former for half an hour every day for two months, does it improve their reading age significantly?
If these trials and hundreds of others like them were conducted, we’d certainly be in a better position. However, in my opinion, even here we’d still be working in the territory of ‘insight’. The results might influence some changes in policy and practice but ultimately I suspect that any changes would always be primarily driven by socio-political values as teachers and politicians continue to cherry-pick the bits of evidence that suit them. We might be in a position to resist the imposition of national policies that we don’t agree with and that would be a good thing for sure. In the end I would also bet that the greatest gains to students come from the reflection/self-evaluation effect of teachers engaging in and with research processes in their local contexts, regardless of the outcomes of the trials themselves. It would take a mega-meta-RCT to prove that!