High priests, method police, and why it’s time for a new conversation
A fortnight ago, Esther Duflo, Abhijit Banerjee and Michael Kremer won the Nobel prize for economics for their “experimental approach to alleviating poverty.” For those not following the debate on Twitter, it has been spirited to say the least. Dozens of articles and blogs have been written about Randomised Control Trials (RCTs) and the laureates themselves, ranging from hero worship to an accusation that they are impoverishing economics. With few exceptions, the exchanges have been mostly dialogues of the deaf — unstoppable force meets immovable object.
I mostly work in “monitoringandevaluation,” rather than research, but the aforementioned dialogue got me thinking about how often the advocates of even complementary methods for monitoring and evaluation speak to one another or read each other’s work. I recently went to trainings on outcome harvesting and realist evaluation. Both were good trainings which I would recommend, but I was surprised to hear how little my fellow participants, or even the trainers, were aware of the potential complementarities of other similar methods. One trainer joked about the “method police.” It sounded rather like one of those jokes that was only partially in jest. Methodological alignment can be important to some degree, but in my personal experience, I’ve found clear boundaries more of a hinderance than a support system, and I’ve never felt greater security by locking into an orthodoxy. Methods are good or bad depending on how appropriate they are for the task at hand, but I’ve found that rarely will any single method give you all you want or need.
When I raised this issue with my trainers, I got the sense they would welcome more of a conversation. So here goes.
No method is a panacea and no method is a gold standard. There was a time when RCTs were presented as if they might be the solution to all our problems. I remember my boss getting very excited about the promise of a young “left-of-centre French academic” nearly a decade ago. That promise was more than realised, but the notion of silver bullets or gold standards had already been comprehensively debunked well before the New Yorker article my boss shared with me (see Scriven, 2008). Notwithstanding the recent fanfare (and in response to it), fear of RCT hegemony has only grown since (see Pritchett, 2017; Deaton and Cartwright, 2017; Ravallion, 2018; Kabeer, 2019).
This fear might be somewhat misplaced, as RCTs are not nearly as popular as it is commonly assumed. And for good reason. They have particularly strong epistemic biases, “black box” problems, serious issues of external validity, and acute problems of ethics. They fail to adequately appreciate the role of human agency, they are rarely a good fit for “hard to measure” interventions, and are extremely expensive, to mention a few limitations. RCTs are good for some things, but not for other things. Rather ironically, RCT advocates’ appeal to exceptionality (not made by the laureates themselves) has become both a blessing and a curse. For their defenders, special status has become an article of faith, and for their critics, it makes them an easier target (and straw man for experimental methods in general).
RCT-bashing may be fun, but it’s worth remembering that RCTs aren’t actually that exceptional, and many of their shortcomings are shared by various other methods (whether experimental or not). So, the Nobel Prize should prompt us all to reflect, not just the randomistas.
While drowned out in the Twitter exchanges, more thoughtful advocates of experimental methods have, in fact, begun to start talking more openly about the benefits of “theory-based approaches,” and the importance of “mechanisms” and “local context (see Bates and Glennerster, 2017; Gugerty and Karlan, 2018).” It may not be fair to say “we won, they lost” or that there has been a Damascene conversion here, but these acknowledgements show that even some of the randomista high priests value aspects of “generative” methods. Some randomistas are a lot more thoughtful than their caricature would suggest. Especially as different methods tend to answer different questions (and in different ways), there may be more scope for complementarity than we might care to acknowledge.
Last year, I declined to be part of an advisory group for a recent systematic review on participation, inclusion, transparency and accountability. My concerns at the time were a lack of methodological pluralism (as the review had a clear hierarchical preference for experimental methods). While I was unable to convince the authors to consider Qualitative Comparative Analysis (QCA) or process tracing, for example, they did employ a realist-informed framework synthesis approach to look at context and mechanisms. So, while I disagreed with some of the authors’ choices, the end product was thoughtful and is well worth reading, despite its limitations.
Given these apparent concessions, for those of us who use mostly theory-based and participatory methods, perhaps we should spend less time RCT-bashing and more time talking among ourselves about what we can learn from one another to improve how werespond to questions RCTs can’t help us answer. Since the Stern review in 2012, this conversation has been surprisingly thin. With relatively few exceptions of pluralism and cross-fertilisation (here, here, here, here, here and here), what I see is M&E methodologists more and more entrenched in their own preferred methods. We appear to be fighting one another for primacy and talking at one another (even in the same paper) rather than talking about complementarities, asking the right questions, and addressing common challenges.
But can we now move beyond the right questions and common challenges?
Over the course of a few blogs, I intend to discuss how a number of theory-based and participatory methods can actually help each other out, and hopefully trigger more thoughtful conversations about how we can move forward together.
In the first blog, I will write about “context” and “mechanisms” and how process tracing and realist evaluation might benefit from a deeper conversation. In the second blog, I will look at how both outcome mapping and contribution analysis can help us better understand boundaries of influence, and why it’s important to focus on monitoring incremental change (especially for adaptive management). In the third blog, I will hopefully show how outcome harvesting can help sharpen qualitative outcome reporting (whatever method you choose). In the fourth blog, I will consider how process tracing can help all qualitative methods focus on the quality of evidence (and why this matters at least as much for monitoring as it does for evaluation). In the fifth blog, I will reflect on how rubrics can provide us with shortcuts to decide what is worth evaluating in the first place.
Depending on how the conversation evolves, and as I learn more, these topics might also evolve. For anyone that feels they have insights to share on the above or on similar methods I’ve (wrongly) ignored, please feel free to join the conversation.