I was recently discussing with Mario Picon and Kathy Bain about future work at Results for Development, and we got to talking about where development interventions begin and end.
Mario asked “well, what is an intervention?”
I’ve actually been thinking about this question for a while, and it came up in a recent evaluation for World Vision in the Dominican Republic.
Merriam-Webster’s dictionary offers several definitions of an “intervention.” The first two are the following:
- The act of interfering with the outcome or course, especially of a condition or process.
- The interference of a country in the affairs of another country for the purpose of compelling it to do or forbear doing certain acts.
The third option is an intervention for someone with a drug addiction problem. The common thread here is interference by an external agent.
More often than referring to these dictionary definitions, we tend to equate interventions with medical interventions and the expansion of this logic to international development. As Isabel Guérin and François Roubaud note, in a Randomised Control Trial (RCT):
‘Two groups are randomly selected from a homogeneous population: the first receives an “intervention” (medicine, grant, loan, training, etc.), while the second gets a “placebo” — either a different intervention or no intervention at all. After a certain time, the two groups are evaluated to compare the efficacy of the intervention or analyze two distinct approaches.’
So, the intervention is effectively the input one group receives that the other group doesn’t. OK, that’s what an intervention is. Or is it?
Definitions of what an intervention is (and isn’t) matter more than you might think for evaluation and what is supposedly known about “what works.” Under the reports section of its website, the International Initiative for Impact Evaluation (3ie) refers to ‘impact evaluations that assess the difference a development intervention has made to social and economic outcomes.’ Their Evidence Gap Maps are structured around interventions, and their Development Evidence Portal offers a taxonomy of development interventions (you can read my thoughts on the Portal here).
Yet, nowhere could I find a definition of what they mean by an “intervention.” For example, when I look at “responsiveness and accountability,” which are most definitely not a homogeneous intervention (or outcome) category, there is an information box which merely says “intervention that seeks to augment responsiveness and accountability in institutions.” Not only is this a circular definition of what are already contested and convoluted concepts, but it even introduces a further conceptual problem of what an “institution” is and a unit of analysis problem for what types of institutions (likely, meaning organisations) we might be talking about.
I am a pedant for such things, but this is not mere pedantry. In the past, I discussed the porous boundaries between outputs and outcomes, and I think we need to think a little more seriously about what we really mean by an “intervention.” In several respects, that means we need to talk about inputs and activities. In Patricia Rogers’ theory of change guide for UNICEF, we find the following definition of an input in the glossary: ‘the financial, human and material resources used in a programme or policy.’ Activities seem relatively intuitive, so there is no glossary entry, but Rogers refers to the example of producing and distributing a newsletter which should then lead to an increase in knowledge.
There is a long-standing debate among realist evaluators about what a “mechanism” is and what it isn’t (which I discussed previously). A mechanism is not the intervention, but disentangling what a mechanism is from the intervention has been a longstanding challenge. Pawson and Tilly note that identifying what the intervention is can be challenging. Nick Tilley’s (2016) EMMIE approach to evidence adds fuel to this fire because he equates “mechanisms” with “mediators” and defines these as “causal processes” which, in turn, he defines as “generally intervening variables.” There is a whole discussion on the potential problems of conflating these terms in Beach and Pedersen’s book on Causal Case Study Methods.
Similarly, Punton et al. point out that, in practice, there are difficulties in disentangling contextual features from mechanisms:
‘Context-mechanism-outcome configurations (CMOs) are the core analytical building blocks of realist evaluation. They… take the form of sentences or short paragraphs explaining how mechanisms interact with features of the context to generate outcomes. Both… evaluations added intervention factors (I) to the configuration, to differentiate features of the intervention from features of the wider context, to create CIMOs.’
Because colleagues of mine were struggling to disentangle intervention activities and mechanisms in some research on 20 years of UK support governance programming in Nigeria, I recommended the intermediate step of including interventions before mechanisms, knowing this was somewhat heretical by the realist method police. This allowed the team to better understand whether and how the intervention factors may or may not have triggered reasoning and response from public authorities. So, making the distinction helped.
The diffusion of interventions or the diffusion of innovations?
In line with the explanation by Guérin and Roubaud above, randomistas also have an opinion on this. Duflo and Kremer (2015) and Kremer et al’s (2019) reviews of which innovations scaled from USAID’s Development Innovation Ventures (DIV) note that there is a widespread view that few innovations scale, most innovation investments are unsuccessful (Kenny and Sandefur, 2013), and most that do take a long time. All but one of the top five innovations in terms of reach (> 4 million people) were simple, cheap, and were tested with Randomised Control Trials (RCTs). The top five were software for community health workers, voter report cards, election monitoring technology, affordable glasses for presbyopia, and road safety stickers. In such cases then, inputs, interventions, and innovations are almost indistinguishable. They kind of have to be from their perspective because of issues of fidelity and replication, as I’ll discuss below.
The criticism that interventions like community scorecards and social audits are “widgets” is valid from this narrow input and activity-based conceptualisation. Anu Joshi and Peter Houtzager are right that these interventions need to be evaluated as part of an ‘ongoing political engagement by social actors’ and situated in historical context. However, the binary of such initiatives either evolving organically or being introduced by external actors is misleading and inaccurate. If you actually trace the evolution of social accountability tools and processes, there is substantial hybridisation and bricolage.
Further, as Michael Woolcock reminds us in a paper largely on how poorly RCTs deal with external validity, two famous studies by Banerjee and Duflo in India and Kenya respectively, show that sometimes your inputs aren’t the same in the first place. In India, contract teachers were “better” (more effective and less costly) than regular teachers, and when regular (state) teachers in Kenya replaced NGO teachers, they actually performed worse than the control group. So, there are serious questions regarding what “interventions” are actually being replicated or repurposed.
Nonetheless, there is still something to be answered here on “widgets.”
Interventions and innovations in the Dominican Republic’s education sector
History matters. But which history matters? I recently evaluated a Global Partnership for Social Accountability-funded social accountability project in the Dominican Republic’s education sector implemented by World Vision, and I was keen to retrace some of this history. Poli et al. (2020) already did this for me, as you can see below.
But, where should I start? I could chart the history back to the Coalición Educación Digna — CED (Coalition for Dignified Education) in 2010 or even to the General Education Act (and I did so), but as the CED effectively disbanded in any meaningful sense years ago (beyond a Facebook page), there was more relevant history since 2010 from other social accountability projects (and their tools).
At the heart of the World Vision intervention, Community Participation in How is My School Doing (Mi Comunidad Participa en cómo va mi Escuela), was an adapted version of the Citizen Voice and Action (CVA) approach. This, in turn, was an evolution of community scorecards which was developed in CARE Malawi in 2002, and which was a common inspiration for some other previous projects in the Dominican Republic’s education sector (e.g., Reportes Comunitarios). I wanted to compare the key inputs and activities between the World Vision project and other previous projects, especially as one of them from the World Bank had an almost identical name (Cómo va mi Escuela). As a result, some of those in both the World Bank and the Ministry of Education understandably confused the two. Not only did I compare these two projects, but I compared several others funded by either the World Bank or USAID in the education sector. Below you can see some of the key intervention features slightly abstracted:
Ostensibly, there are various similarities, particularly in the common thread of action plans. I unfortunately don’t have fully comparable data between the projects on the level of satisfaction and whether action plans made a difference and the response rate in action plans. However, the annual resolution rate of action points was nearly twice as high in the World Vision project as for the Reportes Comunitarios in Progresando con Solidaridad which I evaluated in 2015 (though lower than the overall resolution rate). In the Cómo va mi Escuela project, less than 4 out of 10 parents and guardians considered the action plans made a positive difference (6/10 for the Reportes Comunitarios), whereas most of the 9/60 schools I consulted in the World Vision project clearly did see a positive difference. Several were already using parts of the CVA methodology for other things and were keen to repeat the process in the next school year. A couple of extra schools and a neighbourhood association even joined the project to develop action plans of their own. Of course, the project was not without its limitations (it did take place during COVID-19, after all), but it’s worth reflecting on why interventions might be received differently.
Why the difference in perceived or substantive results? The answer seems to have little to do with classic intervention inputs (i.e., tools). All three projects had something they referred to as a scorecard or a report card and they had an action plan with a similar format. Instead, what likely made the difference was the knowledge and relationships of the project team and how the interventions had been implemented with schools (i.e., soft skills). Cómo va mi Escuela was a classic low-dose, locally-bounded helicopter RCT experiment. Most people I talked to called it a “study” rather than a project, and as yet the study hasn’t shared back their results to the schools many years later. Several respondents said that they felt “abandoned.”
On the other hand, in ~ 1/10 schools, World Vision’s team had worked in the schools previously through the Leer (Read) project and unlike in Cómo va mi Escuela or even Progresando con Solidaridad (which was implemented by government actors), they were able to meaningfully follow up action points, broker connections, and triage so that resolvable issues were identified and that a relatively high proportion of them were resolved quite quickly.
So, this takes us back to Mario’s question? What exactly are the interventions here? Are they the material inputs and a set of consistently implemented activities, or are they something else? I would argue that while your tools and other inputs do matter, but the who (the most important input) and the how matters at least as much as the what. Taking complexity into account means that there are path dependencies and interactions between interventions. As Boulton et al. argue in Embracing Complexity, the important question is not so much whether interventions are simple or complex, but to ‘consider the interplay between interventions (complex or not) and contexts.’ In this case, we saw both layering on past experience and interaction effects between different interventions with a similar name.
Reflecting on these dimensions also raises questions about what can be scaled up or not. Focusing on inputs, as many randomistas do, might be rather unwise.
The straightjacket of intervention fidelity
The premise of replicability in science is that you are looking to see if someone else can get the same result in the same circumstances. Yet, the notion of the same circumstances is flawed when we adopt a more accurate view of what an intervention is, even without considering the importance of wider support factors (or moderators) which shape the possibility of effects (see Cartwright et al. 2020).
Nancy Cartwright and Jeremy Hardy’s book, Evidence-based Policy, offers a long discussion on fidelity. The argument goes that where good studies show a policy (or intervention) worked somewhere (or several somewheres) you are ‘enjoined to implement the policy exactly as it was done in the study situations.’ Cartwright and Hardy compare the Bangladesh Integrated Nutrition Programme and another in Tamil Nadu — the causal cake that worked in Tamil Nadu did not work in Bangladesh. This rather throws fidelity out the window. Cartwright and Hardy instead recommend thinking about what facts are relevant to the interventions and why they may or may not have worked rather than the crude proxies of whether the inputs were the same and whether the activities were implemented in the same way.
Annette Brown, for instance, thoughtfully discusses the issues of ecological validity of experiments (i.e., whether methods, materials and setting of a study approximate the real-world). After all, experimental interventions are un-natural by design. Yet, partly as a result, as Brown notes:
‘We have very little evidence about whether estimated effect sizes from pilots can be observed at scale. When we have impact evaluation evidence from the pilot, we rarely bother to conduct another rigorous evaluation at scale.’
This prompted many randomistas to advocate for bigger embedded experiments, like that implemented in Bihar by Banerjee et al. between 2010–2012. Jean Drèze explains the many problems that this created, alongside the challenges raised by Woolcock on external validity above.
There are other ways to look at scaling and what is being scaled. In agricultural research, for example, there has been a maturation of what scaling really means for interventions that are not mere technical widgets. As Schut et al. argue, ‘scaling should be understood and approached more as a set of interdependent changes in a broader system, rather than as the scaling of a specific technology [or “widget”].’ Garb and Friedlander (2014) also refer to “translation” and “re-innovation” as a prerequisite for scaling.
So, if we focus too much on the specific inputs, the fidelity of activities, or the supposed quality of research, we may miss the fundamental point that achieving impact isn’t just about things (deworming pills, textbooks, malaria nets, or scorecards), but people undertaking interactive and social transaction-intensive processes in the real world in different contexts.