Niches, edifices, and evaluation jargon
In a recent blog, I argued that criticising the Randomised Control Trial (RCT) “movement” is largely a waste of time. After all, those who criticise it tend to have a different view of causation, epistemology, and ideology. They are also usually asking different questions in the first place. So instead, I suggested that the anti-RCT movement (if we can call it as such) could have a more fruitful conversation comparing what we have to offer as alternatives, and spend more time learning from one another to move forward together.
While I’ve seen some signs of movement recently, standing in our way from having a more productive conversation are not just ontological, epistemic, or methodological differences, but politics, power, and prestige. So, in this blog I want to lay out some of the political artifices in monitoring, evaluation, and learning (MEL) of international development we need to challenge to make progress.
In his (relatively) well-known book on theory-based evaluation, Huey Chen argued that evaluation credibility relies on both scientific principles and stakeholders’ perceptions. He claimed that:
“Programme evaluation is both a science and an art (Chen, 2015: 22).”
I’m pretty keen on art. So, over this blog series I’ll discuss some of both the science and art of MEL. I’ll also draw on the perspectives of other MEL peers in the process. I hope this will allow us to reflect upon how different ways of seeing shape which and whose knowledge and perspectives are valued (or not) in the process.
In response to my first blog, Dave Algoso, a consultant like me, argued that:
“Consultants and academics alike face real economic incentives to carve out their space and convince people that the particular nuances of their methods make a difference. It ties to the idea of expertise, that the more arcane and jargon-filled your approach, the more advanced it must be. “
It’s undoubtedly true that we have incentives to create and defend niches. A niche is defined as both a comfortable position of employment and a shallow recess or alcove which houses a particular ornament. MEL is no more immune to ornament than any other industry. One of the titans of evaluation, Michael Quinn Patton, for example, appears to create not just a new method, but new approaches, new principles, and even new premises every couple of years. While these are invariably edifying, they’re very niche, and often feel like a private club for the wealthy, highly educated English-speaker who can not only afford the books but also knows the right lingo. Duflo and Banerjee’s comments in The Guardian recently should also be taken in light of the fact they are also trying to sell a new book and talk up what Stewart Donaldson referred to as evaluation “brands,” vying to be part of the “global evaluation tool store.”
So, how much of this is about creating niches and promoting brands? Well, it varies. Some, like my recent purchase, are so niche that the few copies published are sold out to adoring fans on the first day and are not available for purchase outside Amazon’s reach in rich countries (my copy has now arrived).
Other method advocates have tried harder either to create free platforms or learning communities which are relatively inclusive for those who sign up. Yet, there remains surprisingly little dialogue between these groups, even when there is much scope for mutual learning.
The photo at the top is of il nichioni (the “monster niche”) in the Vatican courtyard. While this niche was designed for the pope (and thus exclusive), its size illustrates that while most niches are small, they don’t need to be. We should be striving for bigger and more inclusive niches. Or perhaps, as Chris Roche suggested, we should be looking to create better connected MEL networks or healthier learning ecosystems. But, I think there’s still a fair way to go before we can really use that language.
After all, part of our artifice as consultants and academics is presenting our methods as being more complicated than they really are (or need to be). As an ex-colleague of mine, Ximena Echeverría, pointed out to me recently, this conceit makes theory-based methods seem daunting, even intimidating, for those unfamiliar but just want to improve their MEL. She argues that this doesn’t mean we have to use simple methods, but we need to make more complicated methods more accessible. One key battle in that fight is language.
Nearly all MEL literature is in English. Methods such as outcome mapping made a deliberate effort to translate materials some time ago and BetterEvaluation has also translated a number of theory-based and participatory methods into Spanish. But the fact that it has taken until September 2019 to translate many of these methods suggests a wider problem.
Each method has created its own sophisticated and confusing terminology (especially when English isn’t your first language), offering us unique terms like “boundary partners,” “social actors,” “progress markers,” “mechanisms,” “hoop tests,” smoking guns,” “straws-in-the wind,” “ontological depth,” “retroduction,” and even “blue marble.” The list is almost endless, and it makes the disconnections between us greater, even when some of these terms convey relatively similar concepts. A “boundary partner” is really just an actor/group you work with directly. So, this could simply be a partner? Also, what kind of actors are notsocial? Anti-social actors?
I’m told, in developing outcome harvesting, Ricardo Wilson-Grau changed the word “boundary partner” from outcome mapping to “social actor,” in part, because the translation of boundary partner to Spanish is awkward (typically, socio directo). Boundary partners and social actors are sometimes the same, but not always. It might be that your partner changed their behaviour, or it might be that some other actor you targeted changed their behaviour. While semantic differences sometimes convey meaningful distinctions, they rarely make a big difference in practice, once translated.
While a number of list serves offer helpful tips and exchange, I fear there are far too many questions are about what is officially correct, rather than a debate about different interpretations. For example, “Am I Deviating from Realist Principles?” does far more to reinforce gatekeeping by the method police than prompt thoughtful debate (especially between methods).
Added to this in many cases is an elaborate infrastructure of accreditation and status validation. Trusted advisors, peer reviewers, and advisory boards have a great deal of influence over whose evaluation methods are chosen and who gets access to funding (as is the case in research). This is often based as much on world views as it is on specific methodological or empirical concerns. Indeed, whether a matter of thematic or methodological preference, we rarely consider how narrow a cultural landscape this advice generally comes from.
Moving beyond the Anglosphere
Let’s be clear; evaluation is dominated by the Anglosphere, and this finds expression in Anglophone bias everywhere. This is particularly strong in process tracing metaphors (“hoop tests,” smoking guns,” “straws-in-the wind”). These are commonly narrated in terms of Sherlock Holmes’ reasoning, but they also resonate with other European cultural reference points such as Nordic Noir and American courtroom dramas or legal conspiracies like the brilliant Netflix show Making a Murderer (which I will discuss in a future blog). These all suit the tastes and traditions of the Anglophone evaluator.
I’ve been concerned about translating process tracing into Spanish for some time. I’ve given brief introductions to a few teams in Spanish and also ran evaluation workshops partly in French. While it’s possible to convey the general idea, the metaphors are cumbersome and translate poorly. It turns out, we need different language for French, Spanish, and Italian. So, just imagine how difficult this might be for non-European languages.
In process tracing, “hoop tests” are evidence tests which are about evidence that becomes important if we don’t find it. Failing to go through the hoop (as we would expect) damages the credibility of our explanation. Essentially, if the evidence is missing, we can’t be sure our explanation is correct. Words like “sensitivity,” “certainty,” or “necessity” are often used to explain what hoop tests are about. It’s evidence we need to find for basic plausibility.
“Smoking guns” are a murder metaphor. This test is about finding unlikely evidence which, if we find it, helps to confirm our explanation (related to a low probability of false positives). We tend to use the word “uniqueness” because confirmatory evidence is about ruling out alternative explanations. However, in both Spanish and Italian, it seems more accurate to use the statistical language of specificity (especifidad) to refer to false positivesbecause unique (único) and necessary (necesario) takes you to a different form of reasoning in Spanish.
While these adaptations may be formally inaccurate for the method police, it is more important to be understood than to be 100% correct. We should embrace such adaptations, because cross-cultural translation really matters. The importance of this was especially clear to me in helping to develop a guide for Oxfam Pacific on Thinking and Working Politically (TWP) last year which included a number of MEL methods and tools I mentioned in the first blog. This was appropriately renamed Planning and Navigating Social Change: Tools for Pacific Voyagers.
Many of the national staff did not speak English as a first language, many had not been to university, and were not necessarily big readers. Guidance and tools were thus often commandeered by expatriates. We felt we had already removed as much jargon as possible from the guide in the first draft, but we still needed to go further. It was felt that on one hand, TWP was how staff had to work to get things done (navigating a constellation of family-clan-wantok relationships for social and economic security), but discussion of TWP also imposed its own form of “isomorphic mimicry” — it allowed the teams a lens for outsiders to see what was (already) going on. Being contextually appropriate, locally driven, and adaptable were not new, but the language was. The team wanted to strip back the language as much as possible, as it was felt such language would enhance external specialists’ power rather than ensuring power lied in local people’s hands.
We pitched a number of different tools from which the team could choose. The simplest and most accessible generally went down best, as these required the least facilitation. As relationships were key, outcome mapping fit particularly well. The actor-centred nature of outcome mapping and the idea of progress markers also resonated with a journeying metaphor. Essentially, it fit a particular worldview in the Pacific. The benefits of simplicity of outcome mapping’s progress markers and the importance of relationships were also key findings from a MEL for adaptive management workshop in the Philippines by the Asia Foundation.
I found my co-author Doug Orr’s reinterpretation of adaptive management metaphors particularly interesting. He noted that adaptive management resonated because it fit with Pacific notions of journeys and the metaphor of way-finding. This was deemed particularly relevant given the revival of traditional canoes and techniques as part of identity-building, especially in Polynesia and Micronesia.
Doug sketched out the way-finding approach roughly like this:
- You need to know the features of the destination (an island) but you may not necessarily have a map to reach the destination;
- You need to understand the system (ocean, stars, winds, clouds, fish, birds etc.) and how this might give you cues for your, as yet unseen, destination;
- Having signs helps navigators to understand where they are and where they came from, so they don’t go backwards or sail around in circles;
- Understanding specific events as indications of the state of the system at different stages of the journey (stages of change);
- Developing an understanding of these events and what they signify through generations of close observation of nature — which would have had to be passed down from person to person;
- Journeying in fleets of canoes, requiring communication for relaying information from other canoes about signs spotted (e.g. birds);
- Everyone having and playing roles to contribute to the voyage — so there must have been people in each canoe (not just navigators) in the fleet who were keeping their eyes out for signs.
In response to some of these reflections, Chris Roche mentioned that:
“With local organisations in the Pacific we have found an approach which starts with questions about how is change already happening in your context (i.e. a slightly different approach to ‘theory of change’ vs. theory of action) can lead to a rich discussion which factors in culture, relationships, politics quite naturally, and is less organisation-centric.”
While theories of change were an Oxfam International standard, they were a less good fit than outcome mapping for these Pacific teams. And it may be that a form of systemic action research (which is what Chris hints at here) would also be quite useful.
All of this requires a sense of humility and adaptiveness on the part of us political economy and MEL nerds. Insights from an emerging literature on indigenous knowledge systems and ground rules and data sovereignty in Australia and New Zealand might help us rethink whose learning and adaptation counts in the first place. Weekend reading, perhaps?
Also, while there are thousands of (impact) evaluators out there, including in sub-Saharan Africa, the issue is less about whether evaluators know impact evaluation methods than about whether they understand the context and the nature of a specific sector, so they know what they’re evaluating. When 9/10 of these evaluators are in only three sectors (health, education and agriculture), there remains a huge gap for more complex programming such governance and accountability. Knowing the lingo and debates in Washington and London is also a huge advantage. Let’s not pretend as though that’s not the case.
In the next blog, I will build on the above focus on relationships, taking a look at a few methods that appear to have the most promise to fill this gap. Stay tuned.
My thanks to the helpful feedback to drive the conversation from Florencia Guerzovich, Chris Roche, Ximena Echeverría, Kaia Ambrose, Maria Cavatore, Soledad Gattoni, Doug Orr, and Dave Algoso.