Toby Lowe recently argued that ‘it is impossible for organisations to “demonstrate their impact” if they work in complex environments.’
Does this herald the end of impact evaluation?
While I agree with several points Lowe makes about performance management, to conclude from these pitfalls that organisations should cease trying to demonstrate their impact in such environments seems to be an unhelpful over-correction, and I figured that I’d explain why.
TL;DR: There are many ways to think about impact and how we assess it without outcome-based performance management, counterfactuals, and supposedly “corrupted” data. Learning about whether, how and why organisations make contributions to both short and longer-term effects in different contexts are a key part of navigating complex environments. Asking organisations to forget about demonstrating their impact (in a broad sense of the term) doesn’t make genuine change work any easier; it makes it even harder. So, let’s not throw the baby out with the bathwater.
Performance management isomorphism
First, let’s cover several areas where I agree with Lowe. I think the main thrust of his argument is about performance management rather than performance measurement. More precisely, it’s about the perils of certain types of (mostly quantified) performance measurement to guide performance management.
Lowe is certainly right that “impact isn’t delivered” (like an output). I was surprised when he pointed me to several articles where people thought it could be. To make his argument, Lowe cites articles on outcome-based contracts, outcome-based performance management, and the unintended consequences. I agree that the record of Payment by Results (PbR) is questionable, and it’s certainly the case that such contracts create all sorts of perverse incentives which waste time, money, and energy. In short, for anything that isn’t an output, they are a bad idea (see here for a discussion on outputs and outcomes). PbR often relies on setting (often meaningless and unhelpful) performance measures — usually called Key Performance Indicator (KPI). As Chris Mowles pointed out recently:
“Performance measures can themselves create a performance — they become performative… [this may include] the performance of gaming the indicators in order to appear to be performing.”
This closely resembles the argument made by Matt Andrews et al. on “isomorphic mimicry” where organisations perform by adopting the form expected to hit a KPI without the function which delivers that performance. One major reason we evaluate is to assess whether this connection is credible or not. There’s also little doubt that Monitoring and Evaluation is often complicit in creating this mimicry. In Navigation by Judgement, Dan Honig reminds us that a focus on “meeting metrics sometimes undermines performance.” I’d argue it’s more frequent than sometimes.
A few years ago I wrote a blog with Alina Rocha Menocal where we argued that the growing pressures for upward accountability (or accountability to DFID/FCDO and the wider UK public) and for the delivery of quick and more easily quantifiable results did not help deliver better results. PbR did not help the programme Alina and I worked on, and due to the performative mimicry the targets created, it distracted staff attention away from actually spending time delivering the work to achieve results. Yet, we noted that our concern was “not about results and the need to report on them, but rather about how reporting is undertaken within a given programme and the purpose this serves.”
Demonstrate, their, impact
While it may be fair to say that using outcomes as performance management tools doesn’t necessarily work well in complex environments (at least in the way Lowe seems to define these), it doesn’t follow that it’s impossible for organisations to demonstrate their impact. Demonstrate just means “show clearly.” I don’t think I have to define “their,” but I probably do have to define impact, as evaluators do. The most widely used definition comes from OECD-DAC. They define impact as “positive or negative, intended or unintended, higher-level effects (see here for more info).”
Lowe uses a systems map of the outcome of obesity to demonstrate his point. He’s right that there are many factors which contribute to reducing obesity. I cited this previously. My mum is a nurse who specialises in diabetes, so I’ve heard quite a lot about it. Lowe then asks how would you distinguish the impact of your weight loss programme from the influence of all the other factors in this system? Lowe’s question assumes an exclusively counterfactual view of causation performed by experiments through which one can isolate external factors and clearly attribute the difference to your intervention. I think that Lowe’s right that you can’t do this in complex environments (or interventions). I wrote two blogs critiquing Masset et al. for suggesting you can attribute changes through via factorial designs or adaptive trials in more complex contexts.
Lowe’s also right that “your actions are part of a web or relationships — most of which are beyond your control, many of which are beyond your influence,” and some of which may well be “completely invisible to you.” Yet, Lowe’s insistence on counterfactuals and attribution is misleading. A lot of the disquiet seems to stem from the highly delimited framing of causation by certain management and business school professors, citing David Hume (or sometimes Immanuel Kant). For example, in Managing the Unknowable, Ralph Stacey argued that:
“Predictability and intention are possible only if there are direct, clear-cut connections between cause and effect — if a specific action in specific circumstances dependably leads to a specific outcome.”
Yet, why the requirement of dependability? It’s been argued since before I was born that causes need not dependably guarantee their effects for one to infer causation. There are, at least, three other well-established approaches to causation (set theoretic, generative, and dispositional, and several relevant alternatives beyond these) that don’t rely on constant conjunctions or dependable statistical associations between cause and effect or experimental control to infer causation. If you don’t believe me, read Causation: A Very Short Introduction.
So, to answer Lowe’s question: You don’t have to distinguish the impact of your weight loss programme from the influence of all the other factors in this system. You never did.
There are even deeper issues hidden here regarding what we can know in the first place. Merriam-Webster’s dictionary defines unknowable as: “lying beyond the limits of human experience or understanding.” So, we are not merely talking about something which is extremely difficult to know and understand, as Dave Snowden discusses, but something beyond the limits of our perception and cognition.
This raises a doubt I’ve always had about certain complexity typologies. In a famous article, Snowden and Mary Boone refer to the complicated domain as the realm of “known unknowns,” the complex domain as the realm of “unknown unknowns,” and the chaotic domain as the realm of “unknowables.” Barbara Zimmerman et al. go beyond unknown unknowns (“things… we don’t know we don’t know,” to use Donald Rumsfeld’s words) to refer to complex issues as unknowable — things we (presumably) can’t know. Yet, in Complex Responsive Processes in Organizations, Stacey refers to the “future as unknowable, but yet recognisable, the known-unknown.” I don’t know who’s right here. Though it’s always seemed strange to me how much people purport to know about the, apparently, recognisably unknowable. I can’t help but feel this is a form of gnosis (higher knowledge) to which only an enlightened few have access. I am clearly not one of them.
Lowe then asked what does “contribution not attribution” mean in practice? The answer, which Lowe himself has expressed before, is that many factors combine to produce outcomes, particularly multi-component outcomes like obesity. Though this does not mean that every factor is equally important for every one of those outcomes (as Jonny Morell reminds us). Arguing that many factors combine takes you to one of the other established approaches to causation I mentioned above (set-theoretic, or configurational). Indeed, the notion that other factors mediate or moderate change alongside your intervention is increasingly gaining acceptance as a mainstream view, including by several committed randomistas.
Now, there are long-standing debates on this in philosophy of science. Most notably, John Mackie’s work on INUS (insufficient but necessary part of an unnecessary but sufficient) conditions in The Cement of the Universe. Lowe writing his blog was an INUS condition for me writing this blog. I could, of course, have written a blog about this subject (there is an alternative possible future), but I probably wouldn’t have written this blog, would I? That is the difference between contribution and attribution.
Lowe is right to suggest that there are a number of challenges in “zooming in” on critical causal links in impact pathways to assess how an intervention contributes (or fails to contribute) to change, as a “causal hotspot” form of Contribution Analysis proposes to do. He argues that relying on predetermined “critical causal links” sounds an awful lot like seeking attributable change. But, I fear this misunderstands both the set-theoretic causal logic which underpins Theories of Change and the generative logic which underpins Contribution Analysis. All such an approach is actually doing, if you read Marina Apgar and Giel Ton’s blog, is asking whether there is a place where evidence is contested or an area that is emphasized by one or more stakeholders in the evaluation. In other words, is this an area that merits gathering further evidence and is this an area communities care about? It has nothing to do with counterfactual causation at all.
Lowe then critiques what he calls a “linear planning process” like the following programme logic model:
I’ve heard this critique more times than I can count (or measure). It speaks to another common misconception about and misuse of theories of change. Here Lowe seems to equate non-linear cause and effect (i.e., non-proportional relationship between inputs and outcomes), a linear planning process, and a set of boxes with a (unidirectional) set of arrows from inputs to long-term outcomes (see my previous blog on this). Recall Edward Lorenz:
“A linear process is one in which, if a change in any variable at some initial time produces a change in the same or some other variable at some later time, twice as large a change at the same initial time will produce twice as large a change at the same later time… A nonlinear process is simply one that is not completely linear.”
Never, in my experience, have I seen anyone use a theory of change in such a way as to suggest that simply increasing inputs by a certain amount will increase outcomes or impacts by a proportional amount. They aren’t linear in this sense. Anyone who uses a theory of change in such a way doesn’t understand what an outcome is. Theories of change (ToC) employ a set theoretic (combinations of causes lead to an effect) and generative (“mechanisms” explain effects) logic to explain how causes lead to effects, including the direction(s) of those effects. It’s debatable whether theories of change are really about making predictions in the traditional sense. There have been some recent efforts to improve the reliability of predictions in theories of change. However, a review on theory of change use by the Consultative Group on International Agricultural Research’s (CGIAR) argued that:
“A ToC is not a prediction of impact — it is a description based on the information available at the time of what will need to be done to achieve a desired impact, and what might prevent that happening (CGIAR, IEA, 2017).”
Even if you view the propositional logic of a theory of change as a form of prediction, you are talking about incomplete, imperfect, contingent, caveated predictions and conditional probabilities rather than marginal probabilities. So, this line of critique is often misleading.
Lowe argues that “creating linear programme logic models is essentially time spent inventing a fantasy.” Vision and mission statements, or impact statements, are often fantastical. I agree with Lowe that “if we care about making impact in the real world we need to stop pretending.” As Ann-Murray Brown quips, a theory of change charts “how an intervention/programme takes the world from a place of desolation to utopia.”
But, not really. Brown reminds us that ‘many ToC outcome pathways include an “accountability ceiling” — usually at outcome level. Or, borrowing from Outcome Mapping, they denote spheres of control, influence, and interest. This reflects the fact that we all know outcomes are uncertain. We all know outcomes can’t be guaranteed. This is why outcome-based payment models are a bad idea.
The illusion of certainty
Lowe then covers the well-covered critique about theories of change and certainty. Theories of change aren’t supposed to create a sense of certainty. Quite the contrary, just like the CGIAR example above, they ought to reflect the levels of uncertainty we have about whether and under what conditions higher-level outcomes (i.e., outcomes of outcomes) might be achieved, or what might prevent them from materialising. Certainly, a simple diagram with unidirectional arrows seems to imply an ineluctable process, but this fundamentally misreads logic models and theories of change. Theories of change are a best guess based on limited information at a particular point in time, they rely on implicit and explicit assumptions, and they almost always have to be updated and adapted as we learn more about the environmental context in which projects/programmes operate.
Firstly, Lowe seems to omit the fact that in the diagram above we have “environmental context.” The environmental context is supposed to feed into each step in the hypothesised process. John Mayne, for instance, recommends the use of nested theories of change for methods such as Contribution Analysis. Process Tracing reconstructs hypothesised causal chains. These may resemble parts of a logic model, but they are rarely, if ever, identical. This demonstrates that there is almost always a difference between what we tentatively hope may happen and what actually does. Few evaluators have illusions about this.
Secondly, as Apgar et al. point out, ‘when the theory of change is uncertain and contested, the steps are best depicted as an iterative cycle of learning,’ reflecting on assumptions and revising the story. Similarly, in their Outcome Harvesting guide, Ricardo Wilson Grau and Heather Britt remind us that the ‘objectives and the paths to achieve them are largely unpredictable and predefined objectives and theories of change must be modified over time to respond to changes in the context.’ We all know there will be inaccuracies due to incomplete information and the unpredictability stemming from emergence.
Thirdly, assumptions reflect uncertainty, not certainty. Any theory of change should have assumptions. I’ve argued on several occasions that assumptions are the key part of theories of change, and fundamental to learning. It may reasonably be argued that many people don’t take this seriously in practice, but this reflects a mis-use of the tool and process of developing theories of change. For example, Heather Britt’s work on Causal-Link Monitoring with Richard Hummelbrunner is instructive here because it focuses on assumptions and how likely they are to fail. It doesn’t assume that that inputs will ineluctably lead to impacts; it suggests there are many things that might well derail the process.
Hence, I think many of Lowe’s critiques regarding contribution, linearity, certainty, and fantasy are quite poorly founded. No doubt there’s some poor practice out there, but I think Lowe significantly overstates people’s lack of awareness, or intelligence.
Imagining alternative futures
I agree with Lowe that we should ask people/organisations to be accountable for experimenting and learning together. Rocha Menocal and I said in our blog. But, forgetting all about demonstrating impact (positive and negative, intended and unintended higher-level effects) leaves learning seemingly bereft of a discussion of contribution to results.
It’s one thing to conclude that outcome-based contracts and outcome-based performance management are bad ideas and perhaps we should abandon them. It’s quite another to extrapolate from this evidence that organisations should forget entirely about demonstrating whether they are contributing to positive or negative, intended or unintended, higher-level effects.
Even more problematic for Lowe’s argument is that one of the main studies he cites in support calls for more Randomised Control Trials (RCTs) to give us a “robust evidence base” for comparison. As Emma Tomkinson put it, “real value will be produced if studies are able to isolate the effects of different contracting variables.” I thought this is what we couldn’t do in complex systems. Perhaps it’s this incompatibility with experimental methods that leads to Lowe’s radical and hyperbolic conclusion that we should abandon impact assessment.
I’m strongly in favour of experimenting and learning, and I’ve reviewed a lot of the evidence on adaptive management in international development. In my view, there’s now enough evidence to argue it’s passed the proof of concept stage. However, I’m deeply suspicious of anyone telling us how “real impact is made.” Evidently, there are many paths to impact, and many ways of assessing impact (i.e., beyond counterfactual causation).
Experimenting and learning might be helpful to achieving “real impact.” But, it’s unlikely they are sufficient conditions. It seems even less likely that deciding not to assess impact will achieve more impact. Wouldn’t we need an impact assessment to make such a claim anyway? So, in my view, we shouldn’t throw the baby out with the bathwater. Instead, let’s experiment and learn about different ways to assess impact in complex systems.
Impact evaluation is dead, long live impact evaluation.
Consider signing up to the Causal Pathways Symposium if you’re interested in hearing more about different ways to assessing impact.
Many thanks to Alan Hudson for comments on an earlier draft.