Complexity; context, checklists?
Michael Bamberger recently offered a very thoughtful two-part blog series on building complexity into development evaluation (blog 1; blog 2). In his view, most evaluations largely ignore contextual complexities and how these affect design and implementation.
I’m listening…
Ignore context at your peril
Bamberger argues that:
‘Most development programs are designed and implemented in complex political, socio-cultural, economic and ecological contexts where outcomes are influenced by many factors over which program management has little or no control. These factors interact in different ways in different project locations. Consequently, a project, which has a clear design and implementation strategy, may produce significantly varying outcomes in different locations or at different points in time.’
He notes that, despite the frequent use of the word “complex,” most evaluations apply designs that assume the project is ’simple’ and that there is a clear relationship between the project inputs and a limited set of outcomes. We might also add, with baked in assumptions of linear change. Most of Bamberger’s critique is made of neat quantitative evaluations, centred around a pre-test post-test comparison group, and where impacts are defined as the difference in the rate of change of the two groups.
The intervention trap
I couldn’t help but feel this recent blog and parts of the related webinar on evaluating complex interventions from the Centre of Excellence for Development Impact and Learning (CEDIL) partly fell into a similar trap.
According to Eduardo Masset, a complex intervention may be defined as one with multiple interacting components and whose interactions generate complex outcomes. Masset’s proposed taxonomy of (1) multi-component interventions; (2) portfolio interventions; and (3) system level-interventions, is helpful, but a focus on complex interventions alone seems to misunderstand some of the key tenets of complexity. Complexity is about boundaries, inter-relationships, and perspectives. In other words, it’s about inter-actions between an intervention and relevant contextual factors within a system, not simply factors within the intervention itself.
The whole cannon of realist evaluation has been making this point about context, mechanisms and outcomes for a generation, yet Masset makes only one passing reference to context in the webinar.
Complexity isn’t simply about how complicated and multi-faceted an intervention is. You have to look beyond the intervention to understand how it will be received. Controlling for demographic features may not suffice. Reception depends on pre-existing relationships and connections as well as particular social norms. This is why we get different effects in different contexts (even if they’re ostensibly comparable). Different communities will also respond differently to an intervention because they have different and/or diverse perspectives. A lack of agreement is central to the Stacey Matrix, which is often used to help understand the factors that contribute to complexity.
- Rick Davies offers a helpful comparison between the Stacey Matrix and the commonly used Cynefin framework here. I prefer the former.
With an intervention-centric understanding of system boundaries, Masset appears to propose several new-ish experimental methods which don’t seem entirely appropriate to the task at hand:
Complexity goes beyond interventions. This is my mantra today. It refers to both features endogenous to interventions and features exogenous to them (or as Bamberger puts it, internal and external systems). Interaction effects are not simply within and between parts of an intervention, but between the intervention and its context. This is a fundamental epistemological issue. Yet, it’s an issue which experimental approaches and methods have largely failed to grasp, because in doing so, you need to let go various of inflated assumptions about control and deflated assumptions about social complexity.
I generally prefer not to comment on experimental methods, but I think this is a necessary exception. Factorial and adaptive designs may well be more appropriate than Randomised Control Trials (RCT) for assessing complex interventions and contexts. Perhaps less inappropriate. But they’re still designed to assess only a handful of variables/factors, and the focus is almost exclusively on intervention features rather than contextual features.
A mapping of the applications of these new-ish methods is a valuable exercise, but Masset found only 1 “true” factorial design that looked at more than 2 factors and only 2 adaptive trials that fit the search criteria. This strikes me as desperation to show relevance of an approach that is a fundamentally poor fit for evaluating complexity (perhaps the paper will be more convincing).
Mental models and appropriate methods
Our preferred epistemology not only shapes how we understand boundaries, but also how we interpret different methods. I believe a control mindset has influenced how Masset explains Qualitative Comparative Analysis (QCA), and I think this is a problem. QCA is often stuck in a tug-of-war between quantitative and qualitative scholars and evaluators. It’s a set-theoretic and configurational method which relies on within-case evidence, but it also offers the opportunity for systematic comparison between cases of potentially necessary and sufficient conditions. The strength of QCA is that is bakes in (generative) within-case explanation prior to developing truth tables for comparison (as Rick Davies pointed out in the webinar). Without this foundation, I would argue, QCA offers relatively limited value added as (essentially) medium-n statistics.
While, in evaluation, QCA will tend to focus on intervention features, it isn’t simply ‘useful for understanding the combination of programme activities that are most likely to result in the intended impact of the intervention,’ as Marcella Vigneri describes it. The combinations aren’t just programme activities, but relevant contextual and intervention features which are argued to explain the outcome in a particular case. This is also, in my view, why a dialogue between Realist Evaluation on Context, Mechanism and Outcome (CMO) combinations and potential necessary and sufficient conditions in QCA truth tables seems helpful.
At this point though, we might run away and say it’s just too difficult. As Megan Colnar pointed out only last week, in the Transparency, Participation and Accountability (TPA) sector:
Indeed, as both Masset and Peter Craig note in the webinar, not everything is complex, and we quite often confuse “complicated” and “complex.” Given its buzzword appeal, you often hear someone use the word “complex” when they mean something entirely different.
While Bamberger’s critique is mostly focused on the limitations of quantitative methods, qualitative methods aren’t immune from the challenges of evaluating complex interventions and complex contexts. As I discussed recently, even highly appropriate methods such as Process Tracing have limitations in terms of where boundaries are drawn and how we look at inter-relationships, for example. No approaches or methods are a panacea.
Complexity checklists and chunking
So, how might we go about addressing the problem? Bamberger offers a five-step process to help us. Rick Davies still has some concerns about measuring complexity which are perhaps not covered by Bamberger’s process. However, I find two elements of Bamberger’s approach offer a good place to start.
The first suggestion is the use of complexity checklists to help assess complexity dimensions. I’m not usually one for checklists, but I think the one below is actually quite helpful to consider what we ought to take into account before we choose appropriate approaches and methods.
I think this kind of checklist could also be helpful for monitoring and reporting processes, given that the degree of complexity is a pretty good indicator of what it is reasonable to report, how, and when.
The second step, which Bamberger breaks down in more detail, I would call “chunking”: Breaking the project into evaluable components and identifying the units of analysis for the component evaluation. Davies questions how we transition from step 1 to step 2, and one might argue that there are perhaps too many dimensions in the full checklist (25) to assess how they might apply to step 2 (hence I prefer the above shortlist).
Chunking does have its risks if you break things into too many parts, or the wrong parts. Chris Roche put it to me that this can sometimes be like “breaking a mirror.” It can make it harder to put things back together (especially if you don’t have a coherent analytical frame). Nonetheless, some kind of chunking seems necessary to determine what might be amenable to different methods (econometrics, case studies, etc.). As Estelle Raimondo mentioned in the same webinar, partly for this reason, chunking may be as important for portfolio evaluation as it is for evaluating complicated multi-component projects.
Only after we have broken things down into more manageable chunks, Bamberger argues, should we be choosing which approaches are methods are a good fit or not. While by no means perfect, I think these first two steps are a pragmatic place to start. I’m curious to hear what complexity evangelists think.