Sam Francis, Around the Blues

What’s wrong with process tracing?

I’ve written quite extensively on the use of process tracing in evaluation in the last few years, trying to find ways to make it a little more accessible and useful. This included a paper with Alix Wadeson and Bernardo Monzani on learning from six process tracing evaluations, a blog explaining the logic of process tracing using the Netflix show Making a Murderer, and even a blog series on how to combine process tracing with realist evaluation. Alix and I will also be doing a training on process tracing at the European Evaluation Society (EES) conference.

In short, I’m a fan, and I think there are many benefits, but I also believe that no approach, method, or tool is perfect, so we should take critique seriously.

My interest was spurred by an argument between Andrew Bennett, Tasha Fairfield and Andrew Charman and Sherry Zaks in a recent volume of Political Analysis. What Zaks is, in fact, critiquing is Bayesian process tracing, and at that, the rather idiosyncratic and disputed interpretation (without explicit causal chains or process tracing tests — the backbone of the method, in my view).

The main thrust of Zaks’ critique of the Bayesian process tracing revolves around three statements from a recent Fairfield and Charman paper (which isn’t even really about process tracing) related to the status of new evidence, about the order in which we learn from information, and whether we should report the timing of the research process. All three statements contradict what I learned when I was taught process tracing.

Nonetheless, Zaks’ paper does offer an interesting critique, and some of it is relevant to various forms of process tracing in research but also (for my interest here) in evaluation.

Bayesian evangelism

Zaks notes that Bayesian process tracing has quickly become a frontier of qualitative research methods.’ Her prime adversaries and Thomas Bayes’ greatest evangelists, Fairfield and Charman, view the turn to Bayesianism as a watershed for in-depth, small-n research. But there seems to be a growing consensus that Bayesian inference is at the heart of process tracing.

But Zaks concludes that:

‘Until Bayesian proponents can demonstrate where their method reveals new conclusions or more nuanced inferences, the costs of adoption will continue to outweigh the benefits.’

I’ve argued elsewhere for how Bayesian forms of process tracing do, in fact, reveal new conclusions and more nuanced inferences in evaluation and, in my view, do justify the opportunity costs. You don’t have to take my word for it though, you can listen to my ex-colleague, Samuel Addai-Boateng, give his view during the Overseas Development Institute’s Measuring the Hard to Measure in Development discussion. Nonetheless, I do want to discuss some potential weaknesses.

Miracles or magic?

The origins of Bayesianism comes from a Presbyterian minister, Thomas Bayes, who was apparently looking for evidence of miracles. Process tracing has evolved substantially as a method since Stephen Van Evera, Andrew Bennet and Alexander George brought it to popular attention in 1997. Back then, it wasn’t considered Bayesian. There is now ever greater philosophical and conceptual delineation and additional methodological sophistication. A lot of this is pretty impenetrable to those looking up at ivory towers. Metaphors like “straw-in-the wind” or “smoking-guns” hardly translate well (I’ve done process tracing in Spanish and French). Indeed, the recent application of Bayesian formulae, explicitly specifying quantities and prior probabilities, such as in contribution tracing or the Bayesian Integration of Quantitative and Qualitative data (BIQQ), while potentially useful, doesn’t make things more accessible. All this jargon make us sound like alchemists.

As Colin Hay puts it in a critique of process tracing in 2016, all good social science traces processes and always has. Zaks also rightly points out that the fundamentals of Bayesian reasoning which underlies process tracing of ‘“updating our views about which hypothesis best explains the…outcome of interest as we learn additional information (Fairfield and Charman, 2019: 158),” may seem so obvious as to not need a name beyond research.’

There’s a great video which explains Bayes theorem in a relatively simple way through geometry. So, if this is all new to you, pause here:

Overall, the foundations are relatively straightforward, but the tools and techniques we use to operationalise Bayesian logic and process tracing is not.

Process tracing is technically difficult

As Scott Bayley laments, authors have a habit of proposing ‘a new method that is quick and simple and will “solve” our current challenges.’ He argues, not without good reason, that like in contribution analysis, some have given the impression that process tracing is easy.

I’m inadvertently guilty of this. In my efforts strip away the complicated theory, techniques, and language to the fundamentals, I made an adaptation of Bayesian process tracing, contribution rubrics, which I unwisely subtitled “a simple way to assess influence.” Of course, by simple, I meant simpler.

Learning how to do process tracing well, and particularly Bayesian forms of process tracing is technically challenging. At least for evaluation, you have to re-programme your thinking around degrees (or levels) of confidence in a hypothesis rather than viewing it as simply true or false. You have to de-programme assumptions you may hold about attribution. You have to learn how to make theories of change (or “causal chains”) more specific and testable. You also need to learn what the metaphors really mean, distinguish between them, and practice the application of evidence tests. You have to be much more serious about considering rival (or alternative) explanations. And, for some forms of Bayesian process tracing (e.g. contribution tracing), you have to learn how to calculate probabilities. All of these require training and practice.

Zaks notes that the ‘explicit Bayesian approach, in particular, requires marked start-up costs in the way of training.’ I can certainly vouch for this from my experience with both contribution tracing and contribution rubrics. Yet, Zaks presents a misleading zero sum trade-off between the merits of training in process tracing vs. training in ethnographic, interview, and sampling techniques, for instance. In practice, you’ll find that interviewing techniques, for instance, are fundamental to good process tracing, so it’s not an either/or. And other methods like realist evaluation (especially realist interviewing) are equally, if not more, technically challenging. To return to Hay’s pithy remark, all good social science requires these basic skills.

Process tracing is time consuming

An obviously related point is that to do process tracing effectively, it’s time consuming. While it’s not necessarily more time consuming than other theory-based methods (e.g. realist evaluation or contribution analysis), the time investment to learn the theory, and the practice of developing develop causal chains or to conduct evidence tests is substantial.

For researchers like Zaks, who already have impressive research skills, one might question what the value added of the various techniques are. Some researchers and evaluators are reasonably sceptical of formulae, notation, and mathematical calculation. While none of these are strictly necessary, they do provide some of the structure and guidance Zaks asks for. As my co-authors and I discuss here, numbers themselves can be potentially misleading, but discussing what numbers represent is a helpful critical thinking exercise to justify why a particular number (i.e., probability) is proposed.

Moreover, monitoring and evaluation teams don’t necessarily have the research skills of a university professor. Many won’t be experts in developing theories of change, they may not have heard of a “mechanism” or “generative causation,” or “type 1 error.” Learning this stuff takes time.

On a more practical note, most evaluations tend only to have budgets for about 30 days. So, if you want to make the process participatory (and I recommend you do), you’ll likely require a minimum of a few days of training, a few days to develop theories of change and causal chains, a few more days of coaching even before you start planning to gather any data. These initial steps help to improve critical thinking and reflecting on potential biases. But, by now, it should be obvious that process tracing isn’t a quick fix.

There are ways to make it quicker and more agile, and there are various things you can do to make the process a little easier and more comprehensible (Alix and I will discuss some of these at EES), but it doesn’t wave a magic wand, nor paper over the cracks you may have in your monitoring and evaluation system.

Zaks makes various criticisms which apply only to Bennett, Charman, and Fairfield’s interpretations of Bayesian process tracing. She claims a “consensus” which doesn’t exist, but she helpfully questions whether:

‘The act of explicitly justifying [probabilities] and subjecting both the probabilities and justifications to scrutiny (via peer review) contributes to greater transparency of the assumptions researchers make implicitly (even without a Bayesian approach).’

One doesn’t necessarily have to make probabilities explicit, nor calculate them, but subjecting your hypotheses to scrutiny prior to data collection is helpful because it can force you to focus your data collection on the data which should matter most for your hypothesis and alternatives (despite Fairfield and Charman’s objections).

Does process tracing really assess mutually exclusive hypotheses?

Zaks offers an interesting discussion regarding whether Bayesian process tracing (or any other form of process tracing, for that matter) develops mutually exclusive hypotheses or not. Zaks argues that:

‘Bayesian method for inference relies on the often incorrect assumption that all rival hypotheses are mutually exclusive: that is, they cannot simultaneously be true (my emphasis).’

At least, in part, I think Zaks is right here.

In most complex phenomena there are no singular causes and explanations, but a set of causes which together produce an outcome, under certain conditions. In research, comparing rival hypotheses is not unique to Bayesians (Fairfield implies as if it were, and that she has discovered the holy grail of iterative and abductive research). We typically weigh up whether one hypothesis is a better explanation for the phenomenon than another.

In her evangelism, Fairfied both oversells Bayesianism as a panacea and overstates the degree to which we can fully rule out alternative hypotheses. Often, while one hypothesis may be a better fit, this doesn't necessarily mean that it is the sole explanation, that none of the factors expressed within a rival hypothesis are any consequence, and that none of these factors interact with the factors expressed within one’s preferred hypothesis (unless we are randomistas, of course). They are sometimes merely of lesser consequence.

In evaluation, you’re unlikely to be considering something as grand as what explains institutional development in Peru. Instead, you’re making claims about whether and how an intervention may (or not) have made a significant contribution to an outcome, or whether that outcome might not have materialised without the intervention. So, confirming “contribution claims” (as we call them) is potentially easier in evaluation than research (there are likely to be fewer significant contributing factors). You can also potentially rule out rival claims by demonstrating that these were of little of no evident consequence to the materialisation of the outcome. I would, however, question whether anyone fully or equally investigates rival explanations. Add up the time you spend on rival explanations. And consider whether you can even gain access to the right material to investigate them fully.

Nonetheless, for any important outcome, there tend to be contextual features and complementary contributions which also aide the materialisation of the outcome. If you fail to consider these, you run the risk of overclaiming. Process tracing has four evidence tests. One of these is called the “doubly decisive” test. Finding evidence to pass this test not only confirms your claim, but it rules out all others. In a murder case, it’s possible to confirm who the murderer is (e.g., video of the murder in action) and in the process rule out all other prime suspects. While we might rule out another rival murderer, we might wrongly assume that there are no accomplices.

Let’s imagine there actually were an accomplice in the case, we might confirm who the murderer was (i.e., “smoking gun” evidence) without ruling out the support of accomplices. This speaks to Zaks complaint about a rival hypothesis not necessarily being considered less plausible if you haven’t found evidence against it. It rather depends on whether we’re talking about a rival murderer or an accomplice. That is, are we talking about the way in which complementary factors combine, or factors which are truly mutually exclusive.

This is where thinking through process tracing evidence tests is helpful. Fairfield and Charman’s faith in Minister Thomas Bayes is so unshakeable that they, unfortunately, view tests as unnecessary. This, I think, is an error.

One of Zaks’ critiques of Fairfield and Charman is regarding the order in which we learn and report new information. If you carry out evidence tests, you tend to carry out “hoop tests” first because you want to know first what might disconfirm (or rule out) your claim before searching for evidence that would help confirm or add confidence to the claim.

To take the murder example again, you consider whether it was plausible that the suspect was the murderer first (hoop test), as you may have various suspects and various potential accomplices (i.e., rival and complementary hypotheses, if you like). You do this to know whether you can reasonably rule a suspect out. You might want to know whether they were in the right place, whether they had a credible alibi, etc. before expending further effort to link them specifically to the crime (smoking gun test). A murder is itself a useful but imperfect analogy. It’s really good for explaining what how “probative value” works (i.e. evidentiary force), but with our mental models of the lone murderer (i.e., full attribution), for complex policy change processes, for example, the murder analogy is slightly misleading.

We should also remember that the adequacy of these evidence tests relies on how well calibrated they are. And this calibration is only as good as your critical thinking, reasoning, and judgement. Whether something is a good “hoop test” or not, for example, tends to require some discussion and a good understanding of intervention context. One might dishonestly proclaim a “hoop test” passed, when it was never really evidence that might have disconfirmed part of your hypothesis in the first place.

Can process tracing really trace causal mechanisms?

Mechanisms are a thorny subject in the social sciences, and no one really has the perfect answer to what mechanisms are let alone whether we’ve found them or adequately explained them. Process tracing is certainly one method that has a good case to argue that it can, but it’s not without its challenges.

Colin Hay rightly notes that there is a risk in process tracing of misidentifying causal mechanisms. This is by no means unique to process tracing, but Hay rightly points out that to trace processes you must first have identified them. There’s no escaping this. In evaluation, where you tend to start by considering whether an intervention may have contributed to an outcome, you run the very real possibility that it made only a trivial contribution or even no contribution at all. I know of several occasions where those who commissioned the evaluation clearly miscalculated.

Even if, ostensibly, the intervention appears to have made a difference, the evidence you may find may not necessarily adequately explain the process. Process tracing offers no guarantees. You can either add confidence to or undermine a particular hypothesis.

One strength of realist evaluation, in contrast, is in its efforts to identify an agent’s “reasoning” and thus potentially reveal people’s motivational impulses. As process tracing generally takes a positivist stance, it’s quite common for evaluators to focus on actors and activities (i.e., events) but they should also seek to enquire into why decisions within these events are taken. I talk a bit about this in a webinar on combining process tracing with realist evaluation and built this into contribution rubrics for this reason.

Can process tracing reduce confirmation bias?

As with any theory-based method, process tracing confronts small-n biases, including confirmation bias. Sometimes biases amount to outright lying. As Fairfield argues in her uncharitable critique of Veil of Ignorance Process Tracing,

‘Dishonest scholars can always find ways to be dishonest whatever constraints are imposed by the discipline.’

One way in which this can manifest itself is in cherry picking evidence to suit one’s “pet theories.” Fairfield claims that Bayesian reasoning provides safeguards against confirmation bias, though she mostly refers to the benefits of comparing rival theories rather than Bayesian reasoning per se. Fairfield and Charman advise against elaborating hypotheses and identifying expected evidence on your hypotheses and rival hypotheses before collecting data (as you do in contribution tracing) because they suggest that this makes it more likely to ‘seek out the sorts of evidence that will support our pet theory.’

While it’s likely you’ll identify evidence you’d like to find beforehand, this doesn’t mean you’ll actually find it. Again, the problem here is researcher dishonesty, not methodological constraints. In process tracing, you don’t just seek evidence to support your pet theory, but also evidence that may cast doubt on your theory. This is what “hoop tests” are for — to potentially disconfirm parts of your theory, or all of it, if you don’t find evidence you would reasonably expect to find. Passing “hoop tests” alone, however, doesn’t mean you confirm your theory. To help confirm your theory (or increase confidence in it), you need evidence which is more uniquely tied to your proposed explanation.

So, while Bayesian reasoning can help, it’s really the comparison of rival theories, supported by the structure and adequately calibrated evidence tests to confirm and disconfirm your theory that helps to mitigate confirmation bias in practice.

To recap, all methods have their weaknesses. Process tracing is no exception, nor indeed are Bayesian forms of process tracing an exception. In my view, the more open we are about discussing potential weaknessess, the easier it will be to find ways to improve. So, I hope this has provided some food for thought for advocates and critics alike. If you want to know more, you can turn up for our training on process tracing or hear me introduce contribution rubrics at the EES next month.

I'm an independent consultant specialising in theory-based and participatory evaluation methods.