“Real” process tracing: part 4 — sampling
Realist Evaluation (RE) and Synthesis (RS) have an awkward relationship with the concept of sampling because there is a tension at the heart of realist research between quantitative and qualitative methods and worldviews. Pawson and Tilly (1997) advocate a “pragmatic approach,” preferring to combine quantitative and qualitative methods. Others acknowledge that combinations are possible, but resist the very concept of sampling, because its most common expression emerges from a positivist epistemology with a frequentist and probabilistic logic (Maxwell, 2012; Emmel, 2013). This form of sampling is thus inconsistent with realist epistemology and the generative logic which underpins it. As Gary Goertz and James Mahoney (2012) explain, while of equal value, these are two distinct cultures.
As realists, we’re supposed reject the premise of a “random” sample, focused on a particular variable; searching for regularities and averages. Instead, our sampling choices are inherently purposeful. Sampling is driven by the questions we want to answer and the ideas (or theories) about the social world we seek to investigate, anchored to a particular context (Emmel, 2013). So, you should select participants, activities, or incidents on the basis of their relevance to the theory you’re developing, refining, or testing (Maxwell, 2012).
Joseph Maxwell (2012) reminds us that the default to probability sampling strategies entails a bias towards uniformity. The notion of probability hides the real diversity that exists in a population and how they respond to resources and opportunities, and thus, such a bias can lead to simplistic and misleading generalisations.
Meaning, not measurement
Cases and units are included because they display certain features, not in spite of those features; and their inclusion is intended to help develop and test particular explanations. So they are included to help “solve the puzzle” (Emmel, 2013) rather than because of what they’re supposed to (numerically) represent. This is therefore more about meaning than it is about measurement. As Sayer (2000: 17) put it:
‘Meaning has to be understood, it cannot be measured or counted.’
Perhaps unsurprisingly then, RE also dodges questions about sample size. The number of cases or units included is considered less important than how the insights into events and experiences are used for interpretation, explanation, and as support (or not) for the claims we make (Emmel (2013). As RE is an iterative (and “retroductive”) process, there’s always a degree of emergence and discovery.
RE thus inexorably encourages us to adopt the qualitative frame which suggests that the purpose of a sample is about revealing new information or insights. Simply increasing the number of interviews on its own, for instance, does nothing to develop new insight. In essence, rather than qualitative data merely complementing (or filling the gaps of) quantitive data, RE turns this framing on its head.
More widely, there’s increasingly a consensus that this aforementioned saturation point is probably reached somewhere around as few as 12 interviews (Guest et al. 2006) and 6 focus group discussions (Guest et al. 2016). For those of you that hadn’t heard this before, it should give you pause. As you can see below, there’s a significant drop off in new information:
Realist Syntheses (RS), or reviews, also employ purposive sampling to help answer specific questions or test particular theories. The search is designed to retrieve studies fit for purpose in identifying, refining, or testing programme theories. We stop looking when ‘sufficient evidence has been assembled to satisfy the theoretical need or answer the question (Pawson et al. 2004: 20).’ “Sufficient” is understood as whether new studies add anything new to our understanding (i.e. saturation). Unlike systematic reviews, realist reviews are iterative (with snowballing and complementary searchers), so knowing when to stop is said to be as much an art as it is a science. But, this webinar does at least help figure it out.
We’re then recommended to appraise the quality of studies based on their relevance to the theory under study and the rigour associated with those studies in response to relevance, and the richness of studies (context/mechanisms). While “richness” is a new feature (I only heard about it the other day), it’s worth exploring because we shouldn’t just be looking at intervention and outcomes, but rather, exploring the inner workings of how context, intervention, mechanisms, and outcomes are connected.
Process tracing sampling
So, how does Process Tracing (PT) square with this?
Like RE, as I explained in a previous blog, PT subscribes to a generative model of causality, it can be consistent with critical realist epistemology and it’s also a case-based rather than variable based method (see Goertz and Mahoney, 2012; Wadeson, Monzani and Aston, 2020).
Firstly, as a case-based method, very often, N=1. And therefore what matters most is within-case explanation. While PT obviously requires triangulation, simply increasing the sample size of respondents is worthless unless it provides new information and insights about the case under study. New evidence is only useful depending on its “probative value.” This can be understood as the inferential power of evidence for the theory you’re developing, refining, or testing.
For example, if you were to interview a bystander at an event they attended by accident and didn’t know what it was about, it’s likely that their perspective will be of limited or no explanatory value (low probative value). I’ve been that bystander at various marches when I lived in La Paz (Bolivia). My testimony then would likely be what’s called “straw-in-the-wind” evidence in PT language, because it does nothing much for someone’s theory about social movement power. If, as happened on at least one occasion, I didn’t even know whether those marching were for or against a particular reform (as often you had both sides marching in the same day), then adding me to a sample would have been pretty worthless. Therefore, simply adding to your sample size of respondents only makes a difference when they can potentially shed light on the context-intervention-mechanism-outcome (CIMO) connections relevant to your theory.
PT doesn’t really do synthesis (it’s a single case method), but if you do want to say something beyond a single case, then you’re supposed map a population of cases relevant for tracing mechanisms (Beach, Pedersen and Siewart, 2019). Similar to RE then, we’re guided by the ideas (or theories) about the social world we seek to investigate. We’re choosing both cases and units based on their potential to reveal the inner workings of those mechanisms. What were the key events? How were they linked together? Who did what, when, why?
Where possible, we should include positive and negative cases on both potential causes and outcome. However, only positive cases are considered relevant for in-depth tracing of mechanisms (Beach, Pedersen and Siewart, 2019). Of course, this causes randomistas consternation because they want to know: what is the counterfactual? However, we shouldn’t necessarily expect that the mechanisms linking causes and an outcome will be the same as those that link other causes with the absence of the outcome. As Beach and Pedersen (2013: 306) explain:
‘For example, if we are studying a theory that claims that strong interest- group demands for more spending on an issue area (Cause) results in disproportionate public spending in the issue area (Outcome) through a mechanism involving the activities of lobbyists using campaign contributions, the claim is asymmetric because we would not be making any claims about what happens when there are not strong interest- group demands — nor are we making any claims about other causes of proportionate public spending. We are therefore only focusing on positive cases of both the cause (or set of causes) and the outcome.’
Some will, of course, contest the point here about asymmetry, but it does reveal some potential shortcomings in the types of comparisons we take for granted, and how reasonable (or not) these may be. As Robert Yin (1994) reminds us, we choose case-based methods because we ‘deliberately’ want to ‘cover contextual conditions’ because they’re likely to be ‘highly pertinent’ to the phenomenon we’re studying.
In the next blog, I will discuss perspectives on evidence.