The comeback of the case study?

10 min readJan 11, 2023

Bent Flyvbjerg’s article on misunderstandings about case study research reached 20,000 citations on Google Scholar last week, and it made me wonder if we’d reached a watershed moment.

According to Google Scholar data, Robert Yin’s Case Study Research is the sixth most cited article or book in any field of all time, having been cited over 220,000 times and Robert Stake’s The Art of Case Study Research has been cited more than 51,000 times. Alexander George and Andrew Bennett further point out that roughly half of all articles in the top political science journals in the early 2000s used case studies. Perhaps, then, we’re not seeing the comeback of the case study, but the quiet, inexorable, rise in its scholarly acceptability.

However, Jennifer Widner and Michael Woolcock’s new open access book The Case for Case Studies note that skepticism lingers in certain quarters (those quarters are, usually, called economists). Widener and Woolcock’s book does a good job in countering this skepticism, particularly the chapters on understanding causes in single cases, the nature of transferability between contexts, and on delimiting generalisations.

All of this made me think perhaps it was time for some personal reflection on an approach I, and many others, seem to take for granted. In homage to Flyvbjerg’s achievement, I figured I’d discuss five challenges I’ve had in developing case studies in recent years in research and, particularly, in evaluation.

Five big challenges of case study research

Resources, time… and politics

In my view, the first and most important challenge when developing case studies is securing adequate time and resources to dig deep enough into the case. For Robert Stake, case studies need to be studied at length. But, evaluators, in particular, are often asked to produce case studies in a very short period of time. Rarely is enough time allocated to achieve sufficient depth. Budgets are usually quite tight, and that means relatively few days can be dedicated to get a handle on relevant contextual and causal factors in a case. As Widner, Woolcock, and Ortega Nieto’s chapter notes, there are often constraints of resources, politics, and time. So, in addition to resource and time constraints, organisational politics shouldn’t be underestimated, as politics shapes the criteria for case selection, what evidence can be collected, and what is (and isn’t) published. These practical issues ultimately eclipse any methodological challenges.

The art of casing

The second big challenge is what Charles Ragin and Howard Becker call “casing” — how we choose cases to help us link the theoretical and empirical. Casing isn’t easy, especially with multiple cases. Ragin and Becker edited a book called What is a Case? which discusses some of the main issues of multiple definitions and uses if you want to learn more.

Before we can case, as a verb, we need to know what a case is as a noun. John Gerring reminds us that case studies are notoriously difficult to define. Case study scholars don’t fully agree on what a case actually is. However, drawing on Stake, Rohlfing, and Beach and Pedersen, my explanation of case study research/evaluation is an empirical analysis of temporally and spatially bounded phenomena (or systems) that are instances of a group of similar phenomena.

The temporally and spatially bounded phenomenon (or phenomena) we choose is the main unit (or units) of analysis. When I say that these are instances of a similar group, I’m adopting Ingo Rohlfing’s argument that comparing case studies requires at least some degree of causal homogeneity. This is a key part of what makes a case relevant to answer your research/evaluation question(s). Not all case studies are about a particular outcome (or outcomes), but it could well be argued that “causal” case studies in evaluation are. For this reason, I also take Beach and Pedersen’s (generative) emphasis on a causal process which plays out and links a cause (or set of causes) with an outcome.

When we say single unit, we’re somewhat deceiving ourselves because we’re typically looking at several units and sub-units of analysis, but we’re supposed to have a primary unit of analysis in mind (e.g., a specific programme intervention and location with an anticipated process towards an outcome over a concrete period of time). Stake and George and Bennett remind us that case study research isn’t sampling research; a case isn’t necessarily supposed to be representative of all potential cases. And we often get stuck in searching for representativeness with a capital R.

There’s often a preference for supposedly representative cases (due to overhangs from, quantitative, variable-based approaches social science), yet true case representativeness is rather a nonsense because no two cases are truly identical. I myself am an identical twin, so I know from experience how small differences in context and treatment can make a big difference. More broadly, countries or sub-national units may be similar but they’re also unique. Also, programmes (or projects) aren’t designed or implemented in exactly the same way in different contexts, and those contexts obviously aren’t truly identical either. So, drawing these boundaries is tricky.

Partly for these reasons, we’re often reminded that it’s difficult to generalise from case studies, and when we do so, we’re told that these need to be bounded with contingencies clearly outlined. This is sound advice. However, Flyvbjerg’s paper shows that it is possible to generalise, to some degree, due to the strategic selection of cases. For Flyvbjerg, typical or average cases don’t tend to have the richest information. Atypical or extreme cases often have much richer information which can help to reveal mechanisms of change.

Flyvbjerg’s table below is a useful illustration of relevant case types and what they can help you to achieve:

Critical cases (or “black swans”) can also be really valuable, at least in theory. If, for example, a favourable case can be proved false, then what we’re studying would most likely be false for intermediate cases. The problem, of course, is that finding these “critical” cases isn’t easy, and in evaluation, in particular, you’d rarely have the time or resources to identify the perfect case. Which is the case that would (in)validate all others? Is there such a case? Is there really this level of homogeneity?

In complex international development programmes, it’s possible that you could identify a highly promising (and hopefully best) case which still failed to trigger a mechanism, demonstrating that the intervention was a failure for that anticipated outcome, and thus that it probably wouldn’t work elsewhere. Probably is doing some heavy lifting there unless you’ve really found the critical case.

For example, colleagues and I evaluated Village Savings and Loans Associations (VSLAs) in Côte d’Ivoire as part of a wider Process Tracing study for CARE International (my previous employer), where the project team hoped that VSLAs had enabled women’s collective action to advocate for their needs in cocoa growing communities. Côte d’Ivoire was one of CARE’s star cases but not the star case (Niger was the star). When we looked at the process, we didn’t find clear evidence of collective action. If we had more time and resources, might we have found some more evidence? Maybe. But, as the evaluation was implemented with the project team itself, it’s reasonable to believe that we were guided to the best sites where they thought they would find this evidence. Thus, it was perhaps an accidental critical case. Verification bias is one of the myths Flyvbjerg addresses in his paper. The above example shows that verification is by no means guaranteed. Yet, had the project team believed that we wouldn’t have found much evidence, wider organisational politics might very well have gotten in the way.

Does this mean that VSLAs don’t or can’t enable collective action? No. There’s plenty of evidence of this in Niger. But, our brief study should temper CARE’s claims of large scale impact from VSLAs as an organic means to trigger collective action and the external validity of a model to places where both contextual conditions and interventions are, in practice, less similar than one might think. It also suggests that CARE’s work in Niger is a outlier case rather than a potentially representative case and thus less scalable than CARE might want to admit.

Appropriate methods

The third challenge is finding appropriate methods and tools within the case study. The case study isn’t really a methodology, but rather an approach within which methodologies feature. Case studies aren’t alone in this, at least not in evaluation. Contribution Analysis is generally considered an approach within which we choose methods. Realist Evaluation is also considered an approach rather than a method. In this sense, the case study provides more of a framework than a concrete set of methods or tools to design, collect, or analyse evidence. Indeed, as colleagues and I suggested in a paper in 2021, various complexity-aware evaluation methods such as Outcome Harvesting and Process Tracing can also be placed under the umbrella of case-based evaluation. So, there are plenty of complementarities.

Yin tells us that due to its richness, a case study cannot rely on a single data collection method, and thus we should deploy a combination of methods that yield the most fruitful insights in response to the questions we have. This is more of a good practice recommendation beyond case studies. And, in reality, hardly any good research or evaluation relies on a single data collection or analysis method. Beyond data collection, Marina Apgar and I have also argued that bricolage can help you to choose the best bits of case-based evaluation methods to enhance rigour.

The right fit evidence

The fourth challenge I wanted to discuss is finding sufficient evidence with high enough “probative value” on causal mechanisms. When Yin appeals to methods to gain the most fruitful insights, this is only half the battle. Assuming you can navigate time, power, and resource constraints, you’ve chosen the right case(s), and have the right data collection tools, you then need to actually gather that evidence. But, you can’t guarantee you’ll find everything you’re looking for from either primary or secondary data sources.

One common critique of case studies is that they make inferences on few observations — the classic King, Keohane, and Verba (KKV) type misunderstanding which confuses case and observation. While the number of case studies is typically small, they tend to have a large number of causal process observations within cases.

As a side note, I also don’t think we need to spend much time addressing Campbell and Stanley’s hyperbolic critique related to the absence of control, especially given that Donald Campbell later revised his view, as Flyvbjerg discusses.

Notwithstanding, the brute number of observations is actually a poor basis on which to make inferences in case-based approaches anyway. Beach and Pedersen discuss these issues at length, so I won’t go over them here. I’ve argued previously that the “probative value” of that evidence is what matters. Simply having a large quantity of poor quality evidence on causal mechanisms doesn’t much help. Yet, a relatively small amount of good evidence (i.e., high relevance and high uniqueness/distinctiveness) does. The same applies to the number of cases; a large number of cases with poor evidence is obviously worse than a small number of cases with good evidence.

The problem is that the evidence I’m searching for isn’t always available or accessible. There are often sources I would like to consult but which I can’t access due to language or administrative issues. There are people I’d like to interview who could shed light on a key part of the process and the connection between interventions and outcomes that aren’t available or don’t have the time to talk. This means that you have to rely on a greater quantity of evidence with lower probative value. I’ve yet to find a way around this. I just dig harder in the hope of finding smoking guns in the straw. Otherwise, I guess you just write a longer list of limitations.

Interaction effects

And the final challenge I thought it was worth discussing is interaction effects between context and the intervention (or key factors) you’re studying. In theory, this is one of the great strengths of case study research/evaluation. As Yin reminds us, including context is a major part of a case study. We choose case studies because we want to cover contextual conditions and when a phenomenon (e.g., project or programme which contributes to an outcome) ‘is not readily distinguishable from its context.’

Both Yin and Stake note that the aim is take account of complexities in the case and to explore the interactions between factors considered to be causal (e.g., psychological factors, social factors) and the outcome under study. Widner, Woolcock, and Ortega Nieto refer to a case study’s capacity to ‘elicit the strategic structure of an event — that is, to capture the interactions that produce an important outcome.’ I love that expression. They rightly point out that various outcomes are “conditioned” (or moderated) by factors like income levels or geography, whereas others may be “crafted” — ‘the outcome is the product of bargaining, negotiating, deal-cutting, brinkmanship, and other types of interaction among a set of specified actors.’ Suffice to say, this isn’t easy.

To some degree, consistency analysis or Qualitative Comparative Analysis (QCA) could help to identify relevant conditions and can also test these (Beach and Pedersen include an interesting discussion). But, rarely have I had the time to do QCA surreptitiously as part of an evaluation. I also have colleagues who wanted to do this recently, but didn’t really have the time or budget to do so. To a large degree, the same is true for Realist Evaluation, which is very time consuming, and Process Tracing which is similarly onerous. As I’ve heard Bob Williams remark, we need much bigger evaluation budgets to do this kind of thing properly, and I agree. Otherwise, we need to be more humble about what can be claimed or proven in a short period of time with limited resources.

While some of these challenges may be daunting, discovering how and why change happens through case studies is one of the most interesting and rewarding things you can do in evaluation. And with Flyvbjerg’s achievement as yet another important milestone in this case, I hope you can see how valuable case studies can be in helping us to understand the world a little better.