Artemesia Gentileschi, Judith Slaying Holofernes

In February, Derek Thorne wrote an article on how we define accountability, and he cautioned us to be careful about what we call accountability. This triggered a Twitter exchange in which Nathaniel Heller argued “we should abandon the label ‘accountability’ entirely,” as “it’s misleading, empty, and of little use to practitioners.” Nathaniel offered: “punishment” and “answerability” as alternatives. On the other hand, Tiago Peixoto offered “sanction” and “responsiveness.” Jonathan Fox harked back to Andreas Schedler’s definition of “answerability” and “enforcement (application of sanctions).” For my part, I suggested that, in practice, most accountability work is probably about “responsiveness” rather than “answerability” or “sanctions.”

Fox argued that collaborative approaches to social accountability leave out enforcement and noted there are trade-offs in doing so. He suggested that, “in practice, for several large INGOs and aid agencies, seeking enforcement isn’t even on the menu.”

Days prior to Derek’s blog, USAID’s David Jacobstein had written a blog asking: “what is the work” we actually do in social accountability? For me, an important follow up question here is what are we actually assessing in research and evaluation?

MIT GovLab also recently published an evidence synthesis on information and accountability interventions. As part of their synthesis, which included 30 studies of citizen-driven accountability, they proposed the following causal mechanisms:

You should notice that (negative) “sanction” is a common feature. However, the authors note that evidence on causal mechanisms from the reviewed studies is limited and, in their view, we still have an inadequate understanding of which levers on the chain from information to accountability are most effective at improving governance outcomes (Is it even necessary to start with information?). The authors stressed that:

‘Many studies of citizen accountability focus on whether information provision increases citizen monitoring, but few look at what happens as a result of the monitoring — that is, whether citizens also take steps to sanction poor performers or whether higher levels of government sanction as a result of the citizen monitoring.’

The causal assumption in the diagram is that the effective implementation of sanctions is necessary to improve government performance. Despite the fact that social or formal sanctions were mentioned a full 86 times in the review, the authors found little evidence that the threat of sanctions or the effective implementation of sanctions made a difference, due to an absence of evidence.

Only a handful of the mentions were actually linked to empirical data. I was perplexed. If sanctions are key in our hegemonic “carrots and sticks” paradigm of accountability, then why do we appear to have such little evidence of either the presence or effectiveness of sanctions in social accountability work?

Back in November last year, I co-facilitated a side-meeting at the Global Partnership for Social Accountability (GPSA) where many of the 40 practitioners, researchers, and evaluators present did not appear to believe sanctions mattered as much as the research literature suggests, but rather believed what mattered most was building relationships, harnessing bureaucrats’ incentives and having a clearer focus on sector delivery chains (see Guerzovich 2020; Jacobstein 2020). So, I’ve been reflecting on my own assumptions recently.

With this in mind, I took a look at the eleven most commonly cited syntheses, meta-reviews, and systematic reviews since the seminal World Development Report in 2004 from which social accountability emerged (Rocha Menocal and Sharma, 2008; Gaventa and Barrett, 2010; McNeil and Malena, 2010; Hanna et al. 2011; McGee and Gaventa, 2011; Fox, 2014; e-Pact, 2016; Molina et al. 2017; Waddington et al. 2019; Tsai et al. 2019; Kosec and Wantchekon, 2020). I got a colleague, Grazielli Faria Zimmer Santos, to do the same exercise to hold my biases in check. These reviews searched thousands of studies and included hundreds. What we found may surprise you.

Approximating MIT’s semantic field and the views of those in the aforementioned Twitter exchange, we looked for derivations of the words: sanctions, punishment, enforcement, answerability, litigation, responsiveness, and transparency.

We first considered the frequency with which the words appeared in these reviews. This is a pretty shallow test. Yet, what comes out clearly is that these are predominantly reviews looking at transparency and responsiveness, not answerability, or enforcement. Sanctions and punishment, however, do appear quite prominently:

Maybe large INGOs and aid agencies are working on sanctions after all?

We also see that the emphasis across studies varies significantly. While there were 86 mentions of “sanctions” in Tsai et al. (2019), except for McNeil and Malena’s World Bank study (17 mentions), the PITA systematic review (15 mentions) and Fox (14 mentions), the other studies barely mention sanctions at all. “Teeth” was mentioned almost exclusively by Fox (23 mentions), and nearly all references to “punishment” come from Hanna et al’s study (62 mentions). The word “sanction” doesn’t even appear in DFID’s macro-evaluation Qualitative Comparative Analysis (QCA) of 50 social accountability initiatives, nor do the words “punishment,” “enforcement,” “litigation” or “teeth.” So, perhaps DFID projects don’t have enforcement on the menu?

Similar to Tsai et al’s contention, Anderson, Fox, and Gaventa (2020) recently argued that ‘plausibly, the existence of vertical links to higher level bodies that issues could be escalated to, might provide a degree of pressure or threat of sanction that motivates lower level solutions.’ The authors did not find much evidence to support this hypothesis in their review, despite Fox’s suggestion to me that DFID were actually trying harder than others. So, what evidence might there be beyond DFID programming?

We suspected that, like MIT’s study, mention of sanctions in other reviews might also be chiefly descriptive or conceptual. So, we then looked at whether these references made a clear link to empirical data. Across the reviews, it’s clear that most references aren’t empirical. We found just 21 empirical references to “sanctions (less than 2 per study),” 36 references to “punishment” (all in Hanna et al’s anti-corruption systematic review), only 9 references to “enforcement,” 2 references to “litigation,” and 2 references to “teeth.” Indeed, it’s also worth mentioning that a high proportion of these empirical references are not necessarily social accountability initiatives, as they often lack a citizen-driven component (especially in Hanna et al’s study). There were, however, dozens of studies which assessed transparency and responsiveness. The number of empirical references is below:

One reason “answerability” is missing is perhaps because it’s deemed too unimportant to evaluate, yet perhaps the reason for such interest in “responsiveness” is because it’s often the clearest indication that duty bearers are doing something, and (probably) because responsiveness is still too big a concept.

Given the difficultly of pinning down sanctions, Fox (2014) previously urged us to expand the definition of sanctions to the concept of “teeth.” Expanding the range of government action beyond sanctions is important, especially given the ostensible dearth of evidence on sanctions and their effectiveness. Fox defines “teeth” variously as the ‘capacity to respond to voice,’ the ‘capacity to sanction’ and as ‘clout’ more generally. ‘Clout’ encompasses institutional responses, following citizen recommendations, investigating and verifying complaints and grievances, deploying preventive measures, and shaping incentives structures to discourage abusive or wasteful behaviour. “Clout” thus appears to include an enormous potential range of government action, yet this also makes it a difficult concept to evaluate.

A number of potential reasons stand out for why “teeth” features so little in recent reviews. “Teeth” is wrongly equated with only sanctions by many readers. So, its relative absence likely has much to do with what we know (or don’t know) about social or formal sanctions.

First, it’s been argued that many organisations are risk averse and thus avoid working on sanctions. This is a common criticism of some aid agencies and INGOs, one which my own experience suggests may well be somewhat exaggerated (but not unfounded).

Second, limited evidence on sanctions partly suggests that few efforts have been effective at building ‘capacity to sanction.’ This may be because there are few organisations prepared to take the perceived risk. It may be that organisations have tried and failed, or it may be organisations don’t believe that efforts to “name and shame” or to trigger formal punishments will be effective in their context.

Third, ‘capacity to respond to voice’ likely has a significant overlap with responsiveness. So, some references may well be hidden. We frequently see extremely vague (and opaque) references to the influence of citizen “pressure.” There are a lot of assumptions here which are rarely unpacked, but perhaps we will find evidence for the effectiveness sanctions there.

Fourth, threats of sanction are often themselves hidden because they are implicit. So, even if they are present, they may sometimes seem absent. This last one is something of a get-out-of-jail-free card. Are threats of social sanctions impossible to evaluate? No, I don’t think so. However, all of the above makes investigating sanctions a difficult task.

For me, the question here is not whether “teeth” matter, but rather which teeth matter? Does ‘capacity to respond to voice’ (i.e. capacity to deliver — molars) matter more than ‘capacity to sanction?’ (canines) And, under what circumstances might different teeth matter?

One key step to understanding this better is to take a closer look at the evidence on sanctions. We’ve identified a few dozen potential cases, but we want to make sure we haven’t missed anything important. So, please share your best cases with us.

I'm an independent consultant specialising in theory-based and participatory evaluation methods.