Scale up in time: How we evidence process and context

13 min readFeb 2, 2022

Written with Florencia Guerzovich

This is the last post of a 5-part series about scale in social accountability, and more broadly the transparency, participation, and accountability (TPA) sector. We started by discussing what scale looks like in the real world. We explained how, in our view, there is no single pathway to scale, or quoting Byrne (2013: 1) “different mechanisms [or paths can] produce different outcomes and… [the] same outcome… the search for the — that is to say universal, always and everywhere, nomothetic — model that fits the data” is in vain. Ultimately, no single model fits the data. We then presented multiple paths through which scale up may happen (resonance, resistance, and best practice). Lastly, we focused on a handful of contextual factors that may help practitioners to understand under which conditions each path might be a better bet.

In this post, we argue that politics in the transparency, accountability, and participation field happen in time. There is more to politics and change in time than black and white debates about short and long term support. Different ways to grapple with temporal contexts (time horizons, sequences, feedback loops, gradualism vs. shock therapies, etc.) provides one concrete way to grapple with how agents contribute to scale through different pathways in complex, uncertain systems that make up a world of greys and mixed results.

As we’ve discussed previously, and as outlined in a recent paper on Monitoring and Evaluation for Thinking and Working Politically, traditional approaches and methods fit poorly in rising to the challenges of evidencing pathways to scale. For those interested in digging deeper our new paper will discuss methodological issues at greater length (when published). We also wrote a range of posts on more detailed methodological issues, portfolio MERL, and the political economy of evidence in the TPA field that cover issues that we don’t cover in full here (you can find them in our Medium accounts @thomasaston and @florcig).

It’s not all or nothing

Scale in social accountability and other participatory work is a fuzzy term — with practitioners having different expectations about what it looks like in practice. In response to our posts, Giulio Quaggiotto from the United Nations Development Program (UNDP) linked the discussion to Gordon Tulloch’s problematization of the concept and Derek Thorne of Integrity Action suggested to first ask “what are we scaling? (can be more than one thing)”, while Varja Lipovšek from Co-Impact reminded us that different problems need different “scale” and that some may not want to get there. For those who do, scale is hard to achieve and also, hard to research. Scale has been called the Achilles heel of social accountability and various other participatory approaches. This is an issue practitioners grapple with often — amidst expert contests for the “best” recipe to fight this uphill battle.

Social accountability processes often contribute to outcomes at the frontline, but interventions are often complex, as indeed are the contexts in which they are implemented. The quality of implementation also matters too much to be able to simply replicate them at will and expect the same results across many cases. So, while it is potentially easy to see best practice replication because it’s clear and visible, whether it actually happens is another matter entirely.

Finding resistance can be difficult to evidence in terms of decision-making processes (i.e., how pressure forced someone to change their mind). For instance, a common mechanism of change comes through “naming and shaming.” Civil society organisations commonly assert that they have forced the government to change their mind. Sometimes they have. Yet, often, this is an assumed rather than empirically demonstrated connection. It is difficult to see that change in attitude, position, and expression behind closed doors, and it is usually not in decision-makers’ interest to admit that they were wrong and explain who changed their mind.

Even when a social movement campaign, for instance, obtains a policy win at scale, embedding an intervention type in a program with a budget, we cannot necessarily check the box of a transformational change. After all, many such changes aren’t necessarily implemented downstream.

Removing rose-tinted glasses, embracing complexity

Shift (intentionally) happens over time even if no one gets 100% of their way all the time. For example, the American Civil Rights and the South African anti-Apartheid movements are often cited to inspire expectations about abrupt, wholesale institutional change. Yet, in South Africa, “as in many countries, a fairy-tale climax has not been the end of the story”. Scholars of American Political Development have also shown that in complex systems, the new (civil rights) can coexist with the old (e.g., labour rights or the filibuster) and produce less than perfect change. Various scholars have explained how hard fought legislative compromises and constitutional clauses invite stakeholders to continue contesting their interpretation through administrative rules, legal doctrine, and practises. Much of this action happens after attention moves to the next thing. It’s under the radar, until a new headline grabs our attention.

Indeed, the very premise of James C. Scott’s seminal book Weapons of the Weak: Everyday Forms of Peasant Resistance is often forgotten:

“Revolutions — are few and far between. The vast majority are crushed unceremoniously. When, more rarely, they do succeed, it is a melancholy fact that the consequences are seldom what the peasantry had in mind.”

Wins and investments aren’t necessarily self-sustaining over time — survival often means that they are actively adapted to changes in the complex systems in which they are embedded. Observers of US politics know that the sites that may have helped fill ambiguity with progressive views at a point in time, might do the opposite over time (or vice-versa). They might be scaled back, or even cut by a new government administration. The same applies to accountability processes; these may be on paper only and only be implemented and/or enforced down the line or after a long period of time. Or they might be partially rather than fully routinized. When it comes to institutional development, of which scale up is one form, as Kathleen Thelen points out, “elements of stability and change are inextricably intertwined” in collective, imperfect constructions over time.

Paul Pierson’s (equally seminal) Politics in Time makes the case for taking temporal context seriously in our analysis of complex social and political processes and outcomes. When researchers strip events of processes from their temporal context — take a “snapshot” view of complex social and political dynamics instead of paying attention to the “movie” and the patterns that emerge in temporal dimensions — we pay a price. Placing TPA and its politics back in time (e.g., timing, sequencing, time horizons, feedback loops) can help us greatly to improve the theories and methods we use to explain them. When researchers removed the blinkers of “big pushes,” “shock therapy,” or “big bangs,” they found gradual development in the UK Parliament, the Brazilian universal health system, Indonesian governance, constitutional history in Argentina, the political regimes of mediaeval Genoa and Venice, and anti-corruption institutions (here and here), among others.

Yet, most research in the TPA sector and its conception of scale has essentially ignored notions of gradualism. Instead, the preference has been for “inspirational” cases of success. “Sampling bias” and limited external validity is sometimes acknowledged explicitly, but these caveats are quickly forgotten because we so desperately want the “few and far between” transformations to be true everywhere. This is, if you wish, the hope of something we might call “outlier scaling.”

As noted previously, it is only recently when forced to look at less inspirational cases that the “air of optimism” has diminished, and some of the very same scholars have now made an apparent volte face through their research into (relatively) fragile and conflict affected settings. Now, incremental, “transitory,” piecemeal, and “intermediary” changes have finally been deemed worthy of our attention.

As the long-ignored 30-year-old political science literature we cite above shows, this is not merely the case of the most fragile and conflict affected settings. “Partial, uneven, and fragile progress” is common and “can be consequential for patterning human behaviour and for shaping political outcomes.”

What we call “best practice” and “resistance” pathways, however, both leave a substantial gap in understanding what many such incremental processes to scale look like in practice.

The challenge of evidencing resonance

As we discuss in our new paper, we believe that the resonance pathway has the potential to enable scale up processes in a significant number of contexts, even if there are far fewer documented cases of this pathway currently out there. There are many reasons for this.

First, it is difficult to find cases of a “resonance” pathway to scale because there are few incentives to show and tell about a seemingly imperfect result. So convinced are we of the hero narrative that unless it looks identical to what we proposed (our agenda in full or our copyrighted tool), we under-report imperfect change. Well done to CARE Malawi for telling their story, despite it not fitting their preferred narrative.

Secondly, and relatedly, as Jim Coe and Rhonda Schangen point out, our ‘too-simplistic understanding of “contribution” is skewing our approach to evaluation.’ We want to claim that our contributions were decisive — that we were the “primary actor” or “lead contributor,” not a “team contributor” or “role player” in a complex system. This has created considerable blind spots for imperfect reform processes where “attribution” is a challenge. As Steve Powell explains using Alfred Barr’s famous diagram of the history of cubism and abstract art shows, a unidirectional arrow implies there is an active “giver” (e.g., the NGO, the social movement, the multi-stakeholder initiative) and a passive “receiver” (the government) Yet, it’s often not quite so simple.

Thirdly, and relatedly, it is difficult for practitioners with funding and reputational incentives to publicly share credit for particular reforms. For instance, back in 2015/16 Tom was involved in piloting an adaptation of CARE Egypt’s Third Party Monitoring Model in the Takaful and Karama cash transfer program in Giza and Assuit — known as “Accountability Groups” (already a radical adaptation of the model). This is another case that would sit squarely in the resonance box. The Ministry of Social Solidarity was enthusiastic, and the World Bank provided initial funding for a pilot as it fit their wider interests to promote citizen engagement in cash transfer programs. We later found out that the Ministry activated 2,226 Social Accountability Committees (SACs) across 24 Governorates. The rationale provided was that expanding social accountability would ensure the effectiveness and efficiency of the program. We wanted to learn more. The first response we got when we asked in the World Bank was that we should speak to Amr Lashin (unaware that Amr was Tom’s ex-colleague at CARE), and Bank colleagues in Cairo said this was an “incredibly, incredibly familiar” story and that CARE Egypt played a “big role.” We then asked a Bank colleague from Washington what adaptations may have been made, and while they recognised that CARE was involved, they forgot the specifics of CARE’s contributions (until we shared the guidance for the groups, which they then acknowledged they had reviewed “many times”) but they did, of course, remember their own contributions. This is likely to be a relatively common story. Most of us remember our own role and forget those of others. Government actors themselves very likely wanted (and deserved) some credit too, and probably proposed their own adaptations.

Fourthly, resonance can be hard to identify because much adaptation happens as the tool, method, model, or approach and/or the key actors evolve through their transmission in the system. This is at the heart of Derek Thorne’s query on precisely what is being scaled. Stakeholders may or may not recognize the model due to relabelling, cropping, and editing. The above is a case in point where the above Bank staffer in Cairo did recognise the model but another in Washington colleague only sort of did. Beyond this, there was staff rotation in CARE, the World Bank, and in the Ministry of Social Solidarity over the period and institutional memory can be a weak spot in TPA. Processes like the one above take years with many actors making potentially relevant contributions at different points in time. Nonetheless, Ruth Mayne and Irene Gujit’s uncovering of what they call “functional scaling” — a process based on functional improvements via iterative adaptations that fit the political context — suggest that it is possible to identify these models at work in programming if we look for them.

Finally, and probably related, the inadequate theorization of the resonance pathway has created obstacles for knowledge production and an evidence gap resulting from the differential speed at which monitoring and evaluation, on the one hand, and practice, on the other, have evolved. The former anchored in a debate among “best practice” and “resistance” pathways, despite the existence of relevant literature in the governance space which has been forgotten, put aside and/or ignored (e.g., we’ll bring back and reconsider the insights on collaboration and confrontation of Fung and Wright from 2003 in a future post). The latter unfolding largely under the radar in what we suspect is a significant portion of TPA interventions which attempt to scale up. We are grateful colleagues such as Jonathan Papoulidis, among others, who are helping us connect literatures and broaden the conversation:

Research and evaluation revisited for complex scale up processes and results

In our explorations of resonance, tangible progress is being made but it’s easy to miss if we’re not looking for it or not open to finding it. So, how do we go about evidencing a resonance pathway? We believe the first step is to be open to learning about the change we seek to make at the edge of practice by embracing greys, multiplicity, and uncertainty. To challenge what we think we know and even what we expect to find and compare our preferred pathway against alternatives. As James Long put it, to cast the net of our exploration as far and wide as we can (“soaking and poking”) before we confirm our biases or close off explorations. In other words, we need evaluators and researchers that look beyond business as usual.

We need to challenge our mental models and expectations; we need greater curiosity and inquisitiveness from evaluators and researchers. This isn’t necessarily an easy step to take, but we think it’s an important one to ensure that important stories of change don’t go untold just because they don’t fit neatly in existing boxes.

Some evaluators have been encouraged and supported to take the plunge and begin exploring how scale happens in World Vision Indonesia’s and Cordaid DRC’s programming, among other places. The preliminary answers from these cases, especially in contrast to other literature that argues that other pathways are a better fit in those contexts, make these countries prime cases to learn about all pathways and their conditions of success. For example, Indonesia is one of the few countries where we have lumpy evidence about transparency and accountability. It’s a country where donors have invested in RCTs and replication in hope of informing decisions about scale (e.g., here and here), but also resistance advocates have found cases that work for their frameworks (here). And yet, when we interviewed Indonesian colleagues (including those supporters of both camps) we began suspecting that an important part of these stories as they happened on the ground — one of resonance — was missing in the papers published and hailed by US-based institutions.

In our forthcoming CEDIL paper, we’ll provide more insights about case selection for social accountability in the health sector. That is: why we think we should prioritise cases such as Indonesia and Cambodia over others to get out of a vicious cycle in which there never seems to be sufficient, good enough, evidence to challenge popular assumptions about scale. We identified pairings of cases that have potential in other sectors such as education and social protection which we do not discuss in the paper but would be happy to discuss.

Case selection, including but not limited to, within and across sector comparisons, is another issue that deserves more attention if we are going to build a stronger evidence base that grapples with complex systems, the world of greys, sometimes and mixed scale up results. Alex Gilles, from the Natural Resources Governance Institute, synthesised the practical relevance of comparing across sectors with a transferability mindset in the first United States Agency for International Development (USAID) Anti-Corruption Evidence and Learning Week:

Investing in knowledge for and with the field

In October 2021, Sam Waldock from the Foreign, Commonwealth and Development Office (FCDO) reflected in a webinar on the value of mid-level theory and transferability, mapping causal mechanisms, including through process tracing to build evidence and learning agendas that are useful for decision-makers going forward. David Jacobstein of the USAID shared in his quarterly “what we’re reading” list a wealth of resources, including some of our own, and suggested a similar direction of travel.

Our work since 2019 mostly happened on the side-lines of our day jobs. We drew on networks we have built over decades of work and called in many favours to check and recheck our intuitions and interpretations. The findings of the resulting paper suggest that we can make significant strides by listening to diverse practitioners from different networks and taking a critical look at the literature.

We don’t think our work is done, even though our seed funding ended with recent FCDO cuts. We have collected insights and evidence about more specific cases than we had the time to write about. We have identified cases where tacit learning needs to be captured and analysed considering the broader framework. We want to think about it further, not just for the programming of the specific colleagues that have reached out to produce work to inform their work (or might reach out).

The papers and blog posts we wrote and are writing on related issues, sometimes with other colleagues, set the building blocks of a revamped monitoring, evaluation, research, and learning agenda for transparency, accountability and participation interventions, portfolios, and the field.

The presentations we made in different fora and Twitter debates we started along the way also suggest that we can have a more diverse and grown-up conversation about the work that informs the next generation of efforts. One donor called these “public goods” for the sector. So, to employ a common recommendation: more research in this area (and funding for it) is needed.