Evidence portals to another world?
The promise of a Development Evidence Portal
Some time ago, I read a very grand sounding blog: “A classification of interventions and outcomes for international development evidence.” This blog included an equally impressive sounding “Development Evidence Portal.”
I originally wrote a version of this in January 2022 and shared it with the International Initiative for Impact Evaluation (3ie). I had some thoughtful responses from one member of the 3ie team and made a few amendments accordingly. But our correspondence ended there in December 2022, so I hope they will consider publishing a response.
A Development Evidence Portal sounds like exactly the sort of thing policymakers would find useful. Who wouldn’t want to take a short cut to reading dozens of studies and trust a reputable impact evaluation organisation to tell you “what the evidence says” and judge for you regarding which are good studies you can trust? As the blog asks, “have you ever had to scan a vast number of papers manually, just so you can find the evidence you need?” I first looked at it when I was working on a Foreign, Commonwealth and Development Office (FCDO)-supported Fund and was keen to know whether it might be useful for them.
A few months ago, Ruth Levine seemed convinced by its promise:
“Thanks to decades of research and evaluation across many sectors, and an increasing volume of high quality studies from investigators who have both subject-matter and contextual expertise, USAID program designers can focus resources with a lot of confidence about the likely results. Just look at the Development Evidence Portal at the International Initiative for Impact Evaluation to see what I mean. Evidence on intervention effectiveness (and cost-effectiveness!) is a click away, and it is highly applicable to the types of programs USAID funds.”
OK, let’s say I want to know what the evidence says on “accountability” — a sector I work in regularly, and which Levine herself knows well. When I typed the word, I got 153 results. But I didn’t have time to look through 153 results, so I refined my search by intervention types. What I was really interested in was citizen-driven accountability (or social accountability interventions), but many practitioners use a variety of related terms — a potential problem for an evidence portal. Nonetheless, 3ie’s vision is that, over time, they will be able to:
“Develop a commonly-accepted taxonomy of interventions and outcomes that provide a standardised vocabulary adopted by researchers and practitioners alike when describing interventions and outcomes.”
They’re quite right that this is a very ambitious project, and at the end of the blog they asked for feedback. So, here we go.
When I used the intervention typologies, I saw a jumble of different things at different levels. I found things as expansive as “civic engagement initiatives” (a common synonym for social accountability). There are literally thousands of these in practice, but my search (rather implausibly) found that there were only 5 eligible studies. These sat alongside micro-specifications of interventions such as “community scorecards” (of which there was implausibly only 1). There are hundreds of scorecard interventions worldwide, and some of these (with null effects) have been highly consequential in recent evidence debates in the sector). Then I added in search terms like “community accountability initiatives” to broaden my search a little. In fact, in using the Portal, given the problematic coverage of terms, I found that it was easier to exclude what I didn’t want than search for what I did. I wasn’t interested in things like “formal credit to farmers” or “infrastructure subsidies” (both of which are very unusual search terms for “accountability”).
Systematic review of evidence reviews?
What eventually came up for me were 25 impact evaluations and 5 systematic reviews. Levine’s blog pointed out that systematic reviews that summarize lots of studies are particularly valuable.’ There’s not necessarily any special value to systematic review vis-à-vis other evidence reviews, but let’s pretend as if there were and see what the Portal came up with. 5 systematic reviews is not a lot on the surface, but there actually aren’t as many as 5 systematic reviews labelled as such in the sector (there are only a few — one from The Campbell Collaboration, another by Campbell and 3ie, and one from the Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI-Centre). So, I was intrigued what these were.
Of the five systematic reviews, two seemed irrelevant to accountability. That left us with 3 systematic reviews: a review of community accountability mechanisms published by the EPPI-Centre (3ie’s systematic review peers), a review of school-based decision-making published 3ie (which was partially relevant) and a realist review of accountability in the education sector published by the EPPI-Centre. Two of these — those not directly published by 3ie — got a quality score of only 1/3. When I checked the details, I found the following:
“3ie has performed a comprehensive critical appraisal of this review. Based on that appraisal, we have low confidence in the results presented in the review. As a result, we have not provided a summary of the review.”
If you look harder, you find a link to a checklist to justify the claim that these were poor quality reviews. I had read all three of these reviews previously. In my view, by far and away, the best; the most rigorous and useful of them is the realist review, which is not a systematic review. It displays the best thematic knowledge, understanding of context, and explanation of mechanisms of change. These are three criteria I would consider important to assessing rigour, even if 3ie does not.
But, then when you look at the checklist, you start to wonder why this realist review was ever included in the first place. Given that realist reviews aren’t systematic reviews but employ theoretical sampling, and do not focus on things like effect sizes, it was always likely to score poorly on their checklist by design. Defining quality is not a neutral exercise. We all have our biases. I was hoping for an explanation of why the EPPI-Centre studies were deemed to be poor quality, which caused me (rightly or wrongly) to question the transparency of the Portal in how it was appraising studies. But I wasn’t able to find these appraisals for all reviews.
3ie kindly helped me locate the study appraisal for Westhrop et al.’s realist study. I didn’t agree that limitations such as only including English studies were grave (because the vast majority of studies in the accountability sector are in English anyway), but I’d agree that the apparent lack of independent screening (if true — though very likely false, as I have spoken to two of the study authors) and not reporting the quality criteria for assessing studies ought to be relevant. However, the latter criterion rather betrays a limited comprehension of how realist reviews appraise evidence.
I also think that the summary of study transparency, methodological, sectoral, and geographic information would save me time in locating relevant studies to read, though they wouldn’t sway me in terms of the apparent rigour of studies.
One of the criteria to assess the quality of the systematic reviews in the checklist was the following question: “was the search for evidence reasonably comprehensive?” Perhaps this is a test we should apply to the Evidence Portal itself.
In 2020, I did a review of reviews in the sector to look at the potential importance of sanctions in social accountability programming. Here’s the long list of evidence reviews: Rocha Menocal and Sharma, 2008; Gaventa and Barrett, 2010; McNeil and Malena, 2010; Hanna et al. 2011; McGee and Gaventa, 2011; Fox, 2014; e-Pact, 2016; Molina et al. 2017; Waddington et al. 2019; Tsai et al. 2019; Kosec and Wantchekon, 2020. I’m not suggesting that all of these reviews are of stellar quality. Yet, not one of these reviews, and neither of the two most relevant systematic reviews, came up in Portal when I searched for “accountability” in January 2022.
As I later found out from additional searches, this didn’t mean that all these studies were necessarily excluded from the Portal itself, but the point was that they didn’t come up when I searched.
In addition to these, and the two education sector reviews mentioned above, there was also a recent review of accountability in the water sector by Water Witness and even a review of reviews in the health sector. Both were missing. Why these 13 reviews were to be excluded when a realist review (which was clearly not to 3ie’s taste) was included was something of a mystery.
What was even more curious is that my search in the Portal failed to include the most recent and most relevant systematic review by 3ie themselves on Participation, Inclusion, Transparency, and Accountability (PITA), co-authored by 3ie’s current director (Waddington et al. 2019).
Good coverage of impact evaluations ?
We now come to the individual studies. My search in January 2022 revealed 25 impact evaluations. As mentioned previously, this isn’t a lot. There were also several names that were conspicuously absent from the list. The accountability sector has a lot of experimental studies (a clear preference at 3ie), including by Nobel laureates Esther Duflo and Abhijit Banerjee. It seemed strange that the parents of the method with which 3ie is so enamoured didn’t come up in my search. Both authors were cited in 3ie’s PITA review and indeed in Westhorp’s realist review mentioned above — so we know someone in 3ie was aware of these studies. Yet, neither laureate appeared in my search. I did a quick search sideways and found as many as 60 entries for Esther Duflo in the Evidence Portal. So, it seemed that something had gone wrong. I did, of course, point this out to 3ie.
Of the 25 studies included, 3 were from the same Randomised Control Trial (RCT) of a CARE Malawi project— The Maternal Health Alliance Project (MHAP) — published in different journals. There was another study from CARE Ghana, the USAID-funded Ghana Strengthening Accountability Mechanisms (GSAM) project. On the surface, it seemed strange that close to a fifth of all impact evaluations in my search were of the interventions of a single organisation — my previous employer. I happened to work on both these projects. Think, for a second, how unlikely it is that one person could possibly have worked on the projects for nearly a fifth of studies included.
I did some research for MHAP alongside the RCT and I was the proposal writer for GSAM and supported the team to do a formative evaluation at the same time as the RCT. In my view, the MHAP studies were generally good quality, but as I’ve explained previously (and in my last blog), there were many problems with the GSAM study. Despite these problems, it was held up as study in USAID’s anti-corruption evidence week last year. 3ie was commissioned by USAID to produce an evidence gap map on good governance for USAID (you can find GSAM here — in my blog, I suggested why it was likely included — it has little to do with being a good study).
It’s also surprising that when given the opportunity to include “Community-Driven Development” (CDD) in my search, I had only 1 result come up, although there were actually 2 studies of CDD in the search retrieved in practice. There are hundreds of studies of CDD. There’s even a review of CDD which 3ie did themselves. So, how this didn’t come up wasn’t immediately clear to me (3ie later explained that as an evidence synthesis it didn’t fit their criteria).
By my estimation, perhaps half of the 25 studies I found were what I was actually looking for. Again, I didn’t want to know about safety net scholarships in Indonesia, nor a petroleum revenue management act in Ghana, did I? So, there seemed to be some issues to work out with intervention coding. 3ie can explain this, but it’s not altogether satisfactory, in my view.
Other, perhaps better known, sectors (e.g., social protection) seem better defined, and hopefully therefore, more useful as a result.
This apparent unevenness, however, suggests that 3ie needs to factor in sectorial expertise explicitly for more conceptually contested sectors like accountability (I understand that they did have this for social protection). Ultimately, if it’s not clear what we’re assessing in the first place, a Portal will inevitably be of limited utility. I appreciate that the wider transparency, participation, and accountability sector may be difficult to define, but if the Portal is to live up to its lofty aspirations, it would do well to work on definitions, perhaps in a similar way to 3ie’s own PITA review.
Gold-plated standards?
Then, of course, there comes the perennial question of methodological preferences towards “gold standard” methods. At this point, I’m a broken record on this issue. Much as 3ie has argued that it is methodologically pluralistic, in practice, there remains a defiant hierarchy of methodological designs, with only the occasional, half-hearted, acknowledgement of why qualitative studies are also (sort of, but not really) important. Especially in a sector where experimental designs are generally a poor fit for designing or assessing the programming, tolerance and pluralism are a must.
As might perhaps be expected, we see the words randomised, control, and trial all over my search results. Jonathan Fox has argued, yet again on 3ie’s own website, that we should be rethinking the lessons from field experiments. He even cites a review of 48 studies (Kosec and Wantchekon, 2020) which was also absent. Given that 3ie’s evidence gap maps are now somewhat recommended to consider qualitative studies and process evaluations (and others such as The Campbell Collaboration, London School of Hygiene and Tropical Medicine — LSHTM, and several other evidence clearing houses are including these in their gap maps), it’s worth asking why this remains such a conspicuous absence from the Evidence Portal? A lack of funds or time would be deeply inadequate answers.
So, did the Portal save me time? Yes. Was it useful? Not nearly as useful as it could be. While an Evidence Portal is a great idea, and we should applaud the ambition, in its present incarnation, I do not have a “lot of confidence” (like Levine does) and I would not recommend it to USAID or the FCDO in the sectors I tend to work in. I think it currently provides a misleading impression of “the evidence.” Though, its flaws are eminently fixable. Thus, I hope that 3ie will take my feedback on board and make the necessary adjustments so that I could recommend it with greater conviction and confidence in the future.
If you have any comments on the portal yourselves, send an email to info@3ieimpact.org