The dilemmas of equipoise
The Australian Centre for Evaluation (ACE) recently published Randomised trials in Australian public policy: a review. It’s an interesting review for several reasons. It offers some interesting data on the scale of Randomised Control Trials (RCTs) — 369 in the last 35 years, and the dramatic increase in the volume of RCTs since 2017 (largely explained by behavioural insight RCTs).
What caught my eye though, was the pre-emptive nature of the publication. It was clearly written as much for its critics as its champions. One of the authors, Harry Greenwell posted the following:
‘Before you shout “There is more to evaluation than just randomised trials”, please read the section on p6 entitled … “There is more to evaluation than just randomised trials.”’
Of course, Greenwell is right, and the phrase “gold standard” is nowhere to be seen in the publication, even though the Minister responsible for ACE himself wrote a book entitled Randomistas. And, in general, I think it’s a helpful publication that can hopefully guide evaluators to design more appropriate trials.
One key area the review addresses is ethics. The authors suggest that ‘a well-designed randomised trial can be highly ethical form of policy evaluation precisely because it provides such robust evidence to inform decision-making.’ It’s unclear to me why the robustness of evidence makes a study design ethical. Nonetheless, they argue that “the foundation for establishing that randomisation is ethical, in a particular instance, is the principle of equipoise.”
Equipoise is the “state of genuine uncertainty on the part of the clinical investigator regarding the comparative therapeutic merits of each arm in a trial (Freedman, 1987: 141).” This uncertainty is relative rather than complete.
Essentially, you need to establish that it’s justifiable to withhold something from a group who would otherwise be eligible for it. Generally, researchers and intervention designers believe that their intervention will make a positive difference. Yet, particularly for innovative interventions that haven’t been tested (rigorously or not), they often don’t know for certain that their intervention will make a net positive difference.
Indeed, withholding treatment isn’t always harmful. As ACE puts it, ‘if a program is not effective, there may be saved time wasted and inconvenience from not being included.’ Particularly for low dose interventions (e.g., cash transfers with very low benefit values) or nudge interventions, the hassle of receiving the treatment may sometimes be outweighed by the time or conditions associated with receiving it. Furthermore, sometimes interventions, whether RCTs or not, can do more harm than good. So, there’s almost always some degree of uncertainty. The question is what level of uncertainty and what level of benefit or harm, and how we ascertain these.
Michel Abramowicz and Ariane Szafarz wrote an important book chapter on the subject, Ethics of RCTs: Should Economists Care about Equipoise? In the chapter, Abramowicz and Szafarz discuss the history of the concept and why we should take it seriously. They argue that ‘the most basic problem, and perhaps the most challenging one in the field, is how to prove that a given trial satisfies clinical equipoise.’ In short, it’s easier said than done.
Setting the bar at the right level
My biggest concern here is how we balance our knowledge and ignorance of costs and benefits. Researchers tend to believe that the intervention they designed will be effective and will almost always have incentives to publish a study. Equipoise demands some reasonable threshold of ignorance of the merits of an intervention. In one sense, this means there must be some level of doubt regarding whether the intervention will have the desired effect, but just as important is whether positive effects might reasonably outweigh negative effects. One infamous case comes to mind. On the 11 November 2018, a study was registered entitled Stay Connected: Encouraging Water Service Payments in Nairobi’s Urban Slums. Several of the researchers are well known and respected in the field, including the lead author of Impact Evaluation in Practice.
We’re not supposed use the word “slum” anymore, but let’s brush optics to one side, shall we. It’s only downhill from here.
The RCT tested the impact of disconnection notices to several thousand non-paying customers. In a nutshell, landlords regularly defaulted on monthly water fees, and the Nairobi Water and Sewerage Company had already resorted to what it calls “micro-rationing” to address the issue. Media reports suggest that, in fact, ‘many people stopped making payments on their connection loans out of frustration at water that flowed only a few hours one day per week, if at all.’ But, either way, the company had a problem to solve, and the researchers (in their view, if you read their ethics statement) came to the rescue.
The experiment was designed to test whether cutting off tenants’ water would incentivise landlords to pay and thus “improving revenue collection efficiency” for the Nairobi Water and Sewerage Company. Incredibly (a word I will use a lot here), the study was approved by an Innovations for Poverty Action (IPA)’s Institutional Review Board and Maseno University Ethics Review Committee.
The experiment identified customers who were behind on their water connection loan payments, divided these randomly into treatment and control groups, and disconnected the water at treatment properties, but not at control properties. The conceptualisation of informed consent for tenants is particularly questionable here, as they surely would have denied consent if they were adequately informed. Essentially, tenants lost their water if landlords failed to pay the bill. 299 compounds received disconnection notices and 97 compounds were eventually disconnected for failure to pay. Unsurprisingly, the punishment of disconnecting water had a large positive impact on repayment rates.
9 months later, water coverage was 3.6 percent lower in treatment versus control households. So, presumably some tenants had no water for that length of time. Though, incredibly, the authors reported no negative effects. The finding section is full of contradictions such as ‘during service interruptions’ (their euphemism for cutting off water) tenants could buy water from private vendors, but miraculously they did not spend any more money on water. As someone who regularly bought drinking water from private vendors in Santo Domingo, I can tell you with confidence that I paid more. Similarly, the authors’ claim that relationships between landlords and tenants did not become more strained is also unbelievable. If you lose your water connection, you will complain more frequently until the problem is fixed (I suspect the issue lies in how they asked the question — see Table A1). Amazingly, in the revised paper, the authors even say their inclusion of the tenant-landlord relationship into outreach efforts was a way to “strengthen bottom-up accountability.” You really couldn’t make it up.
Whatever the reported effects (which I think are highly dubious), this all misses the point on the predictable harms of an intervention, whatever the potential effects on improving repayment rates, and potential sustainability of service provision. For anyone who has suffered not having water for sustained periods of time (I have), you know harms are possible within hours, let alone days, weeks, or months. As one person interviewed said:
“We don’t have water and water is life. So, how can you say it doesn’t harm anyone, how, how?”
It doesn’t make any sense, does it? If indeed there are really no negative effects from cutting off tenants’ water, the implications are enormous, and dystopian. Why don’t we try not feeding people (for up to 9 months)?
Unsurprisingly, when the study was published two years later, there was uproar.
Numerous critical media articles followed:
- A new research experiment in Kenya raises questions about ethics
- When economists shut off your water
- How poor Kenyans became economists’ guinea pigs
The authors penned “A Comment on the Ethical Issues Twitter Discussion in “Enforcing Payment for Water and Sanitation Services in Nairobi’s Slums.” In this, the authors blamed a culture of non-payment, what they insinuate to be the questionable ethics of the Nairobi Water and Sewerage Company, claiming that they ‘worked with Nairobi Water to develop a less disruptive / costly alternative intervention.’ They even suggest that those in the control group were saved from the potential harm because of the study. At no point did they acknowledge that cutting off tenants’ water because their landlords were unwilling to pay might be unethical. Nor did they (publicly) consider withdrawing from implementing the intervention/study, despite their (declared) concerns over the disruptiveness of that intervention. These were all known before the study was conducted, and were ‘discussed extensively with the external Human Subjects review IRBs.’
The term “equipoise” is nowhere to be found in Impact Evaluation in Practice. But, the authors note that the ‘most basic ethical principle in an evaluation is that the delivery of interventions with known benefits should not be denied or delayed solely for the purpose of the evaluation.’ Thus, they are well aware of the concept of equipoise. They even refer to the Hippocratic Oath and the principle of do no harm.
The updated paper included a lengthy section on transparency and ethics. The authors cited The Belmont Report ethical principles of respect for persons, beneficence, and justice. One article highlights some of the ethical acrobatics, and points out the dubiousness of tenant consent to the potential of disconnection because they signed a water contract prior to the intervention. There are plenty of other questionable choices, for anyone who wants to read more. In my estimation, their justification is essentially that they were less unethical than the water company who had a “sub-optimal government intervention.” In essence, we saved the control group from the damage of the intervention we co-designed. And the benefits? Robust evidence that would help utilities direct efforts towards more effective and less costly interventions. Of course — evidence! Costly for whom? one might ask. And, they did include the term “equipoise” — they claim that there was genuine ex ante equipoise regarding the costs and benefits of interventions tested. Yet, again, no mention is made of the potential harm of cutting off tenants’ water. This harm is abstracted to the effectiveness of the service in general.
So, the question here is how such a study was approved when the provision of acute harm was explicitly built into the intervention design?
I’m not saying interventions should never employ sticks (economists call these “incentives”), but you should be very serious about whether the long-term benefits likely outweigh the short-term costs.
As Abramowicz and Szafarz suggest:
“Often, previous field knowledge makes negative outcomes predictable with some degree of confidence. In such cases, the lack of equipoise associated with the specific risks to which disadvantaged people are exposed result from either the indifference or the insufficient field experience of the experimenters.”
Abramowicz and Szafarz cite Veatch (2007: 182) who claims that:
“It is not anyone’s equipoise that is morally critical; it is whether the potential subjects consent to be randomized without being unduly coerced, manipulated, or exploited.”
Given that we can confidently rule out a lack of field experience of the (very experienced) researchers, indifference to the potential harms for tenants seems to be the only believable reason the study went ahead, and continued, if you read their ethics statement V2. In brief, they took the side of the water company over the treatment group. It would be hard to argue that the tenants weren’t, in some way, exploited through the study to get their landlords to pay so that the water company could recover higher service receipts. These higher receipts would then be used to repay the World Bank who provided the loan and co-funded the study.
With luck, the company might then provide a more reliable and better quality service. But, Thames Water’s recent price hikes in London, despite paying out billions in shareholder dividends while pilling on billions in company debt, would suggest that this is a long and questionable causal chain. And, let’s leave to one side the unquestioned neoliberal logic of basic service provision, shall we?
The paradox of equipoise and scaling
A second challenge which equipoise raises is found in the same statement above that if benefits are known, they should not be denied for the sake of an evaluation. This is a dilemma when we consider scaling interventions with RCTs. I responded to Greenwell on LinkedIn that I don’t think we ever have complete certainty (about anything), but when an intervention has been “proven” to work elsewhere with a good quality RCT, and the proposal is to replicate it elsewhere, I asked what does this mean for equipoise? What threshold of uncertainty are we talking about?
Greenwell responded that the definition of equipoise here should be quite generous and expansive. He suggested that “scaling necessarily means going beyond direct replication to instead replicating aspects of the intervention but with a different population, context, scale or implementation method.” In this sense, there will always be wiggle room for doubt about the benefits and costs of the intervention. To some degree, this is totally understandable. However, the lack of clear scope conditions would tend to suggest that almost anything might be permissible (e.g., the Nairobi nightmare above). And this makes the concept difficult to operationalise and to enforce consistent standards in IRBs, or elsewhere. Perhaps for this reason, Greenwell notes that equipoise is a necessary but insufficient condition.
In my view, an expansive definition sets too low an ethical bar. Ultimately, there is too much weight given to preferred methods over the benefits and harms for people. When a proposal is being made to scale something, the claim is already being made that “it works.” Indeed, the water payment study authors above claim that the net effects are positive and that there are no negative effects. So, it “works.”
Even when there may be valid questions about generalisability (or transferability), and the need for more than a single study (e.g., the controversial two study rule), the point is that the available evidence suggests the intervention does more good than harm, and thus justifying withholding the intervention treatment seems ethically dubious and intellectually incoherent. In my view, therefore, an RCT with a pure control group cannot be justified unless there were serious doubts about the original study design (e.g., the Nairobi case), or if there are credible concerns that the intervention may cause more harm than good in a different contexts, or whether the intervention contexts are so radically different that it’s not plausible that what worked in context A might not also be more likely than not to work in context B. Indeed, under such circumstances, the proposal to replicate would seem highly questionable. For sure, I’d raise serious questions about scaling the water payment RCT on ethical grounds, even if it’s “effective” for the preferred dependent variables. That study tests the very limits of the whole “what works” industry.
As Abramowicz and Szafarz note, ‘RCTs are costly to implement and so divert money away from other, often less-consuming, experimental designs.’ A more ethical alternative if the intervention is credibly effective, even if potentially more costly under certain circumstances, would be to carry out an A/B test with different types of treatment, but not a pure control, and compare the relative costs and benefits of these. It strikes me that either one can credibly appeal to equipoise or one can advocate to scale an intervention based on an assertion that “it works,” but not both.
Therefore, I think it’s good news that equipoise be explicitly discussed, and grateful to ACE for raising it explicitly, but I think there is still some way to go to meaningfully operationalise it. If the Nairobi water RCT passes the test, is anything not fair game?