Rubrics as a harness for complexity
In this final blog in the series, I want to look at the potential value of rubrics. While evaluability assessments can help us to understand what can be evaluated in a reliable and credible fashion, we should also think about what is actually worth spending time, energy, and resources evaluating. I believe rubrics can help us to do this, and much more besides.
The hot topic (down under)
Jade Maloney and Gerard Atkinson argue that rubrics are the hot topic in evaluation right now. The rubrics sessions at the Australian Evaluation Society (AES) International Evaluation Conferences in 2018 and 2019 were apparently extremely well attended. Yet, I’ve rarely heard rubrics discussed in London development circles or seen much published on them, unless written by an Australasian. This was confirmed at the Centre for Excellence for Development Impact and Learning’s (CEDIL) start up workshop a few weeks ago. One of the projects is proposing to look a rubrics to enhance evaluation use, yet few in the room seemed to know what rubrics were or how they could help.
I also went to an #AdaptDev workshop hosted by the Overseas Development Institute (ODI) as part of the Global Learning on Adaptive Management (GLAM) initiative about 18 months ago on Value for Money and Adaptive Management. Two of the examples, from Oxford Policy Management (OPM) and Oxfam, employed rubrics (even though the write up doesn’t mention them). Both frameworks were underpinned by the work of Australasians (Julian King and Michelle Besley).
Rubrics are used elsewhere, but still seem to be under the radar. I’ve used them in Contribution Tracing and in the Significant Instances of Policy and Systems Improvement method by Jess Dart (another Australian resident) uses them too. I’ve also been told about or privately shown the use of rubrics by Integrity, Palladium, and Itad, but it’s spoken of very quietly.
So, what are rubrics?
Just before the American Evaluation Association (AEA) conference in 2018, I became fascinated by rubrics. E. Jane Davidson (2004) wrote a highly accessible book on evaluation basics in which she convincingly made the case for using rubrics.
Apparently, the origins of the word rubric comes from the mid-15th Century to refer to different headings in the sections of a book, linked to the red letter monks would make (from the Latin for red — ruber). A few decades ago, rubrics became popular among educators as transparent rules to guide scoring for written composition, which might otherwise be graded quite subjectively.
As David Green, points out, rubrics are essentially a form of qualitative scale that includes:
- Criteria: the aspects of quality or performance that are of interest, e.g. timeliness.
- Standards: the levels of performance or quality for each criterion, e.g. poor/adequate/good.
- Descriptors: descriptions or examples of what each standard looks like for each criterion in the rubric.
Maloney and Atkinson note that all rubrics have some form of scale that denotes levels of performance. Like any scale, rubrics illustrate what the difference between good and bad is, help to determine what’s in and what’s out, and what meets the grade and what doesn’t. But, they do more than simply assign a number (1, 2, 3) or word (good/bad) to a performance metric. Rubrics explain what the standard means and make clear the reasoning behind an assessment.
The most common kind of rubric is a descriptive rating scale that can be applied to any criteria, or to clusters of criteria that logically group together, as the table below shows:
This table comes from Kinnect Group’s Evaluation Building Blocks: A Guide,which includes a bunch of other useful examples of different forms of rubrics (analytic, holistic, weighted). Julian King argues that while “we can often pick up new tricks that enhance our evaluation work, it is relatively rare to learn something that fundamentally changes how we approach the whole game.”
For me, rubrics offer two potential game changers: they provide a harness but not a straitjacket for assessing complex change and they help stakeholders build a shared understanding of what success looks like.
A harness, not a straightjacket, for assessing complex change
Despite numerous valiant efforts (Ragin, 2008; Brady et al. 2010; Goertz and Mahoney, 2012; Seawright, 2016; Beach and Pedersen, 2016), a quantitative paradigm prevails in which soft qualitative methods and data is assumed to merely supplement and give texture to hard quant (see KKV, 1994, and After-KKV). More observations, bigger samples, greater power. In fact, as Gary Goertz and James Mahoney (2012) explain, quant. and qual. are internally consistent but different research cultures with different goals: one looks at averages and comparisons across cases, the other looks for necessary and/or sufficient conditions and looks within cases. Yet, in M&E, as Davidson (2011) illustrates, we still try to fit qual. into precise, narrow, and easy to measure indicators:
As Chris Roche pointed out to me recently, rubrics are useful when trying to describe and agree what success looks like for tracking changes in complex phenomena such as higher order systems thinking, procedural knowledge, and attitude formation, for which single indicators or quantitative data are insufficient. Measuring precise and narrow indicators for such phenomena can be pretty inaccurate and misleading (they’re a straitjacket). If I say “accountability,” do you think I mean providing information, providing answers, providing responses, or implementing sanctions?
Get the wrong answer and you might you end up measuring the wrong thing. Yet, thinking of measurement (assignment of a number to a characteristic of an object or event) is the wrong way to think about change that is often non-linear, multi-dimensional and even multi-directional. Rubrics allow us to think about membership rather than measurement. This is far more appropriate when definitional boundaries are necessarily fuzzy, but yet where you still want some form of harness to make transparent evaluative judgements.
Build a shared understanding of what success looks like
The second potential game changer rubrics offers is a deliberative process to discuss, debate and define what success looks like. While various other semi-qualitative methods do this, like fuzz-set QCA, rubrics provide just enough structure for stakeholders to arrive at shared understandings and agreements about what constitutes progress and success, but not so much structure that you lose sight of contextual variation.
Take the example of the widely-used net promotor metrics elicited from micro-surveys which revolves around the question: “would you recommend us to a friend?” This can provide potentially very useful timely feedback from project participants in humanitarian and development projects. You simply calculate the difference between the promoters (9 -10) and the detractors (0 -6). Deceptively simple, perhaps?
What’s missing, of course, is a contextually anchored description of what the numbers actually mean. Ex-colleagues of mine at CARE applied this in Nepal, Bangladesh, and Ghana. Our partners Keystone Accountability were adamant that the above distribution applies everywhere. The Nepal team protested that these numbers (and their presumed levels) meant something different in their context. Standardisation is designed to help provide useful benchmarks across organisations and geographies, developing competition among organisations to become best in class. Yet, benchmarking is spurious if numbers mean different things in different places. In UK universities, 70% is an excellent grade. In Canada, it’s mediocre (as colleagues and I discuss in relation to rubrics for Process Tracing). Defining rubrics for these levels of membership and describing what each standard looks like can thus transform meaningless data into something meaningful and valuable.
Rubrics for assessing contribution
My own modest contribution to the discussion comes in the form of adapting Process Tracing, Outcome Harvesting, and rubrics. I was looking to develop a simplified version of Contribution Tracing. I lazily called it “Contribution Rubrics.”
A Rubik’s cube was a near homonym and also reflected the fact I wanted to look at three dimensions, taking significance and contribution assessment from Outcome Harvesting and strength of evidence (evidence tests) from Process Tracing. This, I felt, took some of the best of both methods, and rubrics gave me a way to put them together. Where rubrics were helpful was in defining what outcomes were worth evaluating (based on their levels of perceived significance), what level of contribution an organisation had made to that outcome (based on degree of uniqueness) and what the strength of evidence was to support that contribution claim (based on how well evidence fit the explanation).
While in Māori contexts, Nan Wehipeihana (2011) has found steps to heaven (Poutama) as a useful metaphor in relation to levels of learning and intellectual achievement, my own cognitive associations are with the Calvinist asceticism of modernist European art. In particular, my reference point for rubrics was Piet Mondrian’s evolution as an artist. I’m an art history buff. I found it fascinating how Mondrian was able to pair back painting to the bare essentials, influenced by the vertical and horizontal lines in the dunes of Zeeland — as in “Dune Landscape” above (perhaps I was making a connection with my own New Zealand heritage, or reflecting on the fact that rubrics took off in New Zealand). These progressively transformed into rectilinear boxes of primary colours; clean lines which help distinguish boundaries in our fields of vision.
Where next for rubrics?
While rubrics are not a silver bullet, for me, they are full of possibilities for international development monitoring, evaluation and learning. As rubrics get more widely adopted, we might expect to see them solving new problems.
I personally feel that one fertile area for further work is in “hard to measure” sectors like governance and accountability work. Rubrics are great at providing a guiding structure to themes which are conceptually imprecise or where thresholds may be somewhat unclear. They fit nicely with the evolution towards fuzzy-set Qualitative Comparative Analysis(QCA). While rubrics have been used for risk management and value for money, you can flip risks into assumptions, which can then help structure “loose” theories of change. Defining what different outcome thresholds mean can then help complex programmes make better sense of comparable levels of achievement, moving beyond a loose collection of idiosyncratic stories of change (i.e. the misuse and abuse of Most Significant Change).
Another important future direction is around evaluation use and evaluative reasoning. Discussions around rubrics can make stakeholders’ reasoning more explicit, increase transparency, and potentially reduce initial judgment bias among stakeholders. As Christina Peterson suggests, this may be especially the case in contexts where strategic information withholding is expected to be high and stakeholders have a greater tendency to stick to initial judgments despite new evidence.
In writing this blog I spotted a great list on rubric resources, by another Kiwi (Will Allen). So, I highly recommend that if you’re trying to catch up with the conversation, like me. My thanks to Julian King, Christina Peterson, Jade Maloney, Chris Roche, and Florencia Guerzovich for their helpful comments and suggestions.
In the next blog, I will draw all the blogs in the series together.