Statistics and sports analogies
Stretching back to How to Lie with Statistics, there are no end of books speaking to the limits of a statistical worldview. We saw a resurgence of this line of thinking before the pandemic with popular books such as The Metric Society and The Tyranny of Metrics warning against the perils of metric fixation and false precision.
It might perhaps be argued that, due to the pandemic, we’re more receptive to statistics than ever before. For months, we would all wake up, check Our World in Data in despair, and then try our best to get on with our day. For the first time, regular people would talk about the R value and reproduction rates. They even learned what exponential growth meant.
Statistics and sports analytics have a pretty close relationship. We don’t just learn at school about why statistics matter, we absorb this received wisdom in bars, in the press, and on TV. Sports analytics emerged from baseball in the 1980s, popularised in the film Moneyball about the Oakland A’s general manager Billy Beane who put together a baseball club on the cheap by employing computer-generated analysis to draft his players.
But, analytics heavily influences most sports these days. My school cricket coach (a mathematics teacher) would actually compile the batting and bowling averages for all the school’s cricket teams. I knew what my averages were each week, so I’m thus primed to see the value in sports analytics and statistics generally perhaps. This is, of course, a wider phenomenon. We take it for granted that greater volume, frequency, and consistency are invariably good things (and that it’s always possible to achieve).
Arbie Baguios might be right that there are more and less appropriate uses of sport analogies:
Yet, there’s a deeper point here. Our sports analogies, and how we talk about statistics in daily life carry over into how we evaluate success and failure, make judgements about what interventions work or not, and therefore how we should invest public and private funds. I think we need to pause and reflect on why sports are such a good place for statistical thinking (and the logic of experiments), but also on where might not be.
It may surprise you to hear that, as an Englishman, I’m a new basketball fan. I actually became a basketball fan because of basketball statistics. My brother was always into basketball and just as the pandemic started he shared some YouTube videos. Those of one YouTuber, JxmyHighroller, really stood out to me because of his amusing use of charts to show outliers. When a new video drops, my brother and I literally signal with the word “stats.” Or as the YouTuber put it today, “no opinions, no bias, just the facts.” Hmm…
One thing that jumped out at me as I started to watch basketball commentary was the abundance of semi-literate YouTubers talking about sample size. It’s usually a preface to a hot take: “OK, small sample size, but…” The folk usage of “sample size” and “noise” shows how widespread statistical thinking is the sport, largely without thinking.
Easy measurement; predictable success
Basketball is the ideal arena for statistical thinking, and particularly experiments. It’s an explicitly closed system, with relatively few dimensions and quite limited interactions. To put this in the terms of a recent typology for complex interventions, if basketball were an intervention arena, it would be quite simple. There are only 10 players on a court of less than 100ft long and 50ft across, and games last for less than 50 mins. Basketball’s relative simplicity was a major reason I wasn’t much of a fan. From afar, it just seemed like a rather one-dimensional sport for tall people. This was, of course, an unfair characterisation by someone who was never any good at it.
In fact, some have even argued that basketball is among the most skilled sports. The video below from Vox on The Success Equation highlights how some evaluate luck and skill in sports, and illustrates how much we take the underpinnings of statistical thinking for granted. Basketball is argued to be near the top because it’s deemed to be among the least random. In part, this is because basketball has a higher sample size of games, attempts, etc. than (American) football or ice hockey, for example. There are simply more potential observations, but for fewer players, and the best players play proportionally more minutes. So, basketball is more “skilled” because it is more measurable and predictable (individual sports like tennis and swimming do even better), not because basketball players are necessarily more skilled than other sportspersons.
In this delimited space, basic statistics like points per game, assists, blocks, and steals tell us quite a lot about whether someone is a good basketball player or not. Advanced statistics such as Player Efficiency Rating (PER) or plus/minus can arguably tell us even more.
One big discovery of basketball analytics in recent years is that 3 > 2. I jest, but this discovery has probably done more to change the game in recent years than anything else, and the trend below has continued to rise ever since.
A lot of basketball analysts (often taller ex-basketball players) don’t like this trend, nor the analytics revolution which contributed to it. What it’s produced is a more predictable game, as more and more players attempt the same shot.
Most Valuable Player
Evaluation is ultimately about appraising value — defining criteria, standards, and making judgements. Evaluators are those that make these judgements. But then, every sports fan and basketball analyst is also an evaluator in their own way. Each year, there’s a debate among basketball analysts about who is the Most Valuable Player (MVP)? Often, we find that people don’t have the same conception of value. In fact, the debate often revolves around whether they are appraising the same criteria in the first place, prior to assessing standards and making (evaluative) judgements. There are debates as to whether it’s truly an individual award. Some analysts argue that the team’s success matters, because an individual’s achievements ought to have a bearing on whether their team wins or not. So, analysts therefore advocate exclusion criteria for eligible players such as being in the top few teams of a conference at the end of the regular season.
Others analysts argue that the MVP award should be strictly invididual. You hear them say that a player needs to pass the “eye test.” No-one quite knows that this is, but the point is you know it when you see it — they need to look like the MVP on the court. Others argue that the storyline matters — the player needs to have an interesting story of change. Still others seek the primacy of statistics — the MVP is the player with the best averages. If I had a vote, I’d most definitely go with the statistics.
(Un)natural experiments — the “bubble”
During the pandemic, we actually got to see what happens with and without a crowd in various sports. But, for the NBA playoffs in 2020 we saw an interesting experiment. 22 out of the 30 teams created a “bubble” in Orlando, Florida to follow regulations during the pandemic. The bubble is a pretty good analogy for field experiments and their ideal conditions — isolated, standardised, stable. It was deemed a grand experiment of epidemiology. As a result of this tightly controlled environment, shooting percentages went up (it’s believed), because there were fewer distractions from the crowd.
Recently, Minessotta Timberwolves’s Karl-Anthony Towns (this year’s 3-point contest champion) made fun of Los Angeles Lakers’ Russell Westbrook’s (or Westbrick — a “brick” is a badly missed shot) terrible shooting when he air-balled a shot by implying that it must have caught the wind.
Of course, Westbrook’s shot didn’t catch the wind. A basketball arena is a controlled environment. We extrapolate from medical experiments in exactly the same (often inappropriate) way when we bring them outside the lab. The hubris of experiments is not only that we can standardise an intervention, but that we can effectively control environments for our interventions. The above demonstrates that even in the most assiduously controlled environments, context matters, and thus we really need to stop explaining it away.
Whatever the recent criticism, by many measures, Westbrook is a good basketball player. He’s been described as a “walking triple-double” (he gets lots of points, assists, and rebounds). He even won the MVP in 2017 because of his remarkable stats. But, he’s also been criticised as a “stat stuffer” because his pursuit of individual stats is not always necessarily beneficial for his team. Other statistics count against him (e.g., turnovers, shooting efficiency). This year, the one-time MVP is, statistically, the worst shooter in the NBA. So, this tells us that even where stats are king, we have to consider which outcomes are most valuable to us when we make an evaluative judgement. Far too often I’ve seen Randomised Control Trials (RCTs) or other similar experiments which spin one outcome variable and bury several others. What is portrayed as success might just as often be described as a failure if we look at different outcomes from the same intervention, as Jean Drèze’s recent critique of embedded experiments showed.
Whether conscious or not, sport analogies are important folk reference points that reinforce statistical and experimental worldviews and their, largely unquestioned, validity. But, how far should such analogies travel? I’ve argued that, outside of a lab, basketball is the ideal arena to show where statistics might tell “the truth” (or get as close to it as we reasonably can).
The problem, however, is that most interventions that we evaluate and the contexts in which they are situated are not nearly so simple as the basketball arena. I’ve argued previously in a critique of a recent paper inappropriately advocating experimental approaches for evaluating complexity that experimental (and, to some degree, statistical thinking more broadly) fits when the problems being addressed and interventions are relatively simple, when control is both possible and desirable (i.e., closed systems), when change processes are expected to be linear, and where contextual and intervention features are stable.
I think we still don’t appreciate just how rare such circumstances are, which is why they have to be created artificially. This artificiality to close, control, simplify, stabilise, and standardise means that method leads intervention design, when it really must be the other way around.
Many interventions are ineluctably situated in open systems, have multitudes of agents, and manifold interconnections. Relatively little that matters can be controlled. Much is irregular and unstable. Much can’t (or shouldn’t) be standardised. Change is often emerges from interactions between interventions and their environments, and that change is often non-linear. It would be nice if we could simply and efficiently compute another MVP — the Most Valuable Project. But, it’s worth asking how much is your intervention and its context is like a basketball game? The answer to the basketball analogy should help you to understand when statistics and experiments are a good fit, and when they’re a very poor fit indeed.