Articles

The Spectacle 07: Uncommon Sense

August 2022

An interview with the world’s most average chess player about why Nicolas Cage isn’t responsible for people drowning in pools. Or an interview with Ian Shepherd – data understander and author of ‘The Average is Always Wrong’ – about navigating statistics. Or somewhere in between.

Q: Ian, thanks for taking the time to chat. To start, tell me a little bit about yourself?

A: I think the most important bit of background for this conversation is that I’m not a statistician. I’ve worked for a long time in different consumer businesses. The nature of those businesses is that you deal with lots of data. If you have millions of customers, you have millions of bits of data. Decision making is quite data oriented as a result. In recent years when I’ve had both more time and the inclination, I’ve written a book about how businesses can best use the data around them. There are a lot of people in the business community who in reality aren’t that comfortable with data and statistics, and particularly now data has been put on steroids with machine learning and artificial intelligence. My aim is to demystify it a little bit through my book, The Average is Always Wrong.

Q: Why did you decide to write the book? Was there a single event or observation that triggered it, or had it been percolating for a while?

A: It was an accumulation really. Of observing decision making that did not make sense of the numbers. And the use of terminology like machine learning by people who don’t truly understand it. People were coming to me saying ‘we’ve done an AI’. I started wondering whether they even really knew what that meant as they’re saying it.

Q: What do you think underpins that thinking - that data is an end in itself, rather than a means to an end?

A: I think it’s because if you’ve got a big company, and therefore a big data set, then the computational analysis you can do is pretty amazing. But a lot of what ends up happening is that data sits in a bit of a black box. People come up with these incredible insights and actions, but they often forget to tell you the limitations of what they’ve done with the data and its inherent vulnerabilities. I use an example in my book - of a company using data to review all their products and the data highlighting that one particular product was disproportionately bought by pensioners. That’s the thing that stuck in their minds, and was used for targeting. The “pensioner product”, so to speak. But if they really thought about it, they might be wise to ask questions about those numbers. If 5% of the overall customers were pensioners and 6% of the people who bought that product were pensioners, then it’s true to say that it was disproportionately bought by pensioners. But it’s also the case that 94% of their sales were not from pensioners. 94%. Looking at the numbers carefully is important.

Q: To what extent do you think people confuse data with certainty?

A: I think one of the reasons that people can struggle with concepts using data is that they’re not always massively intuitive. Data can be incredibly complex and people end up getting lost in it. Too often we throw around summary statistics as if they are the whole truth - average household income, the COVID mortality rate. In reality, those headline numbers tell only a tiny fraction of the story. Once you learn to dig a bit deeper into the messy reality of the data, much more interesting (and much more useful) conclusions emerge.

Q: So what are the top tips for navigating in a world full of data?

A: I think one of them is to always have in the back of your mind the question, ‘how do you know?’ Because we are presented with these numbers all the time. In politics, media, business. It’s really important to consider how people have actually got to that number. The source. Often you’ll end up seeing a pattern that is quite different, or at least much more complex than the headline.

Q: If you could share one concept from your book, that you want people to take away with them if they haven’t read it - which would it be?

A: Well obviously I’m shocked that there is anyone who hasn’t read the book, but I suppose it’s just about possible. Joking aside, the central theme of the book is that data is more interesting than summary statistics. You can talk endlessly about averages, mean and mode. But you should also just look at the data in its rawest form. Because our most powerful machine learning tool is our brain, and if you look properly you’ll see deviations and insights that build a greater sense of reality.

Q: People are often confused by ‘mean’, ‘mode’ and ‘average’ – do you have a go to metaphor you use to explain these when at parties?

A: I try to be more interesting at parties than that. In any case, the really interesting stuff is in the detail of the data. The variations to the average, the pockets of people behaving differently.

Q: We’ve never known a time where people are so glued to headline data. With daily COVID statistics and briefings over the last couple of years. Do you think it has made people more interested or more wary of the role of data?

A: Well I think it’s probably both. The last few years have seen a melting pot of data, grafted in different ways. Which has caused lots of debate. Really the culmination of it was the world’s discovery of Bayes Theorem. The idea is that if 100 people take a 99% accurate test to identify COVID, and 1 person tests positive, it’s very likely they have the disease. But given the test is only 99% accurate, it is also likely that at least 1 person will test positive even though they don’t have the disease. Both people will be convinced they have it, because the test is 99% accurate. When in reality between the two of them, there is only a 50% chance they have it. Public discourse did not initially take any of that into account, and it was not reflected in any of the statistical results presented to us on how the pandemic was growing. Subsequently Bayes Theorem was used by those opposing lockdowns as an attempt to prove we were all overreacting. But that too missed what was happening in the real world. Since to start with we were only really testing people who were symptomatic, the false positives were much lower than in reality, versus the theory I’ve just presented. The truth is almost everybody was wrong, almost all of the time. All of which to say that the data has fascinated people, but it has also alienated them.

Q: What’s your favourite example of a statistic that has disproven a widely held belief/assumption?

A: Actually my favourite examples are always the other way around. There is a great website (Tyler Vigen’s Spurious Correlations) which produces tantalising correlation graphs between things which are obviously not related. Causation versus correlation. It turns out the number of people drowning in the US by falling into a pool is surprisingly well correlated with the number of films Nicolas Cage has appeared in that year. It’s unnerving. By illustrating so clearly that correlation does not always equal causation, the graphs are a valuable lesson in showing things are not always what they seem.

Q: Why is the average always wrong?

A: It’s obviously a provocative thought, but I think my argument is that if the average is the correct answer to the question you’ve asked, then you’ve asked the wrong question. An interesting illustration for that is the average income for an author, which is a really richly researched topic. It’s a spectacular example of the average being completely wrong. People have done average income reports for this, which have been widely shared. But in reality these are so massively skewed by JK Rowling. Almost as a single individual, but certainly helped along by the ten or fifteen best paid authors. The average is essentially meaningless unless you take those people out. I also had a similar experience chasing down a statistic I read about how the average person goes to the pub 24 times a year.

Q: Does it make more economic sense for businesses to only cater to ‘average’ – tastes, sizes etc. – or to target those outside the standard deviation?

A: Too often, catering to the average is driven by cost - it’s cheaper for me to make small, medium and large than it is to cater to the actual size of an individual customer. But there is great value in challenging that, and the businesses that find a cost-effective way to cater to the individual rather than the average generally win. We are all individuals, and value being treated that way.

Q: For those that are interested in finding out more about data and their role in decision making and society, are there any books or resources you can recommend? Apart from your own book of course.

A: My bible on this topic is David Spiegelhalter, The Art of Statistics. He’s the Cambridge professor of statistics. It’s a wonderful book to get under the skin of what statistics really means. A step beyond what you’ll get with The Average is Always Wrong. Though without my folksy charm, obviously.

Q: Thanks so much for chatting. Final question, in what way are you most average?

A: According to the app I play on, I’m the world's most average chess player. It pains me greatly.

This interview originally appeared in Issue 07 of The Spectacle.