At a recent happy hour, one of my co-workers ordered a glass of wine “because wine makes you skinny.” Curious, I asked her to explain. “I read that people who drink wine have smaller waists than people who drink beer or who don’t drink at all,” she said.
I don’t doubt for a second that at least one study has found a correlation between wine consumption and a lower body mass index (BMI). There have also been studies that correlate aluminum with Alzheimer’s disease, vitamin consumption with better eyesight, and cat ownership with improved health. But does that mean that wine causes weight loss — or that soda cans cause dementia?
Our human brains are wired to seek and see patterns, but sometimes our brains get carried away. And when the runaway brain belongs to a (probably) well-meaning journalist — or even *gasp* an academic — such extrapolations can be misleading. Or hilarious, as Dr. Fiona McQuarrie points out in her excellent post (see below).
So, as you start your week, remember: Just because the facts are accurate doesn’t make it so. And also, Nicolas Cage must be stopped.
Spurious Correlations, or, Why Nicolas Cage Must Be Stopped
There is a misguided assumption in a lot of media reporting on research that correlation equals causation. Correlation is a statistical relationship between two variables – for example, amounts of social service funding and crime rates – that assumes that one variable has some degree of dependence on the other. In other words, if one variable changes, there should be a change in the other variable if the two are correlated.
There is a problem with this assumption, however – or at least it’s a problem for reporters who can’t be bothered to learn basic statistical concepts. A variable that is statistically correlated with another variable may change not because of a change in the other variable, but because of factors that have absolutely nothing to do with that other variable. If you think of the large number of variables that could be related to amounts of social service funding (e.g. what activities the funding is being spent on, how or where the funding allocations or made) and to crime rates (e.g. what kinds of crimes, how much criminal activity is actually reported), you can see how a correlation cannot definitively prove that changes in funding for social services will result in changes in crime rates. And that is why statistics instructors always tell their students: correlation does not imply causation.
I’ve just come across a website, Spurious Correlations, that demonstrates this principle with some great examples. It seems that Nicolas Cage should be banned from making any more movies, because the more he appears in films, the more people drown in swimming pools.
(The correlation number at the bottom of the table indicates the strength of the relationship between the two variables. A positive number means that an increase in one variable relates to an increase in the other variable; a negative number means that an increase in one variable relates to a decrease in the other variable. The closer the correlation number is to +1 or -1, the stronger the relationship between the variables.)
It also appears that increased mozzarella consumption leads to more doctorates in civil engineering in the United States. Maybe hungry American PhD students eat more cheese?
And the website also allows you to generate your own spurious correlations. I know that it rains a lot in Washington, the US state closest to me. But I didn’t know that a decrease in precipitation in Washington leads to fewer lawyers in the Northern Mariana Islands.
A really great feature of this website is that all its spurious correlations are statistically significant. That is, based on the numbers of pieces of data that were used in the calculation, the correlations are unlikely to have occurred by chance. The fact that these correlations are meaningful by statistical standards – but utterly meaningless in terms of any real effect of the variables on each other – emphasizes even more strongly why it’s important to understand statistical concepts.
And it’s especially important to be able to think critically and analytically about statistics if you’re writing about research based on statistical analyses. If you don’t, you may end up misreporting the research and misleading your readers – which is a problem not only for you and for them, but also for society at large. Because that misleads us about the real reasons why things work as they do.