At a recent happy hour, one of my co-workers ordered a glass of wine “because wine makes you skinny.” Curious, I asked her to explain. “I read that people who drink wine have smaller waists than people who drink beer or who don’t drink at all,” she said.

I don’t doubt for a second that at least one study has found a correlation between wine consumption and a lower body mass index (BMI). There have also been studies that correlate aluminum with Alzheimer’s disease, vitamin consumption with better eyesight, and cat ownership with improved health. But does that mean that wine causes weight loss — or that soda cans cause dementia?

Not necessarily.

Our human brains are wired to seek and see patterns, but sometimes our brains get carried away. And when the runaway brain belongs to a (probably) well-meaning journalist — or even *gasp* an academic — such extrapolations can be misleading. Or hilarious, as Dr. Fiona McQuarrie points out in her excellent post (see below).

So, as you start your week, remember: Just because the facts are accurate doesn’t make it so. And also, Nicolas Cage must be stopped.

## Spurious Correlations, or, Why Nicolas Cage Must Be Stopped

There is a misguided assumption in a lot of media reporting on research that correlation equals causation. Correlation is a statistical relationship between two variables – for example, amounts of social service funding and crime rates – that assumes that one variable has some degree of dependence on the other. In other words, if one variable changes, there should be a change in the other variable if the two are correlated.

There is a problem with this assumption, however – or at least it’s a problem for reporters who can’t be bothered to learn basic statistical concepts. A variable that is statistically correlated with another variable may change

notbecause of a change in the other variable, but because of factors that have absolutely nothing to do with that other variable. If you think of the large number of variables that could be related to amounts of social service funding (e.g. what activities the funding is being spent on, how or where the funding allocations or made) and to crime rates (e.g. what kinds of crimes, how much criminal activity is actually reported), you can see how a correlation cannot definitively prove that changes in funding for social services will result in changes in crime rates. And that is why statistics instructors always tell their students:correlation does not imply causation.I’ve just come across a website, Spurious Correlations, that demonstrates this principle with some great examples. It seems that Nicolas Cage should be banned from making any more movies, because the more he appears in films, the more people drown in swimming pools.

(The correlation number at the bottom of the table indicates the strength of the relationship between the two variables. A positive number means that an increase in one variable relates to an increase in the other variable; a negative number means that an increase in one variable relates to a decrease in the other variable. The closer the correlation number is to +1 or -1, the stronger the relationship between the variables.)

It also appears that increased mozzarella consumption leads to more doctorates in civil engineering in the United States. Maybe hungry American PhD students eat more cheese?

And the website also allows you to generate your own spurious correlations. I know that it rains a lot in Washington, the US state closest to me. But I didn’t know that a decrease in precipitation in Washington leads to fewer lawyers in the Northern Mariana Islands.

A really great feature of this website is that all its spurious correlations are statistically significant. That is, based on the numbers of pieces of data that were used in the calculation, the correlations are unlikely to have occurred by chance. The fact that these correlations are meaningful by statistical standards – but utterly meaningless in terms of any real effect of the variables on each other – emphasizes even more strongly

whyit’s important to understand statistical concepts.And it’s especially important to be able to think critically and analytically about statistics if you’re writing about research based on statistical analyses. If you don’t, you may end up misreporting the research and misleading your readers – which is a problem not only for you and for them, but also for society at large. Because that misleads us about the real reasons why things work as they do.

The word “correlation” has two separate states: relational and coincidental: in the latter case where two separate statistical numbers are compared in an arbitrary fashion (usually to test for possible causality): like the number of people who prefer red M&Ms and the number of people who prefer to buy red cars. There is (probably) no appreciable causal relationship between the two behaviours but as scientists, statisticians feel compelled to ask the questions. If the answer to the question implies no causal relationship, it is described as coincidental and since any two sets of numbers can be compared, it is to be expected that a (statistically) high percentage of comparisons will result in clear proof (or a strong indication) that there is no relationship between the sets. But the word “correlation” implies a relationship of some sort and whilst the causality might be tenuous and unreliable, in a correlative relationship it cannot be denied or dismissed. Just ask John Donne, and the millions of butterflies who have cause earthquakes in Tokyo.

Wow, Xpat! Did you write that off the top of your head? Because I must say yours is one of the more impressive comments I’ve ever received. Thank you very much for further educating me (and probably a few of my readers) on this topic. But for the record: I still hate lousy headlines based on extrapolating a couple of data points to their most ridiculous conclusion.

Thank you for reading, and especially for taking the time to write.

Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: “There are three kinds of lies: lies, damned lies and statistics.”

– Mark Twain’s Own Autobiography: The Chapters from the North American Review

Oh, and BTW, there is no record of Disraeli ever having said or written this – so who CAN one believe?