Monday, August 12, 2013

Does temperature really cause depression? Correlation vs. Causation in "big data"

Image by Olimpia Zagnoli, New York Times
Back in Mr. Wardle's high school economics class, we were once asked to look for a correlation between any two datasets. I searched through the small school library and stumbled across historical data on marijuana use in Canada.

So, for fun, I compared it to unemployment data and found a strong negative correlation. More marijuana use, less unemployment.

Mr. Wardle liked my assignment. Granted, his sense of humour was famous; on the weekly ten point quizzes, we got a bonus mark for adding a caption to a Far Side cartoon, and he announced the funniest caption the following class.

When he handed back the assignment, he reminded me of one key rule of research:

Correlation does not imply causation.

Which brings us to an op-ed in yesterday's New York Times. Seth Stephens-Davidowitz, an economist interning at Google, presented evidence from Google searches for possible causes of depression in the United States. After unemployment, what was the best predictor of searches for depression?

I tested dozens of variables in many different categories. The strongest predictor by far: an area’s average temperature in January. Colder places have higher rates of depression, with the correlation concentrated in the colder months. The relationship between weather and mental health has been debated, but those debates have generally relied on “small” data. Google searches, the biggest data source we currently have, are unambiguous: when it comes to our happiness, climate matters a great deal. 

Paging Mr. Wardle, wherever you are.

What else happens in January in cold places?

It is dark. You don't need to be a mental health expert to know about 'seasonal affectiveed disorder', a common condition in places where the winter days are short. Yet Stephens-Davidowitz misses this critically relevant correlate to temperature and goes on to provide temperature-based advice:
The striking correlation between temperature and depression suggests they should consider moving to a more temperate location. Of course, people at risk for depression should hesitate to abandon a job in a cold-winter location for no job in a warm-winter clime, and they should think twice about moving away from family and friends.

The advice may be good, even though the op-ed is probably mistakenly attributing many cases of northern depression to lower temperatures rather than less sunlight. If colder places are also darker in winter, does it matter which variable you use? 
Yes, it matters, because we are talking about a correlation, not a perfect relationship. There are cold, northern cities with glorious sunny winters as well as mild, northern cities with depressing grey winter. 

For example, if you suffer from winter depression, should you move from Montreal, with its notoriously frigid winters, to more temperate Vancouver? Probably not, because in Vancouver you're likely to experience weeks on end without seeing the sun. I counted 22 days of non-stop rain a few Novembers back.
The availability of internet search data allows researchers to probe questions previously answered only with high effort, limited sample-size opinion polls. There can be real value to analyses with Google Trends or other storehouses or search data. The Centers for Disease Control, for example, works with Google because the number of people in an area searching for information on the flu turned out to be the best available indicator of a flu outbreak. On a simpler note, want to know whether people are more likely to use the term "climate change" or the term "global warming"? Try Google Trends, and you'll see the answer is clearly "global warming".

So by all means, examine data with Google Trends. Just remember Mr. Wardle's lesson; correlation does not imply causation. Even Stephens-Davidowitz seemed to understand this, at least in the case of one variable:

More Hispanic-Americans meant fewer searches (though this might have been a result of language factors).

Might have. You think?


Ed Davies said...

Seasonal _Affective_ Disorder. It's a disorder of affect, not a disorder affected (or effected) by seasons.

Center_s_ for Disease Control.

Yours sincerely,

Somebody probably being a bit peevish and pedantic due to too much sunlight (specifically, due to short nights causing some sleep disruption) here in northern Scotland.

Simon Donner said...

Good catches. I could blame the sloppiness on the short summer nights in Vancouver, but that would be correlation, not causation.