An Econometrics Lesson
By Arnold Kling
I received an email from a reader who was very excited to find that over the past 70 years the correlation between excess health care inflation (the price of health care relative to the overall CPI) and the proportion of health care spending paid for by third parties was 0.92 (out of a maximum of 1.00)
I wrote back saying that correlation does not imply causation. He replied that he understood that, but still, with a correlation that high there must be something.
I’m sorry, but the inability to infer causation from correlation has nothing to do with the size of the correlation coefficient. It reflects the process generating the data. In a controlled experiment, you often can say something about causation. When you just observe some data, you cannot.
In addition, time series data (data that cover long time periods) are very subject to spurious correlation. Over time, data tend to follow trends. Any two trends are automatically correlated, whether there is a causal relationship or not.
When you look at data over time, it is important to ask yourself how many data points you really have. With a strong trend, you probably should just think of yourself as having two data points–the beginning and the end point. If there are a few sharp swings in the data, then you might have three or four effective data points. The fewer the number of effective data points, the harder it is to distinguish among alternative sources of causality.
That is why most macro-econometrics is junk science. That is one reason I would tend to suspect that Larry Bartels’ work on Presidential party and income inequality is junk science.
It is possible to do useful empirical work. Amy Finkelstein used a natural-experiment approach to look at the effect of Medicare on health care spending. I suspect that there are actually a lot of pitfalls in her approach, but what she did is far more reliable than plotting two time series, calculating a correlation coefficient, and proclaiming that you have proven that third-party payments are a major cause of health care inflation. I certainly believe that this could be true, but my opinion is not swayed at all by a crude time-series regression.