Ben Haller on My Global Warming Econometrics Bleg
By Bryan Caplan
The more crap you throw into the regression, the more it will be overfit
and won’t give you anything useful back.
Qualitatively, sure. But researchers do multiple regressions with 50-100 observations quite frequently. (See growth regressions, for example). If you get informative results even with small data sets, that’s strong evidence in your favor. If you can’t, that’s at least weak evidence against.
But it’s worse than that.
All of the random variables you suggest will probably be fairly
correlated with CO2, for the simple reason that CO2 has been steadily
rising, whereas (I would expect) church attendance has been slowly
falling, the Dow Jones has been rising, and televisions per capita has
probably been rising as well. Separating the effects of correlated
variables in a regression is hard; the stronger the correlation, the
bigger the dataset needed to do it.
This is true for all the time series data I’ve worked with. But when people talk about climatological data, they often seem to be saying that their data is so strong that these concerns don’t apply. I want to know if I’m interpreting them correctly.
CO2 probably wouldn’t dominate in
such a regression, no. So what? We know from first principles that it
shouldn’t, even if it is causal.
If CO2 is the only causally significant factor that’s changing during the observed period, I say you should expect CO2 to dominate the regression… unless you’re trying out tons of trending variables and only reporting the ones that “work.”
The point is that we have a
mechanistic explanation for why CO2 would cause rising global
temperatures. To conclude that it does not, in fact, have that effect,
despite all of our understanding from basic physics why it should have
that effect, we’d need overwhelmingly strong evidence.
If this is so, what’s the point of all the rhetoric about how “overwhelming” the data is? Why do so many people claim that, “Now the facts can no longer be denied” if the real evidence is basic physics that hasn’t changed in decades or centuries?
Whereas we have
no mechanistic explanation for why church attendance, or the Dow Jones,
or televisions per capita, ought to affect global temperatures – and
thus no good reason to include them in a regression in the first place.
Running the regressions you suggest would be an abuse of statistical
methodology that any competent first-year stats student would call you
out for. So… why exactly do you want to do it?
The baseline regression I suggested – temperature on CO2 and a linear time trend – is one that any competent first-year stats student should take seriously. It’s a standard way to see if your story that “X is making Y go up” is superior to “Y just seems to be going up.” Why add other trending variables? To see if the data are more consistent with your story than random made-up stories. Inquiring minds want to know.
At root, my bleg is about truth-in-advertising. If the case for AGW comes from basic physics, even though applied statistics alone counsels agnosticism, I wish experts would tell me.