By Bryan Caplan
A fundamental fallacy in classical statistics is to say, “Therefore, we accept the null hypothesis.” The classical statistical tests are designed to make it difficult to reject the null hypothesis (innocent in until proven guilty)…
So, if you set up a classical test with the null hypothesis as “health care has no effect,” you are giving yourself a low probability of rejecting that hypothesis. Getting a statistically insignificant result is no more than that–an insignificant result. All claims that you have shown zero effect of the independent variable are fallacious, because you gave yourself a high probability of showing zero effect to begin with.
Arnold’s quite right if you only have a small data set. But the opposite is true for large data sets. In fact, one of McCloskey’s complaints about tests of e.g. purchasing power parity is that the number of observations is so large that you almost always reject the null hypothesis even when it’s basically true. William Poley succinctly summarizes McCloskey’s point:
[W]ith the large number of data points typically used in these studies, standard errors can be relatively small. Thus, the estimated β… might be close to 1, say 0.99, but have a miniscule standard error, say 0.0001, which would lead the researcher to conclude that PPP fails. Yet, 0.99 might be close enough to 1 for the scientific purposes at hand, such as determining the efficacy of monetary policy or determining whether profitable arbitrage exists.
The literatures on health and the family that I discussed vary in quality (what doesn’t?), but at least a fair number of papers have a lot of data points. So rejecting the null hypothesis of no effect is pretty easy. Moreover, it is fairly common in both literatures to discuss the magnitude of the effect, not just statistical significance.
One of my pet peeves is lazy econometrics, where someone tries to estimate an aggregate relationship for a disaggregated process. For example, you can show a significant relationship between many specific medical procedures and longevity for people with the relevant conditions. Yet if you take an aggregate proxy for medical care and an aggregate measure of longevity, there is no relationship. I think of the latter as lazy econometrics rather than a description of the real world.
There is something to Arnold’s point, but if you had to either do econometrics the lazy way or Arnold’s way, I’d recommend you do it the lazy way. Suppose, for example, that you could answer one of the following questions:
(a) Which is more dangerous per mile travelled – planes or cars?
(b) Which is more dangerous if it runs out of fuel while in motion – planes or cars?
I’d say that (a) is a much more valuable question to answer than (b). Yes, if you had unlimited time and data, disaggregating every possible situation would imply the answer to (a) and a lot more. But if your time and data are scarce, you are far better off just knowing the fact that planes are safer than cars.
I agree with Arnold that we lose some valuable information when we look only at the aggregate effect of medicine or parenting. (Indeed, that was my whole point!) But that aggregate information is good to know, and one can imagine the public digesting it. If laymen ignore it, they are highly unlikely to master complex disaggregated statistical analysis instead; they are just going to stick with their priors that medicine and parenting have big positive effects. And they don’t.