# Multicollinearity and Micronumerosity

## By:

R.J. Rummel’s critique of a Cato study set off a big bloggers’ debate about the value of think tanks. The following passage in Rummel’s critique got my attention:

This correlation is meaningful for the kind of regression analysis Gartzke did, but he apparently doesn’t know it. A problem in regression analysis is multicollinearity, which is to say moderate or high correlations among the independent variables. If two independent variables are highly correlated they are no longer statistically independent, and the first one entered into the regression, in this case economic freedom, steals that part of the correlation it has with democracy from the dependent variable. Thus, economic freedom is highly significant, while democracy is not. If Gartzke had done two bivariate regressions on his MID data, one with economic freedom and other with democracy as the independent variables, he surely would have found democracy highly significant.

This reminds me of one of the best few pages I’ve ever read in a textbook. The book: Arthur Goldberger’s *A Course in Econometrics*. The subject: Multicollinearity and micronumerosity.

Goldberger’s main point: People who use statistics often talk as if multicollinearity (high correlations between independent variables) biases results. But it doesn’t. Multicollinearity leads to big standard errors, but if your independent variables are highly correlated, **they SHOULD be big**! Intuitively, big standard errors mean that the effects of different variables are highly uncertain, and if your independent variables are highly correlated, highly uncertain is what you should be.

Goldberger brilliantly drives his point home by introducing the concept of micronumerosity. What’s that? A fancy name for “not having a lot of data.” If you don’t have a lot of data, then again your standard errors tend to be large. *As well they should be!* If you have three data points, you should be uncertain of your results.

Conversely, of course, if your independent variables are highly correlated, or your number of observations is small, and you **still **get strong statistical results, this shows that you have a good reason to believe your conclusion is true. Standard statistical methods have already adjusted for these problems; if you get meaningful answers anyway, you’ve got nothing to apologize for.