Last year, Noah Smith proposed his Two Paper Rule:

If you want me to read the vast literature, cite me two papers that are exemplars and paragons of that literature. Foundational papers, key recent innovations – whatever you like (but no review papers or summaries). Just two. I will read them.

If these two papers are full of mistakes and bad reasoning, I will feel free to skip the rest of the vast literature. Because if that’s the best you can do, I’ve seen enough.

If these two papers contain little or no original work, and merely link to other papers, I will also feel free to skip the rest of the vast literature. Because you could have just referred me to the papers cited, instead of making me go through an extra layer, I will assume your vast literature is likely to be a mud moat.

And if you can’t cite two papers that serve as paragons or exemplars of the vast literature, it means that the knowledge contained in that vast literature must be very diffuse and sparse. Which means it has a high likelihood of being a mud moat.

I never faced Noah’s challenge.  Why not?  To be totally honest, because I don’t know of any empirical papers that meet Noah’s standards.  Yes, there are some literature reviews that I consider excellent, like Clemens’ “trillion-dollar bills on the sidewalk” article or Barnett and Ceci’s piece on Transfer of Learning.  But I’d be loathe to point to any specific piece of research and call it a “paragon” or “exemplar.”  Every article I’ve carefully examined has issues -no matter how I firmly agree with the conclusions.  The highest compliments I’m comfortable paying a paper are “careful” and “cool,” never “compelling” or “clearly right.”

My slogan: No Paper Is That Good.

What’s wrong with every specific empirical paper?

First and foremost, external validity is always debatable.  If you use data from 1950 to 2010, you can reasonably wonder, “But are the results relevant now?”  If you use data from the 50 U.S. states, you can reasonably wonder, “But are the results relevant for Canada, or Germany, or China?”  If you set up a pristine experiment, the problem just gets worse; the experiment might not even be relevant in the real world the day it was performed.

Second, identification is always debatable.  Identifying a genuine “natural experiment” requires wisdom and patience.  Plenty of smart people lack one or both.  Calling something a “natural experiment” doesn’t make it so.

Third, even smart human beings are prone to big careless mistakes.  A paper that seems impeccable to a casual reader might be based on miscoded data.  Or crucial variable names could have been switched.

Fourth, although researchers like to pretend that they base their conclusions purely on “the evidence,” their priors always matter.  If A seems initially obvious to you, and paper X confirms A, even researchers who know better must struggle not to say, “X shows that A is true.”  The problem isn’t confidence in A, which may be completely warranted.  The problem is the pretense that you believe in A because X confirms A, even though you would believe in A no matter how X came out.

Fifth, most researchers’ priors are heavily influenced by some extremely suspicious factors.  Factors like: social acceptability, ideological palatability, and what you thought when you were an ignorant teenager.

To be clear, I freely admit that some papers are better than others.  My claim is simply that the best existing papers are still underwhelming – and probably always will be.  As Saint Paul preaches, “For all have sinned and fall short of the glory of God.”

Imagine placing papers on a continuum of convincingness from 0 to 1.  0=”provides no information at all.”  1=”decisively answers its question.”  At least for questions that anyone cares about, I say the median paper hovers around .05.  The best papers get up to around .20.  Again, No Paper Is That Good.  If you demur, consider this: In twenty years, will you still hold up the best papers of today as “paragons” or “exemplars” of compelling empirical work?  If not, you already agree with me.  The best papers are relatively good but absolutely mediocre.  And no, you can’t just staple five top-scoring papers together to hit 1.0.

Does all of this hold for my papers, too?  Of course.  The most I can claim is that I am hyper-aware of my own epistemic frailty, and have a litany of self-imposed safeguards.  But I totally understand why my critics would look at my best papers and say, “Meh, doesn’t really prove anything.”

Given my grim view of research, how can I remain a professional researcher?  By the power of Stoicism.  I do my best to reach truth despite the fact that No Paper Is That Good.  I read voraciously in all the disciplines relevant to the questions on my mind – especially those review articles that Noah holds in low esteem.  He’s right that research is “full of mistakes and bad reasoning”; I probably perceive even more mistakes and worse reasoning than he does.  But my goal as a reader is to discover whether a paper has anything of value in it.  When I toss a paper in the trash, it’s mostly because I decide the author doesn’t even aspire to answer an important question.

More fundamentally, though, I try to set aside controversial priors in favor of common sense, stay calm, and view all my identities with suspicion.  And I bet.  I wish there were a better way – an algorithm that assures truth.  But I see no sign that such an algorithm exists.