In Chapter 6 of Escaping Paternalism, Rizzo and Whitman argue that paternalistic behavioral economists have recklessly rushed from laboratory experiments to real life.  Even if the experiments were above reproach, their external validity is questionable at best.  Sunstein, Thaler, and the rest have overpromised and underdelivered:

A central claim of behavioral paternalists is that their approach is “evidence-based” (Thaler 2015b, 330–345). They claim to eschew ideology and simply advocate “what works” (Halpern 2015, 266–298). They say their policy recommendations rest on strong evidence provided by both behavioral economics and cognitive psychology. This decades-long research program has supposedly enabled them to discover how actual people behave rather than how hypothetical economic agents behave.

Why is there such a slip ‘twixt cup and lip?

Most importantly, the crafting of behavioral paternalist policies depends not simply on the existence of phenomena such as the endowment effect or present bias, but on the quantitative magnitudes of such phenomena. These magnitudes are indispensable for answering such questions as: How large should sin taxes be? How long should cooling off periods be? How graphic does a risk narrative need to be? How much income should people be defaulted into saving for retirement? How difficult should it be, in terms of time and effort, to opt out of default terms? All of these quantitative questions, raised automatically by any attempt to implement the policies in question, require quantitative inputs to calculate their answers. Research must therefore establish people’s true preferences (whose better satisfaction is the raison d’être of behavioral paternalism), as well as the strength of the biases that impede them. The methods used by behavioral economists have not reliably estimated such quantitative magnitudes.

And if the new paternalists retreat to, “Well, that’s all up to the political process,” what good are they?

Question: Why is behavioral economics so inadequate to the task of crafting specific policies?  RW offer many complementary answers:

1. Because one of their recurring findings is that human behavior is “context-dependent.”  If changing contexts in the lab makes a big difference, imagine what happens when we move from the lab to real life!  Example:

But perhaps the most important contextual question relates to the reference point from which a loss or a gain is defined. The reference point is subjective. As we saw in Chapter 4, the reference point need not be the subject’s current endowment; it could be the subject’s expectation of something in the future.

2. Behavioral economists find strong “hypothetical bias” – yet selectively use hypotheticals to reach desired policy conclusions:

The upshot is that extrapolating the results of stated-choice experiments into the realm of actual behavior is fraught with difficulties, especially since most of the evidence on preference reversal (due to present bias) rests on stated-choice experiments.

3. Psychological findings replicate poorly:

The Open Science Collaboration, a group of more than 125 psychologists, conducted replications of 100 experimental and correlation studies in three major psychology journals for the year 2008. There is no single standard of successful replications, but the results that are most important for our purposes are these: (1) only 36 percent of the replications showed a statistically significant effect in the same direction as the original study, and (2) the “mean effect size of the replication effects . . . was half the magnitude of the mean effect size of the original effects . . . representing a substantial decline” (Open Science Collaboration 2015, 943). The Open Science Collaboration’s project has since been criticized on statistical grounds by Gilbert et al. (2016), who say that the project did not faithfully replicate the conditions of the original studies…

We don’t know how this particular discussion will ultimately be resolved, but it is safe to say that the reproducibility of much psychological research is simply unknown.

The replication of economic experiments comes off only moderately better:

In the first systematic, but limited, effort to replicate laboratory experiments in economics, Camerer et al. (2016) replicated eighteen studies published in the American Economic Review and the Quarterly Journal of  economics between 2011 and 2014. They found that 61 percent of the replications showed a statistically significant effect in the same direction as the original study. However, the mean effect size was 66 percent of the original magnitude. In most cases such a difference in magnitude (or a greater difference, as in the large study of psychology articles) will have a considerable impact on policy prescriptions.

4. Mere replication is not enough!  The populations researchers study aren’t just unrepresentative.  They’re based on a skewed sample of a skewed sample; namely: students willing to join experiments.

5. Researchers fail to properly account for incentives and learning.  RW’s discussion is too thoughtful and subtle to quickly capture, but here are two nice cases:

Consider the experiment conducted by Thaler (1985, 206). In a hypothetical beer-on-the-beach scenario, people were asked their maximum willingness to pay for a beer… It turned out that people said that they were willing to pay more for the same beer in the fancy resort case than in the run-down grocery case. In theory, this difference in willingness to pay is deemed irrational, inasmuch as the beer is the same regardless and will be consumed on the beach, not in the place where it was purchased. When the experiment was repeated (Shah et al. 2015) while dividing the participants by income constraint, it was found that there was no statistically significant difference in willingness to pay between the two kinds of stores for the lower-income (i.e., more income-constrained) group.

We do not interpret this result as showing that lower-income people are more standardly rational by temperament or character; this seems highly unlikely. What seems more likely is that for lower-income people, the subjective opportunity cost of money is higher… Thus, the cost of succumbing to so-called irrelevant framing does seem to affect its incidence.


Becker and Rubinstein (2004) examine the use of public bus services in Israel after a spate of suicide bombings on buses. Their hypothesis was, broadly speaking, that the greater the cost of one’s fears in terms of reducing the consumption of the terror-infected good (bus rides), the more agents will expend effort to control those fears…

Since Becker and Rubinstein could not measure fear directly, they sought to measure the effect on the consumption of the terror-infected good. They found that frequent or more intensive users of buses were not affected at all by the terror threat, while all of the reduced consumption was on the part of low-frequency users. This differential impact conforms to the rational application of more effort to reduce fear when there is greater value from doing so. Thus it appears that when the opportunity cost of riding on the bus is relatively high, the operative bias disappears.

6. New paternalists give short shrift to self-help; people often realize that their decisions and beliefs are biased – and strive to offset these biases.  RW’s discussion is rich with detail, so let me just share one striking passage:

In an attempt to produce results of general applicability, many experiments are devoid of relevant context (Loewenstein 1999). Familiar cues are omitted and individuals are treated as abstract agents. And yet this attempt at generality results in an impoverished and narrow view of self-regulation. Consider that it is impossible in a laboratory experiment to avoid facing the prescribed choice. The participants cannot say, “No. I would never face that temptation. I would change or modify the situation.” …In natural environments, people choose their own regulatory strategies.

Chapter 7 zeroes in on “knowledge problems.”  I usually find references to Hayek gratuitous, but not here.

[S]cientific knowledge is not the only kind of knowledge relevant to policy. There is another type of knowledge that lies largely beyond the reach of academics and policymakers: the particular details of time and place that affect the preferences, constraints, and choices of individuals. Following Friedrich Hayek (1945), we will call this kind of knowledge “local knowledge.” In the case of scientific knowledge, a suitable body of experts may legitimately claim to have the best and most recent knowledge available. But when it comes to local knowledge, individuals have insights and perspective unavailable to outside experts.


[S]ometimes individuals lack knowledge of themselves… Behavioral economists may have scientific knowledge of a certain kind of bias that afflicts many people or the “average” person. But they do not typically know how much any particular individual is affected by a given bias, the extent to which the individual has become aware of her own bias, and the ways in which she may have attempted to compensate for it. The best the expert can hope for is population-level or group-level summary statistics, not the specific contextual knowledge needed to guide and correct individual behavior.

What are the central knowledge problems that dog the new paternalist enterprise?

1. Knowledge of “true preferences.”  Helping agents satisfy their “true preferences” is the whole point of the new paternalist project, but distinguishing “true preferences” from “false preferences” is daunting even in theory.  One great observation:

Did the hot decision-maker know she would later regret her decision? In other words, is she sophisticated in her bias? If so, then she may reason in this way: “I know I will regret this in the morning because then I will be in a cool state. But it is totally worth it. My cool self is such a bore. I am always choosing the ‘safe’ way. Perhaps I need to take some risks and live a little.”

2. Knowledge of the extent of the bias.

Just one illuminating passage:

It might be argued that the widely varying existing estimates are good enough; we can simply take their mean or median value. However, it turns out that optimal policies can be highly sensitive to small differences in parameter values. In a theoretical exercise, O’Donoghue and Rabin (2006, 1838) provide a striking example of the sensitivity of their sin-tax model to parameter estimates:

If half the population is fully self-controlled while the other half [of] the population has a very small present bias of β = 0.99, then the optimal tax is 5.15%. If instead the half [of] the population with self-control problems has a somewhat larger present bias of β = 0.90 – which is still a smaller present bias (larger β) than often discussed in the literature – the optimal tax is 63.71%. Thus, a mere 9 percentage-point shift in one parameter (from β = 0.99 to β = 0.90) results in a twelvefold increase in the optimal tax.

3. Knowledge of self-debiasing and small group debiasing.  One of many weighty insights:

Self-regulation is complex and much of it is not obvious. Consider an overweight individual with a propensity to eat junk food. Imagine that she often stays away from restaurants that serve junk food but occasionally indulges herself. Does she need the help of a paternalist? Should her indulgences be taxed or should she be nudged away from junk food on those occasions? A person who has made an intrapersonal bargain to abstain, but also to reward her “present self” with some tasty junk food from time to time, may not require a correction. Or, if she does to some extent, the paternalist would have to know in which respects this bargain has broken down and to what extent it is inadequate. To tax the present self reward would tend to unravel the bargain, thereby potentially putting the agent in a worse condition than before.

4. Knowledge of bias interactions.

Because biases can reinforce or offset each other, policies that would improve welfare by correcting a bias if that were the only bias present may in fact reduce welfare when multiple biases are in play. As Besharov (2004) has pointed out, this problem is analogous to the second-best problem in the study of market failure. To take one example, negative externalities (such as air pollution from the burning of fossil fuels) can lead to too much consumption, while a degree of monopoly power (such as that created by OPEC in the petroleum industry) can lead to too little consumption. When both market imperfections are present, theory alone cannot say whether consumption is too high, too low, or just right; any of these are possible.

5. Knowledge of population heterogeneity.

Mitchell (2002) cites at least 100 studies on this point, showing that behavioral phenomena (including cognitive biases) differ in the population along such dimensions as educational level, cognitive ability (as measured by, for instance, performance on the Scholastic Aptitude Test), cognitive mindsets or dispositions, cultural differences, age differences, and gender differences (pp. 94–95, 140–156).


Chapters 1-5 of Escaping Paternalism were very good.  Beginning in Chapter 6, however, the book becomes a relentless bulldozer of the intellectual pretensions of the new paternalism.  After reading Chapters 6 and 7, the idea that behavioral economics provides a “scientific foundation” for any concrete paternalist program seems absurd.  The best-case scenario is that behavioral economics will provide a vague rationalization for the feel-good seat-of-the-pants paternalistic policies governments wished to adopt anyway.  Furthermore, the “It will be up to the democratic process” answer is mere democratic fundamentalism.  Isn’t the central message of behavioral economics is that we can’t blithely trust the wisdom of the people?!

Further thoughts:

1. Outsiders may struggle to believe that practicing behavioral economists neglect to offer specific quantitative recommendations.  As far as I can tell, though, the intellectual picture is as dire as RW paint.  Researchers announce the discovery of an “effect,” then let policy-makers use (and abuse) their discoveries to rationalize both old regulations already on the books and new regulations they dream of putting on the books.

2. RW never mention “aging out.”  Yet many forms of self-destructive behavior erode with age, and we can plausibly interpret this as a kind of “learning.”

3. If RW are right, why don’t researchers work harder to achieve external validity?  I prefer a straightforwardly neoclassical story: The costs of external validity are very high, and the professional rewards are low.  If politicians really wanted scientifically-grounded policy, of course, matters would be entirely different.  Governments rarely give astronomers credit for qualitative “discoveries” about space travel.

4. RW describe many credible examples of self-debiasing.  The main weakness with their discussion: Most people seem extraordinarily stubborn.  Finding and emulating highly successful people is easy, but few less-successful folks are humble enough take advantage of this golden opportunity.  In my experience, the typical human being prefers to either (a) keep their own counsel, or (b) “heed” advice that confirms their prejudices.

5. RW’s discussion of the knowledge problem makes old ideas seem new again.  “Helping people achieve their true preferences” indeed!  If revealed preference doesn’t reveal true preference, what on Earth does?

6. The extension of the “second-best” model to individual decision-making is great – and deserves a far wider audience.  Without optimistic bias to counter their unreasonable fear of rejection, how many males would ever find love?  How many children would never have been born?

7. You could object, “Behavioral economics is no worse a foundation for concrete regulation than any other branch of economics.”  And you probably wouldn’t be wrong.  Most economists, sadly, would rather rationalize regulation than hold regulation up to a mirror.  Insurance regulation is a fine example: economists routinely use moral hazard and adverse selection to justify policies that make these “market failures” worse.