Teacher Evaluations and Superstition
By Arnold Kling
Megan McArdle has a long post on the issue of measuring teacher quality. Meanwhile, The New York Times profiles James Heckman, whose careful research suggests that by the time a child reaches school age it is too late to make much difference.
If the best evidence is that it is almost impossible to make a long-term difference in education, then the statistical evidence on teacher quality is bound to be highly unreliable. What appears to be teacher quality is likely to be random variation. The low rate of replication of statistical teacher evaluations that Megan discusses is consistent with that.
There is a term that Daniel Klein alerted me to called “white hat bias.” What it means is that findings that favor a popular political viewpoint will be published, while those that contradict that viewpoint will tend to be discarded. So many people have a vested interest in believing that teachers make a difference that one has to be very wary of white hat bias in studies that purport to show such differences.
Along these lines, I am afraid that I am skeptical of Rick Hanushek’s claim that the best teachers are really effective and the worst are really ineffective. If that were true, then I think we would observe private schools dramatically outperforming public schools, holding student characteristics constant, and I do not think that is what the data say. Instead, when we see differences, those differences typically do not persist over time.
In education research, intensive efforts are made to find differences caused by teachers or other inputs. This is a worthwhile effort, but whenever studies are published showing such differences, they need to be discounted heavily for the biases induced by various filters in the research and publication process. The likelihood of any strong difference holding up in repeated study is quite low.