Years ago, I told Tyler Cowen, “It’s surprising that IQ tests predict life outcomes so well, because there’s usually no financial incentive to get a high score.” He replied, “People try out of pride – an under-rated motive.” So when Tyler blogged Duckworth et al, “Role of Test Motivation in Intelligence Testing” I naturally took notice. Key claims:
1. Material incentives boost IQ scores:
In 46 independent samples (n = 2,008), the mean effect of material incentives on IQ was medium to large: g = 0.64 [95% confidence interval (CI) = 0.39, 0.89], P < 0.001.
2. Material incentives have a bigger effect on the IQs of people with low scores:
Because exact baseline IQ scores were not reported in some samples, we created a binary variable where 1 = below average (i.e., IQ < 100) and 2 = above average (i.e., IQ ≥ 100). The effect of incentives was greater for individuals of below-average baseline IQ: Qbetween(1) = 9.76, P = 0.002. In 23 samples with IQ scores below the mean, the effect size was large: g = 0.94 (95% CI = 0.54, 1.35). In contrast, in 23 samples of above-average IQ, the effect was small: g = 0.26 (95% CI = 0.10, 0.41). A similar analysis in which baseline IQ scores (available for 43 of 46 samples) were treated as a continuous moderator indicated that a 1 SD increase in IQ is associated with about two-thirds of an SD decrease in the effect of incentives: b = −0.04, P < 0.001.
The authors reasonably infer that IQ is more of a composite intelligence/motivation measure than usually believed – especially by inter-disciplinary researchers. Their words to the wise:
Our conclusions may come as no surprise to psychologists who administer intelligence tests themselves (49). Where the problem lies, in our view, is in the interpretation of IQ scores by economists, sociologists, and research psychologists who have not witnessed variation in test motivation firsthand. These social scientists might erringly assume that a low IQ score invariably indicates low intelligence.
It’s hard to evaluate a piece like this without re-doing the underlying research, but the presentation is compelling and plausible. My main complaint is statements like this:
[W]e hypothesize that test motivation is a third-variable confound that tends to inflate, rather than erode, the predictive power of IQ scores for later-life outcomes.
This is especially odd given Duckworth et al’s effort to distinguish unobserved “true intelligence” from IQ. As far as I can tell, the authors do nothing to show that their results make IQ less predictive. They don’t even show that IQ is more mutable than earlier studies find; boosting incentives boosts scores while the incentives remain in place, but there’s no reason to think the boost lasts after the test-takers receive their pay. All the researchers require us to reconsider is the reason why IQ is so predictive and hard to durably improve.
For example, instead of saying, “IQ tests show that people are poor because they’re less intelligent – and intelligence is hard to durably raise” we should say, “IQ tests show that people are poor because they’re less intelligent and less motivated – and intelligence and motivation are hard to durable raise.” If, like me, you already believed in the Conscientiousness-poverty connection, that’s no surprise.
In any case, I urge you to read the original article. I’ve been reading IQ research and personality psychology for over a decade, but these results really are news to me. Your thoughts?
READER COMMENTS
Guy in the veal calf office
May 6 2011 at 5:32pm
This story doesn’t add much, but is pretty neat: My parents, dirt poor immigrants, once apologized to me over dinner for not having any wealth to leave me. I replied, “oh, don’t worry, you brought me here and you gave me good genes, that’s enough.”
Not until 2 years later did I found out that my dad was so taken by that response that he studied for 12 months and then took the Mensa test, at the age of 80. One day, he gave me his Mensa certificate, told me the story, and said, “you know what? I did give you good genes.”
[Last aside, contrast my parents’ apology to what the WSJ reports about boomers: “Rather than passing on their wealth to future generations, they’re expected to splurge mostly on themselves as they move households and pursue active lifestyles.”]
Guy in the veal calf office
May 6 2011 at 5:41pm
I imagine you could isolate the motivation variable by studying people who take both the LSAT and GMAT (or GRE, etc). The group that attends law school is likely more motivated by the LSAT and vice versa for those attending business school. Mensa already converts a wide variety of tests into equivalents, so you could use their methodology.
(I know its possible that success on the test determines which school to attend, but among high achievers it doesn’t work that way. )
Steve Sailer
May 6 2011 at 6:21pm
I made the same point in my 2007 FAQ on IQ:
“Q. Wait a minute, does that mean that maybe some of the predictive power of IQ comes not from intelligence itself, but from virtues associated with it like conscientiousness?
“A. Most likely. But perhaps smarter people are more conscientious because they are more likely to foresee the bad consequences of slacking off. It’s an interesting philosophical question, but, in a practical sense, so what? We have a test that can predict behavior. That’s useful.”
http://www.vdare.com/sailer/071203_iq.htm
Phil
May 6 2011 at 6:21pm
I just finished reading “The Bell Curve” — 17 years late — but haven’t seen many other works on intelligence. Hope someday you tell us what you think of it … I’ve read some critiques but none that strike me as … Masonomical.
Steve Sailer
May 6 2011 at 6:32pm
Keep in mind that the notorious average group gaps in cognitive test scores show up on high-stakes tests where the testees are highly motivated: the SAT, ACT, LSAT, MCAT, GMAT, GRE, the military’s AFQT enlistment test, NYC firefighting hiring tests, New Haven fire department promotion tests, Chicago cop tests, the NFL’s Wonderlic IQ test, insurance agent licensing tests, and so forth and so on ad infinitum.
I can think of only one example where different levels of group motivation had a sizable effect: the military’s AFQT enlistment test was renormed in 1980 on the National Longitudinal Study of Youth sample of about 12,000 young people, most of whom weren’t trying to enlist. The test was 105 pages long. It was found years later that the anomalously large white-black gap (18.6 IQ points rather than the usual 15 or 16) on the 1980 AFQT was caused by blacks being more likely to give up from discouragement part way through this long and hard test. (Keep in mind that this was a low stakes test for the participants, who were just taking part in a social science project, not trying to enlist).
In 1997, the AFQT was renormed using a computer adaptive testing where wrong answers lead to easier questions and thus less discouragement. The white-black gap was only 14.7 points.
Steve Sailer
May 6 2011 at 6:54pm
This finding is worth keeping in mind for evaluating school performance test scores, which are usually low stakes tests for the students. Hence, students often get bored or tired and “bubble in” the rest of the answers.
Some of the difference in performance among schools on achievement tests therefore depends upon how well the principal and teachers manage to motivate students to keep working until the end of the test.
So, a lot of reports of miracle schools that seem to fizzle out after awhile have to do with higher scores ginned up by getting students just to not bubble in.
On the other hand, I’d rather send my kid to a school where the management has enough on the ball to figure out how to look better and is persuasive enough to motivate students to work for an extra 20 minutes than a school where management isn’t. So, once again, the motivation angle to cognitive testing isn’t really all that important.
Steve Sailer
May 6 2011 at 7:01pm
One thing to keep in mind is that in experimental situations involving low stakes tests, if the experimenters _want_ one group of testtakers to be unmotivated, it’s easy to demotivate them to work less hard on the test. The test administrator can convey that a lackadaisical attitude is okay just through word choice, tone of voice, body language, and so forth.
I suspect this is a major feature of the popular stereotype threat experiments where low stakes tests are given to blacks. In the test group, blacks are told that they are expected to score low on the following test and in the control group, they aren’t. Not surprisingly, on these tests that are meaningless to the testtakers, the first group is more likely to pick up the experimenters’ hopes that they will work less hard and they do work less hard.
I’ve never seen stereotype threat confirmed experimentally on high stakes tests. I can’t see how such an experiment would pass an ethical review board.
rapscallion
May 6 2011 at 8:06pm
“I’ve never seen stereotype threat confirmed experimentally on high stakes tests. I can’t see how such an experiment would pass an ethical review board.”
You could make the “stake” be money, offer thousands of dollars to the top performers, so it wouldn’t have to be the SATs or GREs. But to do that you’d have to get grant money for a study that might prove that a favored explanation for group underperformance is probably irrelevant. That’d have to be one heck of a well written proposal.
Steve Sailer
May 6 2011 at 8:55pm
The Pioneer Fund my put up the money for that experiment, but I can’t imagine anybody else would!
Troy Camplin
May 7 2011 at 2:27am
Sounds like people have context-dependent intelligence. They can call forth more intelliegnece than they may use in a more typical situation. Is this incentives, or laziness? Many people don’t like to think, and won’t do it if they don’t have to. (Which is too bad, because once you learn how to do it, it’s great fun!)
Evan
May 7 2011 at 3:17am
It seems to me like this study provides greater evidence that blaming poverty on large external factors like discrimination and class is actually likely to contribute to poverty. Obviously someone who sincerely believe “The Man” will stop them from being successful would have no reason to try. This, of course, would probably vindicate Bill Cosby on the subject. This also might help explain Turkheimer et. al’s observation that IQ is less hereditable among the poor, since living in an environment where more is expected of you would encourage you to try more.
What I’d like to see is a study of whether or not these boosts are cumulative. Will a person who is constantly exposed to situations where they have to use the IQ end up smarter over the long term, or will they, as Bryan speculates, lose the boost after they receive their pay. The temporary boost idea seems plausible, but a longer-lasting boost would explain Turkheimer’s results fairly well.
JL
May 7 2011 at 7:51am
AFAIK, there’s little to no correlation between the personality trait conscientiousness and IQ (even in low-stakes tests), so how could motivation have such a large effect on IQ? Assuming that personality tests measure conscientiousness at least somewhat reliably, Duckworth’s results are puzzling. I will have to look at it in more detail. Previously, Duckworth wrote a paper called “Self-discipline outdoes IQ in predicting academic performance of adolescents”, which did not prove the claim in the title, because the IQ range in her sample was restricted and because her measure of self-discipline was correlated with IQ.
stephen
May 7 2011 at 9:13am
Something I mentioned on Tyler’s blog, the article seems to say that the study involved running a series of models until the “right” one was selected. No problem with that if the intent is to use the model to predict future data, but simply reporting the output parameters is not meaningful. You can always find the right model.
On a broader level, it seems that ALL tests measure a compound of whatever it is they are testing for and motivation. No matter how talented/prepared you are, if you don’t care you won’t do well. In fact you can do as poorly as you wish. But, no matter how much you care, you are limited by natural ability.
John
May 7 2011 at 10:13am
Has anyone done an analysis of the correlation between IQ scores and GRE/SAT performance? One would expect a very strong relationship here. If the above mentioned results are right then we should see a weaker correlation on the GRE than on the SAT and both should be weaker than we might predict if everyone did their best on IQ tests– say a correlation statistically equal to 1.
gFactor
May 7 2011 at 12:55pm
The authors reasonably infer that IQ is more of a composite intelligence/motivation measure than usually believed – especially by inter-disciplinary researchers.
This statement has implications for the SAT and ACT, which are effectively tests of general intelligence (g) as they correlate strongly with g (~.80).
Coyle (Intelligence, 2008; PAID, 2011) and colleagues separated g variance (related to intelligence) and non-g variance (related to motivational variables) in the SAT and ACT. Consistent with the view that IQ is a composite intelligence/motivation measure, they found:
(a) The g component of the SAT and ACT predicted college GPA. This finding is consistent with a century’s worth of evidence documenting the predictive validity of g.
(b) The non-g component of the SAT and ACT also predicted of GPA. This finding is inconsistent with the assumption that cognitive tests derive their predictive validity mostly (or even exclusively) through g. It also suggests that the SAT and ACT measure something besides g that contributes to their predictive validity (e.g., motivational variables).
Chip Smith
May 8 2011 at 1:21pm
Phil,
Arthur Jensen’s magnum opus, “The g Factor” was published in 98 or 99, though no one seemed to pay much attention.
The project begun by Murray and Herrnstein seems to live on in a number of class-sensitive works of pop-sociology (like “The Big Sort”) where IQ research is conspicuous for not being explicitly engaged. I think a piquant whiff of this preoccupation can be decoded in the SWPL Lexicon as well.
JL
May 8 2011 at 4:26pm
I knew this study was fishy. Statsquatch’s analysis of the studies in Duckworth’s paper seriously question the veracity of her claims. He shows that the large effect size is driven by three studies on special ed children conducted by Bruening and Zella in the late 1970s. Not only is it questionable to generalize findings from special ed kids to everybody else, but Bruening is a self-confessed fraudster who fabricated data out of thin air in another study on special ed children, and there’s a suspicion that he may have done it with other studies as well. When Bruening’s studies are removed from the data, there is no statistically significant difference between low-stakes and high-stakes tests!
Comments are closed.