Teaching a class with 90 students gives me a newfound appreciation of multiple-choice tests. One of my objections to them has been that questions are not robust, in that students’ answers may not reflect their knowledge.
My concern is that multiple-choice tests are subject to type I and type II errors. A student who knows the answer can get a question wrong by misreading it or giving it a more “nuanced” reading (type I error). Conversely, a student can get it right by being lucky (a type II error). One way to deal with this is to ask a lot of questions and hope that the law of large numbers is on your side.
Another approach is what I would term rank-order multiple choice. You give students a group of three or four questions, each one of which has a different answer. For example,
Of the three statements below about increased trade between a country with a highly-educated work force and a country with a poorly-educated work force, one is true, one is false, and one is uncertain. Select the correct answer for each question.
1. The country with a highly-educated work force will have a comparative advantage in some goods and services, but not in others.
2. The country with the highly-educated work force will tend to run a trade deficit.
3. The country with the highly-educated work force will suffer a decline in productivity and wages.
My claim is that grouping these questions together and effectively asking students to “rank-order” them in terms of truth or falsehood should reduce at least the type I errors. So I’m thinking that this way of asking questions would lead to results that are more robust than simply asking three separate questions.
Here is another example:
In questions 4-7 below, rank the standard of living of the following people. Put the highest-ranking person in 4, the next-highest ranking person in 5, etc.
(A) someone in the 50th percentile of the Mexican income distribution in 2000
(B) someone in the 20th percentile of the U.S. income distribution in 2000
(C) someone in the 20th percentile of the U. S. income distribution in 1970
(D) someone in the 75th percentile of the U.S. income distribution in 1850
Another example might be to rank four goods in terms of their likely elasticity of demand.
My hypothesis is that putting questions in this format will cause students to see possible errors and correct them, thereby reducing type I errors.
For Discussion. What suggestions do you have for giving multiple-choice tests that are robust?
READER COMMENTS
L.F. Brown
Oct 22 2004 at 6:18pm
I would think students should approach a multiple choice question in the same way, usually eliminating the “false” answer and then moving on to the two more likely answers. I guess you think more about what you are answering when you (presumably) get points for each answer (although I wouldn’t give them points for getting the “false” answer right -if you did give them points for getting the “false” answer right this would not be as robust compared to asking three separate questions…if I’m reading you correctly.) But this seems to be more of a solution for helping ward off laziness (reactive) instead of getting them to always apply rigorous thinking (pro-active). If the question and answer is clear, then there should be no “misreading.” If a question is open to “nuance” then it should more likely be a written answer format as opposed to a multiple choice question. As for the two above points, if a student has a complaint, hopefully they will raise it so that maybe an unclear question can be avoided in the future.
Aargh, this is bringing back bad memories!
I guess it is subjective (for me at least). Having written answers vs multiple choice is a better way of reflecting knowledge (the chance to explain or reason/not “hinting” at the answer) but is much more time consuming for teacher and (probably) student alike.
Psychotic Twins on Alii Drive
Oct 22 2004 at 6:35pm
In other systems where you have a small likelihood of false negative or false positives, the solution is to take the data point multiple times and take the value returned the majority of times. As the number of samples increases, the likelihood of multiple false negatives or positives goes to zero.
Of course, with humans you cannot just ask the same question over and over and consider the answers to be independent events.
Jim Erlandson
Oct 22 2004 at 9:08pm
This I learned in my two quarters of undergrad Econ: Test Bank!
The department kept a file of 600+ multiple choice questions in the library. The final exam consisted only of questions from the test bank. This had the advantage of easy-to-grade exams, a rational method of study and a high quality learning experience — it forced me to spend time in all chapters of the text to come up with the right answers. Plus, I knew exactly what the department expected me to know and study.
It solves your problem since given time to study the question and possible answers, the student is less likely to mis-interpret the question.
The down side is the possibility of group-study and memorizing instead of learning. But a large enough test bank should help with that problem. A large number of questions prevents memorizing all the answers. It is easier to understand than to memorize.
I learned as much studying for the final as I did in lecture — and I had a good prof.
A more elegant approach would be what good market researchers do — ask the same question several times in different ways and compare the answers. This would let you write a program to do some sophisticated (well, not so sophisticated but you can pretend) statistical analysis of each student’s answers to separate ignorance from misinterpretation. Ultimately, it would tell you which questions were poorly constructed and which were brilliant. Or that they were all brilliant.
KipEsquire
Oct 23 2004 at 8:33am
There’s also the tactic (trick? medieval torture device?) of listing several answers then flitering them through the multiple choice options. For example:
Which of the following presidents were Democrats:
I. Andrew Jackson
II. Abraham Lincoln
III. Theodore Roosevelt
IV. Franklin D. Roosevelt
A. I only
B. IV only
C. II and III only
D. I and IV only
E. I, II, III and IV.
The sad part is such a question does absolutely nothing to minimize Type II errors (a guess is still a guess) but can increase the possibility of Type I error (should a person who knows FDR was a Democrat but does not know Jackson was a Democrat receive the same ZERO as a person who thinks Lincoln and Teddy were Demcorats?
Yet standardized tests are increasingly overflowing with questions of this format.
Lawrance George Lux
Oct 23 2004 at 12:29pm
What suggestions do you have for giving multiple-choice tests that are robust?
The most robust, and challenging form of multiple choice I have seen is to put the Questions in one column, and the Answers in another column, stating that there is a specified number of wrong answers (four is good). The Answers are numbered, with the Questions having a line to write down the Answer number. Listed below is four lines for the wrong answers, and extra credit given for all four wrong answers being listed here.
Test-taking should be a learning experience in itself, and this type of test is the most thought-provoking. Type II errors are substantially cut down, and Type I errors curtailed by short, precise answers. lgl
John Thacker
Oct 23 2004 at 12:36pm
The problem with the “ranking” questions is that a person who very correctly understands the relative order of three of the answers but is very unsure on the fourth can easily get zero or one correct. As a result you get odd situations like someone who would get all four correct when asked to rank a certain group of four, but when a fifth choice is added, gets zero or one correct.
The ranking questions also inflate the importance of knowing the extremes; whether that’s a good thing or not is up to you. E.g., if the correct ranking is A > B > C > D > E, and you know A > B > C > D, but aren’t sure about E, you can score 0, 1, 2, 3, or 5 depending on where you put E. If you know A > B > D > E but don’t know C, then you score 2, 3, 5, 3, or 2 depending on whether you put C in the first through fifth place, respectively.
You can always score out of a percentage of transitive relationships correct, though. That would be more difficult (not for a computer), but probably fairer.
Bruce Cleaver
Oct 24 2004 at 6:56am
Regarding Kip Esquire’s point:
If Arnold is feeling particularly cruel that day, he may also include in the list (E) None of the above. In one historically sadistic University of Tennessee calculus exam I heard of, there were 30 questions and the correct answer to 26 of the 30 was (E) None of the above. That would test the nerves and confidence of Milton Friedman.
Lawrance George Lux
Oct 24 2004 at 2:26pm
Bruce,
I had a Freshman English professor who gave a 60 question True and False test, where the answer was always False. I also had a Price Theory professor who Everyone knew you could get a B answering all questions true–her class was real trouble, though, think of a Group take-home Final of One question, and it was something else. lgl
Ann
Oct 24 2004 at 4:52pm
I sympathize, having just spent the last week grading more than 100 exams! I had some multiple choice, but apparently there were too many essay and problem-solving, so it took forever to grade.
For multiple choice, I offer many possibilities, as was mentioned above, to minimise the Type II error. It’s harder to guess the right answer with 7 or 8 alternatives (although still possible, of course). I also give partial credit. Suppose the question is “which of the following is false?”. Answers a), b) and c) are statements that may or may not be false, and then I give all the combinations: d) Both a and b. e) Both b and c. f) Both a and c. g) All of the above. h) None of the above.
But, suppose both a and c are false. The highest score would be for answer f) Both a and c, but partial credit would be given for a), c), or g) All of the above, since for each answer, the student would have gotten two out of three right. You could even have a third point level for answers d), g) and h), with b) getting the lowest possible points for getting everything wrong.
You can give partial credit multiple choice even for somewhat complicated numerical problems, if you can forsee what mistakes most of them will make. Besides the correct answer, you can give the answer they’ll get if they use the wrong discount rate, the answer if they mistake the timing of one of the cash flows (I teach finance – lots of present value problems!), the answer if they use the perpetuity formula rather than the annuity formula, etc. This can allow you to separate out small from large mistakes in a relatively efficient way.
And, I’m sure you’ve found that it’s easier to come up with strong, unambiguously false answers than to come up with “true” answers. Out of 100 students, there’s always the risk that someone will come up with an alternate interpretation of a statement that you considered true, an interpretation that has some merit but wasn’t at all what you were aiming for. It’s a surprizingly tricky process, since you’re asking them to read so much into every single word! One alternative I’ve tried, that hasn’t been too horrible to grade, is to ask them to correct a false statement.
For big numerical problems, one of my colleagues draws a box for the correct final number, and tells the students that he will not look at anything outside the box. Thus, no partial credit – either you get it or you don’t. It’s a breeze to grade, but of course a student who’s way off gets the same score as one that was almost right. I don’t like this but can see why he does it.
I used to think that, if video lectures ever took off, they wouldn’t need as many people teaching classes. But then it occurred to me that if I didn’t have to stand in the same room repeating the same thing for three sections, I could instead spend more time giving the students direct feedback (i.e. more time grading). It can be particularly helpful to go through several rounds on a big project, telling them how they should have done it and making them try it again. But who has the time?
jaime
Oct 24 2004 at 8:17pm
I would use multiple-choice to confront the student with a two tiered set of questions:
– Numerous simple questions in order to evaluate the student´s familiarity with the subject. Easy questions like the meaning of technical terms that a well prepared student can answer in no time.
– A few complicated problems, requiring the use of equations and discernment.
Different weight should be given to each set of questions.
Evaluation using market-research methods, including statistical analysis, seems efficient to avoid errors. But apllying statistical methods of evaluation goes against the clarity of the relation between answers provided and the result of the exam. One of the main goals of an exam is to convince the student that the teacher´s evaluation of his knowledge and ability is correct and he truely deserves the number he got. The exam should make that relation very clear and uncontestable.
shamus
Oct 25 2004 at 9:36pm
Having graded many exams, I’ve come to the firm conclusion that if you want to figure out what a student knows, then you need to use essay (or a least short answer) questions. Multiple choice is a guessing game motivated by professorial sloth. Sorry Arnold, but you’re too lazy to give your students a real grade, then multiple guess won’t help you.
Bill Fellers
Oct 26 2004 at 1:19am
Can’t you get a student on financial aid to use their work-study allotment to grade short-answer/essay exams? I did that for numerous physics courses from intro to grad, homework and exams. For fairness, I scored each problem according to a rubric and avoided looking at names until I was done. I never had any complaints. I must admit, though, that I was an atypical grader. The department chair, who’d been teaching for 30+ years, said I was the most reliable grader he’d ever had. I was astonished. I guess that old-school Lutheran work ethic is all but lost. Sigh… Lazy Westerners…
Ann
Oct 26 2004 at 8:24am
It’s easy to say that “you shouldn’t use multiple choice”, but grading essay exams consistently is also quite difficult with 100 students. It’s hard to consistently give the same number of points for the same answers, across the many variations and the hours it can take to grade just one question. Sometimes I find myself wanting to get more lenient (‘if that many students came up with the same answer, maybe it’s not all that stupid’), other times I find myself wanting to get more strict (‘I’m sick of this same stupid answer over and over!’). At least with multiple choice, there are consistent standards.
It’s silly to assume that the instructor’s time doesn’t matter. I just spent 40 hours grading only one midterm, and it was that fast only because I used a combination of essay, short answer and multiple choice. And say what you want, but when I had 240 students in one section, I did not use essay exams, and yet I think I was able to come up with reasonable exams.
Using graders is usually more trouble than it’s worth, at least in my experience. For straightforward grading, they’re unnecessary, while for complicated grading, they add noise. I don’t see how a well-constructed multiple choice question, graded consistently, is worse than an essay question graded randomly by a TA.
With a combination of essay and multiple choice, an instructor can cover more material on the exam and give more consistent scores at a lower cost than if (s)he relies on only essay. It’s worthwhile to try to develop good multiple choice exams.
Comments are closed.