Thursday, March 27, 2008

In the course I'm currently teaching, the exams usually consist of two sections: multiple choice and short answer. I just finished recording the midterm grades, and for curiosity's sake I calculated the correlation between the two sections. For one class, the correlation between multiple choice and short answer was 0.59; for the other class, it was 0.70.

The question is, what correlation should I hope to get if the test has been optimally designed? Two things occur to me:

1. I don't want a correlation of zero, because I expect smarter and better prepared students to well on both parts (and dimmer and less prepared students to do poorly on both parts). A correlation of zero would raise the worry that my test doesn't really measure ability and preparation.

2. I don't want a correlation of one, because that means the sections are effectively redundant. Multiple choice is much easier to grade, so why bother grading the short answers if the multiple choice section tells me everything I need to know? If the two sections measure different things, correlation should be less than perfect.

So the correlation should be in between zero and one. But that's a pretty wide range. How much correlation should I aim for? Is there even a right answer to the question?

Gil said...

sqrt(2)/2

I have a truly marvellous proof of this proposition which this margin is too narrow to contain.

Jenny said...

At least you do know something: You didn't get any negative correlations. The argument that certain "types" of students do better on multiple choice and other "types" do better on short answer must not apply to your students.

Michael Schwartz said...
This comment has been removed by the author.
Anonymous said...

Even if the correlation were perfect, I wouldn't say that the two-part test is redundant. I tend to view test taking as part of the learning experience. I believe adequate time should be alloted for the exam and that students shouldn't be rushed. I don't agree that a test is just a place for the students to "dump" everything they've memorized, but rather an opportunity to think about what they've learned more or less well and form a gestalt of the subject matter to achieve greater understanding, and one hopes, as well, to come up with the right answer (or close to it) when it's being graded.

Anonymous said...

I think you're conflating the average correlation with the tightness of the correlation. For example: suppose half the class scores 25% less on the multiple choice, and half score 25% more. The correlation (as I understand your use of the term) would be 1, but the results aren't actually that predictive, and there's clearly value in having both parts of the test.

Dr. Zeuss said...

I don't think you can rule out one so easily - the real point of a test is the studying it induces, and maybe their studying would be different if they knew in advance that it would be only mc.

Anonymous said...

A high correlation only makes grading, not giving the second section, redundant.

Ran said...

So this question has been floating in the back of my mind, and I think I've reached the conclusion that no, there's no right answer. Imagine a class where all the students are exactly identical and interchangeable, but not deterministic. You'd then expect some score variation in both sections, but no correlation, because there's no reason that the students who happen to do better in one section would also happen to do better in the other. Conversely, imagine a class with only five students, one being a straight-A student, one a straight-B student, etc. You'd then expect a perfect correlation between the two sections; but then, in this case you could theoretically give neither section, and simply assign each student the grade you know (s)he'd otherwise get. (Obviously that's not fair, since students don't even get a chance to prove themselves, but you didn't ask about fairness — and I think if we're considering fairness, then we can ignore the question of redundancy, since the fair approach is to give students both kinds of opportunities.)

Now, this doesn't rule about the possibility that for any given group of students, an optimal test design would have a certain correlation (i.e. that if there are multiple equally optimal tests, they'd all have the same correlation); but certainly there's no way to fully investigate that for a given group. At least, no way without actually giving them various tests …

Anonymous said...

What does the numerical correlation signify? Do we know what the "expected" grade was for each student? Do we know what the "expected" average grade for the class was? And when we know those things, what do they tell us?

There are many variable factors, many of them entirely random, that enter into the exam grades. Is there some account taken of these factors, such as weather, individual student sense of well-being, individual student's actual health on exam day, etc, etc.?

So, again, I ask: what do those numbers tell you and of what use is that knowledge to you? The confounding and unaccounted-for factors sure to be present make it impossible to claim some useful knowledge from these numbers, without wholesale disregard for all the other factors leading to those results.