Assessing Assessments

A week ago I promised a follow-up entry on testing. Here goes. We can divvy up tests into two broad categories: formative and summative. The first type, formative, is often the more useful of the two when discussing learning and understanding. It commonly goes by feedback or commenting. But the point is that it not only tells the student that he’s got it wrong, it points out why and how it could be changed so that it’s better. Think annotations on an English composition or history paper. This sort of test helps students to revise the way they’ve approached the subject and generally to improve their grasp of the material. Also, this sort of thing usually isn’t considered testing. It tends to happen during the draft rather than after the final project is complete, though it need not.

When I graded for a freshman honors, theoretical math course, it was not uncommon for me to mark up problem sets with comments like, “This is impossible. I see why you wrote this, but here’s why it doesn’t work…You’ve got the basic idea right, though.” Score: ten out of ten. Sometimes I’d go as low as seven, but you really had to push me there. The score wasn’t important, the reasoning was.

The second kind of assessment, summative, takes its name from the word summary. And as such, it usually signals the end of a unit, a chapter, a book, whatever. Once the lesson is done, summative testing quantifies student learning and spits out a grade. It’s dangerous for at least two different reasons.

Summative testing doesn’t help the student learn from his mistakes, at least not as easily. Say a second grader writes on his multi-digit addition exam that 112+37=482. Summative testing tells the student that he’s got it wrong. Formative testing identify the problem: he’s lining up the numbers in the wrong way. If he makes the same mistake consistently, formative assessment would address the root of the problem—he needs to review basic concepts about base ten number representations.

The second weakness of summative testing comes from the kinds of questions it asks and when it asks them. When kindergarteners enter school, most believe that the world is flat. This fact, after all, is confirmed by common experience. Once a teacher explains to his students that the world round and not flat, the student may accept this new fact—teachers are authorities, you know—but not in the way the teacher meant. If asked on a test, the kindergartener may successfully report that the earth is round, even though round to her might mean round like a pancake rather than like a ball. The trick, then, is to ask the right sort of question.

Too often summative testing (think SATs) requires a definitive right answer against which all other responses are considered wrong. Multiple-choice questions are especially bad, as the test taker may not know anything about the answer except that it has to be right in front of her. When the Princeton Review guarantees its instructors can raise test scores simply by teaching testing techniques, they mean it. There’s big money involved. And a lot of it comes simply from the format of the test.

So what should we have asked our kindergartener instead? Well, there’s nothing especially wrong with the first question. We can still ask what shape the earth is. But we need to supplement it with the sorts of questions that incorporate the student’s foundational knowledge: their conceptions, misconceptions, superstitions, and cultural beliefs—whatever, which they bring to the classroom before they ever enter it. So ask them to name another object that is the same shape as the earth and to draw it. It’s hard to hide a misunderstanding if you look for it in many different ways. For an older child, it might even be appropriate to ask her to design an experiment to confirm her answer. (How do we know that the earth is round like a ball short of going into space to see?)

Now that we know which kinds of questions to ask, we should think about when we ask them. Timing is crucial. So much summative assessment comes at the end of a chapter. This provides context. And the context may serve as a crutch, providing a sense of false understanding. If a calculus class has just finished a section on integration by parts, there’s a good chance that the questions on the test can all be solved by integration by parts. Many students dread cumulative final exams. They’re harder, if for nothing else, because the questions come out of context.

In that same freshman math class, a student came up to me during office hours after the midterm exam. She explained that there was one problem that was unlike any other that they had seen and that it was totally unfair and how could the professor do such a thing and how upset she was. After a short deep-breathing exercise and one and a half cups of cold water with a wedge of lemon, she was calm enough to identify the question. It was a three parter and went something like this:

(a) State the Rank-Nullity Theorem for linear operators.
(b) For a linear transformation A not the zero matrix from R3 to R3 such that A2=0, find a relationship between its image and kernel.
(c) What are the maximum possible dimensions of Image(A) and Ker(A)?

She was right. They had never discussed such a linear transformation (for the curious, such a creature is called nilpotent) in class before. They had, however, proved the Rank-Nullity Theorem. The problem above required students not only to have memorized the statement of the theorem but also understand what it meant enough to apply it to a slightly new situation.

I told the frazzled student that I recognized the question and thought it was very fair, and that’s why I had written it in the first place. She and I both were unmoved by the other.

The point is, it is possible to write good questions even in a summative testing environment.