On Monday, USA Today released its latest report in a series on standardized testing in American schools. The story focused on Noyes Education Campus, a PK-8 school in D.C., which had been singled out for praise by the city’s former schools chancellor, Michelle Rhee, because of a big jump in its test scores. USA Today found that there were widespread irregularities with the tests at Noyes — namely that a high number of the students’ answer sheets for the city’s standardized tests had erasures that looked as if the initial, incorrect answer bubble filled out by students had been changed to the correct ones.
There was a time when standardized testing was widely seen as a necessary evil in education if not anathema to actual learning. But in the years since the passage of No Child Left Behind, testing has come to dominate discussions of education reform and classroom priorities. (Some districts devote several dozen school days to high-stakes testing each year.) Charter schools tout their test scores in their fundraising efforts. Public schools with consistently low scores run the risk of being shut down. Several states are making “value-added” scores — an ostensible measure of how much an individual teacher improves student learning based on hitting test score targets — a central peg in determining how much a teacher should be paid, if she should be granted tenure, or whether she should be dismissed outright.
Dana Goldstein writes that given the stakes, it shouldn’t be too surprising that it appears some folks have been cheating to hit their numbers.
In the social sciences, there is an oft-repeated maxim called Campbell’s Law, named after Donald Campbell, a psychologist who studied human creativity. Campbell’s Law states that incentives corrupt. In other words, the more punishments and rewards—such as merit pay—are associated with the results of any given test, the more likely it is that the test’s results will be rendered meaningless, either through outright cheating or through teaching to the test in a way that narrows the curriculum and renders real learning obsolete.
In the era of No Child Left Behind, Campbell’s Law has proved true again and again. When the federal government began threatening to restructure or shut-down schools that did not achieve across-the-board student “proficiency” on state reading and math exams, states responded by creating standardized tests that were easier and easier to pass. Alabama, for example, reported that 85 percent of its fourth-graders were proficient in reading in 2005, even though only 22 percent of the state’s students demonstrated proficiency on the National Assessment of Educational Progress, the gold standard, no-stakes exam administered by the federal government.
The stat-juking going isn’t just happening on the school side, either. A Twin Cities City Pages report by Jessica Lussenhop found that the testing industry — which has tripled in size since 2002 — relies on poorly trained temps to score the essay portions of standardized tests. At NCS, one of the largest testing companies, the obvious problem of subjectivity — what, exactly, makes an essay good? — was addressed by a blunt rubric that, for some reason, graded essays higher when they contained longer paragraphs. (A “5″ was excellent, “1″ was poor.)
The scanned papers popped up on the screen and her eyes flitted as fast as they could down the lines. The difference between “excellent” and “good” and “adequate” was decided in a matter of seconds, to say nothing of the responses that were simply off the reservation. How do you score a kid who rails that his town sucks? What about an exceptionally well-written essay on why the student was refusing to answer the question?
There were the students who wrote extremely well but whose responses were too short—in his mind he saw them, bored with the essay topic, hurrying to finish. Or the essays where the handwriting got rushed and jumbled at the end, then cut off abruptly—he imagined the proctor telling the frantic student to lay down his pencil on a well-written but incomplete response.
And there were the kids who just did what they wanted. Like the boy from Arkansas who, instead of writing about the most fun thing to do in his town, instead wrote a hilarious essay on why his town is terrible and how he wanted to burn it down and pee on the ashes.
“I wanted the kid to get the score they deserved,” Puthoff says of his time in the business. “But they want to put them in boxes.”
Supervisors at this company were pressured to make sure the aggregated test scores resembled a bell curve. So when the scorers doled out too many 2s or too many 5s, workers alleged that their supervisors simply re-graded the essays so that the scores fell in line.
So even as we use tests as the foundation for more of our education policy, can anyone even say for certain what, if anything, we’re actually measuring?