Measurement error?

Some of the most miserable experiences that I’ve ever had with staff development have been connected to reviews of standardized test results.  One session in particular has been burned in my mind because the presenter—who I happened to admire and enjoy—spent the better part of the session battering my sixth grade language arts teachers and me with discouraging numbers that seemed to reveal that we were the least effective teachers in the universe.

In fact, he described us in front of our entire faculty as “decidely average” because our students were showing the least amount of growth on the state’s exam when compared to the other grade levels and departments in our school.  “You’ll notice,” he said, “that you’ve only slightly exceeded the 4 point average growth that middle grades students typically post on the reading portion of the state’s end of grade exams.”

Fighting back a bit, one of my peers raised her hand and asked about the measurement error—a statistical indicator expressing how accurate a test is at predicting actual student achievement—on the exam.  She was expressing a doubt that has crept its way into the minds of many educators.  “If we’re going to use these scores as an indicator of our performance, then we need to have confidence that they are reliable measures of student performance,” she explained.

After an awkward pause, the presenter explained that the measurement error on the exam stood at 3 points!

We were shocked and a bit offended all at once.  After all, how seriously could we possibly take the results of our standardized exams when average student growth statewide just barely exceeded the measurement error of the exam.  “Does that mean that when students make average growth that they may—or may not—have mastered a year’s worth of content?” my friend asked, “Is it possible that some of the kids who failed the exam actually could have made expected growth?”

“Well, sort of,” replied the presenter.  “But you also have to remember that some of the kids who passed the exam actually didn’t make expected growth too.”

Needless to say, we didn’t take any comfort from his explanation!  After all, these scores had just been used as a cudgel to criticize our performance. “If we’re going to get bruised over these scores,” my colleague muttered, “then I’d like it to be by evidence that is a bit more definitive than a strong maybe!”

We were all left to wonder what kinds of implications our newest discovery should have on teaching and learning—and student assessment—at the school, district and state level?  Should my team truly embrace test results as an accurate indicator of performance—and then use those scores to make instructional decisions—-when a year’s growth is just slightly higher than measurement error?  Should we accept accountability for results on an exam that seems barely reliable?

I’m honestly confused because I’ve been trying to support standardized testing as a tool for change….but now I’m not so sure it’s even something I should consider. “The test” has lost even more credibility in my book!