Why test scores are ALMOST useless to me

If you’ve been following the conversation on the Radical recently (see here and here), you’ll know that we’ve been collectively wrestling with the place that the magic bubbles (read: standardized test scores) should have in assessing student understandings.

One question that I haven’t had a chance to answer yet was posed by regular reader K. Borden.  She wrote:

If that rising sixth grader were entering your class, would you want only her former teachers’ observations or would you want those test results as well? Which would serve you best in providing her an opportunity to be an enthusiastic, confident and competent learner?

Good question, K, and one that has had me thinking over the past few weeks as our school’s end of grade test scores have come back from Scantron Central

Here’s my response:  When talking about my initial attempts to identify the strengths and weaknesses of the students entering my classroom, test scores are almost completely useless to me.  I have little confidence in them as a measure of an individual child’s abilities and—given the choice—would take observations by both parents and teachers in every circumstance.

That’s not a very tempered response, is it?!

But it’s a response built from the understanding that test scores for individual children can change dramatically from one administration to the next with no apparent explanation.  I first learned this lesson while watching the testing results of a boy—-let’s call him Jamison—-that I tutored several years ago.

Like many of the students I tutor, Jamison was the prototypical middle school boy.  He was active and funny, but unpredictable on a good day!  When he was on, he was brilliant—engaged in deep and meaningful conversations about world events that would force the thinking of every other child in his class.  When he was off, he’d be throwing his shoes across the room just because he thought it was funny.

When Jamison took the reading end of grade exam for the first time, his scale score—-the primary indicator of a child’s progress from one year to the next—-dropped by something like 13 points.  He went from a Level 3, which demonstrates grade-level mastery of material, to a low Level 2, which (no surprise to me) represents unpredictable levels of mastery.

Now here in North Carolina, students who score a Level 2 on the end of grade exams are given a retest the following week.  While teachers are able to give students remediation lessons between the first testing session and the second, there really isn’t much that anyone can do to improve a child’s reading ability in a week.

So I sat with Jamison and reminded him of how important it was to work from the beginning of the test to the end.  We also reviewed a bit of poetry, considering its status as the genre middle school boys like the least!  Then, I crossed my fingers and hoped for the best!  “Work as hard on the last reading selection as you do on the first,” I told Jamison the night before his retest.

When his results came back, Jamison’s scale scores were something like 10 points HIGHER than they’d been the year before!  In the span of one week, he’d seen a swing of over 20 points in his scores.  As a comparison, middle grades students see an average of somewhere between 3 and 7 points growth from year-to-year on reading exams.

20 points of academic growth in a week with little to no remediation is simply ridiculous.

Think about what kind of consequences that has for me as both a teacher and a tutor.  I’m left to wonder which of Jamison’s two scores was “the right score.”  Was it his first attempt, which saw him struggle mightily?  If so, I need to seriously look at the instructional strategies that I’m choosing for students like Jamison because something’s not working.

Or was it his second score, which saw him outperform his peers?  Because if it was, I need to seriously look at the instructional strategies that I’m choosing for students like Jamison because I’ve discovered the key to leaving no child behind!

The really sad thing is that Jamison’s story isn’t unique by any means.  Anecdotally, I see Jamisons in my classrooms and schools every single year.  Rarely do retested students see their scores drop from one session to the next—-and it’s not unusual to see a child move from “struggling mightily” to “spot on” in no time.

Why does this kind of thing happen? 

My guess:  Jamison worked harder on the second test than he did the first.  For Jamison,  determination to do well—instead of pure academic ability—was the factor that most influenced his final score.

And thankfully, other really, really smart people have seen the same trend in student testing scores.  Take Malcom Gladwell for example.  In his newest book Outliers, Gladwell reviews the work of Erling Boe, a researcher at the University of Pennsylvania who noticed an interesting trend while studying the TIMSS exam—a math and science test given to samples of fourth and eighth grade students in countries around the world.

Like most standardized exams, the TIMSS test begins with a survey that asks students about topics ranging from their opinions towards math and the amount of time spent on homework outside of class to the highest level of education that their parents have ever reached.

Unlike most standardized exams, however, the TIMSS survey is nothing short of grueling.

It is comprised of something like 120 questions—and remember, the students taking the TIMSS test are either 9-year old fourth graders or 13-year old eighth graders.  (When was the last time that you asked 9 year olds to fill out a 120 question survey?!)  As Gladwell notes, the TIMSS survey is a test of student determination.  “It is so tedious and demanding,” he writes, “that many students leave as many as ten or twenty questions blank.”

What Erling Boe discovered next, however, was the really interesting part.  As Gladwell writes:

“As it turns out, the average number of items answered on that questionnaire varies from country to country.  It is possible, in fact, to rank all the participating countries according to how many items their students answer on the questionnaire.

Now what do you think happens if you compare the questionnaire rankings with the math rankings on the TIMSS?

They are exactly the same.  In other words, countries whose students are willing to concentrate and sit still long enough and focus on answering every single question in an endless questionnaire are the same countries whose students do the best job of solving math problems.”

(Kindle Location 2972-2996)

Amazing, isn’t it?  What Boe discovered is something that every elementary and middle school teacher has known for as long as they’ve stood in front of squirmy tweens:  End of grade reading and math tests really aren’t reading and math tests at all.

Instead, they’re tests of a student’s resilience, determination, and mental stamina.

The kids who do the best on end of grade exams are like those who do the best on the TIMSS exams:  They’re willing to sit still and concentrate.  They take every question seriously—including those that come at the end of a three-hour testing session two weeks before the end of the school year.

Does that mean that kids like Jamison—who score poorly on end of grade exams—are struggling with grade level content?

The sad fact is that it’s just plain impossible to say.  Jamison could be falling behind academically.  But he might also be the kind of twelve-year-old who struggles to concentrate as he plugs through reading passages on topics that he’s not interested in or that he has no first-hand experience with.

Don’t get me wrong:  End of grade tests have their place.  They allow schools to make comparisons across large samples of students to identify trends in populations.  After looking at the results of all of our sixth graders this year, our teachers may discover that our students struggle with expository text structures or with identifying bias—and those are trends that will help us to tailor our instruction for next year.

But I simply can’t believe that end of grade tests are reliable indicators of an individual child’s academic ability when I see students raise their scores by 20 points in a week.  Such unpredictability calls into question the meaning of every score generated by the magic bubbles, regardless of whether they are higher or lower than expected.