Skip to main content

Join the Community

or Close

Search

Evaluating the evaluators

Debates continue to swirl over the use of student test scores—and any number of statistical models—to assess which teachers are effective or not. Fueled by the priorities of the U.S. Department of Education (including its Race to the Top and No Child Left Behind waiver rules), 40 states are now using some form of value-added (VA)1 or student growth (SG)2 models in their new teacher evaluation systems.

For example, Colorado’s recent legislation (SB10-191) calls for all teachers, by 2015, to have 50 percent of their effectiveness rating determined by a simple SG model (primarily using standardized tests in math and literacy), with direct consequences for pay, tenure, and whether they can remain credentialed. No doubt the vast majority of teachers seek teaching evaluations that help them improve their practice and student learning, but one Colorado educator made his concerns clear: “Good luck with….mak(ing) new professional rules for teachers based on student performance on standardized tests that have yet to be created based upon standards that have yet to be implemented on various levels of state and local governance with a contested ‘value-added’ assessment of teachers.”

Many researchers have pointed out the inaccuracy of both VA and SG models (e.g., a USDOE study found that 25 percent of teachers will be misclassified on average), but proponents of judging teachers on the basis of student tests typically respond with the cliche, “don’t let perfection get in the way of progress.”

Researchers at the Brookings Institute, well known for their enthusiasm for judging teachers on test scores, now admit that these statistical methods are not perfect, but “a performance measure needs to be good, not perfect” and that “some classification errors are worse than others.”

Other researchers, like Doug Harris, warn that student tracking, especially in middle and high school, render value-added methods highly suspect as a measure of teacher effectiveness. Dan Koretz of Harvard University pointed out that VA model errors increase dramatically with more complex curricula (e.g., the Common Core State Standards). He also notes that teachers who are effective in one class may be ineffective in another.

Still other researchers have shown that special education and second language learner teachers are less likely to be deemed effective (even when they are) and that teachers who teach the same students over several years (commonly called looping) soon “max out” on their value-added scores and can be deemed ineffective when they are actually just the opposite.

The Brookings researchers address none of these issues.

Now, in a soon-to-be released paper,3 Clarin Collins and Audrey Amrein-Beardsley document carefully the challenges states face in implementing new teacher evaluation systems framed by VA and SG models to assess teaching effectiveness. Take a look at some of issues raised, after seven months of interviews with leading representatives of state education agencies charged with implementing new evaluation systems:

  1. Over 70 percent of state representatives expressed concerns about assessing student progress for teachers of non-tested grades and subject areas;
  2. Over 40 percent of state representatives noted that demographic data will be used to control for student differences and their influence on teacher effects on test score gains;
  3. Only 14 percent of state representatives expressed concerns about reliability, while only 6 percent questioned the validity of their tests to capture teacher effectiveness over time;
  4. Most strikingly, not one state (including DC) has articulated a plan for teachers to use formative assessment data.

Teachers seek better evaluation systems to improve their practice. And they have no reservations about using student learning evidence, as long as the data are sufficiently reliable and valid—and can be interpreted in their specific teaching context (such as team teaching or the quality of the test itself). In fact, a recent poll, funded by the Bill & Melinda Gates Foundation, found that teachers believe the most accurate measures of their effectiveness are shown in student engagement and academic growth data they help assemble—not test score data on a single standardized test.

Our nation’s students deserve a results-oriented teaching profession—one where teachers use multiple indicators of academic gains to determine who does well, or not, and why. The rush by some think tanks and politicians to grade teachers on the basis of tests—driven more by ideology and politics—will undermine well-intended efforts to transform the teaching profession.

The Florida Commissioner of Education, Tony Bennett, and the State Board of Education are now being sued because teachers—like Kim Cook (her school’s teacher of the year)—are being judged ineffective on the basis of test scores of students they had not even taught. It seems that progress can get in the way of the good.

Perhaps it is time to begin grading researchers who create and sell these models—and the policymakers who implement them.

1 Value-added models estimate teachers’ impacts on student growth over time, using advanced statistics (and sometimes controls) to account for student background variables that are known to influence achievement.
2 Student growth models measure academic progress on standardized test scores from one point to another in relation to a similar group of peers.
3 Collins, C., & Amrein-Beardsley, A. (2013). Putting growth and value-added models on the map: A national overview. Teachers College Record.

1 Comment

John Eller commented on May 28, 2013 at 6:02pm:

Test scores and teacher evaluation

Hi Barnett,

Thanks for your good thoughts on this topic. I am working with principals in helping them in the area of teacher evaluation and the test score piece is an issue that still needs to be addressed. I have read much of the same information you included in the article. There are problems in using value added measures. If principals use multiple data sources that are readily available, they could get a clearer picture of their teachers and their impact on their students. I'm looking forward to reading more of your blogs.

Join the Conversation!

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Subscribe to Blogs by Barnett Berry

Stay Informed

Sign up to receive the latest news and events through email!

Sign Up