Debates continue to swirl over the use of student test scores—and any number of statistical models—to assess which teachers are effective or not. Fueled by the priorities of the U.S. Department of Education (including its Race to the Top and No Child Left Behind waiver rules), 40 states are now using some form of value-added (VA)1 or student growth (SG)2 models in their new teacher evaluation systems.
For example, Colorado’s recent legislation (SB10-191) calls for all teachers, by 2015, to have 50 percent of their effectiveness rating determined by a simple SG model (primarily using standardized tests in math and literacy), with direct consequences for pay, tenure, and whether they can remain credentialed. No doubt the vast majority of teachers seek teaching evaluations that help them improve their practice and student learning, but one Colorado educator made his concerns clear: “Good luck with….mak(ing) new professional rules for teachers based on student performance on standardized tests that have yet to be created based upon standards that have yet to be implemented on various levels of state and local governance with a contested ‘value-added’ assessment of teachers.”
Many researchers have pointed out the inaccuracy of both VA and SG models (e.g., a USDOE study found that 25 percent of teachers will be misclassified on average), but proponents of judging teachers on the basis of student tests typically respond with the cliche, “don’t let perfection get in the way of progress.”
Researchers at the Brookings Institute, well known for their enthusiasm for judging teachers on test scores, now admit that these statistical methods are not perfect, but “a performance measure needs to be good, not perfect” and that “some classification errors are worse than others.”
Other researchers, like Doug Harris, warn that student tracking, especially in middle and high school, render value-added methods highly suspect as a measure of teacher effectiveness. Dan Koretz of Harvard University pointed out that VA model errors increase dramatically with more complex curricula (e.g., the Common Core State Standards). He also notes that teachers who are effective in one class may be ineffective in another.
Still other researchers have shown that special education and second language learner teachers are less likely to be deemed effective (even when they are) and that teachers who teach the same students over several years (commonly called looping) soon “max out” on their value-added scores and can be deemed ineffective when they are actually just the opposite.
The Brookings researchers address none of these issues.
Now, in a soon-to-be released paper,3 Clarin Collins and Audrey Amrein-Beardsley document carefully the challenges states face in implementing new teacher evaluation systems framed by VA and SG models to assess teaching effectiveness. Take a look at some of issues raised, after seven months of interviews with leading representatives of state education agencies charged with implementing new evaluation systems:
- Over 70 percent of state representatives expressed concerns about assessing student progress for teachers of non-tested grades and subject areas;
- Over 40 percent of state representatives noted that demographic data will be used to control for student differences and their influence on teacher effects on test score gains;
- Only 14 percent of state representatives expressed concerns about reliability, while only 6 percent questioned the validity of their tests to capture teacher effectiveness over time;
- Most strikingly, not one state (including DC) has articulated a plan for teachers to use formative assessment data.
Teachers seek better evaluation systems to improve their practice. And they have no reservations about using student learning evidence, as long as the data are sufficiently reliable and valid—and can be interpreted in their specific teaching context (such as team teaching or the quality of the test itself). In fact, a recent poll, funded by the Bill & Melinda Gates Foundation, found that teachers believe the most accurate measures of their effectiveness are shown in student engagement and academic growth data they help assemble—not test score data on a single standardized test.
Our nation’s students deserve a results-oriented teaching profession—one where teachers use multiple indicators of academic gains to determine who does well, or not, and why. The rush by some think tanks and politicians to grade teachers on the basis of tests—driven more by ideology and politics—will undermine well-intended efforts to transform the teaching profession.
The Florida Commissioner of Education, Tony Bennett, and the State Board of Education are now being sued because teachers—like Kim Cook (her school’s teacher of the year)—are being judged ineffective on the basis of test scores of students they had not even taught. It seems that progress can get in the way of the good.
Perhaps it is time to begin grading researchers who create and sell these models—and the policymakers who implement them.
1 Value-added models estimate teachers’ impacts on student growth over time, using advanced statistics (and sometimes controls) to account for student background variables that are known to influence achievement.
2 Student growth models measure academic progress on standardized test scores from one point to another in relation to a similar group of peers.
3 Collins, C., & Amrein-Beardsley, A. (2013). Putting growth and value-added models on the map: A national overview. Teachers College Record.