Value-added measurement scores have become a popular way to judge effective and ineffective teachers. As the L.A. Times publishes a list of these teachers and their scores, one education expert questions the validity of the value-added data and the media’s decision to print these questionable results.
In an interview yesterday with Amanda Paulson of the Christian Science Monitor, I told her that the decision by the Los Angeles Times to publish a list of effective and ineffective teachers, based solely on the results of one researcher’s take on value-added estimates, was absolutely “god-awful.”
Don’t get me wrong: As an organization we are dedicated to advancing a results-oriented profession, which means teacher evaluation metrics must include evidence of student learning. But the approach used by the L.A. Times is rife with huge problems — most notably the fact that value-added data on teacher effectiveness is far too unstable to use a sole arbiter of who is an effective teacher and who is not. Even the U.S. Department of Education, which is promoting the use of value-added data in teaching quality reforms, released a major research paper last month pointing to the substantial error rates associated with generating lists of good and not-so-good teachers using these methods.
This isn’t an ideological issue. Rick Hess of the American Enterprise Institute and I are frequently at odds about the best ways to assure teaching quality now and in the future. But we are in agreement on the poor judgment underlying the L.A. Times decision. Hess noted in his Education Week blog this morning that he is “increasingly nervous at how casually reading and math value-added calculations are being treated as de facto determinants of ‘good’ teaching.” And well he should be.
Elsewhere I have pointed to a lengthy list of inherent problems in using VAM to identify effective teachers — including the following realities:
- Students are not randomly assigned to teachers, and VAM measures cannot fully sort out the effects caused by differences in students’ needs or preparedness — as distinct from teacher effects.
- The lack of properly scaled year-to-year tests makes it difficult to evaluate gains at all points along the achievement continuum or to assess (for example) a physics teacher’s effectiveness based on her students’ previous scores in chemistry.
- Many students in high-needs schools are highly mobile and do not complete a full year of instruction in a given teacher’s classroom.
- Many students are taught the same subjects by more than one teacher — confounding the capacity of distant statistical models to isolate the effects from individual classroom instructors.
- In addition, depending on the VAM statistical model a researcher uses, the same teacher can be identified as effective or ineffective.
- Finally, researchers have found that the same teacher’s effectiveness rating can change depending on the school in which he or she teaches.
No wonder the USDOE report revealed that even when three years of VAM data are used, over 25 percent of the teachers will be “erroneously identified.” That should make any fair-minded person think twice about using such data to name names and possibly destroy careers.
The Bill & Melinda Gates Foundation, under the auspices of its Measuring Effective Teaching project, is taking a very thoughtful approach to teacher assessment by looking at multiple measures of student achievement and linking other metrics (e.g., classroom observations, teachers’ analyses of student work and their own teaching, and levels of student engagement) to capture a more robust and accurate view of who is effective and why.
If policymakers and the media mavens really want to learn about effective teacher evaluation, it’s time they turn to the wisdom of those who actually teach — like Larry Ferlazzo, a Teacher Leaders Network member, who penned a thoughtful response in this morning’s Washington Post blog The Answer Sheet to what will surely be a journalistic debacle.
I suspect the Los Angeles Times will soon hear from many teachers branded as “ineffective” who turn out to be “false negatives” (or a Type II error in statistical lingo), and the Times editors will need to re-think their approach to reporting on the debates about effective teaching.
I’m frankly startled by the Times’ decision to publish this highly questionable data and risk humiliating many competent teachers by doing so. I wonder how journalists of national stature, who are trained to look at multiple sides of an issue and to be sure of their facts, could be so reckless?