Fresh debate about teacher evaluation

Tom Toch and Robert Rothman show admirable caution concerning the use of standardized test scores to judge teacher performance in their new report on teacher evaluationRush to Judgment: Teacher Evaluation in Public Education, published last week by the Washington think-tank Education Sector.

Among the reasons they cite: (1) Only about half the nation’s teachers teach subjects that are tested; (2) Standardized tests measure low-level skills (thus devaluing the teacher expertise required to move students into higher orders of learning); (3) States still largely determine acceptable performance levels on high-stakes tests and those levels vary significantly from state to state; and (4) Teachers are “dealt different hands from classroom to classroom and school to school.”

“Teachers of rich kids may do a lousy job in the classroom, but their students nonetheless may get higher test scores than their less-privileged peers,” Rothman and Toch write. “And teachers of less privileged students may do a great job, only to have their students come up short compared to students with more advantages. Evaluation systems with this unfairness built into them would create a strong incentive for teachers to abandon challenging students and the schools that enroll them.”

The authors give a nod to value-added methods that seek to measure a teacher’s impact “over the course of a school year,” attempting to adjust for socio-economic influences. They point out “two catches” — teachers who work with small numbers of students (for various reasons) and the poor data systems in most states. They don’t mention a third “catch” — the need for about three years of continuous data to make sound, defensible VAM judgments at the individual teacher level.

Given all the testing caveats, they say, “It’s not surprising…that many teachers are strongly opposed to evaluations based substantially or exclusively on student test results. So there’s an added risk to such evaluations: The people who would be subjected to them don’t think they’re credible.”

Toch and Rothman also make short work of traditional “checklist” evaluations by principals who are often poorly trained, less knowledgable than teachers themselves about effective practice, and frequently distracted. They argue in favor of “a different solution to drive-by evaluations— comprehensive evaluation systems that measure teachers’ instruction in ways that promote improvement in teaching.” Such systems should be rooted in agreed-upon standards of professional practice, they say, citing the work of Charlotte Danielson, whose Framework for Teaching has influenced most of the attempts at developing comprehensive evaluation that have thus far emerged. “(T)he comprehensive models capture a much richer picture of a teacher’s performance,” they note. As to the expense of such models:

It’s hard to believe that an industry that spends $400 billion annually on something as central to its success as teachers are to public education pays so little attention to the return on its investment. How can public education hope to improve teacher quality without a reliable way to measure teacher quality?

Toch (of Education Sector) and Rothman (of the Annenberg Institute for School Reform) support the use of trained teachers as peer evaluators and note the higher quality evaluations that result when evaluators have backgrounds in candidates’ subject and grade levels. “Subject-area and grade-level specialists, scoring rubrics, evaluator training, and recertification requirements…increase the ‘inter-rater reliability’ of evaluations. They produce ratings that are more consistent from evaluator to evaluator and that teachers are more likely to trust.”

Teachers, they say, also like systems that link evaluations to classroom coaching by master and mentor teachers. “Capable people want to work in environments where they sense they matter and using evaluation systems as engines of professional improvement signals that teaching is such an enterprise. Comprehensive evaluation systems send a message that teachers are professionals doing important work.”

Rothman and Toch also challenge the argument (now being made in NYC) that measuring student achievement using standardized tests is a sufficient method of evaluation. “They aren’t great measures of student achievement. As a result, it’s important to evaluate teachers’ actual instruction—the way they work with their students in their classrooms, from their teaching techniques to the types of homework they assign.”

The authors go on to examine the cost/time arguments against more expensive comprehensive evaluations approaches, the linkages between evaluation and performance pay, and the mixed reactions to evaluation reforms by national and local unions. They acknowledge that “regardless of the evaluation system, teachers aren’t going to buy into a performance-pay system that pegs a substantial percentage of their compensation to their performance evaluations.”

They conclude: “To get a fuller and fairer sense of teachers’ performance, evaluations should focus on teachers’ instruction—the way they plan, teach, test, manage, and motivate. They need to move far beyond principal drive-bys to multiple measures, multiple evaluations, and multiple evaluators.”

Rothman and Toch recommend a “hybrid model” that combines measures of instruction and student achievement, multiple evaluations by trained peers, district evaluation teams (because principals lack time and training), the use of teacher portfolios in the evaluation process, and the use of rewards and performance pay. They also support a new definition of “highly qualified teacher” under NCLB, which they would rename “highly qualified effective teacher.” In their conception, test scores would play “a less significant role” in determining effectiveness, and the federal definition would allow states to “innovate with comprehensive standards-based approaches to teacher evaluation.”

In several places in the report, Toch and Rothman speak positively about teacher involvement in the development of teaching standards, evlauation methods and reward systems. From our perspective,  involving successful teachers as full partners in every phase of development will be a critical first step in breaking the traditional logjam around evaluation reform. Put another way: If teachers don’t buy it, teachers won’t support it.

(A day after Rush to Judgment was released, Tom Toch discussed early reactions in this post to the Education Sector blog The Quick and the ED, including AFT’s comments that the report is “thoughtful and balanced.”)

Related categories: ,