Miserable Teaching Evaluations

March 11, 2016

I have always disliked teaching evaluations, feeling that they fail to measure true teaching effectiveness. And it's not just sour grapes -- really, I swear, I generally do fine and have won several teaching awards. Rather, I simply think that teaching evaluations create bad incentives. Ask yourself: Is the behavior that maximizes teaching evaluations the same behavior that maximizes true teaching effectiveness? No way.

But it may be much worse than that. Check out the abstract below for a seminar to be presented in Penn Statistics next week by Philip Stark, a 
Berkeley statistician (and Associate Dean of the Division of Mathematical and Physical Sciences). Paper here.


Teaching Evaluations (Mostly) Do Not Measure Teaching Effectiveness


Joint work with Anne Boring (SciencesPo) and Kellie Ottoboni (UC Berkeley)
Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show:
·         SET are biased against female instructors by an amount that is large and statistically significant
·         the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded
·         the bias varies by discipline and by student gender, among other things
·         it is not possible to adjust for the bias, because it depends on so many factors
·         SET are more sensitive to students' gender bias and grade expectations than they are to teaching effectiveness
·         gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.
These findings are based on permutation tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university. 

