But it may be much worse than that. Check out the abstract below for a seminar to be presented in Penn Statistics next week by Philip Stark, a Berkeley statistician (and Associate Dean of the Division of Mathematical and Physical Sciences). Paper here.

# TEACHING EVALUATIONS (MOSTLY) DO NOT MEASURE TEACHING EFFECTIVENESS

**PHILIP STARK - UNIVERSITY OF CALIFORNIA, BERKELEY**

Joint work with Anne Boring (SciencesPo) and Kellie Ottoboni (UC Berkeley)

Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show:

· SET are biased against female instructors by an amount that is large and statistically significant

· the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded

· the bias varies by discipline and by student gender, among other things

· it is not possible to adjust for the bias, because it depends on so many factors

· SET are more sensitive to students' gender bias and grade expectations than they are to teaching effectiveness

· gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.

These findings are based on permutation tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.

