Amelia, it was just a false alarm

Nah, jet fuel can’t melt steel beamsI’ve watched enough conspiracy documentaries – Camp Cope

Some ideas persist long after the mounting evidence against them becomes overwhelming. Some of these things are kooky but probably harmless (try as I might, I do not care about ESP etc), whereas some are deeply damaging (I’m looking at you “vaccines cause autism”).

When these ideas have a scientific (be it social or physical) basis, there’s a pretty solid pattern to be seen: there is a study that usually over-interprets a small sample of data and there is an explanation for the behaviour that people want to believe is true.

This is on my mind because today I ran into a nifty little study looking at whether or not student evaluation of teaching (SET) has any correlation with student learning outcomes.

As a person who’s taught at a number of universities for quite a while, I have some opinions about this.

I know that when I teach my SET scores better be excellent or else I will have some problems in my life. And so I put some effort into making my students like me (Trust me, it’s a challenge) and perform a burlesque of hyper-competence lest I get that dreaded “doesn’t appear to know the material” comment. I give them detailed information about the structure of the exam. I don’t give them tasks that they will hate even when I think it would benefit certain learning styles. I don’t expect them to have done the reading*.

Before anyone starts up on a “kids today are too coddled” rant, it is not the students who make me do this. I teach the way I do because ensuring my SET scores are excellent is a large part** of both my job and my job security. I adapt my teaching practice to the instrument used to measure it***.

I actually enjoy this challenge.  I don’t think any of the things that I do to stabilize my SET scores are bad practice (otherwise I would do it differently), but let’s not mistake the motives.

(For the record, I also adapt my teaching practice to minimize the effects of plagiarism and academic dishonesty. This means that take-home assignments cannot be a large part of my assessment scheme. If I had to choose between being a dick to students who didn’t do the readings and being able to assess my courses with assignments, I’d choose the latter in a heartbeat.)

But SETs have some problems. Firstly, there’s increasingly strong evidence that women, people of colour, and people who speak english with the “wrong” accent**** receive systematically lower SET scores. So as an assessment instrument, SETs are horribly biased.

The one shining advantage to SETs, however, is that they are cheap and easy to measure. They are also centred on the student experience andnd there have been a number of studies that suggest that SET scores are correlated with student results.

However a recent paper from Bob Uttl, Carmela A. White, and Daniela Wong Gonzalez suggests that this observed correlation in these studies is most likely due to the small sample sizes.

The paper is a very detailed meta-analysis (and meta-reanalysis of the previous positive results) of a number of studies on the correlation between SET scores and final grades. The experiments are based on large, multi-section courses where the sections are taught by multiple instructors. The SET score of the instructor is compared to the student outcomes (after adjusting for various differences between cohorts). Some of these courses are relatively small and hence the observed correlation will be highly variable.

The meta-analytic methods used in this paper are heavily based on p-values, but are also very careful to correctly account for the differing sample sizes across studies. The paper also points out that if you look at the original data from the studies, some of the single-study correlations are absurdly large. It’s always a good idea to look at your data!

So does this mean SETs are going to go away? I doubt it. Although they don’t measure the effectiveness of teaching, universities increasingly market themselves based on the student experience, which is measured directly. And let us not forget that in the exact same way that I adapt my teaching to the metrics that are used to evaluate it, universities will adapt to metrics used to evaluate them. Things like the National Student Survey in the UK and the forthcoming Teaching Excellence Framework (also in the UK) will strongly influence how universities expect their faculty to teach.

Footnotes:

*I actually experimented assuming the students would do the reading once when I taught a small grad course. Let’s just say the students vociferously disagreed with the requirement. I’ve taught very similar material since then without this requirement (also with some other changes) and it went much better. Obviously very small cohorts and various other changes mean that I can’t definitively say it was the reading requirement that sunk that course, but I can say that it’s significantly easier to teach students who don’t hate you.

** One of the things I really liked about teaching in Bath was that one of the other requirements was to make sure that scatterplot of a student’s result in my class against an average of their marks on their other subjects that semester was clustered around the line y=x.

***I have unstable employment and a visa that is tied to my job. What do you expect?

**** There is no such thing as a wrong english accent. People are awful and students are people.

The post Amelia, it was just a false alarm appeared first on Statistical Modeling, Causal Inference, and Social Science.