Question 3 of our Applied Regression final exam (and solution to question 2)

Here’s question 3 of our exam:

Here is a fitted model from the Bangladesh analysis predicting whether a person with high-arsenic drinking water will switch wells, given the arsenic level in their existing well and the distance to the nearest safe well.

glm(formula = switch ~ dist100 + arsenic, family=binomial(link="logit"))
(Intercept)        0.00    0.08
dist100           -0.90    0.10
arsenic            0.46    0.04
n = 3020, k = 3

Compare two people who live the same distance from the nearest well but whose arsenic levels differ, with one person having an arsenic level of 0.5 and the other person having a level of 1.0. Approximately how much more likely is this second person to switch wells? Give an approximate estimate, standard error, and 95% interval.

And the solution to question 2:

2. A multiple-choice test item has four options. Assume that a student taking this question either knows the answer or does a pure guess. A random sample of 100 students take the item. 60% get it correct. Give an estimate and 95% confidence interval for the percentage in the population who know the answer.

Let p be the proportion of students in the population who would get the question correct. p has an estimate of 0.6 and a standard error of sqrt(0.5^2/100) = 0.05.

Let theta be the proportion of students in the population who actually know the answer. Based on the description above, we can write:
p = theta + 0.25*(1 – theta) = 0.25 + 0.75*theta,
thus theta = (p – 0.25)/0.75.
This gives us an estimate of theta of (0.6 – 0.25)/0.75 = 0.47 and a standard error of 0.05/0.75 = 0.07.

Common mistakes

Most of the students had no idea what to do here, but some of them figured out how to solve for theta. None of them got the standard error correct. The students who figured out the estimate of 0.47 simply computed a standard error as sqrt(0.47*(1 – 0.47)/1000). Kinda frustrating. I’m not really sure how to teach this, although of course I could just assign this particular problem as homework and then maybe students would remember the general point about estimates and standard errors under transformations.

I’m also thinking this would be a good example to program up in Stan because then all these difficulties are handled automatically.