Question 10 of our Applied Regression final exam (and solution to question 9)

Here’s question 10 of our exam:

10. For the above example, we then created indicator variables, age18_29, age30_44, age45_64, and age65up, for four age categories. We then fit a new regression:

lm(formula = weight ~ age30_44 + age45_64 + age65up)
(Intercept)     157.2     5.4
age30_44TRUE     19.1     7.0
age45_64TRUE     27.2     7.6
age65upTRUE       8.5     8.7
  n = 2009, k = 4
  residual sd = 119.4, R-Squared = 0.01

Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis) and draw the fitted regression model. Again, this graph should be consistent with the above computer output.

And the solution to question 9:

9. We downloaded data with weight (in pounds) and age (in years) from a random sample of American adults. We created a new variables, age10 = age/10. We then fit a regression:

lm(formula = weight ~ age10)
(Intercept)    161.0     7.3
age10            2.6     1.6
  n = 2009, k = 2
  residual sd = 119.7, R-Squared = 0.00

Make a graph of weight versus age (that is, weight in pounds on y-axis, age in years on x-axis). Label the axes appropriately, draw the fitted regression line, and make a scatterplot of a bunch of points consistent with the information given and with ages ranging roughly uniformly between 18 and 90.

The x-axis should go from 18 to 90, or from 0 to 90 and the y-axis should go from approximately 100 to 300, or from 0 to 300. It’s easy enough to draw the regression line, as the intercept and slope are right there. The scatterplot should have enough vertical spread to be consistent with a residual sd of 120. Recall that approximately 2/3 of the points should fall between +/- 1 sd of the regression line in vertical distance.

Common mistakes

Everyone could draw the regression line; nearly nobody could draw a good scatterplot. Typical scatterplots were very tightly clustered around the regression line, not at all consistent with a residual sd of 120 and an R-squared of essentially zero.

I guess we should have more assignments where students draw scatterplots and sketch possible data.