(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)
At the sister blog, Henry writes about poll averaging and election forecasts. Henry writes that “These models need to crunch lots of polls, at the state and national level, if they’re going to provide good predictions.” Actually, you can get reasonable predictions from national-level forecasting models plus previous state-level election results, then when the election comes closer you can use national and state polls as needed. See my paper with Kari Lock, Bayesian combination of state polls and election forecasts. (That said, the method in that paper is fairly complicated, much more so than simply taking weighted averages of state polls, if such abundant data happen to be available. And I’m sure our approach would need to be altered if it were used for real-time forecasts.)
Having a steady supply of polls of varying quality from various sources allows poll aggregators to produce news every day (in the sense of pushing their estimates around) but it doesn’t help much with a forecast of the actual election outcome. (See my P.S. here.)
Since 1992 (when Gary and I did our research indicating that poll movements are mostly noise), I’ve thought that that repeated-polling business model of news reporting was unsustainable, but it’s only been getting worse and worse. Maybe Henry is right that recent developments will push it over the edge.
One reason that political scientists have not generally been doing poll aggregation is that, at least for the general election for president, there’s little point in doing so–or, to put it another way, just about any averaging would do fine, no technology needed. Recall that Nate made his reputation during the 2008 primary elections. Primaries are much harder to predict for many reasons (less lead time, candidates have similar positions, no party labels, unequal resources, more than two serious candidates running, etc), and being sophisticated about the polls makes much more difference there.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science