OK, so the “occasionally other stuff” really means politics, and that’s been a major distraction for the past few weeks. The differences between electoral vote tallies from state-by-state polling, which gave Obama persistent (though at times small) EV lead, and the national polling which gave Romney a persistent (though at times small) lead are an obvious conundrum, and I had begun to poke around in data that’s fairly easily available to see what biases there might be in state polling and models such as Nate Silver’s, even before attacking Silver’s model because the new favorite pundit sport.
My first question was: How good were state polls in past elections? I didn’t find an easy way to pull together information on that, so I settled for the closely related “How good was Nate Silver’s model in 2008?” The short-hand view — it missed only one state (Indiana) and one Nebraska CD — isn’t actually very helpful since this year’s election could easily come down to one or two states.
1) Comparing Obama’s actual percentage of the vote vs. the 538 model for the 56 electoral vote entities (50 states, plus DC, plus 2 Maine CDs and 3 Nebraska CDs), on average 538 underestimated Obama’s vote share by 0.9%, quite comparable to the 1.1% underestimate in the 538 popular vote estimate.
2) The bias was noticeably smaller (-0.07% vs. 1.32%) in the states/entities where the race was close (an Obama-McCain difference of 10% or less) than those where the difference was large.
3) There were a number of significant (>2%) “misses” in states where the race was predicted to be close. However, of these, only three — Colorado, where the Obama vote was underestimated by 2.4%, Indiana where the Obama vote was underestimated by 2.5%, and Nevada where the Obama vote was underestimated by 8% — were in states that had been heavily polled prior to the election (see Silver’s discussion of the accuracy of state polling)
4) For the states predicted to be close, where there was substantial polling, on average, the 538 model was slightly closer to the final result than the polling alone, though in the one case (Indiana) that the polling and model predicted different winners, the polling was correct.
5) Finally, a subtle but interesting trend in the comparison: the model had a fairly strong tendency to underestimate the margin of victory of the winner of each state — i.e., underestimating Obama’s margin of victory in states that Obama won, and McCain’s margin of victory in states that McCain won. The relationship was approximately linear — underestimates increased with the candidates winning percentage. Could this be the result of bandwagon effects, or possibly unmodeled political polarization?
Details and graphs to follow..