Political season is data science season! Here is some more on Nate Silver’s forecasting methods. If you are reading this in real time (Sunday January 31), by tomorrow night we will find out what actually happens. I will reproduce some graphics here – these are all from the FiveThirtyEight site, so please thank me for the free advertising and don’t send me to copyright jail.
For Clinton vs. Sanders, here is Nate’s average of polls as of today. He gives more recent polls greater weighting, and also adjusts somehow for bias shown in the same polls in the past.
Average of polls: Clinton 48.0% vs. Sanders 42.7%
Now, this is within the 4-6% “margin of error” reported by most polls. (I find this easier to find on the RealClearPolitics site, although curiously it lists margins of error for Democratic polls but not Republican ones. RealClearPolitics does a straight-up poll average without all the corrections that today is Clinton 47.3% vs. Sanders 44%. So all the corrections don’t make an enormous difference.) I can’t easily and quickly find information on whether the “margin of error” is a standard error or a confidence interval or what, but generally when the polls are within the margin of error the media tends to report it as a “statistical tie” or dead heat. And that is exactly what they are saying in this case.
Nate Silver does a set of simulations – it sounds very complicated, but in essence I assume he takes his adjusted poll average for each candidate, some measure of spread like standard error, then runs a whole bunch of simulations. Which leads to results like this:
Based on this, Nate Silver gives Clinton an 80% chance of winning Iowa and Sanders only a 20% chance.
So what’s interesting is that you have the average of polls (48-43 or 47-44 depending on source), which everyone says is a statistical tie. You have Silver’s predicted result (50-43) based on a large number of simulations, and then you have the resulting odds considering both the predicted result and the spread in the predictions (80-20). In other words, the computer is generating random numbers and 80% of simulations end up favoring Clinton. Of course in real life the dice get rolled only once, but these odds seem pretty good for Clinton.
Meanwhile, the Trump-Cruz contest is similarly close in the polls (30-25 in favor of Trump), but the predicted result (26-25 in favor of Trump) and odds (48-41 in favor of Trump) are much closer. From a quick glance, this appears to be because the spreads are much wider. I don’t know why that would be the case – presence of more viable candidates on the Republican side? Or maybe there is just more variability in the polls and nobody actually knows why.