There was only one prediction that we were prepared to make around 9:45 p.m. on the night of the Alabama special election in December: If Roy Moore won, it would have been the end of our live forecast dial — affectionately or grudgingly known as the “needle.”
The Upshot writes about data and policy, and the needle is our way of following live election results and adjusting for the votes that have yet to be counted.
At that point, a victory for Mr. Moore would have been no surprise for those watching returns on television or online. He held a lead of 8.3 percentage points with two-thirds of precincts counted. But the needle was unconvinced. Doug Jones, the Democratic candidate, had a 62 percent chance to win, according to our estimates.
For many careful consumers of news and journalism, a loss for Mr. Jones would have been an incontrovertible failure for our forecast. And it would have come on the heels of the needle’s performance in the 2016 election, which was widely, though wrongly, perceived as a failure. The reaction would have been scathing.
In the end, Mr. Jones prevailed. “All hail the needle” resounded on liberal Twitter. But the needle could have very easily been “wrong,” if “wrong” meant the leading candidate went on to lose.
The reaction after Alabama weighed on our thinking as we considered whether to publish a forecast for the special election on Tuesday in Pennsylvania’s 18th Congressional District.
It was sobering to see the triumphant coverage of the needle after Alabama, especially compared with how people reacted after the presidential race. Yes, it forecast a win for Mr. Jones before many others did. But if anything, it was as close to “wrong” as it had ever been in a high-profile setting: For five minutes, it showed Mr. Jones favored to win by eight points. (He went on to win by about 1.5 points.)
By our standards, the live forecast actually performed much better on election night in 2016, when it showed Donald J. Trump had a greater than 95 percent chance of winning hours before the race was called.
Yes, that night started with Hillary Clinton as a favorite, based on pre-election polls that had given Mrs. Clinton a clear but modest lead. But the needle makes election night forecasts, not pre-election forecasts. It had to start somewhere, and it merely inherited Mrs. Clinton’s presumed edge.
The needle did its job: It swung swiftly when votes from Midwestern battleground states began to be counted. It gave Mr. Trump the edge in Pennsylvania, even when Mrs. Clinton had a double-digit lead in the state. It performed exactly as we had hoped and, frankly, if more readers and journalists were conservatives, they would have seen it just as they saw it in Alabama: as the leading indicator of a thrilling upset.
With that history, do we really want to risk being “wrong” in a lower-profile special election?
Right or wrong, we’re doing it anyway.
Why the Needle
Incomplete election results are often deeply unrepresentative. That was certainly the case in Alabama. The state’s predominantly white, rural and Republican counties were among the first to report, giving Mr. Moore a big but unsustainable lead.
Many sophisticated analysts would have recognized that Mr. Jones was poised to close the gap. Television viewers might have gotten a sense of Mr. Jones’s route to victory from on-air analysts like MSNBC’s Steve Kornacki or CNN’s John King, who pointed to the remaining Democratic vote in Montgomery and Birmingham. Network decision desks certainly knew: They have their own internal forecasts, which will probably compete with the needle one day, perhaps even as soon as this year’s midterm elections. Maybe they’ll put the needle out of business.
Online viewers looking at a traditional election results page, on the other hand, would have had no idea, unless they knew Alabama’s political geography well enough to figure it out for themselves.
Our live forecast is just a formal means to do for online viewers what analysts like Mr. Kornacki or Mr. King have been doing for television viewers for years. It looks at where votes remain to be counted, and makes an educated guess about how those votes will break based on past election results and trends evident in initial returns.
There is an argument that all of this is a waste of time: After all, we’ll know the actual result soon enough. We are sympathetic to that idea, but if people are going to follow election night results online — and they do, by the millions — they ought to have the context to understand them. In 2016, that meant letting readers know Mr. Trump was an overwhelming favorite as soon as we knew.
This time, the special election in Pennsylvania has near-optimal conditions for a needle. Election officials say that precinct-level results will be available. There is also no meaningful early vote, which can often be deeply unrepresentative. This is just about the best possible environment to make quick and relatively confident inferences about the favorite.
We do have one big concern: A county might be slow to publish precinct results, or might entirely fail to do so. If the precinct data falls too far behind the county-level results from The Associated Press, we might even have to turn the needle off. But that’s out of our hands.
Why the Needle Is Risky
Despite these advantages, the needle could easily be “wrong” tonight, at least if the measure of right or wrong is whether the leading candidate goes on to win.
The challenge boils down to two unavoidable problems: Our initial expectations for how a place might vote are never perfect, and initial results are never perfectly representative.
To understand, it’s worth taking a moment to consider how the needle works. Simply put, it estimates the final result by looking at what votes are left to be counted.
It starts with a weakly held assumption about the turnout and vote choice of every precinct or county. We call that set of assumptions a baseline. As votes are tabulated, the needle compares the returns with the prior baseline. If the reported vote share or turnout differs from our initial expectations, the needle will assume that similar outstanding areas will differ from our expectations in a similar manner.
As the night progresses, the needle will have more and more data to work with. But early on, issues with the baseline are a big factor.
Imagine, for instance, that our baseline tonight underestimates the final performance of the Democrat, Conor Lamb, in white working-class areas but overestimates him in well-educated suburbs. If the first results are from white working-class areas, the needle will tick to the left as Mr. Lamb beats expectations. The Republican, Rick Saccone, would eventually counter in the suburbs, but at the beginning of the night there would be no way to know.
The bigger danger to the needle is representativeness: the possibility that the reported results are particularly good for one candidate, even after considering the political and demographic characteristics of the reported areas.
One obvious example is the early vote, which often reports first and leans Democratic. This problem is pretty easy to fix if the state reports the absentee or early vote separately. We used this separate data in recent elections in Virginia and Georgia’s Sixth Congressional District. Otherwise, it’s very hard to correct. Fortunately, this is not a factor in Pennsylvania.
A less obvious example is the sequencing of the results: A candidate’s best or worst areas could report first. After all, there’s always going to be somewhere with surprising results. If the worst or best news for a party comes first, the needle could mislead — even for a long time.
To some extent, this happened in Alabama. The Republicans had a big turnout problem, but for a few minutes our forecast believed it was catastrophic, since the first counties to complete their count included some of the weakest Republican turnouts of the night. Mr. Jones briefly soared in our estimate. The magnitude of our error at that point — nearly seven points — was huge, even if brief.
Why the Needle Will Be ‘Wrong’
But as perilous as Mr. Jones’s fleeting eight-point lead was for The Upshot, or as frustrating as Mrs. Clinton’s initial edge may have been for her supporters, the needle hasn’t yet faced its real worst-case scenarios.
The needle’s worst nightmare will come at the end of a night, not the beginning.
With so much data already in by that time, unrepresentative data at the end of a night can be a big surprise. It could mean that one candidate loses a projected lead in the final returns, perhaps even after leading the whole night.
Imagine, for instance, that Fayette County (where Republicans had their worst turnout in Alabama, at just 63 percent of our expectations) was the last county to finish its count, rather than the fourth earliest. And imagine that it wasn’t a small county, but a large metropolitan one, with 200,000 voters initially expected. In this hypothetical, the final turnout would be just 126,000 votes, nearly 80,000 short of initial expectations and probably still 50,000 short of what we would have expected by the end of the night. Mr. Moore won Fayette County by a 50-point margin, so the turnout error would have been worth a net of 25,000 votes. That’s a two-point swing in the projected statewide popular vote that we would learn about only in the final moments of the race.
If the needle lives long enough, one night the needle will suggest Democrats are poised to win, even as Republicans hold a lead in the tabulated vote. In the end, the Republican candidate will survive as a major urban center doesn’t yield as many votes as the needle imagined. Or alternately, the Republican will be favored all night, only to be overtaken because of an unexpectedly large Democratic turnout.
Keep these possibilities in mind if one candidate tonight has a 90 percent chance of victory at some point. You won’t have any way of knowing whether the 10 percent chance of an upset is about to become a reality; know only that such a thing could really happen. If an upset wasn’t a possibility, the model would assess a 100 percent chance of victory. And if the needle was always “right,” then our probabilities would be quite wrong.
Ultimately, it is impossible to prevent the needle from being “wrong.” It is an inevitability, if the needle is put to work in enough elections. The challenge is to communicate the possibility of an upset, for as long as the possibility remains. Traditionally, we have used probabilities to do so, but we increasingly doubt whether this helps readers understand the uncertainty.
This was evident before the result of the 2016 election, and as a result we tried something new: a jitter, where the needle quivered to reflect the uncertainty around the forecast. Although many readers disliked it, the jitter reflected an earnest attempt to give tangible meaning to abstract probabilities. Nonetheless, we turned the jitter off for all of our 2017 forecasts.
Tonight, readers will have the option to turn the jitter off. We expect that some readers will opt to do so, but remember this: Switching it off only hides the uncertainty — it doesn’t make it go away.
Ultimately, it’s not over until the race is called.