Saturday, February 27, 2010
Trouble, Trouble for the Democrats in the United States Senate
Sunday, February 21, 2010
Republican Blizzard on the Generic Ballot
Wednesday, February 17, 2010
Collaboration allows all of us to be better informed
In my post on Strategic Vision, I asked anyone who had more polls than I had archived to come forward. Professor Michael Weissman, the man responsible for the fantastic Fourier analysis exposing Strategic Vision, did just that. He wrote a comment on this blog and an email informing me that my 209 polls were less than what he had worked with in his analysis. Professor Weissman was kind enough to send me his file of polls to allow me to see why this difference existed.
It turns out that I had missed 17 polls. Most of those were in the state of Florida where I missed a total of 13 polls. The large majority of those Florida polls were from 2007. I thank Professor Weissman for allowing me to discover my errors and share them with you.
The other great part of Professor Weissman initiating conversation with me is that I was able to share with him a new finding that I can share with you now. As Mark Blumenthal (my boss for the spring and summer) has previously pointed out, Strategic Vision conducted polling during the 2004 election cycle. Nate Silver's initial analysis on Strategic Vision's possible fraud, Michael Weissman's follow up, and the polling list I provided you only included polling from 2005 onward. Neither Weissman's son Jonathan, nor I found a link(s) that allowed us to access polls from 2004.
This afternoon after receiving Professor Weissman's email I decided one last time to try and find a link(s) that would permit us to secure Strategic Vision polling information from 2004. Although most of the links I could find are now "dead", I was able to find what I call a "golden key".
In Strategic Vision's effort to provide their readers with an easy to read time comparison of 2004 "swing state" "polling", they created one page that had data from 9 polls from 8 states (for a total of 72 polls) with 10 questions from each poll (for a total of 720 poll questions). While I know more 2004 data definitely exists that we cannot currently get data from, this newly discovered data provides new information not previously available.
I hope these new polls allow Professor Weissman and others to perform exciting new statistical analyses. And more than that, I love how our collaboration allowed both of us and you to be able to access the most information possible.
The new polling file including Professor Weissman's polls, the polls I missed, and the original file are available here. An archived version of the 2004 data is available here.
P.S. As always, please let me know if you have polling data from Strategic Vision that I did include in these files.
Tuesday, February 16, 2010
A Call for Standards on Wikipedia
An example of why this firsthand information should be used was demonstrated in a post about Evan Bayh's retirement on Monday by one of the coolest kids on the block Mr. Nate Silver. The post compares something known as the Cook Partisan Voting Index (used for measuring how Democratic or Republican a state or Congressional District is compared to the country as a whole) to DW-Nominate scores (measuring how conservative or liberal a Congressman's or Senator's voting record is). Silver was basically trying to show that Bayh is more liberal than one would expect Indianan (or Hoosier) Senator would be.
In the post, Silver shows his tendency, like many others, to cite Wikipedia almost obsessively. In the article I linked to, four out of the five cites are from Wikipedia. I am sure most of the information in these articles is accurate, but one of the articles on the Cook Partisan Voter Index (CPVI or PVI) had one of the aforementioned warning flags at the top. In fact (see the video posted below), I could only find one link in the article on PVI. The page links to the House PVI rankings after the 2008 elections. Not surprisingly, the rankings for the House that appear on Wikipedia seem accurate.
But no link exists for the State PVI rankings, which is what Silver is utilizing. I was going to work on my own piece using the PVI index, so I decided to check the numbers that appeared on Wikipedia at the original source... And, it turns out, the Wikipedia page is inaccurate.
26 out of the 50 states have inaccurate PVI calculations. On average, the difference between the Wikipedia page's PVI and actual PVI is about .62 when you use Cook's website calculations to the tenths place and .64 when you only go out to the whole number. When going out to the whole number, 21 states differ by 1 point, 4 differ by 2 points, and 1 differs by 3 points. Note Silver's Indiana is at R+5, when in fact it should be at R+6 or 6.2.
Considering Silver's entire article is based upon PVI, these errors could definitely have an impact on his findings. I would re-run Silver's numbers, but I am a bit uncertain on how he came to them. His directions are a bit unclear. This post will be edited, if I or someone else (perhaps Silver himself) can figure out exactly what he did. I have, however, reproduced at the bottom of this page part of his graph showing the relationship between PVI and and DW-Nominate Scores.
Of course, the need to correct the post would not be necessary, if Silver had checked the firsthand source. Silver is certainly not the only one who links to Wikipedia (heck I have cited it once on this blog before). I am just using his post (because his blog is read by so many people) as an example of what can go awry when you do not check your sources.
I really hope we all would simply verify anything we find on Wikipedia. It is the right and academic thing to do. See for yourself how easy it is to make a citing Wikipedia error, and how easy it is to avoid.
The District of Columbia, not used in Silver's analysis, is not included in the tables, but its correct PVI of 40.7 in the Democratic direction is different from the 39 in the Democratic direction seen on Wikipedia.
To interpret the table and graph:
Negative PVI scores indicate a state that was more Democratic on average over the last two election cycles (2004 and 2008) than the country as a whole, while negative DW-Nominate Scores indicate a Senator with a more liberal voting record. One would expect Senators from Republican states to have higher DW-Nominate scores, while Senators from Democratic states to have lower DW-Nominate scores. Feel free to play around with the correct PVI data.
Wednesday, February 10, 2010
Strategic Vision Not So Strategic After All
In an interesting epilogue to the entire affair, Strategic Vision, LLC, has requested that previously available polling data from 2005-2007 via the Internet Archive be removed. Mark Blumenthal of Pollster.com (and my future boss in full disclosure) has properly noted that "given the swirl of accusations about Strategic Vision arising from a failure to disclose basic information about their methods, this new effort to scrub previously disclosed information from what is essentially a public library for the Internet is more than a little creepy." Blumenthal alerts us that the son of one of those who completed a statistical analysis indicating possible fraud by Strategic Vision, LLC "realized that Strategic Vision might delete their archive, and thus downloaded everything he could before it disappeared. So the archived pages live on..."
But I have my own twist to add to the story... The web pages can still be access online right now even without the Internet Archive! That's right, you can still access any of the original poll data from 2005-2009 on Strategic Vision, LLC's website. How? Well, it turns out that, despite not having one single page to display the polling data from 2005-2007 (they do for 2008 and 2009), one can still retrieve the original individual pages the polling data was displayed upon. In what can only be deemed as one of the WORST coverups of all time, Strategic Vision, LLC left the individual polling pages on its servers.
All you need to access the data is the original link to any poll. Those links are easily available from polling aggregation sites such as RealClearPolitics.com, Pollster.com, and even Wikipedia.org. To see how easy it is to get the original data, you can watch the video below.
It takes a lot of time to go to what turned out to be 209 pages (208 polls) of data since 2005, and I am afraid that Strategic Vision, LLC may take down the pages. So, I have downloaded every single poll file from 2005-2009 and have uploaded it in a single zip file for anyone to download.
P.S. If you believe a poll is missing, please just leave a comment :)
Sunday, February 07, 2010
Another Argument Against IVR Polling Falls Flat
Democratic pollster Mark Mellman (who has more polling experience than I could dream of having) points to three elections in which he believes IVR polls were incorrect until the final round of polls. Of course, as Mellman himself notes, there is no way of knowing whether a poll is right two months before an election. Instead, we must play a sort of guessing game on what "seems" more realistic. While I think this is a game that leads open way too much interpretation, I'll play along.
Mellman's first case analysis is the recent Massachusetts Special Senatorial Election in which now Senator Scott Brown came from behind to defeat Attorney Martha Coakley. In that election, a University of New Hampshire poll taken January 2nd through the 6th and Mellman's own poll taken the 8th through the 10th showed Coakley lead by 17 and 14 respectively. These two live interviewer polls differed significantly from three IVR polls taken during the same period. A Rasmussen poll taken January 4th, a Public Policy Polling poll taken the 7th through 9th, and a Rasmussen poll taken the 11th showed a Coakley lead of 9, Brown lead of 1, and a Coakley lead of 2 respectively. The IVR polls clearly show a movement towards Brown during the week of the 3rd, while these live interviewer polls showed no such trend. These live interviewer polls did not show a trend until the following week when Mellman's own final poll before the January 19th Special Election predicted, as the final Public Policy Polling poll, a 5 point Brown victory.
Obviously, these two sets of polls cannot both be right. Mellman believes that "given the timing of ads and the feel on the ground, our story strikes me as more plausible". Is this story really more "plausible"? Well, we know that an internal Republican poll from all the way back in mid-December showed Brown only down 13. Brown had along with special interest groups aired television ads (I saw them on TV in New Hampshire) and ran around the state of Massachusetts during this period and into early January, while Coakley was on vacation. I would think this campaigning would have cut the lead. Furthermore, an internal Coakley poll conducted by Celinda Lake showed Brown cutting Coakley's lead from 15 on the 2nd-4th of January to 5 by the 9th to the 11th.
You get that? Coakley's own internal (live interviewer) poll was different by 9 points from Mellman's poll taken over pretty much the same period. A Suffolk University poll, also live interviewer, taken in the immediate three day period after Mellman's poll showed a 4 point Brown lead. In other words, if we believe that only live interviewer polls show the true story, Brown saw an 18 (yes, 18) point bounce in a matter of 3 days. So despite all the advertising by Brown in the month prior, something (of unknown origin) broke in that three day period to give Brown a 4 point lead? Would it not make more sense that Brown slowly, but surely chopped away at Coakley's lead (supported somewhat by Coakley's own live interviewer internal polling) as his message took hold? Could it have been that Mellman's own polling was maybe wrong? I think that explanation makes a lot more sense. Thus, it was not a matter of IVR vs. live interviewer polls as much as it was a battle between polls that were right and polls that were wrong, which happened to be mostly live interviewer polls in this instance. A look at the graph to the right shows that, if nothing else, many IVR polls differed little with live interview polls.
As for the two other two races Mellman points to to support his point, I see selective use of data. In the 2006 Washington Senate race, Rasmussen polling showed Democratic candidate Maria Cantwell's one time mid-teens lead drop to the mid-single digits throughout the summer, while Mellman's own polling showed a relatively consistent high teen to low twenties lead for Cantwell. (Mellman's poll is not shown in the table because he did not release the exact date of when the poll was conducted). Mellman asks whether we should believe a "big initial lead that narrowed somewhat as the campaign engaged, or the bottom suddenly falls out for Cantwell for no discernible reason, but she recovers her advantage after both sides hit the airwaves?". I'm inclined to agree with Mellman that his own polling was right, but this was not a matter of IVR vs. live interviewer.
The live interviewer firm Elway Research showed Cantwell's lead dropping from 29 in April to 14 in July, which is nearly equal to Rasmussen's 11 point lead in July. While Rasmussen's poll showed that margin shrinking to 6 in August, SurveyUSA (an IVR firm) showed Cantwell with a comfortable 17 point lead. Cantwell's SurveyUSA lead shrunk to around 10 points as the race entered September and October, but her lead also dropped to this level in live interviewer polls conducted by Mason-Dixon. In other words, the polls may have been inconsistent in Washington, but this inconsistency happened across IVR and live interviewer polls.
In the Connecticut 2006 Senate race, we see more of this selective use of data. Melman points out that Rasmussen's polls showed that Lamont and Lieberman were" tied in July... and were still neck-and-neck in September... [and that when] Rasmussen called the race even, Quinnipiac gave the incumbent a 24-point edge. September found the challenger [Lamont] closing the gap to 10 points, a lead Lieberman held through Election Day".
The problem in claiming that live interviewers showed a more realistic picture is that like Rasmussen live interviewer American Research Group also claimed Lieberman (the independent) and Lamont (the Democrat) were statistically tied in August and September. At the same time, IVR pollster SurveyUSA gave Lieberman a 13 point lead in early September, which equaled the Quinnipiac poll's margin. It was not that the IVR polls were in one camp and live interviewers were in another. You can find (see table above from Pollster.com) live interviewer and IVR polls that tell the "more realistic" and "less realistic" story.
In all three of these races, "unrealistic" polls were unrealistic because of something other than being an IVR or live interviewer poll. Those who continue to try and find faults with IVR polls have to look to reasons other than them showing a "unrealistic" picture.
And truthfully, I think those reasons are getting harder and harder to find everyday.
Wednesday, February 03, 2010
Brady still looks like a winner
First, the most obvious mistake was that there were still some votes to be counted from downstate where Brady was performing stronger than Dillard. Now with most of those votes in, Brady's lead has gone up to 751 votes. Now that 96 out of the 97 precincts remaining are from Cook County, will Dillard really close the gap by some 500 votes? I doubt it.
Silver's projection is based on the Associated Press' count that does not differentiate between Chicago (within Cook County) and suburban Cook County. Out of the 96 precincts left to be counted in Cook, 72 are from Chicago. Why does this make a difference? The vote from Chicago is not only significantly less Republican (as in less votes out there cast in the Republican primary), but Brady is actually performing slightly better relative to Dillard in the city than in the suburbs.
Out of the 2501 precincts counted in Chicago, Dillard has 5,269 and Brady has 1,778 votes. If we project the remaining 72 precincts based of out of this count, Dillard would pick up 152 votes and Brady would pick 51 votes.
Out of the 1913 precincts counted in suburban Cook, Dillard has 23,315 votes and Brady has 6,272. If we project the remaining 24 precincts based on this count, Dillard would pick up 293 votes and Brady would pick up 79 votes.
Combine Chicago and suburban Cook, Dillard picks up about 445 votes and Brady picks up 130 votes. That would cut Brady's lead by 315 votes, but he would still lead by 436 votes.
Will this lead hold up? Absentee ballots can still come in (up to 2 weeks after the election), and provisional ballots must still be counted. After this occurs and a random check of some of the results, the vote will be certified. As long as Brady's lead remains above say 150 votes with additional absentee, provisional, and recheck of the vote (he could lose 66% of his lead based off my projection), he is almost assured victory even if Dillard asks for a recount.
Consider the recent the recount in Minnesota (which like Illinois (save a few machines)) uses optical scan ballots. In that recount, Al Franken gained 527 votes. Of course in that election, the two candidates had a little more than 2.4 million votes between the two of them. In this primary, the two leading candidates have only about 310,000 votes combined. That's a ratio of about 7.75 to 1 between Minnesota and Illinois. If you extrapolate the results of the Illinois vote based off this ratio, Dillard may gain about 68 votes in a recount. Even if we double that gain, he would still come up short. Of course, Dillard may also lose votes in a recount. We really do not know where the votes would go in a recount. Finally, the people most likely to record over or undervotes missed by the machines are minorities and young first time voters. Unlike in the 2008 Minnesota Senate election (where plenty of these voters existed), Republican primaries are not exactly a breeding ground for young or minority voters.
In other words, it is a VERY uphill climb for Dillard. Let the votes be counted, but at the end of the day I think Bill Brady is going to the Republican nominee for governor.
EDIT (3:40 EST): The vote from suburban Cook has come in... Dillard picked up 359 votes (66 more than projected based off other precincts), while Brady picked up 117 votes (38 more than projected). That cuts Brady's lead to 508 votes. Of the remaining 73 precincts to be counted, 72 are from Chicago. Dillard would have to get 5 times the margin projected based off of the already counted precincts to pull even.
EDIT (4:40 EST): The complete vote from Chicago is apparently now in . Dillard picked up an additional 115 votes (36 less than predicted), while Brady picked up 39 votes (12 less than predicted).
All together that gives a final margin of 433 votes to Bill Brady... which is 3 votes off my estimate.
EDIT (7:55 EST): Apparently one precinct had yet to report (should have paid closer attention to that post . I linked to)... With the complete count in, Dillard actually picked up 156 votes in Chicago (4 more than predicted), while Brady picked up 53 votes (2 more than predicted).
In total that gives Dillard 516 more and Brady 170 more votes than they had before Cook County completed its count. That means Dillard made up 346 votes... leaving him 405 votes behind. Thus, Dillard performed better (relative to Brady) than I calculated thanks to a stronger performance in suburban Cook. Total error was 31 votes.
Brady will hold on in Illinois Gov
As I illustrated (oddly enough) on election night in mass, I look for numbers wherever I can get them... Silver's post http://www.fivethirtyeight.com/2010/02/recount-all-but-assured-in-illinois.html (probably because it was 230 am est...) doesn't differentiate between Cook County and Chicago within Cook County. The latter of course is slightly (but significantly in this case) more Democratic (David Wasserman noted this earlier this evening)... Ergo, less Republican votes are out there.
Turns out that of the now 96 precincts to count in Cook County, some 72 are from Chicago. How do I know this? Cause as I mentioned, I soak this stuff up and know the Chicago Board of Elections publishes their own stuff http://www.chicagoelections.com/results/race_results.php?id=1236
Both Brady and Dillard are running just terribly in Chicago... and Dillard is running worse relative to Brady in the city than the county as a whole. I'm calculating that instead of getting the 12.2 votes per precinct in Cook (but outside of Chicago), Dillard's only garnering 2 votes per precinct within the city. Brady, on the other hand, is garnering about 3 1/2 votes per precinct in non-Chicago Cook, but is only getting about 7/10 of a vote per precinct within the district.
The ratio of Dillard to Brady outside of the city (but within the county) is something like 3.5 to 1... but within is only something like 2.9 to 1. Combine that with the fact that both are just getting a less total number of votes within the city proper, I find that Brady is more likely to lead by about 195-200 votes, not the 1, that Silver projects, when all is said and done.
Considering that I believe Illinois uses mostly optical scan (http://www.elections.state.il.us/VotingInformation/VotingEquip.aspx) , my guess based on some quick and VERY rough calculations (using the last recount I know that involved optical scan (Minnesota)), my guess is that after recanvassing (which I would assume would take place??? if not the point is even stronger) is over and everything is recounted, the difference between a 200 vote lead and a 1 vote lead would be HUGE. If the lead is, in fact, 200 votes... I estimate that Dillard probably (unless something is really funky) could probably pick up net 115-120 votes max... Combine that with the fact that most of the areas where undervotes happen (lower educated/minority neighborhoods) aren't really casting that many votes, I'd have to think that Dillard's shot at closing the deal would be VERY small.
All this is part guesswork, part math, but there it is...
Tuesday, February 02, 2010
It's Primary Time! Illinois Style
Believe it or not, I believe the Democratic Gubernatorial contest is probably going to be called relatively early tonight. The trend lines are all in Hynes' direction. He has risen Quinn's unfavorables by riding a host of negative ads that make voters question Quinn's handling of the state's inmate release program and budget. Unfortunately for this prognosticator, the polls may be missing a lot of action.
In the final week of the campaign, Hynes released an ad featuring the venerable first black Mayor of Chicago Harold Washington lambasting Quinn for incompetence in Washington's administration. Quinn, who is relying on strong African-American support, fired back with accusations of Hynes' own fumbling over the Burr Cemetery disaster (a sore point for black voters). The only polls taken after the ads' release, a Public Policy and a Rasmussen poll, suggested that Hynes is still rising, but both polls were taken just after the ads' release. I am not confident that these polls fully registered the firestorm caused by the ads. The polls also missed what was arguably Quinn's strongest debate performance on Friday.
All that said, one must think the ads probably cancel each other out. They are both negative, and they probably both help to drive up the opponent's negative ratings. I do not think the ads had much of an overall effect on the direction of this race, but I really do not know. One important point to take into account is that the final polls that asked voters' favorable opinions of the candidates found Hynes had higher net favorables (favorable rating - unfavorable rating). Even the Chicago Tribune poll that had Quinn up by 4 had Hynes with a 6 point higher net favorable rating. In the Rasmussen poll, more voters held a favorable view of Hynes, despite more viewers not holding an opinion of him. Typically undecideds go to the challenger, and I expect this time will be no different. I estimate based on all the numbers that Hynes pulls out a surprisingly large victory (10+ points possible).
As for the United States Senate race, I really do not know what to say. Giannoulias looked to be cruising for the nomination, but ties to a shady family bank have really hurt him in the press (who incidentally enough have overwhelmingly endorsed his opponent). Like the Gubernatorial primary, the underdog, Hoffman, has seen his numbers double in the past month and a half. Of course, Hoffman started in a much lower position than Hynes. Giannoulias net favorables remain higher than Hoffman's, unlike what we see in the Gubernatorial contest.
Finally, Giannoulias', unlike Quinn's, approval rating remains the same as it did last month. Charges of Giannoulias' electability (stemming from the bank problems) seem to be the only thing that are holding him from jumping to a huge lead.
With a third candidate Cheryle Jackson (whose support is likely to be contained to African-Americans) taking about 20% of the vote, the winning candidate will only need about 40% to win. Giannoulias is right near the finish line, but something is just irking me about the fact that despite higher net favorables and high approvals, he cannot pull away. And like the Gubernatorial race, the final days' actions (bank tie hits against Giannoulias) are not registered by the polls. I have to give the edge to Giannoulias by the pure math of it, but I really would not be shocked by any outcome.
On the Republican side, Congressman Mark Kirk will easily win the nomination for United States Senate, and, as the chart below shows, nobody can have any clue who will the Gubernatorial primary. I am not going to even try to guess.
Subscribe to Posts [Atom]