Wednesday, February 17, 2010

 

Collaboration allows all of us to be better informed

Yesterday morning, I asked all political bloggers including myself to follow the academic standards for data mining. With that in mind, I want to call attention to another one of the most important parts of academia: collaboration. We need to share ideas with each other, point out (in a professional manner) when others make mistakes, and be willing to take constructive criticism.

In
my post on Strategic Vision, I asked anyone who had more polls than I had archived to come forward. Professor Michael Weissman, the man responsible for the fantastic Fourier analysis exposing Strategic Vision, did just that. He wrote a comment on this blog and an email informing me that my 209 polls were less than what he had worked with in his analysis. Professor Weissman was kind enough to send me his file of polls to allow me to see why this difference existed.

It turns out that I had missed
17 polls. Most of those were in the state of Florida where I missed a total of 13 polls. The large majority of those Florida polls were from 2007. I thank Professor Weissman for allowing me to discover my errors and share them with you.

The other great part of Professor Weissman initiating conversation with me is that I was able to share with him a new finding that I can share with you now. As
Mark Blumenthal (my boss for the spring and summer) has previously pointed out, Strategic Vision conducted polling during the 2004 election cycle. Nate Silver's initial analysis on Strategic Vision's possible fraud, Michael Weissman's follow up, and the polling list I provided you only included polling from 2005 onward. Neither Weissman's son Jonathan, nor I found a link(s) that allowed us to access polls from 2004.

This afternoon after receiving Professor Weissman's email I decided one last time to try and find a link(s) that would permit us to secure Strategic Vision polling information from 2004. Although most of the links I could find are now "dead", I was able to find what I call a
"golden key".

In Strategic Vision's effort to provide their readers with an easy to read time comparison of 2004 "swing state" "polling", they created one page that had data from 9 polls from 8 states (for a total of 72 polls) with 10 questions from each poll (for a total of 720 poll questions). While I know more 2004 data definitely
exists that we cannot currently get data from, this newly discovered data provides new information not previously available.

I hope these new polls allow Professor Weissman and others to perform exciting new statistical analyses. And more than that, I love how our collaboration allowed both of us and you to be able to access the most information possible.

The new polling file including Professor Weissman's polls, the polls I missed, and the original file are available
here. An archived version of the 2004 data is available here.

P.S. As always, please let me know if you have polling data from Strategic Vision that I did include in these files.

Comments:
Harry is being a little too modest. He also caught a case where our files had a duplicated Iowa poll, which SV had posted twice under distinct URLs. That's important because duplication, unlike random deletions, can create statistical artifacts. (In this case, accounting for less than 0.1% of the data we are using, it didn't make trouble.)

BTW, at very first glance at the 2004 data, it looks like SV may have underestimated the spread required to match the statistical error bars. It may be possible, given almost 90 sets of 8 polls, to do a chi-squared test with about 600 degrees of freedom (depending on whether temporal de-trending is needed) to show that these have an improbably low dispersion.
 
Harry- It's ok to post the results. Of the 80 sets of repeated poll questions (8 states * 10 questions), 27 showed less variance in the margins (differences between the first two percents)than would be found by chance in 3.38% of such collections, even if there were no change over time in the population opinions. In other words, 27 sets met that criterion instead of the expected 2.7 or less. The odds of that are 5 out of 100,000,000,000,000,000,000.

Nice find!
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]