Wednesday, March 17, 2010


New and Improved Model for Predicting Support for Same-Sex Marriage Bans

For those that have been following my blogging over the past four months, you may remember that my first was a blog on support for same-sex marriage ban referenda. The blog was in response to both Nate Silver's same-sex marriage model and the aggregate incorrectly projecting the failure of Question 1 in Maine, which sought to overturn the legalization of same-sex marriage in that state. Before the election in Maine, I had amassed a polling dataset of same-sex marriage referenda taken within 3 weeks of the election. After the election, I decided to use that dataset along with some additional variables [the year of the election, whether the election took place in a "off-year", the religiosity of a state, and whether the ban was a constitutional amendment (or simply a law)] to create a regression model.

The results were pretty good. I found that "92.1% of the variation between the different same-sex marriage elections was explained by the model compared with 80.7% for Silver's...model". More importantly, the model would have correctly predicted the passing of Maine's Question 1.

Despite this success, I was still bugged by a few things about my initial model.

1. I felt the difference (2.3%) between the model's predicted (50.6%) and actual (52.9%) support for Maine's Question 1 was too large.

2. I prefer when all the variables in my model are statistically significant (with at least 90% confidence). The amendment vs. just law variable was not.

3. Most importantly, the model produced a major outlier in the state of North Dakota. This outlier of more than 9% between the predicted and actual support of the ban (in North Dakota) would really make me think twice before any vote. I felt like I'd be asking myself "is this going to be another North Dakota"?

I wanted to see if I could try and create a new model that dealt with one or more of these concerns. Being that concern three was my number one problem with my initial model, I wanted to address it first. I tried adding many different variables (number of Democrats, liberals, seniors, 18-29 year olds, an alternate measure of religiosity, dummy variable for region in each state, etc.), but none of these variables made the model more accurate. I thought all was lost. Then a Professor of mine Michael Bronski came up with what I dare say was the most ingenious ideas I had heard in a while. He said and I paraphrase "if you are trying to gauge the acceptance of same-sex relationships in a community, find a variable that is highly correlated with that". He suggested a variable that anyone who has been in a middle or high school recently would know: gay-straight alliances (GSAs) in high schools and middle schools.

Gay-straight alliances usually only pop up when the LGTBQ lifestyle is at least moderately accepted within a community. One would not expect youths would feel comfortable to join a gay-straight alliance in a non-accepting community. Using this mode of thought, it would follow that more gay-straight alliances in a state would equal less people favoring a ban on same-sex marriages. The only problem now would be to actually find out the number of gay-straight alliances exist in each state.

Fortunately, the Gay, Lesbian, and Straight Education Network (GLSEN) produced a list in 2007 of exactly this. In an effort to standardize the measurement in each state (we don't want the model to only see that more GSAs exist in states with larger populations), I have simply taken the number of high school and middle school students as provided by the 2007 American Community Survey and divided it by the number of GSAs in a given state. So a higher number of this variable means a theoretically less friendly gay community. How does this variable help our model?

Interestingly, the addition of the "Gaystraight" variable makes the religiosity variable insignificant, so I have eliminated that variable. This new model (without the religiosity variable) does all of the things I wanted to accomplish from the outset.

Stata Output for Regression:

First, we see that the model is even more accurate than our prior one. We can now explain 94.6% of the variation of support for gay marriage bans between states.

Second, the number of predicted state’s level of support for bans that were off by a very large margin has dropped as measured by the root-mean-squared-error (about 2.23 compared to 2.69 of my prior model). We see that the error for the North Dakota outlier has dropped from greater than 9% to a little bit over 5% (5.19%), an error that is no worse than the normal margin-of-error in many state polls. Why did this error drop? North Dakota has the third fewest GSAs in our dataset, and we expect that the polling data (even when controlling for the other variables seen above) will underestimate the amount of support for gay marriage bans in states with a limited number of GSAs. I should also note that North Dakota is the only error of over 4%.

Third, our Maine measurement is more accurate. When we eliminate Maine from the dataset (obviously we wouldn't know the final result from Maine before the vote took place), our new model would have over-predicted support for Question 1 by 1.8% (instead of under-predicting by 2.3% as our prior model did). Not only is this an improvement, but (because of the direction of the error) it would have also changed the characterization of my election prediction before the vote. Instead of claiming that the election was "too close to call" I would have been more forceful in calling for the passage of Question 1. No doubt that part of this error in Maine was due to the rarity of the type of election. Only two prior same-sex marriage elections with a poll taken three weeks before the election had taken place in a "off-year" and only one election (California's Proposition 22 in 2000) involved only a change of law, not change to a state's constitution. Not one election was both off-year AND just a change in law.

Fourth, all of the variables in this model are statistically significant with 95% confidence. As with our prior model, this model shows that polls are the best predictor of same-sex marriage ban elections. This model also indicates that the polls tend to under-predict support in off-year elections, have become more accurate recently, and are less likely to under-predict the support for the bans in elections when the ban is not an amendment. See my prior write-up for reasons behind the year and off-year effect.

As for why the polls seem to over-predict (when controlling for the other variables) in elections not involving changes to a state’s constitution, I am not sure. It could be that we have had only two cases, so any effect we see is really just a mirage (false significance). It could be that people are less willing to admit that they want to write "discrimination" into the constitution, a sort of “gay Bradley Effect”. No academic studies I find seem to back up this hypothesis, but maybe it does exist? I just do not know.

What we do know is that this model does a very good job at predicting support for same-sex marriage referenda. Going forward as new same-sex marriage referenda come up for votes, I hope this model can be utilized to find out when the polls are wrong and right.

Notes on the data

1. I have used the same dataset that I used in my prior blog with one exception: Ohio 2004. I found an "Ohio Poll" produced by the University of Cincinnati that was conducted a week closer to the election than the polls previously used in my dataset for Ohio 2004. Using the previous Ohio measurement, all the variables I used above would still have been statistically significant; I would been able to explain 94.5% of the variation of support for gay marriage bans between states; the root-mean-squared-error would have been 2.25; and, the highest error (North Dakota) would have been 5.11%.

The other notes have been reproduced from my prior post.

2. For my model, off-year is defined as any election that did not place during a presidential election (primary or general) or a midyear general. This includes Missouri 2004, Kansas 2005, Texas 2005, and Maine 2009. Silver's model only counts Kansas 2005, Texas 2005, and Maine 2009 as off-year elections. I used my measure because non-presidential primaries, like traditional off-year elections are often plagued by low turnout.

3. For Silver's and my model, religiosity is measured by the percentage of adults in a state who considered religion an important part of their daily lives in a 2008 Gallup study.

4. Because prior studies have found that due to the confusing nature of ballot questions voters become increasingly aware of the meaning of a "yes" and "no" vote for same-sex marriage ballot measures closer to the election (most likely relying on advertisements), my polling variable only uses data taken within three weeks of the election. In the case that more than one firm conducted a poll within three weeks of the election and less than a week separated the polls, I used an average of the firms' final polls. For Maine, this rule means I included an average of the final Public Policy and Research 2000 polls in my dataset, but not the Pan Atlantic poll because it was taken more than a week before the Public Policy's final poll was conducted.

5. While most of the data in my model is easily available, prior polling for same-sex marriage referenda is surprisingly difficult to find. I managed to locate and verify 25 elections with a measure to ban (or allow the state legislature to ban as is the case with Hawaii) same-sex marriage and a poll within three weeks of the election. I simply allotted undecideds to how already decided voters were planning on voting: projected vote in favor of the amendment by polls = those planning on voting yes / (those planning on voting yes + those planning on voting no).

Comments: Post a Comment

Subscribe to Post Comments [Atom]

<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]