Monday, March 22, 2010


Same-sex marriage referenda from within... You only need three

Any modern student of political science has read Andrew Gelman's et al. magnum opus Red State, Blue State, Rich State, Poor State. In the book, Gelman et al. illustrate one of the more interesting facets of American political life: rich states vote Democratic, while poor states vote Republican. It is a bit counter-intuitive (as Democrats are supposedly the party of the workingman), but it turns out that it all makes sense in the end. WITHIN each state, rich voters are more Republican, while poor voters are more Democratic. You might be wondering what this has to do with Gay marriage bans. The answer is pretty simple: what works to explain electoral phenomenon between states does not necessarily work to explain what happens within them.

For those that follow this blog, predicting (and explaining) support for same-sex marriage referenda has been a favorite topic. Before I discovered that polling data worked so well to predict same-sex marriage referenda, I tried a buffet of other explanatory variables. With the ever-expanding research in this subsection of LGBTQ studies, I mostly concentrated on the findings of Patrick Egan and Ken Sherrill in their 2008 paper "California Proposition 8: What Happened and What Does the Future Hold". The paper was published after Californians voted in favor of Proposition 8 to dispel the myth that African-Americans were responsible for the passage of Proposition 8. Egan and Sherril found that race had very little predictive or explanatory power in understanding a person's vote on Proposition 8. Instead, people's votes were much more likely to breakdown along party identification and level of religiosity (tendency to attend church). One variable they did not (and could not) check (because their dataset did not have a question on it) was level of education. Yet, in the less discussed, but equally as close Florida Amendment 2 to ban same-sex marriage (and civil unions), education was found to be the main factor in people's vote. In fact, Daniel Smith found that in the Florida vote, "education level was five times more important than race in determining how people voted. The more educated people were, the more likely they were to oppose the amendment". Thus, we had three variables that predicted how people voted in California and Florida: education, party identification, and religiosity.

Wanting to make a big splash (as I always want to do), I decided to plug these variables into a large model containing the votes of same-sex marriage ban referenda nationwide. Since 1998, there have been 33 such referenda. For each referendum, I have collected the percentage of people who consider religion to be an important part of their lives in each state, the percentage of people in each state who had a bachelor’s degree, and the partisan lean in each state measured by the Cook Partisan Voting Index (many states do not have voter registration by party identification).

In a linear regression model of these variables predicting support in the aforementioned 33 elections, we can explain about 65% of the differences between states' support. That is pretty decent, but it is not anywhere near the 95% of our polling model. More than that, only religiosity and the percentage of people with bachelor’s degrees really add anything to the model. That is, the partisan nature of each state does little to predict the changes in support for the referenda from state to state.

But this comparison is a bit unfair. Nate Silver's research tells us that earlier same-sex marriage elections as well as those that just ban same-sex marriages (not both same-sex marriages and civil unions) are more likely to garner higher percentages of the vote. What happens when we control for the year and whether an election is a ban on just same-sex marriage or a ban on same-sex marriage and civil unions?

Now, we see that the partisan nature of a state, religiosity, and education level of a state do have statistically significant prediction (explanatory) power with 90% confidence (the effect is not due to chance) of support for same-sex marriage referenda. As we would expect, Republican states, religious states, and states with less educated people tended to vote for same-sex marriage bans in higher percentage. In addition, this model explains 83% of the difference in vote between different elections. However, it should be noted that it is really religion, year, and the nature (just a marriage or marriage and union ban) of the election that have the greatest impact on understanding the vote in each state. To me, this model does not do much to add to our understanding or predictive value. Our polling model does a much better job.

Still, I remembered what Gelman et al. with concern to income and state. I asked myself if the differences between any two states' votes cannot be explain by a model because the dynamics between the elections played out differently. That is, a highly publicized vote in California is going to be different by nature than say one in South Dakota. At the same time, religion might be played up more in some elections than in others. It would be very difficult for a simple regression model playing off of mostly demographic data to be able to pick up on these differences. Fortunately, within any one election, the dynamics would be the same. Most people would read the same newspaper endorsements, watch the same television advertisements, and be delivered the same mailers.

To test my theory, I have decided to check the county-by-county differences in vote in three recent same-sex marriage elections. The massive volume of counties that have voted in same-sex marriage referenda elections is 1,000+, so that it is not within the scope of this study to look at all of them. Instead, I downloaded data from the Arizona 2006, California 2008, and Maine 2009's same-sex marriage elections. For each election, I collected the percentage of people with bachelor degrees in each county, percentage of people who are religious adherents, and percentage of the vote each county gave to Obama in 2008 (a measure of party identification). Within each state (5 of 15 in Arizona, 18 of 58 in California, and 1 of 16 in Maine), some counties do not have a percentage of bachelor degree measurement because of a lack of population. Still, each state has more than 2/3’s of the counties covered, and there is no reason (I can think of) that not having these counties would greatly affect our models. With these counties and the variables I had, I made three linear regressions for each state: percentage of county supporting a ban on same-sex marriage = percentage of people with a bachelor degree within each county + percentage of vote Obama earned in each county + percentage of people who are religious adherents in each county.

It turns out that this model does a very good job at explaining the differences in vote within each state. In each state, over 91% of the difference in vote between the counties is explained by these three variables alone. In all three states, all three variables were statistically significant with at least 90% confidence. Counties with more religious adherents, less Obama supporters, and less educated people vote were more likely to vote for the same-sex marriage bans. Perhaps it is intuitive, but due to the nature of regression we also know such things as more religious Democratic counties are more likely to vote for bans than less religious Democratic counties.

In the 2006 Arizona election, 94.7% of the vote differences between counties can be understood with these three variables. Obama vote and bachelor degree are statistically significant with 99% confidence, while religion is significant with 90% confidence.

In the 2008 California election, I confirm the finding of Egan and Sherrill and find that religion and partisanship are statistically significant in their prediction of voting yes on Proposition 8 with well over 99% confidence. Education, a variable that they did not check, is also significant with well over 99% confidence. Overall, 94.0% of the vote difference between counties can be explained by these three variables.

In the 2009 Maine election, 91.5% of the differences between the 15 counties I checked are explained by these three variables. Both religious adherence and Obama vote are statistically significant with near 99% confidence, while the percentage of people with bachelor degree is significant with greater than 99% confidence.

All of these results make a lot of sense. Democrats are likely to see same-sex marriage as a civil rights issue, and therefore support it. Education affects views on same-sex marriage because, as Daniel Smith put it, "Education is so important because it increases exposure to those who are different. Studies show very clearly that the more educated people are, the more tolerant they are of differences". Religious people are more likely to oppose same-sex marriage on "moral grounds".

Going forward, it will be interesting to see if these three states are merely aberrations or more indicative of a larger trend. One thing I do caution is extrapolating these results to the individual (person) level. The research indicates that education, political affiliation, and religiosity do impact people's stances on the issue of same sex marriage, but it may not hold in all these states (with the exception of California where Egan and Sherrill have found it to); however, I would guess that they do.


1. Religiosity was measured in two ways. For the state-by-state model, religiosity is measured by the percentage of adults in a state who considered religion an important part of their daily lives in a 2008 Gallup study. I think this a very good measure, but it is not available by county. For the county data, I had to settle for the percentage of religion adherents per county. The estimate comes from the Association of Statisticians of American Religious Bodies (ASARB) The data has been “adjusted” by Roger Finke and Christopher P. Scheitle of Pennsylvania State University to account “for the historically African-American denominations and other religious groups not listed in the ASARB totals”.

2. Education is measured by the percentage of people with at least a bachelor's degree in a given county or state. Since this data only goes back to 2001 (and it would take a lot of time to gather the data by year of election for the state level... something I was not too interested in obtaining for this study), I used data from the 2008 American Community Survey. For the county data, I used the American Community Survey data from 2006 (the year of the election) for Arizona, 2008 (the year of the election) data for California, and an average of 2006-2008 (in order to obtain data from more counties) for Maine.

3. I prefer the Cook Partisan Voting Index as a measure of partisanship (as it combines data from the 2004 and 2008 Presidential Elections), but it is not published county-by-county. Therefore, I just use Obama vote share by county.

4. In case you are wondering, the addition of a race variable has little predictive power. In Arizona, it adds nothing to the model. In California, it explains an additional one percent of the vote. Counties that had less white people were more likely to vote for the ban. In Maine (not exactly a diverse state racially), counties with less white people were actually LESS likely to vote for the ban. Overall, race just did not seem to have that much of an effect.


What do Arizona (2006), California (2008), and Maine have in common?

The percentage of religious adherents, people with bachelor degrees, and Obama vote explains over 90% of the variance between county votes on same-sex marriage within each state.

More in the PM.

Saturday, March 20, 2010


Kendrick Meek

I'll have something to say about his Senate candidacy at some point this weekend or Monday, but needless to say I think he might end up being the Democrats best chance to win a Senate seat currently held by Republicans.

More later on that...


Don and Juan Killed My Marriage: Another look at divorce rates and same-sex marriage

“Gay marriage will lead to our divorce” is a line you might expect from someone against same-sex marriage.

In January of this year, Nate Silver conducted a preliminary analysis that seemed to show for the first time, that I know, that this argument was not only statistically untrue, but the opposite was true. He found that divorce rates fell from 2003-2008 in states with NO constitutional same-sex marriage bans enacted prior to 2008, while they rose in states with constitutional same-sex marriage bans enacted prior to 2008. He finished his post with the following "There is, however, probably now enough data on this subject to engage in more sophisticated longitudinal studies on this subject (more sophisticated than I have engaged in here), which might produce more robust conclusions."

Being a student who needs to do essays (otherwise he (I) fails out), I figured I would try to take Mr. Silver up on his offer. That is, I would try to add other variables and use a more "rigorous" (though not greatly) statistical technique to test the relationship between same-sex marriage bans and divorce rates.

What I found is going to be revealed in this post in three parts: 1. I am going to show what I believe to be serious errors in Mr. Silver's initial analysis. 2. I am going to complete my more "rigorous" analysis. 3. I am going to show why this whole thing is a crapshoot.

Before I begin, I want to say that I tried to reach out to Mr. Silver after finding what I did. Because of the massive emails he receives (he's a popular guy because of his many talents), I never received a response. I have sat on this information for over a month, but I finally had to complete this for school credit.

The Problems with Silver's Initial Analysis

I initially wanted to study to see if adding any more explanatory variables (changes in income, education, age, race, etc.) to Silver's numbers would change any of his findings. Before I did, I had to download the data as Mr. Silver instructed in his post. I gathered, in each state, the number of divorces in 2003 and 2008 from the Center of Disease Control (CDC) and the number of married people in 2003 and 2008 from the American Community Survey (run by the kind folks at the Census). Because some states are missing from the CDC data, we are left with 43 states and the District of Columbia.

Trying to be a legitimate academic (we will see if you agree), my first step was to reproduce Silver's changes in divorce rates (so that I could then add other variables). I quickly ran into a problem. Many of the rates I was finding did not match Mr. Silver's. I spent around twelve hours trying to figure out why, and then I got it.

According to Silver's post, he calculated the divorce rate in 2003 as such: (# of Divorces in 2003 / # of Married People in 2003) * 2 (every couple has 2 people). In 2008, he supposedly did the same calculation: (# of Divorces in 2008 / # of Married People in 2008) * 2. The change (as it will be for all calculations in this post) from 2003-2008: (rate in second period (08) / rate in first period (03)) - 1.

Except (perhaps because his post was at 4:12 am... pot calling kettle black), Mr. Silver did the following for calculating the divorce rate in 2003: (# of Divorces in 2003 / # of Married People in 2008) * 2. For calculating the divorce rate in 2008, he made the same error in reverse: (# of Divorces in 2008 / # of Married People in 2003) * 2.

The major problem with this error is that many of the states that have passed same-sex marriage bans are gaining population, while many of the states that have not passed bans (or legalized same-sex marriage in the case of Massachusetts) are losing population. In mathematical terms, Silver's mistake artificially increases the divorce rate in 2003 in states with no constitutional bans (because the # of married people in 2008 in these states is lower), while it artificially decreases the rate in states with constitutional bans (because the # of married people in 2008 in these states is higher). The exact opposite effect happens when calculating the 2008 divorce rates. Overall, this spuriously drops the change in divorce rate by a much larger margin than it should in states with no ban, while it spuriously raises the change in divorce rate in states with bans.

In addition to this problem, Mr. Silver's initial analysis did not include the District of Columbia (an area that just legalized same-sex marriage). The District saw a major increase in divorce rates from 2003-2008. It also seems that Mr. Silver mistakenly assigned a constitutional ban to the State of Washington.

What follows is a table with Mr. Silver's found divorce rate changes from 2003-2008 and my calculated (and verified by someone a lot smarter than I) divorce rate changes from 2003-2008. States in red had a constitutional ban against same-sex marriage implemented prior to 2008; states in black had no ban nor allowed same-sex marriage; and, states in blue legalized same-sex marriage prior to 2008.

As you can see the differences between the two datasets is pretty astounding. Some states changed their relative position in change of divorce rate by as many as 21 spots. The average change per state (or district) is 8.


The two states with the most negative divorce rate changes from 2003 to 2008 are now states that passed same-sex marriage bans prior to 2008, Alabama and South Carolina. Also, states with a ban had an average decrease in divorce rates of 2.9%, not an increase of 0.9%.

The area with the biggest increase in divorce rate is now Washington, D.C., which did not have a ban. Overall, areas that did not pass a ban actually saw a decrease of 6.2%, not 8.0% as Mr. Silver found. And while states (+ the District of Columbia) that did not pass constitutional bans against same-sex marriage had a larger decrease in divorce rates relative to those states that did pass a ban, the difference between the areas that had a ban and did not have a ban is NO longer statistically significant.

Still, I wanted to complete a slightly more sophisticated analysis.

A New Model

The analysis above is not as in-depth as I would like. The model does not take into account the fact that not all constitutional bans against same-sex marriage prior to 2008 went into effect in 2004. 8 states had bans that went into effect in 2006. How do we know these states did not see an increase (or decrease) in divorce rates from 2003-2006, only to see the exact opposite happen from 2006-2008? At the same time, three states had bans enacted prior to 2003 (Alaska, Nebraska, and Nevada). How do we know these states did not see an increase (or decrease) in divorce rates in the immediate aftermath of their bans passing, only to see the opposite happen in later years?

I also wanted to take Mr. Silver's advice and add some other explanatory variables that might better explain divorce rate. The following variables have been linked with divorce (excuse the colon after the verb): political affiliation, income, race, education level, and age. People living in blue states are less likely to get divorced as are richer people, whites, more educated, and older.

So what exactly did I do? Using the American Community Survey for marriage data and CDC for divorce data, I went all the back to 2001 and calculated the change in divorce rate for three two year periods (2001-2003, 2003-2005, and 2005-2007) in all states with available data + D.C. I also calculated the change in median household income, percentage whites make up of the population, percentage of those over 25 with a bachelor's degree, and median age. I measured political affiliation using the 2008 Cook Partisan Voting Index (PVI). I would have liked to have gotten a political affiliation variable that changed with the years, but unfortunately those figures are not available. As it is, PVI will act as a constant. Quibble statistically, if you must.

The three intervals allow us to take care of the problems described in paragraph 1 of this section. In each time period, we can separate out the states that had a ban in place in the first year in a given interval (e.g. Nebraska had a ban in both 2001 and 2003), had a ban take effect during a given interval (e.g. Nevada did not have a ban in 2001, but passed one in 2002), or did not have one during a given interval (e.g. Alabama's ban did not go into effect until 2006). Therefore, we can see the overall differences in divorce rates between those states that had the constitutional ban for more than a year (Nebraska in 2003), just passed a ban (Nevada in 2003), or did not have a ban (Alabama in 2003).

I did not use the 2004-2006 interval because 18 states had bans for part of one of these years, while not having one in the other part of the year. In addition, Massachusetts legalized same-sex marriage in the middle of 2004. Kansas and Texas had bans implemented in the middle of 2005, so I have eliminated their 2003-2005 and 2005-2007 data. The elimination of data in these instances is in an effort to make my groupings (ban vs. no ban vs. marriage) as "pure" as possible. Despite these eliminations, I still had 131 observations.

With these observations, I found the following results. Keep in mind this is preliminary (as is this post... hey it's a blog). I am more than happy to share my data with others, if they feel like they can do something better statistically. If you see a mistake, let me know in the comments.

First, a simple regression of just change in overall divorce rate against a state having a ban by the end of a interval: states that did not have a ban had on average a 3.5% drop in divorce rate over each two year interval, while states that did have a ban saw no change in divorce rates. This difference was statistically significant with 95% confidence.

Second, I broke down the ban groupings between a newly implemented ban and having a ban throughout a given interval. I ran simple regressions of divorce rate against a state having a newly implemented ban (withholding observations of those areas that had a ban at the beginning of interval) and against a state having a ban at the beginning of a period (withholding those areas with newly implemented bans). It turns out that only states with a newly implemented ban (rise of .05%) differ significantly with those that have no ban (drop of 3.5%). Those states that had the ban in the beginning of an interval actually had a drop of .05%. I will allow you to draw your conclusions from the Massachusetts numbers, but because it is only one observation I would ignore it. Overall, if you believe these numbers, it would mean that states that passed a ban initially vary with concern to divorce rate from those states that do not pass a ban; however, this difference begins to disintegrate with time.

Third, I added PVI plus the other explanatory variables (income, race, education level, and age) and their changes over the three intervals to the freshly implemented ban regression model. Even with the addition of these explanatory variables, divorce rates in newly implemented ban states were still significantly higher (with 90% confidence) than those states with no ban. No other variable was statistically significantly different, but states with a higher PVI (more Republican) were more likely to have higher divorce rates. This may suggest that the bans on same-sex marriage variable is a stand in for something happening in red states. Religion perhaps?

Fourth, I added these explanatory variables to the model where we measured the differences between states that had a ban for longer than a year vs. no ban. No variables were statistically significant, but states with same-sex marriage bans were still more likely to see higher divorce rates.

Overall, the addition of these other explanatory variables did not really do anything to help explain away what we saw in our "second" step.

The Crapshoot

As I admitted for the model in our first section, I will admit the models in our second section are elementary. All one has to do is look at what happens when we concentrate on only changes during the third interval (2005-2007).

In this third period, the divorce rate change among states without a ban rose by 0.7%, rose by 0.5% in states with a newly implemented ban, and dropped in states that had bans prior to 2005 by 0.5%. In other words, the relationship we were seeing when looking at the three intervals (01-03, 03-05, and 05-07) in combination disappears and somewhat reverses itself when only looking at 05-07.

When we add the explanatory variables, states with new bans (those passed in 2006) are still more likely to see a drop in divorce rates than those in states with no ban. The difference is not statistically significant, and none of the other explanatory variables have a statistically significant impact on predicting divorce rates.

HOWEVER, when we compare those states that had a ban prior to 2006 (excluding Kansas and Texas for the reason stated earlier) to those states without a ban, states without a ban are on the periphery of being statistically significantly (with 90% confidence) more likely to see a rise in divorce rates. Unlike the other models we ran, states with increases in income were also more likely to see drops in divorce rate. We would expect this as financial problems can pull families apart.

Does this mean that we are likely to see divorce rates continuing to rise in states that do not pass a ban relative to those states that have? I VERY HIGHLY DOUBT IT. The truth of the matter to quote a friend of mine is that most likely "no relationship exists".

I truly believe that people get divorced because the person they thought they were compatible with turned out not to be. Truthfully, if a woman I thought was tough, smart, and kind (and if you are those please send me an email... TY) turned out to be weak, not as smart as I hoped, and mean, then I would want to break it off. I think most of us do not have our relationships impacted if Laser and Blazer decide to get married. Then again, I really do not know.

In Conclusion

What I have done here is pretty much show what we thought already. I have run a slightly more complicated model than Mr. Silver had previously implemented. I have shown that the relationship between same-sex marriage and divorce rates really does not exist in any consistent fashion. I would be more than happy to have other people suggest variables or ways to carry this out, but at the end of the day I do not think a relationship exists.

Thursday, March 18, 2010


Polling Standards at the New York Times?

When I visited the Daily Kos blog this afternoon, I was greeted by the following headline "Chamber of Commerce Skews Polling in Dem Swing Districts" linking to a New York Times blog piece. In the Times piece, author Robb Mandelbaum explains that the Times cannot publish results from the partisan Chamber surveys because

Instead of randomly selecting their respondents, the Chamber of Commerce sampled from voter lists, a practice The New York Times and many other media pollsters do not endorse because the lists are often outdated and are generally not representative -- they do not include unlisted telephone numbers, for example.

In other words, the New York Times claims it will not publish polls conducted using registration based list sampling (RBS).

As I am not familiar with the New York Times' "stringent standards" for publishing poll results, I was admittedly perplexed when I read about the New York Times' opposition to RBS polling. Why? Because I had seen them publish polls in the past that use RBS.

Just today, in fact, the Times published an RBS result in an a blog discussing Senator Barbara Boxer's bid for re-election:

'A new Field Poll shows that the three candidates hoping to unseat Senator Barbara Boxer have gained ground. Senator Boxer, who is in her third term, trails Tom Campbell, a former congressman, 44 to 43 percent, and leads Carly Fiorina, the former chief executive of Hewlett-Packard, 45 to 44 percent.'

The Field Poll, one of the oldest and most widely respected polling firms in California, uses RBS technology "when conducting surveys of the state's registered voter population". A search of the New York Times' archive reveals 20 mentions of the Field Poll in the last 12 months.

The Field Poll is not the only firm to use RBS technology. The vaunted pre-caucus Iowa Poll conducted by Ann Selzer rode RBS to being the only poll to predict a Kerry/Edwards 1-2 finish in the 2004 Democratic Iowa Caucus, and it accurately projected Obama and Huckabee victories in the 2008 Iowa Caucuses. The New York Times has quoted Selzer's pre-caucus polls.

Of course, I would still be somewhat suspicious of the Chamber of Commerce sponsored polls, and Mandelbaum implies the Times is too. They are after all polls conducted by a Republican leaning firm for an organization against the current healthcare reform bill. But for the New York Times' to claim they never publish RBS polls is laughable.

Indeed, It appears that the Times accepts list based samples in some instances but not others. So what is the New York Times' standard for publishing RBS polls?

This post originally appeared

Wednesday, March 17, 2010


New and Improved Model for Predicting Support for Same-Sex Marriage Bans

For those that have been following my blogging over the past four months, you may remember that my first was a blog on support for same-sex marriage ban referenda. The blog was in response to both Nate Silver's same-sex marriage model and the aggregate incorrectly projecting the failure of Question 1 in Maine, which sought to overturn the legalization of same-sex marriage in that state. Before the election in Maine, I had amassed a polling dataset of same-sex marriage referenda taken within 3 weeks of the election. After the election, I decided to use that dataset along with some additional variables [the year of the election, whether the election took place in a "off-year", the religiosity of a state, and whether the ban was a constitutional amendment (or simply a law)] to create a regression model.

The results were pretty good. I found that "92.1% of the variation between the different same-sex marriage elections was explained by the model compared with 80.7% for Silver's...model". More importantly, the model would have correctly predicted the passing of Maine's Question 1.

Despite this success, I was still bugged by a few things about my initial model.

1. I felt the difference (2.3%) between the model's predicted (50.6%) and actual (52.9%) support for Maine's Question 1 was too large.

2. I prefer when all the variables in my model are statistically significant (with at least 90% confidence). The amendment vs. just law variable was not.

3. Most importantly, the model produced a major outlier in the state of North Dakota. This outlier of more than 9% between the predicted and actual support of the ban (in North Dakota) would really make me think twice before any vote. I felt like I'd be asking myself "is this going to be another North Dakota"?

I wanted to see if I could try and create a new model that dealt with one or more of these concerns. Being that concern three was my number one problem with my initial model, I wanted to address it first. I tried adding many different variables (number of Democrats, liberals, seniors, 18-29 year olds, an alternate measure of religiosity, dummy variable for region in each state, etc.), but none of these variables made the model more accurate. I thought all was lost. Then a Professor of mine Michael Bronski came up with what I dare say was the most ingenious ideas I had heard in a while. He said and I paraphrase "if you are trying to gauge the acceptance of same-sex relationships in a community, find a variable that is highly correlated with that". He suggested a variable that anyone who has been in a middle or high school recently would know: gay-straight alliances (GSAs) in high schools and middle schools.

Gay-straight alliances usually only pop up when the LGTBQ lifestyle is at least moderately accepted within a community. One would not expect youths would feel comfortable to join a gay-straight alliance in a non-accepting community. Using this mode of thought, it would follow that more gay-straight alliances in a state would equal less people favoring a ban on same-sex marriages. The only problem now would be to actually find out the number of gay-straight alliances exist in each state.

Fortunately, the Gay, Lesbian, and Straight Education Network (GLSEN) produced a list in 2007 of exactly this. In an effort to standardize the measurement in each state (we don't want the model to only see that more GSAs exist in states with larger populations), I have simply taken the number of high school and middle school students as provided by the 2007 American Community Survey and divided it by the number of GSAs in a given state. So a higher number of this variable means a theoretically less friendly gay community. How does this variable help our model?

Interestingly, the addition of the "Gaystraight" variable makes the religiosity variable insignificant, so I have eliminated that variable. This new model (without the religiosity variable) does all of the things I wanted to accomplish from the outset.

Stata Output for Regression:

First, we see that the model is even more accurate than our prior one. We can now explain 94.6% of the variation of support for gay marriage bans between states.

Second, the number of predicted state’s level of support for bans that were off by a very large margin has dropped as measured by the root-mean-squared-error (about 2.23 compared to 2.69 of my prior model). We see that the error for the North Dakota outlier has dropped from greater than 9% to a little bit over 5% (5.19%), an error that is no worse than the normal margin-of-error in many state polls. Why did this error drop? North Dakota has the third fewest GSAs in our dataset, and we expect that the polling data (even when controlling for the other variables seen above) will underestimate the amount of support for gay marriage bans in states with a limited number of GSAs. I should also note that North Dakota is the only error of over 4%.

Third, our Maine measurement is more accurate. When we eliminate Maine from the dataset (obviously we wouldn't know the final result from Maine before the vote took place), our new model would have over-predicted support for Question 1 by 1.8% (instead of under-predicting by 2.3% as our prior model did). Not only is this an improvement, but (because of the direction of the error) it would have also changed the characterization of my election prediction before the vote. Instead of claiming that the election was "too close to call" I would have been more forceful in calling for the passage of Question 1. No doubt that part of this error in Maine was due to the rarity of the type of election. Only two prior same-sex marriage elections with a poll taken three weeks before the election had taken place in a "off-year" and only one election (California's Proposition 22 in 2000) involved only a change of law, not change to a state's constitution. Not one election was both off-year AND just a change in law.

Fourth, all of the variables in this model are statistically significant with 95% confidence. As with our prior model, this model shows that polls are the best predictor of same-sex marriage ban elections. This model also indicates that the polls tend to under-predict support in off-year elections, have become more accurate recently, and are less likely to under-predict the support for the bans in elections when the ban is not an amendment. See my prior write-up for reasons behind the year and off-year effect.

As for why the polls seem to over-predict (when controlling for the other variables) in elections not involving changes to a state’s constitution, I am not sure. It could be that we have had only two cases, so any effect we see is really just a mirage (false significance). It could be that people are less willing to admit that they want to write "discrimination" into the constitution, a sort of “gay Bradley Effect”. No academic studies I find seem to back up this hypothesis, but maybe it does exist? I just do not know.

What we do know is that this model does a very good job at predicting support for same-sex marriage referenda. Going forward as new same-sex marriage referenda come up for votes, I hope this model can be utilized to find out when the polls are wrong and right.

Notes on the data

1. I have used the same dataset that I used in my prior blog with one exception: Ohio 2004. I found an "Ohio Poll" produced by the University of Cincinnati that was conducted a week closer to the election than the polls previously used in my dataset for Ohio 2004. Using the previous Ohio measurement, all the variables I used above would still have been statistically significant; I would been able to explain 94.5% of the variation of support for gay marriage bans between states; the root-mean-squared-error would have been 2.25; and, the highest error (North Dakota) would have been 5.11%.

The other notes have been reproduced from my prior post.

2. For my model, off-year is defined as any election that did not place during a presidential election (primary or general) or a midyear general. This includes Missouri 2004, Kansas 2005, Texas 2005, and Maine 2009. Silver's model only counts Kansas 2005, Texas 2005, and Maine 2009 as off-year elections. I used my measure because non-presidential primaries, like traditional off-year elections are often plagued by low turnout.

3. For Silver's and my model, religiosity is measured by the percentage of adults in a state who considered religion an important part of their daily lives in a 2008 Gallup study.

4. Because prior studies have found that due to the confusing nature of ballot questions voters become increasingly aware of the meaning of a "yes" and "no" vote for same-sex marriage ballot measures closer to the election (most likely relying on advertisements), my polling variable only uses data taken within three weeks of the election. In the case that more than one firm conducted a poll within three weeks of the election and less than a week separated the polls, I used an average of the firms' final polls. For Maine, this rule means I included an average of the final Public Policy and Research 2000 polls in my dataset, but not the Pan Atlantic poll because it was taken more than a week before the Public Policy's final poll was conducted.

5. While most of the data in my model is easily available, prior polling for same-sex marriage referenda is surprisingly difficult to find. I managed to locate and verify 25 elections with a measure to ban (or allow the state legislature to ban as is the case with Hawaii) same-sex marriage and a poll within three weeks of the election. I simply allotted undecideds to how already decided voters were planning on voting: projected vote in favor of the amendment by polls = those planning on voting yes / (those planning on voting yes + those planning on voting no).

Saturday, March 13, 2010


On the docket

After a week off (final papers...), I'll be blogging up a storm this week.

On the agenda:

1. I'll be weighing in (once again) on the subject of interactive voice response polls. I have already blogged on the subject, and I will once again be knocking down a case that just makes no sense to me.

2. Same-sex marriage... Same-sex marriage... Same-sex marriage. I'll be looking at divorce rates, when polls fail to accurately predict support for same-sex marriage amendments, a new measure to accurately predict support for same-sex marriage amendments, and how all this relates to Washington D.C. (which just legalized gay marriage). I suspect the polls might be under-predicting support for a ban in our nation's capital...

Overall, I expect it to be a very BUSY week... So please check back here and at (the place where Obama's pollster gets his numbers...)

Monday, March 01, 2010


Is Blanche Lincoln the new Joe Lieberman?

With Lt. Governor Bill Halter entering the Democratic Senatorial primary in Arkansas, the first question most are asking is "can he win?" I think we can agree that he can win with Senator Blanche Lincoln an almost certain loser in the general election. How likely is a Halter victory? According to a January Mason-Dixon poll, Lincoln led Halter 52-34% in a hypothetical match-up. In addition, her approval rating among Democrats was only 51% (with 35% disapproving) in an early February Public Policy Polling poll. While these polls indicate that Lincoln is vulnerable to a primary challenge, I would argue the polls could be underselling her vulnerability.

As it stands right now, it is clear Halter is going to challenge Lincoln from the left, with Lincoln's position against the public option being the main issue. While the polling numbers gauging the public option in Arkansas are a little stale, we do know that a December Daily Kos/Research 2000 poll found that 84% of Democratic primary voters supported the public option. In a late October poll conducted by Research 2000 for the liberal Progressive Change Campaign Committee and Democracy for America, 43% of Democratic voters said that if Lincoln did not support the public option, they would be less likely to vote for her.

Thus, if Halter can phrase the health care question correctly (public option vs. government run healthcare, etc.), I think Lincoln can be beaten on this one issue. Why do I believe this even though Halter is not winning already?

The idea of an ideological one issue primary makes me think back to the last big time Democratic Senatorial primary as it stood two months out.

Two months before the 2006 Connecticut Democratic Senatorial primary between Ned Lamont and Senator Joe Lieberman, a Quinnipiac poll headline read in part "Anti-Bush, Anti-War Feeling Does Not Hurt Lieberman". Quinnipiac found that 60% of Democrats approved of Joe Lieberman job performance. This relatively high approval from Democrats was despite 55% of Democrats knowing Lieberman supported the War in Iraq, 72% of Democrats wanting a decrease in the troops in Iraq, and 83% of Democrats believing that we should have never entered into the War. Why the disparity between support for Lieberman and support for the war? Only 12% of Democrats in this late April poll said a candidate's position on the war was the only issue they were voting on.

Flash forward two months to mid July and early August 2006 and the final three Quinnipiac polls for the Democratic primary. After two months of Ned Lamont hammering Lieberman over his support for the War in Iraq, the importance of the War as an issue rose dramatically and Lieberman's approval dropped among Democrats. In the final three polls, the percentage of voters who pledged to vote again Lieberman because of Iraq was 28%, 44%, and 36%. That's anywhere from a 16-32% jump from the late April polling, with the two higher percentages polled in the two weeks before the primary. In addition, Lieberman's approval dropped from 60% and a net approval of +29 in late April to an approval of 47% and net approval of +3% in mid July.

What am I getting at here? Voters in Connecticut did not seem so interested in voting for or against Lieberman based on the Iraq War, despite the overwhelming number of them opposed to it. Then another candidate (Lamont) breached the subject, and it began to unravel for Lieberman. His approval ratings took a dive, and the Iraq War that the primary voters were against become much more important in determining their votes.

In Arkansas, many Democrats have already indicated that they would be less likely to vote against Senator Lincoln because of her stance on the public option. If the prior polling on the public is correct (and Halter can frame the healthcare question as one of the "public option"), I would not be surprised to see her approval numbers take a dive due to another candidate (Halter) raising what is shaping up to be the signature issue of the primary (healthcare) and Lincoln's stance on it. Many Democratic voters in Arkansas may be unaware of her stance, and those that are may just need a little persuasion to make them vote on the issue.

Of course, primaries are odd in nature. We cannot know how a primary electorate will react to a new candidate and his/her arguments, and Arkansas is no Connecticut.

Still, I would not be surprised if the next polling numbers out of Arkansas show Halter closing fast on Lincoln.

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]