Sunday, March 27, 2011


Give me Obama's stats, not anecdotes

President Obama may or may not be re-elected in 2012 (models indicate a close vote is in store), but I have to take some issue with a recent piece by Republican pollster David Hill. He claims that "five items point to Obama loss", and I pretty much disagree with each point.

1. Hill posits that Obama is not likable. While I feel that most elections are not decided on the likability factor, I think the American public likes Obama. For one thing, 83% of the public found Obama "likable" in a January GfK poll. Hill brings up some facts about Obama's strict diet (among other things) that may or may not be true, but I think the actual polls speak for themselves.

Obama's favorables (which Hill probably believes is not the best measure of "likability") run ahead of his approval ratings. In fact, his aggregate favorable rating has never dropped below his aggregate unfavorable rating, while the same cannot be said for his approval/disapproval.

There is a good 5% (+/- a few % depending on the poll) who hold a favorable
view of Obama, but do not think he is doing a good job. That type of spread between favorability and approval is pretty much equal to what "likable" President Bush saw throughout most of his Presidency.

2. Hill submits that Obama does not have support from his base. While I could tell you anecdotally that my roommate Derek (a diehard Obama supporter) still loves Obama, you're better informed by knowing Obama's approval among Democrats is actually above what either of the last two Democratic Presidents (Carter and Clinton) had in the middle of their first term.

His current ~80% is a stunning ~10% above Clinton's and ~30% above Carter's. How is that not getting support from your base? If anything Obama's problem is that he is too polarizing to those outside of his base.

3. Hill puts forth the belief that Obama has had no "noteworthy" accomplishments. This type of question is subjective (and one I am not in a position to answer). What I can say is that I am not sure it even matters. If these big "accomplishments" matter in the end, then why can economic models do such a fine job of explaining past Presidential election results? The only error the famous Hibbs model had was in a year (2000) where the incumbent could not run for re-election. If the economy does improve (see point 5), then I am sure pollsters will begin to hear "the economic recovery" as the number 1 accomplishment for Obama.

4. Hill argues that voters do not feel "personally connected" with Obama. I have not seen any polling to suggest that voters actually feel this way, nor that it actually matters. Hill may have some insider information that he could share with us, but fact is that the aforementioned GfK also found that 61% of Americans said that Obama "in touch with ordinary Americans". That seems, to me anyway, like Americans feel they have a personal connection with the President.

5. Hill points to the famous "are you better off than you are four years" question that Carter failed and says Obama is failing it too. I actually think Hill and I agree on this point. Obama would lose based on economic performance through this point in his Presidency. Luckily for Obama, the economy should improve (see these forecasts from the Philadelphia Fed and Wells Fargo) on a number of measures (GDP, Real Disposable Income, and Unemployment) to the point that Obama is at least a 50/50 proposition to be re-elected.

To reiterate, Hill may very well end being correct: Obama may lose in 2012. However, none of the evidence he presents is very convincing.

Thursday, March 24, 2011


Is Your House (model) On Order?

Earlier this evening, Nate Silver posted a response to my House model. In a sentence, I agree with his basic finding (like many of his). In fact, I made it a week ago in my comment section. My model simply will not work for 1948. It has a difficult time for any year prior to 1948.

Issues such as those raised in Nate's piece about coding often plague political scientists. When does a new political era begin? What constitutes an aggressive war? I do not pretend to know the answers. I only hope to know them.

I have made my judgement about 1952 beginning a new political era that continues to this day. Nate clearly disagrees with that belief. My argument is pretty simple. World War II along with the Great Depression forever changed the political landscape of our country. Some might argue that then I should include 1948 in my model, as it is post World War II. My postulation is that World War II played a role in the 1948 election in ways we cannot model.

Truman won re-election despite having negative yearly growth in real disposable personal income per capita in the first three years of his term (before a recovery in the fourth year). While quarterly disposable personal income data was not published for the first half of Truman's term, my guess is that Douglas Hibbs' quarterly growth real disposable personal income per capita based Presidential model (which Nate and I agree is solid) would have difficulty handling the Truman election.

We know for instance that Ray Fair (who has his own economic Presidential model) had to make ad-hoc adjustments to the all the elections in which the President's term (including 1948) encompassed any part of America's involvement in World War I or World War II. My model is not immune to this World War problem, but considering we have not since had a World War and will probably not have one for a long while, I am not particularly worried for future forecasting.

Another issue that Nate finds fault with is my use of the war variable, which my dataset codes as true for 1976 and 2008. As I noted in my piece, this variable is merely a dummification (not a word, I know) of Douglas Hibbs' Fatality variable. I apply the dummy only to years in which the Majority party in the House differs from the party that controls the Presidency. To answer Nate's question, Libya will count as a war in my book if Douglas Hibbs counts it in his.

A fair question to ask is why does this variable not apply to years in which the party in the White House is the same as the Majority party in the House. Note that including it does not affect my findings. Why? The answer is there is another variable that encompasses the war variable in years in which the President's party is the same of the Majority party.

As Hibbs' model illustrates, Presidential vote (which is my model's main variable for predicting House seats in years in which the Majority party is the same as the President's) is mostly a function of the quarterly real disposable personal income per capita growth and military fatalities. Thus, it would make sense (to me anyway) that the model warps out any effect my war variable would have when the Presidential vote variable is in effect. The question that should be asked is why does economic growth not matter in House elections in which the Majority party differs from that of the President.

Of course, coding of wars is always interesting to deliberate. Hibbs does not count any of the fatalities under Nixon's first term (and Obama's first term for Afghanistan and Iraq) against him because he "inherited" the war from Johnson. I happen to agree with Hibbs, but I know some who do not. The point is that a "right" answer in coding is not always apparent.

I also want to address a few other points.

First, Nate and I have "known" each other (at least in the blogosphere) for about a year now. During that time, I have felt (and continue to feel) that we have had a respectful and amicable relationship. I have "critiqued" some of his work (see here), and he has returned the favor. This exchange is healthy for me (as I hope it is for Nate), and more than that, it is important for the reader. Without this back and forth, anyone could post a poor model at a whim, and we would be none-the-wiser.

Second, part of the reason that it was so easy for Nate (or anyone) to check my work is because I post my datasets for everything I do online. If I do not, I tell readers (and I really do mean it) that they can email me. I wish more online political writers would post their datasets. It is the right thing to do.

Third, margins of error can be tricky for any out-of-data forecast. I do not pretend to know that my +/- 10 will definitely turn out to be right (e.g. it might actually be closer to +/-12), but it is based on something concrete (the existing dataset). I would rather make my margin of error that way then create some large arbitrary error to cover my behind. Some can point to this simple Gallup house model as an example where the margin of error (+/- 11 seats) of the in-dataset did not quite cover the spread in 2010 because it ended up being off by 13 seats. Unlike that dataset however, my model is not reliant on some measure (a Gallup poll) that is subject to additional error.

Finally, a wise man once said, "to be clear, accuracy ought to be the paradigm here. We're not trying to prove or disprove anything to an academic certain degree of certitude; we're trying to make a forecast... [My] model may be wrong, but I'd rather fail by being too ambitious than too stubborn."

I agree with him.

EDIT: One more point on degrees of freedom and number of cases. If you take the full model, you have 6 variables on 15 cases. If you take the not full model (i.e. only having the interaction variables without the original), you have 4 variables on 15 cases. I'm in nowhere comparing model to Douglas Hibbs' midterm House model, but there you had 3 variables on 15 cases. The Gallup model discussed above also had 3 on 15.

If you were to breakdown my model to the years in which the Majority party was different the President's party (which is basically my argument... that we have two types of House election is Presidential years since 1952), you would get the same result with 10 observations but only 2 variables (war and previous seat count). I have seen some suggestions that a solid model should have at least 10 to 20 times as many observations as variables. Most academic Congressional models and mine fail this test.

The real question, however, is whether statistically the model is too "tight" a fit. I have made the point that 1952 began a new political era. This belief has worked through 2008. In fact, as you can see in the comment section of my original post, the model did just fine in every year since 1952. If 2012 ends up being different from the '52-'08 dataset, the model will meet Reaper, Grim.

However, I really do not believe 12 will mean the model's death. Good aggregate based modeling (such as Hibbs and even the Gallup model that Nate does not like) did just fine in 2010. Most of them were inside their margin of error, and even the Gallup model missed the margin of error by just a few seats (or less than three standard errors). I will be making a post that the generic ballot (at least at this point) also supports my model.

The true test will be in a year and half. If it is clear my model is going down the drain before then, I will jump off the bus faster than fans of the Yankees when they are heading for last. And as those who know me can attest, I will come back here and admit my fault.

Monday, March 21, 2011


Three models say: It's a Tossup in 2012 Prez Race

In a few weeks, I'll be re-debuting my 2012 fundamentals based Presidential vote prediction model. It is a re-specification of Douglas Hibbs' bread and peace model that if nothing else will make you think. Until that time (and even after it), I want to throw out some caution on any long term Presidential fundamental (i.e. non-poll based) forecast (including my own).

Currently, three solid models* are out and about in the blogosphere (with many more on a Google Scholar near you).
-The aforementioned Douglas Hibbs Bread and Peace model that utilizes quarter-to-quarter growth rates in real disposable personal income per capita (dpipc) expressed annually and fatalities in "unprovoked, hostile deployment of American armed forces in foreign conflict" over the term**.

-Ray Fair's model that employs real per capita GDP quarter-to-quarter growth rates expressed annually in the 13-15th quarter of a term, GDP deflator quarter-to-quarter growth rates expressed annually over the 1st-15th quarter of the term, and number of quarters in the first 15 quarters of the Presidential term where the real per capita GDP growth expressed annually rate is above 3.2.

-Alan Abramowitz's model that uses 2nd quarter election year real GDP growth rate expressed annually, election year's June Presidential net approval (as measured by Gallup), and a dummy variable for number of terms the incumbent party has been in the White House.

Capitalizing on available economic forecasts from Wells Fargo (as well as from Ray Fair for his own prognostication), we can get an idea of what each of these models would call for if the election were today.

Hibbs' has the election teetering on the edge with Obama winning 50.05%-49.95% in the two-major-party vote; Fair's has Obama taking 52.5% of the two-major-party vote (down from 55.9% in his November prediction); while, Abramowitz's sees Obama winning between 53-54%*** of the two-major-party vote.

All three estimates for Obama's vote share are pretty much within each model's (at least +/- 3.5%) margin of error, so the spread between them is more than understandable. Therein lies part of the problem, however.

These models (and my own) are not built for close elections. Yes, sometimes they will do very well in close (53.5% or less for the winner) races (such as in 2004), but sometimes they will have large errors (2000 for Hibbs and Abramowitz and 1992 for Fair) where their winner for the popular vote received at least 3% less of the vote than forecasted. Considering all the models are giving Obama a small majority of the vote, the words "margin of error" cannot be repeated enough.

What about difficulties specific to each model?

Solid as a rock for post-election explanation of election results, the Hibbs model (as with mine) is sensitive to very small changes in real dpipc.

For example, the Hibbs' model utilizing an economic forecast from two months ago would have had Obama winning 50.3%-49.7%. A small change to be sure, but the only difference in Wells Fargo's growth rate of real dpipc forecast was that it was 0.1%-0.2% higher in some quarters (e.g. 2011's quarter 1 forecast of 4.0% to 3.9% now).

Still not worried? September 2010's to November 2010's quarter 1 of 2011 Wells Fargo's forecast for real dpipc changed by 0.6%****.

Thus, it is more than possible that slight (to not so-slight) revisions, especially in the long-term, will be made to the real dpipc forecasts. These possible changes make any Presidential forecast based of this measure of economic strength difficult.

The Fair model is less sensitive to slight shifts to two of its main economic variables (GDP deflator and real per capita GDP). Its greatest issue (besides ad-hoc adjustments) is the use of an interval variable to express the number of quarters during the first 15 quarters of a Presidential term in which real per capita GDP growth at an annual rate is greater than 3.2%.

Only one such quarter has occurred during Obama's Presidency, but Fair is predicting that there will be 3 more. My issue with this estimation is that I simply do not believe it will happen.

When we adjust Wells Fargo's real GDP growth rate to real per capita***** GDP (which basically takes about 1% off growth), no quarter-to-quarter growth at an annual rate for the rest of Obama's term climbs higher than 2.2%. Even the more optimistic Philadelphia Fed consensus estimate (when adjusted to per capita rates) gets only to about 2.7% in the best quarter.

And if only one quarter during Obama's term has real per capita growth rate above 3.2%, Fair's model actually shows the Republican candidate winning the election with 50.5% of the two-major-party vote!

The Abramowitz model, on the other hand, actually lends itself the most to a long range forecast, despite its use of Presidential approval in June of election year. Why?

Presidential approval is only a small part of the model with a coefficient for the variable of only 0.107; holding all other variable values constant, we should only expect about a 2% difference in Obama's projected vote percentage if his net approval is -10% vs. +10% (a 20% swing).

The model's utilization of real GDP is also an asset as the forecast for 2012's second quarter real GDP rate expressed annually has held steady at right around 3%, according to Wells Fargo. Even if the actual growth rate ended up being 2%, Obama would only lose about 0.5% off his estimated vote percentage.

The possible problem with the Abramowitz model is that (as Abramowitz, himself, points out) in each of the last four Presidential elections the model has over-predicted the vote of the incumbent candidate by at least 1.85%.

If Obama's net June approval ends up at -5% (very possible), and real GDP growth expressed annually for the second quarter is at 2.9% (per Wells Fargo), a 1.85% error in the negative direction pushes Obama under 51%. Even so, I would say that Abramowitz's model still indicates that Obama is the slight frontrunner.

So what can we take away from this analysis?

Two models based entirely off the fundamentals have the election shaping up to be a 50/50 affair where stump speeches in suburbs of Denver could make the difference. The other model (with a poll variable) contends that Obama is the slight favorite.

I think we can say that the 2012 election is going to be very tight.


*Please visit each writer's link to get full details on their models. They do better job of explaining them than I ever could. If you're still confused, just ask me a question.
**Note that a President gets a one term grace period for wars that his party did not start, so this value is currently 0 for 2012.
***Given Obama's net approval is between -3% and +6%.
****Both of these forecasts were before the extension of the tax breaks to those in the upper income brackets. Evidence suggests this extension may have negatively impacted real dpipc growth and thus had a negative effect Obama's changes of re-election.
***** Assuming the population grows by 670,000 each quarter. This number was achieved using the average quarter-to-quarter growth during the four quarters of 2010.

Wednesday, March 16, 2011


Republicans to maintain control of the House in 2012

Unless a historic event occurs, Republicans will still be in control of the House of Representatives after the 2012 election. How can I be so confident even when House re-districting is still occurring?

It turns out that the difference between House election results during Presidential election years are very well accounted for by fundamentals variables*. What I mean by "fundamentals" is simply variables that we know (or can reasonably predict) before the election and do not include polling data. In the 15 Presidential year House elections since 1952 (a common post-war cutoff for political science), these variables** include

-Percentage of seats won by the majority party in the last election. The majority party in the House wins more seats when it previously held more seats. In 2012, this variable is 55.6 because the Republicans won 242 seats out of 435 in the 2010 House elections.

-A dummy (1 or 0) for whether the majority party in the House is the same as in control of the Presidency. In 2012, this variable is 0 because the House is controlled by a different party than the White House.

-The percentage of the vote the party in control of the White House wins in the Presidential election during years in which the majority party in the House is the same as in control of the Presidency. Not surprisingly, when the party controlling both is the same, the majority party in the House gains more seats when its candidate for President wins a higher percentage of the vote. Surprisingly, Presidential vote has little relation to the House election in years when the House and Presidency are controlled by different parties. In 2012, this variable is 0 because the House is controlled by a different party than the Presidency.

-A dummy (1 or 0) that is 1 when the party in control of the White House is different from the majority party in the House, and the President has started or maintained for more than one term an "unprovoked, hostile deployment of American armed forces in foreign conflict" (as defined by Douglas Hibbs) which has resulted in at least 1 fatality during the past term. Interestingly, as if to penalize the party in the White House, the majority party in the House wins more seats when this variable is true (like in 2008). In 2012, this variable is 0 (false) because the Iraq War was started by a Republican President (Bush), and Democratic President (Obama) has not continued it for more than one term.

In simple linear regression equation form, the model reads for 2012,

Percentage of seats won by the current Majority Party (54.8%) = Coefficient for Previous Seat Share (.67) * Previous Seat Share of the Previous Vote share (55.6) + Constant (17.3).

The model is therefore predicting that the Republicans will win 238 seats, more than enough for them to maintain their majority.

We must ask how accurate is this model in explaining past results?

The answer is very accurate. The model is able to account for 95.9% of the difference in the results of the 15 Presidential year House elections since 1952.

What about the chance of error? The root-mean-squared-error (a statistic that measures errors in estimate and penalizes for larger errors) is 1.1% of seat share, which given our small sample size (15) corresponds to a margin of error at 95% confidence of +/- 2.3% seats in Congress. The largest error (and the only with an error of greater than 1%) in our set is the 1988 election, which the model misfit by 2.5% (or 11 seats). For 2012, our margin of error indicates that Republicans winning as many as 248 seats and as few as 228 is a reasonable expectation.

Of course, none of these findings are useful for 2012, unless we know how well the model not only explains but also would have predicted past House elections. To do so, we take out a given election from the model and re-run the regression. The results give us confidence that the 2012 estimate should be a good one. In 2008, the model called for the Democrats to win 252 seats, when they ended up winning 257 seats. In 2004 (thus we take out 2004 and 2008 from our dataset), the model would have forecasted Republicans to win 236 seats, when they won 232.

And what about the last time a Presidential election took place directly after re-districting (1992)? Eliminating 1992 from the dataset and re-estimating the model, we find that the model would have projected the majority party (Democrats) to win 254 seats, when they took 258. Thus, while this year's re-districting may cause the model problems, past history does not indicate that it will.

Given this past accuracy and current Republican majority, it looks like it will still be Speaker Boehner come January 2013.


**As always, email if you are interested in the dataset or have questions.

*To those statistic masters out there, yes it would be proper to include both "original" independent variables that make-up the interaction terms (war and Presidential vote). They were left out to preserve a reasonably low number of variables given the low number of observations as well as for simplicity of explanation. There is no statistically significant difference in the accuracy of the "full" model to explain past results.

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]