But what about Nate's point that the Hibbs' model widely misses on elections prior to 1952?
There are a number reasons not to input data prior to 1952 into the model. Some of those (including how voters viewed the role of government differently prior this party era) are outlined by Seth Masket.
One other point that I have not heard brought up is simply that prior to 1952 elections lacked any true television campaigning. It's difficult to underestimate the effect of how television nationalized elections.
I think these factors that apply to 1952 and afterward make these elections fundamentally different to those prior. It's not surprising that 5 out of the 7 out-of-dataset errors Nate found prior to 1952 are approximately 2 times or more greater than any of those from 1992-2008.
Indeed, political scientists have known elections prior to WWII and afterward seemed to respond differently to the fundamentals.
For example, esteemed political scientist James Campbell noted that there was a difference between presidential coattails pre and post-WWII and actually mapped out different equations for pre and post-WWII. Some scientists have tried to fit pre and post-1952 data into one model. During the 2010 midterm election, for instance, we saw many fundamental models (i.e. no polling data) of this design, but not a single one correctly projected the Republicans winning control*. In fact, the only fundamental that saw the House turnover was Hibbs' midterm House model, which uses real disposable income growth data post-1948 like the Presidential model.
So, these combined models may take into account more years, but more years does not necessarily mean a better forecasting model.
If Hibbs' Presidential model initially included Nate's estimates for real disposable personal income growth prior to 1952 (i.e. 1924-1948) and was refitted to predict the 1992-2008 elections out-of-dataset, Hibbs' forecasts would have been worse.
Both the 1992 and 2008 elections would have been incorrectly forecasted to be won by the Republican party. Instead of the 2.6% mean error of the original Hibbs' model, Hibbs' estimates would have been off by about 3.91% with '24-'48 data added.
This difference in 1992-2008 out-of-forecast errors between the models that include and do not include pre-1952 data is so large that it's actually greater than the difference in error between my amended Hibbs' model (2.09%) and the final weekend polls (1.11%) during the same 1992-2008 period.
Personally, I want the model that best predicts the future, not one that makes us feel good inside for post-projecting elections prior to 1952.
But let's say that you are interested in knowing how well the economy explains pre-1952 vote. Nate would like you to believe that these high pre-1952 errors prove that "it’s the economy, stupid. And everything else too" when it comes to explaining Presidential elections. And in one way, I agree with him.
See, the number 1 predictor of how any voter casts her/his ballot is not the economy, but party identification. It's the main reason why we never see Presidential candidates winning much more than 60% of the vote, or the losing candidate earning less than ~40%.
In other words, in a race between a Republican and Democratic candidate, each candidate starts with a base of about 40% of the vote with only about 20% of the vote up for grabs.
This means that even if there was 15% growth (or decline) in real disposable personal income, the effect of this growth on the vote would run out as most of the population makes their vote choice based upon the candidates' party affiliation.
You can see this concept graphically for the incumbent party's, which typically wins a greater, vote share. Note the flat tails at each end as growth becomes incredibly positive or negative.
This high growth/decline factor is not an issue in the 1952-onward data, but it is for the election years 1932, 1936, and 1944.
The question is how do we control for this issue. One adequate and simple (but not perfect) way is to simply cap the economic growth variable at -5 and +5. That is, we simply recode the economic variable in 1932 as -5%, in 1936 as +5%, and in 1944 as +5%.
Let's also add an interaction variable between term of the party in the White House (a fundamental dummy variable that has been found to be quite significant in other models) to this recoded real disposable personal income growth variable and the military fatalities' variable. Doing so, we can now explain not 59% of election results between 1924-2008, but 77.4%! Three variables** accounting for 77.4% of the difference in results among 22 elections spanning two party systems, a World War, and the move into the television age is fantastic as far as I'm concerned. But I actually believe that we can explain more than 77.4%. When I first built my amended Hibbs' model, I too had searched the Census archives, but could only find yearly data on disposable income from 1929-onward. I must admit that I was surprised that Nate found the growth for the 1928 election to be only about 0.30% (the amount needed for Republican Herbert Hoover to receive 47% of the two-party vote). After all, the 1920's had a supposedly booming economy. In search of data, I plowed through Google and was able to locate disposable personal income data from this (gated) 1946 economic paper.
The data seems pretty accurate***, and I calculate growth rates**** in real disposable income per capita of 2.00% for 1925, 1.35% for 1926, 1.37% for 1927, and 3.39% for 1928.
Remember, however, that Hibbs' model is based off quarterly income growth in which later quarters are weighted more heavily than earlier ones. Exactly how one converts yearly estimates to quarterly will determine the exact weighted quarterly growth rate.
There is no perfect method, but my own imperfect estimation is 1.80%.
I have no idea whether Nate's or my finding for income growth heading into the 1928 election is correct, but I feel my calculated rate better jibes with the overall consensus the economy was doing quite well during the 20's (despite a few bumps along the way).
If we assume my growth finding is right, then 82%, not 77.4%, of the differences among election results between 1924-2008 are accounted for by the model. This means that this model has only about 5% less explanatory value than the 87% of the original Hibbs' model for 1952-2008.
I find it amazing that three non-polling variables can explain 82% of difference in results among 22 elections spanning two party systems, a World War, and the move into the television age.
I still stress, however, that I don't find this pre-1952 prediction exercise all that important for predicting results in this modern age.
We should not incorporate more data into a model just to boost the number of data-points. As I discovered, our ability to forecast elections with economic modeling in 2012 and beyond is impeded, not helped, by adding data prior to 1952.
Overall, these exercises should give us confidence in economic modeling to give us a good early read on the political environment in 2012. Certainly the state of the economy conveys upon us a better idea of the ultimate outcome than determining which candidates smile more or are more charismatic or even the early horse race numbers.
I am willing to say right now that, unless something totally unexpected occurs, the 2012 presidential election will be about the economy just like pretty much every election since 1924. We should be looking growth in real disposable personal income per capita to lead the way.
*Ray Fair's House model did not estimate seat share. However, almost all political scientists would agree that to win back the House, House Republicans, due to the Democrat's incumbent advantage, needed to take the popular House vote by more than the 1.6% Fair's model projected.
**Four if you include the original "term" variable, although this makes no difference in explanatory power.
***Note, I found slightly different estimates of real disposable income in a few other papers, but all led to the same conclusion I make.
****Yearly income data is given as the average for a given year. I assume that this average is equal to the real disposable income at the mid-point of a given year. To determine what the income is at the beginning of a calendar year, I average the income data for a given year with that of the prior year. It's not perfect, but I tested it on known beginning year data points and found it to be more than acceptable.