A 2012 published paper by Roy K. Roth of Brigham Young University 1 developed a regression model for predicting whether the Republican party runner or the Democratic party runner would win a U.S. presidential election. That paper developed models to predict whether the Democtaric party would win (0 or 1) and what the percentage of electoral votes received by the Democatic candidate would be. The data was based on n = 20 election years from 1932 through 2008. The author predicted a win for President Obama in 2012.
Considerable research has been done in the field of U.S. presidential electionoutcome prediction. The most famous of which is the recent work of Nate Silver published on the website fivethirtyeight.com (https://projects.fivethirtyeight.com/2016-election-forecast/ ) However this work is based on polling data. For the current study, I wanted to build on the work of Roth, Ray Fair 2, Leo H. Kahane 3 These papers look at the effect of econometric variables on election outcomes, particularly inflation and production,in addition to the question of whether the current president was a Democrat and how long he had been in office.
The basic assumption is that voters vote in their economic self-interest.
The current work uses these econometric variables and examines whether factors such as whether or not the incumbent is running, whether he is a Republican or not.
We based these models on U.S. Election Data from 1928 to 2016.
The Appendix at the end of this report describes and provides summary statistics of the ariables used in this study.
We conduct two analyses;
One in which the the independent variables are: Unemployment Rate,Real GDP, Inflation, consecutive Terms of President Party in office, and Reelect . The response variable is binary logistic regression ( 1 if incumbent party wins and 0 otherwise ) and the second model is percentage of incumbent president electorate vote.
In the second analysis, we have a response variable of percentage of the national electoral vote attained by the incumbent party against the same independent variables.
We ran both models with the current inflation and the inflation for the four year period as well as with current production anf production for the four period sice previous research indicated that it may have been voters’ impression of the utility of an incumbent party right before the election that most affected their preferences.
These models build upon the work of the previously mentioned studies and present a framework for using the facts of whether the incumbent is running or not and, if so, how long the governing party has been in office. This model can be generalized to include presidential elections in other countries, even where there are more than two parties running by definting parties as the incumbent and “all other” and with now that it has been shown with fairly straightforward regression analyses that our variables are, in fact, predictive, we can employ more sophisticated models to try to get the accuracy up higher. All of this model-building suffers from a small sample size.
# install.packages("ResourceSelection")
library(ResourceSelection)
## ResourceSelection 0.3-4 2019-01-08
data1 <- read.csv("glmdata1.csv")
# recode Party as Democrat (0), Republican (1)
data1$Party=ifelse(data1$Party=="D",0,1)
# remove Election.Year; not useful and redundant
data1$Election.Year = NULL
head(data1)
## Year Iwin Party Unemployment RealGDP CGDP CInflation Inflation Prctel
## 1 1928 0 1 0.13 -0.1740 -1.2 0.8 0.064 0.5875490
## 2 1932 1 0 0.21 0.1401 5.1 2.9 0.017 0.4085104
## 3 1936 1 0 0.02 0.0690 17.7 9.9 0.002 0.6245973
## 4 1940 1 0 0.04 0.0450 -1.0 2.2 0.061 0.5498388
## 5 1944 1 0 0.03 -0.0120 -0.6 2.1 0.080 0.5377684
## 6 1948 1 0 0.04 0.0770 4.7 0.7 0.027 0.5231854
## Reelect Terms
## 1 0 1
## 2 1 1
## 3 1 2
## 4 1 3
## 5 1 4
## 6 1 5
# model 1; predict Iwin based on AVERAGE inflation and AVERAGE gdp
glm.fit1 <- glm(formula = Iwin~RealGDP+Unemployment+
Inflation + Reelect +Terms ,family = binomial, data=data1)
summary(glm.fit1)
##
## Call:
## glm(formula = Iwin ~ RealGDP + Unemployment + Inflation + Reelect +
## Terms, family = binomial, data = data1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4729 -1.0252 0.6300 0.9433 1.6265
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4222 1.4392 -0.293 0.769
## RealGDP 8.3246 8.5661 0.972 0.331
## Unemployment -6.8298 12.4983 -0.546 0.585
## Inflation -0.2919 17.9554 -0.016 0.987
## Reelect 0.9155 1.3073 0.700 0.484
## Terms 0.1326 0.6847 0.194 0.846
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 31.841 on 22 degrees of freedom
## Residual deviance: 28.224 on 17 degrees of freedom
## AIC: 40.224
##
## Number of Fisher Scoring iterations: 4
anova(glm.fit1)
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: Iwin
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 22 31.841
## RealGDP 1 1.69508 21 30.146
## Unemployment 1 0.49755 20 29.649
## Inflation 1 0.00513 19 29.644
## Reelect 1 1.38038 18 28.263
## Terms 1 0.03867 17 28.224
# model 2; predict Iwin based on CURRENT inflation and CURRENT gdp
glm.fit2 <- glm(formula = Iwin~CGDP+Unemployment+CInflation+Reelect+Terms,family = binomial, data=data1)
summary(glm.fit2)
##
## Call:
## glm(formula = Iwin ~ CGDP + Unemployment + CInflation + Reelect +
## Terms, family = binomial, data = data1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5235 -0.9788 0.3498 1.0012 1.7854
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7461 1.7747 -0.420 0.674
## CGDP 0.1852 0.1869 0.991 0.322
## Unemployment -5.2200 12.1637 -0.429 0.668
## CInflation -0.1116 0.2191 -0.509 0.611
## Reelect 0.9955 1.2984 0.767 0.443
## Terms 0.2226 0.7148 0.311 0.756
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 31.841 on 22 degrees of freedom
## Residual deviance: 28.109 on 17 degrees of freedom
## AIC: 40.109
##
## Number of Fisher Scoring iterations: 4
anova(glm.fit2)
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: Iwin
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 22 31.841
## CGDP 1 0.80873 21 31.033
## Unemployment 1 0.30634 20 30.726
## CInflation 1 0.54869 19 30.178
## Reelect 1 1.96688 18 28.211
## Terms 1 0.10142 17 28.109
# model 3; predict Prctel based on AVERAGE inflation and AVERAGE gdp
lm.fit1 <- glm(formula = Prctel~Year+Party+Unemployment+RealGDP+Inflation+Reelect+Terms, data=data1)
summary(lm.fit1)
##
## Call:
## glm(formula = Prctel ~ Year + Party + Unemployment + RealGDP +
## Inflation + Reelect + Terms, data = data1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.07178 -0.03521 0.00102 0.02423 0.07696
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.4045210 0.9111604 1.541 0.1440
## Year -0.0004600 0.0004559 -1.009 0.3290
## Party 0.0257977 0.0234292 1.101 0.2882
## Unemployment -0.4713065 0.2979969 -1.582 0.1346
## RealGDP -0.2568946 0.1804258 -1.424 0.1750
## Inflation 0.3742882 0.4289920 0.872 0.3967
## Reelect 0.0787378 0.0304870 2.583 0.0208 *
## Terms -0.0064650 0.0154704 -0.418 0.6819
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.002557151)
##
## Null deviance: 0.079143 on 22 degrees of freedom
## Residual deviance: 0.038357 on 15 degrees of freedom
## AIC: -63.844
##
## Number of Fisher Scoring iterations: 2
anova(lm.fit1)
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: Prctel
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 22 0.079143
## Year 1 0.0048903 21 0.074252
## Party 1 0.0007562 20 0.073496
## Unemployment 1 0.0082050 19 0.065291
## RealGDP 1 0.0019653 18 0.063326
## Inflation 1 0.0012542 17 0.062072
## Reelect 1 0.0232677 16 0.038804
## Terms 1 0.0004466 15 0.038357
# model 2; predict Iwin based on CURRENT inflation and CURRENT gdp
lm.fit2 <- glm(formula = Prctel~Unemployment+CGDP+CInflation+Reelect+Terms, data=data1)
summary(lm.fit2)
##
## Call:
## glm(formula = Prctel ~ Unemployment + CGDP + CInflation + Reelect +
## Terms, data = data1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.090339 -0.032160 -0.006703 0.028734 0.130911
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4682715 0.0443712 10.554 6.99e-09 ***
## Unemployment -0.2075192 0.3087244 -0.672 0.511
## CGDP -0.0005021 0.0036652 -0.137 0.893
## CInflation 0.0071594 0.0053860 1.329 0.201
## Reelect 0.0469419 0.0314174 1.494 0.153
## Terms 0.0090136 0.0159466 0.565 0.579
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.002955564)
##
## Null deviance: 0.079143 on 22 degrees of freedom
## Residual deviance: 0.050245 on 17 degrees of freedom
## AIC: -61.635
##
## Number of Fisher Scoring iterations: 2
anova(lm.fit2)
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: Prctel
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 22 0.079143
## Unemployment 1 0.0051655 21 0.073977
## CGDP 1 0.0016202 20 0.072357
## CInflation 1 0.0025827 19 0.069774
## Reelect 1 0.0185853 18 0.051189
## Terms 1 0.0009443 17 0.050245
### test GLM
hoslem.test(data1$Iwin, fitted(glm.fit1))
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: data1$Iwin, fitted(glm.fit1)
## X-squared = 9.861, df = 8, p-value = 0.2749
hoslem.test(data1$Iwin, fitted(glm.fit2))
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: data1$Iwin, fitted(glm.fit2)
## X-squared = 10.556, df = 8, p-value = 0.2282
Now let’s try to predict using one of these models and see what percentage of the 23 observations is predicted correctly:
pred <- predict(glm.fit2)
pred = ifelse(pred<0,0,1)
correct = sum(pred == data1$Iwin)/length(pred)
correct
## [1] 0.6956522
variables <-read.csv("variable table.csv")
Mean = round(colMeans(data1),2)
Std.Dev = round(apply(data1,2,sd),2)
Min = round(apply(data1,2,min),2)
Max = round(apply(data1,2,max),2)
variables <-variables[-1,]
variables = cbind(variables, Mean, Std.Dev, Min, Max)
#install.packages("kableExtra")
library(kableExtra)
kable(variables, caption = "Appendix - Variable List - Descriptive Statistics - Years 1928 - 2016 (n = 23)") %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T, border_right = T) %>%
column_spec(2, width = "10em", background = "yellow")
| Variable.Name | Description | Mean | Std.Dev | Min | Max | |
|---|---|---|---|---|---|---|
| 2 | Year | The year in which the election was held | 1972.00 | 27.13 | 1928.00 | 2016.00 |
| 3 | Iwin | Indicator variable indicating whether the incumbent won (1) or not (0) | 0.52 | 0.51 | 0.00 | 1.00 |
| 4 | Party | Was the winning party Republican ( R ) or Democrat (D) | 0.48 | 0.51 | 0.00 | 1.00 |
| 5 | Unemployment | The average unemployment rate in the nation over the four (4) year period prior to the election | 0.07 | 0.04 | 0.02 | 0.21 |
| 6 | RealGDP | The average GDP growth in the nation over the four(4) period prior to the election | 0.02 | 0.07 | -0.17 | 0.14 |
| 7 | CGDP | The GDP Growth in the one (1) year prior to the election. | 3.29 | 3.88 | -2.50 | 17.70 |
| 8 | CInflation | The inflation rate (CPI) in the nation for the one (1) year prior to the election. | 3.50 | 2.72 | 0.70 | 9.90 |
| 9 | Inflation | The inflation rate (CPI) in the nation for the four (4) year period prior to the election. | 0.04 | 0.03 | 0.00 | 0.10 |
| 10 | Prctel | The percentage of the electoral votes won by the incumbent party in the election | 0.52 | 0.06 | 0.41 | 0.62 |
| 11 | Reelect | Indicator variable indicating whether or not the incumbent office holder was riunning for reelection (1) or not (0) | 0.61 | 0.50 | 0.00 | 1.00 |
| 12 | Terms | Number of terms that the incumbent office holder had been in office prior to the election. | 1.83 | 1.07 | 1.00 | 5.00 |
Roth, Roy K. (2011) “Does the Economy Determine the President? A Regression Model For Predicting US Presidential Elections,” Undergraduate Economic Review: Vol. 8: Iss. 1, Article 11. Available at: http://digitalcommons.iwu.edu/uer/vol8/iss1/11.↩
Fair, R. (1978). The Effect of Economic Events on Votes for President. The Review of Economics and Statistics, 60(2), 159-173. doi:10.2307/1924969↩
Kahane L. It’s the economy, and then some: modeling the presidential vote with state panel data. Public Choice. 2009;139(3/4):343-356. doi:10.1007/s11127-009-9397-z.↩