Project: Development of Models to Predict Whether an Incumbent Party Will Win the U.S. Presidential Election.

Introduction

A 2012 published paper by Roy K. Roth of Brigham Young University ¹ developed a regression model for predicting whether the Republican party runner or the Democratic party runner would win a U.S. presidential election. That paper developed models to predict whether the Democtaric party would win (0 or 1) and what the percentage of electoral votes received by the Democatic candidate would be. The data was based on n = 20 election years from 1932 through 2008. The author predicted a win for President Obama in 2012.

Background

Considerable research has been done in the field of U.S. presidential electionoutcome prediction. The most famous of which is the recent work of Nate Silver published on the website fivethirtyeight.com (https://projects.fivethirtyeight.com/2016-election-forecast/ ) However this work is based on polling data. For the current study, I wanted to build on the work of Roth, Ray Fair ², Leo H. Kahane ³ These papers look at the effect of econometric variables on election outcomes, particularly inflation and production,in addition to the question of whether the current president was a Democrat and how long he had been in office.

The basic assumption is that voters vote in their economic self-interest.

The current work uses these econometric variables and examines whether factors such as whether or not the incumbent is running, whether he is a Republican or not.

We based these models on U.S. Election Data from 1928 to 2016.

Data

The Appendix at the end of this report describes and provides summary statistics of the ariables used in this study.

We conduct two analyses;

One in which the the independent variables are: Unemployment Rate,Real GDP, Inflation, consecutive Terms of President Party in office, and Reelect . The response variable is binary logistic regression ( 1 if incumbent party wins and 0 otherwise ) and the second model is percentage of incumbent president electorate vote.

In the second analysis, we have a response variable of percentage of the national electoral vote attained by the incumbent party against the same independent variables.

We ran both models with the current inflation and the inflation for the four year period as well as with current production anf production for the four period sice previous research indicated that it may have been voters’ impression of the utility of an incumbent party right before the election that most affected their preferences.

Results

We can see from the models that ….

Conclusion

These models build upon the work of the previously mentioned studies and present a framework for using the facts of whether the incumbent is running or not and, if so, how long the governing party has been in office. This model can be generalized to include presidential elections in other countries, even where there are more than two parties running by definting parties as the incumbent and “all other” and with now that it has been shown with fairly straightforward regression analyses that our variables are, in fact, predictive, we can employ more sophisticated models to try to get the accuracy up higher. All of this model-building suffers from a small sample size.

# install.packages("ResourceSelection")
library(ResourceSelection)

## ResourceSelection 0.3-4   2019-01-08

data1 <- read.csv("glmdata1.csv")
# recode Party as Democrat (0), Republican (1)
data1$Party=ifelse(data1$Party=="D",0,1)
# remove Election.Year; not useful and redundant
data1$Election.Year = NULL
head(data1)

##   Year Iwin Party Unemployment RealGDP CGDP CInflation Inflation    Prctel
## 1 1928    0     1         0.13 -0.1740 -1.2        0.8     0.064 0.5875490
## 2 1932    1     0         0.21  0.1401  5.1        2.9     0.017 0.4085104
## 3 1936    1     0         0.02  0.0690 17.7        9.9     0.002 0.6245973
## 4 1940    1     0         0.04  0.0450 -1.0        2.2     0.061 0.5498388
## 5 1944    1     0         0.03 -0.0120 -0.6        2.1     0.080 0.5377684
## 6 1948    1     0         0.04  0.0770  4.7        0.7     0.027 0.5231854
##   Reelect Terms
## 1       0     1
## 2       1     1
## 3       1     2
## 4       1     3
## 5       1     4
## 6       1     5

# model 1; predict Iwin based on AVERAGE inflation and AVERAGE gdp
glm.fit1 <- glm(formula = Iwin~RealGDP+Unemployment+
               Inflation + Reelect +Terms ,family = binomial, data=data1)
summary(glm.fit1)

## 
## Call:
## glm(formula = Iwin ~ RealGDP + Unemployment + Inflation + Reelect + 
##     Terms, family = binomial, data = data1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4729  -1.0252   0.6300   0.9433   1.6265  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -0.4222     1.4392  -0.293    0.769
## RealGDP        8.3246     8.5661   0.972    0.331
## Unemployment  -6.8298    12.4983  -0.546    0.585
## Inflation     -0.2919    17.9554  -0.016    0.987
## Reelect        0.9155     1.3073   0.700    0.484
## Terms          0.1326     0.6847   0.194    0.846
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 31.841  on 22  degrees of freedom
## Residual deviance: 28.224  on 17  degrees of freedom
## AIC: 40.224
## 
## Number of Fisher Scoring iterations: 4

anova(glm.fit1)

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Iwin
## 
## Terms added sequentially (first to last)
## 
## 
##              Df Deviance Resid. Df Resid. Dev
## NULL                            22     31.841
## RealGDP       1  1.69508        21     30.146
## Unemployment  1  0.49755        20     29.649
## Inflation     1  0.00513        19     29.644
## Reelect       1  1.38038        18     28.263
## Terms         1  0.03867        17     28.224

# model 2; predict Iwin based on CURRENT inflation and CURRENT gdp 
glm.fit2 <- glm(formula = Iwin~CGDP+Unemployment+CInflation+Reelect+Terms,family = binomial, data=data1)
summary(glm.fit2)

## 
## Call:
## glm(formula = Iwin ~ CGDP + Unemployment + CInflation + Reelect + 
##     Terms, family = binomial, data = data1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5235  -0.9788   0.3498   1.0012   1.7854  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -0.7461     1.7747  -0.420    0.674
## CGDP           0.1852     0.1869   0.991    0.322
## Unemployment  -5.2200    12.1637  -0.429    0.668
## CInflation    -0.1116     0.2191  -0.509    0.611
## Reelect        0.9955     1.2984   0.767    0.443
## Terms          0.2226     0.7148   0.311    0.756
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 31.841  on 22  degrees of freedom
## Residual deviance: 28.109  on 17  degrees of freedom
## AIC: 40.109
## 
## Number of Fisher Scoring iterations: 4

anova(glm.fit2)

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Iwin
## 
## Terms added sequentially (first to last)
## 
## 
##              Df Deviance Resid. Df Resid. Dev
## NULL                            22     31.841
## CGDP          1  0.80873        21     31.033
## Unemployment  1  0.30634        20     30.726
## CInflation    1  0.54869        19     30.178
## Reelect       1  1.96688        18     28.211
## Terms         1  0.10142        17     28.109

# model 3; predict Prctel based on AVERAGE inflation and AVERAGE gdp 
lm.fit1 <- glm(formula = Prctel~Year+Party+Unemployment+RealGDP+Inflation+Reelect+Terms, data=data1)
summary(lm.fit1)

## 
## Call:
## glm(formula = Prctel ~ Year + Party + Unemployment + RealGDP + 
##     Inflation + Reelect + Terms, data = data1)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.07178  -0.03521   0.00102   0.02423   0.07696  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.4045210  0.9111604   1.541   0.1440  
## Year         -0.0004600  0.0004559  -1.009   0.3290  
## Party         0.0257977  0.0234292   1.101   0.2882  
## Unemployment -0.4713065  0.2979969  -1.582   0.1346  
## RealGDP      -0.2568946  0.1804258  -1.424   0.1750  
## Inflation     0.3742882  0.4289920   0.872   0.3967  
## Reelect       0.0787378  0.0304870   2.583   0.0208 *
## Terms        -0.0064650  0.0154704  -0.418   0.6819  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.002557151)
## 
##     Null deviance: 0.079143  on 22  degrees of freedom
## Residual deviance: 0.038357  on 15  degrees of freedom
## AIC: -63.844
## 
## Number of Fisher Scoring iterations: 2

anova(lm.fit1)

## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: Prctel
## 
## Terms added sequentially (first to last)
## 
## 
##              Df  Deviance Resid. Df Resid. Dev
## NULL                             22   0.079143
## Year          1 0.0048903        21   0.074252
## Party         1 0.0007562        20   0.073496
## Unemployment  1 0.0082050        19   0.065291
## RealGDP       1 0.0019653        18   0.063326
## Inflation     1 0.0012542        17   0.062072
## Reelect       1 0.0232677        16   0.038804
## Terms         1 0.0004466        15   0.038357

# model 2; predict Iwin based on CURRENT inflation and CURRENT gdp 
lm.fit2 <- glm(formula = Prctel~Unemployment+CGDP+CInflation+Reelect+Terms, data=data1)
summary(lm.fit2)

## 
## Call:
## glm(formula = Prctel ~ Unemployment + CGDP + CInflation + Reelect + 
##     Terms, data = data1)
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -0.090339  -0.032160  -0.006703   0.028734   0.130911  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.4682715  0.0443712  10.554 6.99e-09 ***
## Unemployment -0.2075192  0.3087244  -0.672    0.511    
## CGDP         -0.0005021  0.0036652  -0.137    0.893    
## CInflation    0.0071594  0.0053860   1.329    0.201    
## Reelect       0.0469419  0.0314174   1.494    0.153    
## Terms         0.0090136  0.0159466   0.565    0.579    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.002955564)
## 
##     Null deviance: 0.079143  on 22  degrees of freedom
## Residual deviance: 0.050245  on 17  degrees of freedom
## AIC: -61.635
## 
## Number of Fisher Scoring iterations: 2

anova(lm.fit2)

## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: Prctel
## 
## Terms added sequentially (first to last)
## 
## 
##              Df  Deviance Resid. Df Resid. Dev
## NULL                             22   0.079143
## Unemployment  1 0.0051655        21   0.073977
## CGDP          1 0.0016202        20   0.072357
## CInflation    1 0.0025827        19   0.069774
## Reelect       1 0.0185853        18   0.051189
## Terms         1 0.0009443        17   0.050245

### test GLM
hoslem.test(data1$Iwin, fitted(glm.fit1))

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  data1$Iwin, fitted(glm.fit1)
## X-squared = 9.861, df = 8, p-value = 0.2749

hoslem.test(data1$Iwin, fitted(glm.fit2))

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  data1$Iwin, fitted(glm.fit2)
## X-squared = 10.556, df = 8, p-value = 0.2282

Now let’s try to predict using one of these models and see what percentage of the 23 observations is predicted correctly:

pred <- predict(glm.fit2)
pred = ifelse(pred<0,0,1)
correct = sum(pred == data1$Iwin)/length(pred)
correct

## [1] 0.6956522

Appendices

variables <-read.csv("variable table.csv")
Mean = round(colMeans(data1),2)
Std.Dev = round(apply(data1,2,sd),2)
Min = round(apply(data1,2,min),2)
Max = round(apply(data1,2,max),2)
variables <-variables[-1,]
variables = cbind(variables, Mean, Std.Dev, Min, Max)
#install.packages("kableExtra")
library(kableExtra)
kable(variables, caption = "Appendix - Variable List - Descriptive Statistics - Years 1928 - 2016 (n = 23)") %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = T) %>%
  column_spec(2, width = "10em", background = "yellow")

Appendix - Variable List - Descriptive Statistics - Years 1928 - 2016 (n = 23)
	Variable.Name	Description	Mean	Std.Dev	Min	Max
2	Year	The year in which the election was held	1972.00	27.13	1928.00	2016.00
3	Iwin	Indicator variable indicating whether the incumbent won (1) or not (0)	0.52	0.51	0.00	1.00
4	Party	Was the winning party Republican ( R ) or Democrat (D)	0.48	0.51	0.00	1.00
5	Unemployment	The average unemployment rate in the nation over the four (4) year period prior to the election	0.07	0.04	0.02	0.21
6	RealGDP	The average GDP growth in the nation over the four(4) period prior to the election	0.02	0.07	-0.17	0.14
7	CGDP	The GDP Growth in the one (1) year prior to the election.	3.29	3.88	-2.50	17.70
8	CInflation	The inflation rate (CPI) in the nation for the one (1) year prior to the election.	3.50	2.72	0.70	9.90
9	Inflation	The inflation rate (CPI) in the nation for the four (4) year period prior to the election.	0.04	0.03	0.00	0.10
10	Prctel	The percentage of the electoral votes won by the incumbent party in the election	0.52	0.06	0.41	0.62
11	Reelect	Indicator variable indicating whether or not the incumbent office holder was riunning for reelection (1) or not (0)	0.61	0.50	0.00	1.00
12	Terms	Number of terms that the incumbent office holder had been in office prior to the election.	1.83	1.07	1.00	5.00

Roth, Roy K. (2011) “Does the Economy Determine the President? A Regression Model For Predicting US Presidential Elections,” Undergraduate Economic Review: Vol. 8: Iss. 1, Article 11. Available at: http://digitalcommons.iwu.edu/uer/vol8/iss1/11.↩
Fair, R. (1978). The Effect of Economic Events on Votes for President. The Review of Economics and Statistics, 60(2), 159-173. doi:10.2307/1924969 ↩
Kahane L. It’s the economy, and then some: modeling the presidential vote with state panel data. Public Choice. 2009;139(3/4):343-356. doi:10.1007/s11127-009-9397-z.↩

Election Data

Molabou Ouroubou