Based on Ray Fair’s book, Chapter 2
Fair’s big idea is people hold the president (and the president’s party) responsible for the state of the economy and we can use that to predict presidential elections.
First let’s get the data. The data through 2008 is here. I’ve updated this to include data for the past two elections.
presdata <- read.csv('Data_Science_Data/election_forecasting/rayfair.csv', header=TRUE)
## YEAR VP I DPER DUR WAR G P Z
## 1 1916 51.682 1 1 0.00 0 2.229 4.252 3
## 2 1920 36.148 1 0 1.00 1 -11.463 0.000 0
## 3 1924 41.737 -1 -1 0.00 0 -3.872 5.161 10
## 4 1928 41.244 -1 0 -1.00 0 4.623 0.183 7
## 5 1932 59.149 -1 -1 -1.25 0 -14.586 7.164 4
## 6 1936 62.226 1 1 0.00 0 11.836 2.475 9
## YEAR VP I DPER DUR WAR G P Z
## 21 1996 54.737 1 1 0 0 3.258 2.027 4
## 22 2000 50.262 1 0 1 0 2.014 1.641 7
## 23 2004 48.767 -1 -1 0 0 1.989 2.245 1
## 24 2008 53.689 -1 0 -1 0 -2.260 3.052 1
## 25 2012 52.010 1 1 0 0 1.422 1.470 1
## 26 2016 51.100 1 0 1 0 1.970 1.370 3
VP:: Democratic share of the two-party presidential vote
I: 1 if there is a Democratic presidential incumbent at the time of the election and −1 if there is a Republican presidential incumbent
DPER: 1 if a Democratic presidential incumbent is running again, −1 if a Republican presidential incumbent is running again, and 0 otherwise.
DUR: 0 if either party has been in the White House for one term, 1 [−1] if the Democratic [Republican] party has been in the White House for two consecutive terms, 1.25 [−1.25] if the Democratic [Republican] party has been in the White House for three consecutive terms, 1.50 [−1.50] if the Democratic [Republican] party
WAR: 1 for the elections of 1918, 1920, 1942, 1944, 1946, and 1948, and 0 otherwise.
G: growth rate of real per capita GDP in the first three quarters of the on-term election year (annual rate).
P: absolute value of the growth rate of the GDP deflator in the first 15 quarters of the administration (annual rate) except for 1920, 1944, and 1948, where the values are zero.
Z: number of quarters in the first 15 quarters of the administration in which the growth rate of real per capita GDP is greater than 3.2 percent at an annual rate except for 1920, 1944, and 1948, where the values are zero.
Election Data
library(dplyr); library(ggplot2)
presdata <- presdata %>% mutate(IV = ifelse(I==1, VP, 100-VP))
tail(presdata)
## YEAR VP I DPER DUR WAR G P Z IV
## 21 1996 54.737 1 1 0 0 3.258 2.027 4 54.737
## 22 2000 50.262 1 0 1 0 2.014 1.641 7 50.262
## 23 2004 48.767 -1 -1 0 0 1.989 2.245 1 51.233
## 24 2008 53.689 -1 0 -1 0 -2.260 3.052 1 46.311
## 25 2012 52.010 1 1 0 0 1.422 1.470 1 52.010
## 26 2016 51.100 1 0 1 0 1.970 1.370 3 51.100
presdata %>% ggplot(aes(G, IV))+geom_point()+geom_smooth(method="lm", se=FALSE)
coef(lm(IV ~ G, data=presdata))
## (Intercept) G
## 50.7997416 0.8774338
\[ Incumbent\ Vote\ Share = 50.9 + 0.88*(GDP\ Growth) \]
Am I really convinced that higher GDP growth in the firstt 9 months of the election year helps the incumbent party?
We can absolutely use our judgement to help answer this question.
We can also use our new tool – Bootstrapping
We will sample 26 elections with replacement from the 26 actual elections many times and each time we will look at the relationship between GDP growth in the election year (G) and Incumbent Vote Share (IV).
Do we always see a positive relationship?
coef(lm(IV ~ G, data=presdata))
## (Intercept) G
## 50.7997416 0.8774338
sample(1:nrow(presdata), nrow(presdata), replace=TRUE)
## [1] 17 16 6 24 12 3 11 18 7 5 13 19 18 9 9 1 26 9 1 10 2 3 10
## [24] 17 25 18
sort(sample(1:nrow(presdata), nrow(presdata), replace=TRUE))
## [1] 2 2 3 4 5 6 6 7 7 8 11 12 13 13 13 15 17 17 20 20 21 22 22
## [24] 24 24 25
coef(lm(IV ~ G, data=presdata[sample(1:nrow(presdata), nrow(presdata), replace=TRUE),]))
## (Intercept) G
## 51.9366530 0.9764221
coef(lm(IV ~ G, data=presdata[sample(1:nrow(presdata), nrow(presdata), replace=TRUE),]))
## (Intercept) G
## 52.3957380 0.8813672
int_and_slope <- replicate(1e3,
coef(lm(IV ~ G, data=presdata[sample(1:nrow(presdata), nrow(presdata), replace=TRUE),]))) %>%
as.numeric() %>% matrix(ncol=2, nrow=1000, byrow=TRUE)
head(int_and_slope)
## [,1] [,2]
## [1,] 51.35870 0.5322817
## [2,] 51.11783 0.8686014
## [3,] 51.34801 0.6059568
## [4,] 50.48159 0.9098724
## [5,] 51.44142 0.7344160
## [6,] 52.32835 0.8131375
presdata %>% ggplot(aes(G, IV))+geom_point()+
geom_abline(intercept=int_and_slope[,1], slope=int_and_slope[,2], col="blue", lty=2, size=0.1)
The Model’s “Predictions” for Incumbent Vote Share (first four years):
growth_model <- lm(IV ~ G, data=presdata)
presdata$GDPgrowth_preds <- predict(growth_model, presdata)
presdata %>% ggplot(aes(GDPgrowth_preds, IV))+geom_point()+
geom_abline(intercept = 0, slope=1, lty=2, color="red")
The Residuals/“Errors” are the differences between the predictions and the actual results:
\[ residual = actual - predicted \]
presdata <- presdata %>% mutate(G_model_residuals = IV - GDPgrowth_preds)
presdata %>% ggplot(aes(GDPgrowth_preds, G_model_residuals)) + geom_point()
presdata %>% ggplot(aes(YEAR, G_model_residuals)) + geom_point() +
geom_abline(intercept=0, slope=0, color="red", lty=2)
presdata %>% ggplot(aes(G_model_residuals)) + geom_histogram()
presdata %>% ggplot(aes(G_model_residuals)) + geom_histogram() +facet_wrap(~I)
Can you predict these residuals using one of the other variables?
If so, you can build a model that predicts presidential elections based on GDP growth and another variable.
Challenge: Build a Better Model to Predict Presidential Elections using Ray Fair’s Variables.
(Potentially Helpful Code on the Next Slide)
new_growth_model <- lm(IV ~ G + Z + I(abs(DUR)), data=presdata)
coefs <- replicate(1e3,
coef(lm(IV ~ G + Z + I(abs(DUR)),
data=presdata[sample(1:nrow(presdata), nrow(presdata), replace=TRUE),]))) %>%
as.numeric() %>% matrix(ncol=4, nrow=1000, byrow=TRUE)
head(coefs)
## [,1] [,2] [,3] [,4]
## [1,] 52.12263 0.5691348 0.60893097 -5.107743
## [2,] 52.39962 1.2343053 0.04596531 -4.361890
## [3,] 52.98340 0.6723391 0.26004215 -4.909852
## [4,] 49.75329 0.5826170 0.93350760 -5.143337
## [5,] 52.55709 0.6393888 0.46053968 -3.735133
## [6,] 52.47557 0.7139497 0.24867062 -4.954677
hist(coefs[,3])
presdata$newpredictions <- predict(new_growth_model, presdata)