Introduction

This is a collection of notes surrounding the creation of a reasonably simple statistical model to forecast US presidential elections.

The classic paper in this regard is written by Ray Fair, and he helpfully makes all his data available on his website.

The principle of the paper to which this analysis will add (work with Leighton Vaughan Williams) is to forecast as much as possible given data availability. With Fair’s data, we can use real-time data to construct forecasts back to 1992, on the principle that population data, which is only available in real time back to 1996, is not as variable between different data releases as is GDP data.

Replicating Fair’s Model

The regression model is very simple; if \(VP\) is vote share, \(I\) is a dummy for whether the incumbent was Democrat (1) or Republican (-1), \(DPER\) is a dummy for whether the incumbent Democrat (1) or Republican (-1) is running for re-election, \(DUR\) is a dummy representing the length of terms by the formula \(I(0.25T-1)\) where \(T\) is number of terms, \(WAR\) is a dummy for the two World Wars, then \(G\) is real per capita GDP growth in the first three quarters of the election year, \(P\) is inflation (from the GDP deflator) for the first 15 quarters of the administration, and \(Z\) is the number of quarters in which the growth rate of real per capita GDP is greater than 3.2% (annualised rate): \[ VP_t = \beta_0 + \beta_1 I_t + \beta_2 DPER_t + \beta_3 DUR_t + \beta_4 WAR_t + \beta_5 G_t\times I_t + \beta_6 P_t\times I_t + \beta_7 Z_t\times I_t + e_t. \]

We can load this data, having processed it from Fair’s website, and stripping out the non-presidential year variables.

fair <- read.csv("fair.csv",stringsAsFactors=F)
fair <- fair[order(fair$t),]
#fair <- fair[is.na(fair$VP)==F,]
for(i in 1:ncol(fair)) {
  fair[,i] <- as.numeric(fair[,i])
}

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

summary(fair)

##        t              VP              VC             VCC       
##  Min.   :1876   Min.   :36.15   Min.   :37.96   Min.   :41.57  
##  1st Qu.:1910   1st Qu.:44.95   1st Qu.:48.63   1st Qu.:46.55  
##  Median :1943   Median :49.93   Median :51.38   Median :50.50  
##  Mean   :1943   Mean   :48.91   Mean   :50.93   Mean   :50.49  
##  3rd Qu.:1976   3rd Qu.:52.17   3rd Qu.:54.01   3rd Qu.:54.18  
##  Max.   :2010   Max.   :62.23   Max.   :58.48   Max.   :58.53  
##                 NA's   :34      NA's   :39      NA's   :39     
##        I                DPER               DUR               WAR         
##  Min.   :-1.0000   Min.   :-1.00000   Min.   :-2.0000   Min.   :0.00000  
##  1st Qu.:-1.0000   1st Qu.:-1.00000   1st Qu.:-1.0000   1st Qu.:0.00000  
##  Median :-1.0000   Median : 0.00000   Median : 0.0000   Median :0.00000  
##  Mean   :-0.1471   Mean   :-0.05882   Mean   :-0.2353   Mean   :0.08824  
##  3rd Qu.: 1.0000   3rd Qu.: 0.75000   3rd Qu.: 0.0000   3rd Qu.:0.00000  
##  Max.   : 1.0000   Max.   : 1.00000   Max.   : 1.7500   Max.   :1.00000  
##                    NA's   :34         NA's   :34                         
##        G                 GCC                P              PCC        
##  Min.   :-14.5860   Min.   :-11.349   Min.   :0.000   Min.   : 0.000  
##  1st Qu.: -2.0577   1st Qu.: -1.174   1st Qu.:1.442   1st Qu.: 1.026  
##  Median :  2.2195   Median :  2.590   Median :2.245   Median : 2.336  
##  Mean   :  0.7025   Mean   :  3.111   Mean   :2.667   Mean   : 2.755  
##  3rd Qu.:  4.0983   3rd Qu.:  5.814   3rd Qu.:3.310   3rd Qu.: 3.434  
##  Max.   : 11.8360   Max.   : 22.006   Max.   :7.864   Max.   :11.480  
##  NA's   :34         NA's   :34        NA's   :35      NA's   :34      
##        Z               ZCC        
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 3.000   1st Qu.: 2.143  
##  Median : 5.000   Median : 4.286  
##  Mean   : 5.182   Mean   : 5.105  
##  3rd Qu.: 7.000   3rd Qu.: 8.571  
##  Max.   :10.000   Max.   :12.857  
##  NA's   :35       NA's   :34

Adding in 2012 data

We can add in the 2012 data, for which Fair wrote on October 26 2012: “the value of G is 1.03, the value of P is 1.57, and the value of Z is still 1”, and on 30 October 2014 “The Democratic share of the two-party presidential vote in 2012 was 52.0, and the Democratic share of the two-party House vote in 2012 was 50.68”.

fair <- rbind(fair,c(2012,52,50.68,NA,1,1,1,0,1.03,NA,1.57,NA,1,NA))

This information is essential for constructing the 2012 forecast, and also evaluating it.

Replicating Fair’s Regression Results

Running the regression model we replicate Fair’s 1996 paper output (Table 2, Eq.~1):

## 
## Call:
## lm(formula = VP ~ G:I + P:I + Z:I + DPER + DUR + I + WAR, data = fair[fair$t >= 
##     1916 & fair$t < 2012, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3393 -1.4517 -0.5878  1.5195  4.7853 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  47.3843     0.6109  77.563  < 2e-16 ***
## DPER          2.9244     1.3394   2.183 0.044249 *  
## DUR          -3.4085     1.1861  -2.874 0.011025 *  
## I            -1.9143     2.2634  -0.846 0.410155    
## WAR           5.0611     2.5402   1.992 0.063677 .  
## G:I           0.6724     0.1081   6.221 1.22e-05 ***
## I:P          -0.6540     0.2831  -2.310 0.034571 *  
## I:Z           0.9904     0.2301   4.305 0.000545 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.495 on 16 degrees of freedom
##   (24 observations deleted due to missingness)
## Multiple R-squared:  0.9116, Adjusted R-squared:  0.8729 
## F-statistic: 23.56 on 7 and 16 DF,  p-value: 2.708e-07

## 
## Call:
## lm(formula = VC ~ G:I + P:I + Z:I + DPER + I + WAR + VCC.2, data = fair[fair$t >= 
##     1916 & fair$t < 2012, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5297 -0.5715 -0.0639  1.0663  3.2607 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  49.5688     0.5644  87.823  < 2e-16 ***
## DPER          2.3533     1.0293   2.286 0.036201 *  
## I            -3.8567     1.6296  -2.367 0.030901 *  
## WAR           3.4032     2.2785   1.494 0.154724    
## VCC.2         0.6440     0.1321   4.874 0.000169 ***
## G:I           0.4130     0.1016   4.065 0.000901 ***
## I:P          -0.2972     0.2593  -1.146 0.268570    
## I:Z           0.5264     0.2166   2.431 0.027209 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.23 on 16 degrees of freedom
##   (24 observations deleted due to missingness)
## Multiple R-squared:  0.8559, Adjusted R-squared:  0.7929 
## F-statistic: 13.58 on 7 and 16 DF,  p-value: 1.17e-05

## 
## Call:
## lm(formula = VC ~ index.VC + DPER + I + WAR + VCC.2, data = fair[fair$t >= 
##     1916 & fair$t < 2012, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3552 -0.5043  0.0844  0.9929  3.3474 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 49.56138    0.53294  92.996  < 2e-16 ***
## index.VC     0.57049    0.08583   6.647 3.08e-06 ***
## DPER         2.47907    0.91695   2.704  0.01454 *  
## I           -3.85160    0.84359  -4.566  0.00024 ***
## WAR          3.20811    1.66481   1.927  0.06991 .  
## VCC.2        0.62440    0.11275   5.538 2.94e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.112 on 18 degrees of freedom
##   (24 observations deleted due to missingness)
## Multiple R-squared:  0.8547, Adjusted R-squared:  0.8144 
## F-statistic: 21.18 on 5 and 18 DF,  p-value: 5.767e-07

## 
## Call:
## lm(formula = VCC ~ PCC:I + ZCC:I + I + WAR + VC.2 + VP.2, data = fair[fair$t >= 
##     1916 & fair$t < 2012, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6406 -1.4618 -0.0689  1.4299  5.5703 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  48.5108     0.7251  66.904  < 2e-16 ***
## I            -2.9797     1.1979  -2.487  0.02355 *  
## WAR           0.8357     2.1283   0.393  0.69947    
## VC.2          0.6463     0.1973   3.275  0.00447 ** 
## VP.2         -0.3225     0.1535  -2.101  0.05086 .  
## PCC:I        -0.4938     0.2246  -2.198  0.04206 *  
## I:ZCC         0.6905     0.2780   2.484  0.02369 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.494 on 17 degrees of freedom
##   (24 observations deleted due to missingness)
## Multiple R-squared:  0.7785, Adjusted R-squared:  0.7003 
## F-statistic: 9.957 on 6 and 17 DF,  p-value: 8.757e-05

## 
## Call:
## lm(formula = VCC ~ index.VCC + I + WAR + VC.2 + VP.2, data = fair[fair$t >= 
##     1916 & fair$t < 2012, ])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.718 -1.503 -0.117  1.560  5.537 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  48.5261     0.6994  69.385  < 2e-16 ***
## index.VCC     0.7171     0.2471   2.902  0.00950 ** 
## I            -3.0877     0.9736  -3.171  0.00528 ** 
## WAR           0.9663     1.9199   0.503  0.62085    
## VC.2          0.6353     0.1806   3.518  0.00246 ** 
## VP.2         -0.3297     0.1432  -2.302  0.03347 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.426 on 18 degrees of freedom
##   (24 observations deleted due to missingness)
## Multiple R-squared:  0.7781, Adjusted R-squared:  0.7165 
## F-statistic: 12.63 on 5 and 18 DF,  p-value: 2.302e-05

Real-time data

In order to effectively evaluate the quality of forecasts, we need to consider real-time data rather than ex post data. Ex post data is what is contained in the `r fair’ dataset, as it is data as we know it to be now. However, the forecast model was constructed based on data available pre-election in each year.

From the quartery data file also on Fair’s website, we can update the quarterly data, but more importantly we can use real time data. A difficulty is found in the real GDP data, for which the base year of comparing prices is updated at regular intervals. Nonetheless, this is simply a scaling, and as such has an entirely predictable scaling effect on regression estimates.

Using data from Alfred, we have real-time updates for nominal and real GDP back to 1991, and population back to 1996.

Our first step is to replicate the Fair \(G\), \(P\), and \(Z\) variables:

fairq <- read.csv("tbl2.csv",stringsAsFactors=F)
fairq$def <- 100*fairq$Nominal.GDP/fairq$Real.GDP
fairq$def_15 <- c(rep(NA,15),fairq$def[-seq(NROW(fairq)-14,NROW(fairq))])
fairq$P <- ((fairq$def/fairq$def_15)^(4/15)-1)*100
fairq$Real.GDP.pc <- fairq$Real.GDP/fairq$Population
fairq$Real.GDP.pc_1 <- c(NA,fairq$Real.GDP.pc[-NROW(fairq)])
fairq$G0 <- ((fairq$Real.GDP.pc/fairq$Real.GDP.pc_1)^4-1)*100
fairq$Z0 <- as.numeric(fairq$G0>3.2)
fairq$term2 <- floor(fairq$quarter) - as.numeric(floor(fairq$quarter) %% 2 != 0)
fairq$term4 <- floor(fairq$quarter) - as.numeric(floor(fairq$quarter) %% 4)
fairq$Z <- ave(fairq$Z0,by=fairq$term4,FUN=cumsum) 
fairq$Zcc <- (15/7)*ave(fairq$Z0,by=fairq$term2,FUN=cumsum) 
fairq$Real.GDP.pc_3 <- c(rep(NA,3),fairq$Real.GDP.pc[-seq(NROW(fairq)-2,NROW(fairq))])
fairq$G <- ((fairq$Real.GDP.pc/fairq$Real.GDP.pc_3)^(4/3)-1)*100

#ex post forecast values
fairq.x <- fairq[floor(fairq$quarter) %% 2 == 0 & fairq$quarter-floor(fairq$quarter)<0.4 & fairq$quarter-floor(fairq$quarter)>0.2,c("quarter","G","P","Z","Zcc")]

So the object fairq.x contains our recreated Fair values. While Fair is able to, in his papers, refer back to his own previous forecasts, we seek to use forecast we are able to create ourselves. To do this, we create a dataset for each election and construct a forecast for each election.

#real time forecast values
gdp <- read.table("GDP_2_Vintages_Starting_1991_12_04.txt",sep="\t", header=TRUE,stringsAsFactors=F)
gdp <- gdp[,c("observation_date","GDP_19921027","GDP_19941028","GDP_19961030","GDP_19981030",
              "GDP_20001027","GDP_20021031","GDP_20041029","GDP_20061027","GDP_20081030",
              "GDP_20101029","GDP_20121026")]
gdp$quarter <- seq(from=1946,to=2014.5,by=0.25)
gdpr <- read.table("GDPC1_2_Vintages_Starting_1991_12_04.txt",sep="\t", header=TRUE,stringsAsFactors=F)
gdpr <- gdpr[,c("observation_date","GDPC1_19921027","GDPC1_19941028","GDPC1_19961030","GDPC1_19981030",
              "GDPC1_20001027","GDPC1_20021031","GDPC1_20041029","GDPC1_20061027","GDPC1_20081030",
              "GDPC1_20101029","GDPC1_20121026")]
gdpr$quarter <- seq(from=1947,to=2014.5,by=0.25)
pop <- read.table("POP_2_Vintages_Starting_1996_12_06.txt",sep="\t", header=TRUE,stringsAsFactors=F)
pop <- pop[,c("observation_date","POP_19961206","POP_19981002","POP_20001027","POP_20021002","POP_20041029","POP_20061002",
              "POP_20081031","POP_20101008","POP_20121024")]
pop <- pop[grep("-01-01|-04-01|-07-01|-10-01",pop$observation_date),] #get quarterly observations
pop <- pop[-NROW(pop),]
pop$quarter <- seq(from=1952,to=2014.5,by=0.25)
fairq$quarter <- floor(fairq$quarter) + 2.5*(fairq$quarter-floor(fairq$quarter))

It is important to consider the correspondence between the data we are introducing and the data Fair provides. Without looking further at all of Fair’s older papers [TO DO!], I cannot know whether the difference between the real-time data as provided by Fred and Fair’s data is because Fair’s data is revised as provided on his website (although to emphasise, when he does a historical forecast exercise in his paper he uses real-time data).

plot(pop$quarter,pop$POP_20121024,type="l",main="Population",ylab="Population (thousands)",xlab="Year")
lines(fairq$quarter,1000*fairq$Population,type="l",col=2)
legend("topleft",lty=1,col=c(1,2),legend=c("Real time","Fair's ex post"))

plot(gdp$quarter,gdp$GDP_19961030,type="l",
     main="Real GDP",ylab="Real GDP ($bn)",xlab="Year")

## Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by
## coercion

lines(fairq$quarter,fairq$Nominal.GDP,type="l",col=2)
legend("topleft",lty=1,col=c(1,2),legend=c("Real time","Fair's ex post"))

rgdp.scale <- as.numeric(gdpr$GDPC1_19961030[gdpr$quarter==1959.5])/fairq$Real.GDP[fairq$quarter==1959.5]
rgdp.scale2012 <- as.numeric(gdpr$GDPC1_20121026[gdpr$quarter==1959.5])/fairq$Real.GDP[fairq$quarter==1959.5]
plot(gdpr$quarter,gdpr$GDPC1_19961030,type="l",ylim=range(c(as.numeric(gdpr$GDPC1_19961030),fairq$Real.GDP),na.rm=T),
     main="Real GDP",ylab="Real GDP (chained linked $bn)",xlab="Year")

## Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by
## coercion

## Warning in plot.default(gdpr$quarter, gdpr$GDPC1_19961030, type = "l",
## ylim = range(c(as.numeric(gdpr$GDPC1_19961030), : NAs introduced by
## coercion

lines(gdpr$quarter,gdpr$GDPC1_20121026,type="l",lty=2)
lines(fairq$quarter,fairq$Real.GDP,type="l",col=2)
lines(fairq$quarter,rgdp.scale*fairq$Real.GDP,type="l",col=2,lty=2)
legend("topleft",lty=c(1,2,1,2),col=c(1,1,2,2),legend=c("Real time","Real time (2012)","Fair's ex post","Fair's ex post scaled to real time"),bty="n")

The additional lines on the real GDP plot suggest that it is the ex post data that drives the differences.

In order to create real time datasets, we scale down the pre-Alfred real GDP data from Fair to be inline with the Alred real-time data release on account of the fact that the clear difference between the series is driven by the different base years (see the Alfred page for real GDP contains more information on the six different bases used between 1996 and 2014).

The first step is to calculate \(G\), \(P\) and \(Z\) from the real-time data. All three centre on calculating real GDP growth per capita, and the GDP deflator, and additionally calculating annualised growth. The latter is calculated following the formula: \[ r = \left[\left(\frac{Y_t}{Y_0}\right)^{m/n}-1\right]\times100, \] where \(r\) is the annualised growth rate, \(Y_t\) the most recent observation of the data series in levels, \(Y_0\) the value of the data series in levels in the base year, \(m\) is the frequency of data, and \(n\) the number of periods between \(t\) and the base period, so \(t-0\). More details can be found on the BEA website

fair1996 <- data.frame("observation_date"=seq(from=as.Date("1946-01-01"),to=as.Date("2014-07-01"),by="quarter"),"pop"=c(rep(NA,24),pop$POP_19961206),"gdp"=as.numeric(gdp$GDP_19961030),"gdpr"=c(rep(NA,4),as.numeric(gdpr$GDPC1_19961030)), stringsAsFactors=F)

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

##useful date variables
fair1996$quarter <- seq(from=1946,to=2014.5,by=0.25)
fair1996$term2.0 <- floor(fair1996$quarter) - as.numeric(floor(fair1996$quarter) %% 2 != 0)
fair1996$term2 <- c(rep(NA,4),fair1996$term2.0[-seq(NROW(fair1996)-3,NROW(fair1996))])
fair1996$term4.0 <- floor(fair1996$quarter) - as.numeric(floor(fair1996$quarter) %% 4)
fair1996$term4 <- c(rep(NA,4),fair1996$term4.0[-seq(NROW(fair1996)-3,NROW(fair1996))])
fair1996$term4.0 <- NULL
fair1996$term2.0 <- NULL
###GDP deflator for P
fair1996$def <- 100*fair1996$gdp/fair1996$gdpr
fair1996$def_15 <- c(rep(NA,15),fair1996$def[-seq(NROW(fair1996)-14,NROW(fair1996))])
fair1996$P <- ((fair1996$def/fair1996$def_15)^(4/15)-1)*100
###real GDP per capita for G and Z
fair1996$gdprpc <- fair1996$gdpr/fair1996$pop
fair1996$gdprpc_1 <- c(NA,fair1996$gdprpc[-NROW(fair1996)])
fair1996$G0 <- ((fair1996$gdprpc/fair1996$gdprpc_1)^4-1)*100
fair1996$Z0 <- as.numeric(fair1996$G0>3.2)
fair1996$Z <- ave(fair1996$Z0,by=fair1996$term4,FUN=cumsum) 
fair1996$ZCC <- (15/7)*ave(fair1996$Z0,by=fair1996$term2,FUN=cumsum) 
fair1996$gdprpc_3 <- c(rep(NA,3),fair1996$gdprpc[-seq(NROW(fair1996)-2,NROW(fair1996))])
fair1996$G <- ((fair1996$gdprpc/fair1996$gdprpc_3)^(4/3)-1)*100

We next must condense this data down to just the information needed to generate forecasts:

fair1996.x <- fair1996[floor(fair1996$quarter) %% 2 == 0 & fair1996$quarter-floor(fair1996$quarter)<0.6 & fair1996$quarter-floor(fair1996$quarter)>0.4,c("observation_date","quarter","G","P","Z","ZCC")]
fair1996.x$t <- floor(fair1996.x$quarter)
fair1996.x$quarter <-NULL
##add the off-term values of Z into Z for condensing
fair1996.x$Z[fair1996.x$t %% 2==0 & fair1996.x$t %% 4 == 2] <- fair1996.x$ZCC[fair1996.x$t %% 2==0 & fair1996.x$t %% 4 == 2]
fair1996.x$ZCC <- NULL

We now augment our \(G\), \(P\) and \(Z\) series to the older data from Fair:

fair1996.full <- merge(fair[fair$t<1997,],fair1996.x[,c("t","G","P","Z")],by=c("t"),suffixes=c(".ex",".rt"))
fair1996.full[,c("t","G.ex","GCC","G.rt","P.ex","PCC","P.rt","Z.ex","Z.rt")]

##       t   G.ex    GCC       G.rt  P.ex   PCC     P.rt Z.ex      Z.rt
## 1  1946     NA -3.474         NA    NA 0.000       NA   NA        NA
## 2  1948  3.638     NA         NA 0.000    NA       NA    0        NA
## 3  1950     NA 13.635         NA    NA 0.052       NA   NA        NA
## 4  1952  0.726     NA         NA 2.349    NA       NA    7        NA
## 5  1954     NA -0.685         NA    NA 0.761       NA   NA        NA
## 6  1956 -1.451     NA         NA 1.897    NA       NA    5        NA
## 7  1958     NA -1.337         NA    NA 2.709       NA   NA        NA
## 8  1960  0.455     NA  0.8840938 1.938    NA       NA    5        NA
## 9  1962     NA  3.669  3.3301345    NA 1.185       NA   NA  8.571429
## 10 1964  5.087     NA  5.0462592 1.269    NA 1.188754   10 10.000000
## 11 1966     NA  3.525  3.5229642    NA 2.579 2.024482   NA 10.714286
## 12 1968  5.049     NA  4.9267934 3.130    NA 3.183645    7  7.000000
## 13 1970     NA  0.018 -0.0719502    NA 5.034 4.411258   NA  2.142857
## 14 1972  5.949     NA  6.2056874 4.795    NA 4.799025    4  4.000000
## 15 1974     NA -3.013 -2.7951648    NA 8.195 6.157096   NA  4.285714
## 16 1976  3.806     NA  3.9525173 7.632    NA 7.539401    5  5.000000
## 17 1978     NA  6.035  5.4300089    NA 6.728 6.755313   NA  8.571429
## 18 1980 -3.659     NA -3.7077064 7.864    NA 8.113971    5  5.000000
## 19 1982     NA -2.877 -3.2232713    NA 7.041 8.211463   NA  4.285714
## 20 1984  5.424     NA  5.2088370 5.249    NA 5.425439    8  8.000000
## 21 1986     NA  2.241  1.5728947    NA 2.507 3.398495   NA  6.428571
## 22 1988  2.210     NA  2.0768833 2.955    NA 3.292633    4  6.000000
## 23 1990     NA  0.729  0.1490085    NA 3.897 3.925883   NA  0.000000
## 24 1992  2.949     NA  2.3586669 3.310    NA 3.683054    2  1.000000
## 25 1994     NA  2.819  2.6944740    NA 2.164 2.745830   NA  4.285714
## 26 1996  3.258     NA  2.0429080 2.027    NA 2.310833    4  3.000000

##need to put old values into real time series from before Fred data sample starts
fair1996.full$G.rt[is.na(fair1996.full$G.rt)] <- fair1996.full$G.ex[is.na(fair1996.full$G.rt)]
fair1996.full$P.rt[is.na(fair1996.full$P.rt)] <- fair1996.full$P.ex[is.na(fair1996.full$P.rt)]
fair1996.full$Z.rt[is.na(fair1996.full$Z.rt)] <- fair1996.full$Z.ex[is.na(fair1996.full$Z.rt)]
##need to create the interaction terms in order to calculate the forecast
fair1996.full$GI.ex <- fair1996.full$G.ex*fair1996.full$I
fair1996.full$PI.ex <- fair1996.full$P.ex*fair1996.full$I
fair1996.full$ZI.ex <- fair1996.full$Z.ex*fair1996.full$I
fair1996.full$GI.rt <- fair1996.full$G.rt*fair1996.full$I
fair1996.full$PI.rt <- fair1996.full$P.rt*fair1996.full$I
fair1996.full$ZI.rt <- fair1996.full$Z.rt*fair1996.full$I
##1996 regression on ex post data
reg.1996.ex <- lm(VP ~ G.ex:I + P.ex:I + Z.ex:I + DPER + DUR + I + WAR,data=fair1996.full[fair1996.full$t<1996,])
forc.1996.ex <- reg.1996.ex$coefficients %*% t(cbind(1,fair1996.full[fair1996.full$t==1996,c("GI.ex","PI.ex","ZI.ex","DPER","DUR","I","WAR")]))
summary(reg.1996.ex)

## 
## Call:
## lm(formula = VP ~ G.ex:I + P.ex:I + Z.ex:I + DPER + DUR + I + 
##     WAR, data = fair1996.full[fair1996.full$t < 1996, ])
## 
## Residuals:
##          2          4          6          8         10         12 
##  2.220e-16  2.308e+00 -2.581e-01  2.002e+00  3.142e-01 -2.249e+00 
##         14         16         18         20         22         24 
## -1.919e+00 -2.209e-01 -3.730e-01  7.336e-01 -1.723e+00  1.385e+00 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  48.3128     1.1588  41.694 1.98e-06 ***
## DPER         -2.0117     2.5391  -0.792   0.4725    
## DUR          -8.5242     2.3526  -3.623   0.0223 *  
## I             9.2768     6.2620   1.481   0.2126    
## WAR           6.4054     5.3021   1.208   0.2936    
## G.ex:I        0.8582     0.2950   2.909   0.0437 *  
## I:P.ex       -1.0645     0.3846  -2.767   0.0505 .  
## I:Z.ex        0.2296     0.6243   0.368   0.7317    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.442 on 4 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.9454, Adjusted R-squared:  0.8499 
## F-statistic: 9.896 on 7 and 4 DF,  p-value: 0.0214

reg.1996.rt <- lm(VP ~ G.rt:I + P.rt:I + Z.rt:I + DPER + DUR + I + WAR,data=fair1996.full[fair1996.full$t<1996,])
forc.1996.rt <- reg.1996.rt$coefficients %*% t(cbind(1,fair1996.full[fair1996.full$t==1996,c("GI.rt","PI.rt","ZI.rt","DPER","DUR","I","WAR")]))
summary(reg.1996.rt)

## 
## Call:
## lm(formula = VP ~ G.rt:I + P.rt:I + Z.rt:I + DPER + DUR + I + 
##     WAR, data = fair1996.full[fair1996.full$t < 1996, ])
## 
## Residuals:
##          2          4          6          8         10         12 
##  2.220e-16  1.636e+00 -3.136e-01  1.920e+00 -3.273e-02 -1.453e+00 
##         14         16         18         20         22         24 
## -2.118e+00  5.380e-02 -1.499e-01  1.267e+00 -1.791e+00  9.817e-01 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  47.9557     0.9756  49.154 1.02e-06 ***
## DPER         -0.1592     2.5829  -0.062   0.9538    
## DUR          -6.9481     2.3154  -3.001   0.0399 *  
## I             5.2177     6.1496   0.848   0.4440    
## WAR           6.7971     4.3285   1.570   0.1914    
## G.rt:I        0.8054     0.2554   3.154   0.0344 *  
## I:P.rt       -0.9462     0.3462  -2.733   0.0523 .  
## I:Z.rt        0.5282     0.5423   0.974   0.3852    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.172 on 4 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.9568, Adjusted R-squared:  0.8812 
## F-statistic: 12.66 on 7 and 4 DF,  p-value: 0.01365

The 1996 real-time forecast is 53.078315, ex post is 66.9281945. The former is close to the 53.2 Fair reports in his 2010 paper as having been his 1996 forecast in real terms. The actual value was 54.737, giving a forecast error of 1.658685.

We now nest this process in a loop to create forecasts from 1996 to 2012:

forecasts.rt <- data.frame("t"=seq(from=1996,to=2012,by = 2),"VP"=NA,"VC"=NA,"VCC"=NA)
forecasts.ex <- data.frame("t"=seq(from=1996,to=2012,by = 2),"VP"=NA,"VC"=NA,"VCC"=NA)
for (y in 1996:2012){
  if (y %% 2 == 0) {
    print(y)
    #dataset of Fred real time variables
    fair.y <- data.frame("observation_date"=seq(from=as.Date("1946-01-01"),to=as.Date("2014-07-01"),by="quarter"),
                         "pop"=c(rep(NA,24),pop[,grep(y,colnames(pop))]),"gdp"=as.numeric(gdp[,grep(y,colnames(gdp))]),
                         "gdpr"=c(rep(NA,4),as.numeric(gdpr[,grep(y,colnames(gdpr))])), stringsAsFactors=F)
    fair.y$quarter <- seq(from=1946,to=2014.5,by=0.25)
    fair.y$term2.0 <- floor(fair.y$quarter) - as.numeric(floor(fair.y$quarter) %% 2 != 0)
    fair.y$term2 <- c(rep(NA,4),fair.y$term2.0[-seq(NROW(fair.y)-3,NROW(fair.y))])
    fair.y$term4.0 <- floor(fair.y$quarter) - as.numeric(floor(fair.y$quarter) %% 4)
    fair.y$term4 <- c(rep(NA,4),fair.y$term4.0[-seq(NROW(fair.y)-3,NROW(fair.y))])
    fair.y$term4.0 <- NULL
    fair.y$term2.0 <- NULL
    fair.y$def <- 100*fair.y$gdp/fair.y$gdpr
    fair.y$def_15 <- c(rep(NA,15),fair.y$def[-seq(NROW(fair.y)-14,NROW(fair.y))])
    fair.y$P <- ((fair.y$def/fair.y$def_15)^(4/15)-1)*100
    fair.y$gdprpc <- fair.y$gdpr/fair.y$pop
    fair.y$gdprpc_1 <- c(NA,fair.y$gdprpc[-NROW(fair.y)])
    fair.y$G0 <- ((fair.y$gdprpc/fair.y$gdprpc_1)^4-1)*100
    fair.y$Z0 <- as.numeric(fair.y$G0>3.2)
    fair.y$Z <- ave(fair.y$Z0,by=fair.y$term4,FUN=cumsum) 
    fair.y$ZCC <- (15/7)*ave(fair.y$Z0,by=fair.y$term2,FUN=cumsum) 
    fair.y$gdprpc_3 <- c(rep(NA,3),fair.y$gdprpc[-seq(NROW(fair.y)-2,NROW(fair.y))])
    fair.y$G <- ((fair.y$gdprpc/fair.y$gdprpc_3)^(4/3)-1)*100

    ##condensing to just election years
    fair.y.x <- fair.y[floor(fair.y$quarter) %% 2 == 0 & fair.y$quarter-floor(fair.y$quarter)<0.6 
                           & fair.y$quarter-floor(fair.y$quarter)>0.4,
                           c("observation_date","quarter","G","P","Z","ZCC")]
    fair.y.x$t <- floor(fair.y.x$quarter)
    fair.y.x$quarter <-NULL
    
    ##make the non-CC and CC variables separately
    fair.y.x$ZCC[fair.y.x$t %% 4 == 0] <- NA
    fair.y.x$Z[fair.y.x$t %% 4 != 0] <- NA
    fair.y.x$PCC[fair.y.x$t %% 4 != 0] <- fair.y.x$P[fair.y.x$t %% 4 != 0]
    fair.y.x$P[fair.y.x$t %% 4 != 0] <- NA
    fair.y.x$GCC[fair.y.x$t %% 4 != 0] <- fair.y.x$G[fair.y.x$t %% 4 != 0]
    fair.y.x$G[fair.y.x$t %% 4 != 0] <- NA

    y.data <- merge(fair[fair$t<=y,],fair.y.x[,c("t","G","P","Z","GCC","PCC","ZCC")],
                    by=c("t"),suffixes=c(".ex",".rt"),all.x=T)
#    y.data[,c("t","G.ex","GCC","G.rt","P.ex","PCC","P.rt","Z.ex","Z.rt")]
    
    ##need to put old values into real time series from before Fred data sample starts
    y.data$G.rt[is.na(y.data$G.rt)] <- y.data$G.ex[is.na(y.data$G.rt)]
    y.data$P.rt[is.na(y.data$P.rt)] <- y.data$P.ex[is.na(y.data$P.rt)]
    y.data$Z.rt[is.na(y.data$Z.rt)] <- y.data$Z.ex[is.na(y.data$Z.rt)]
    y.data$GCC.rt[is.na(y.data$GCC.rt)] <- y.data$GCC.ex[is.na(y.data$GCC.rt)]
    y.data$PCC.rt[is.na(y.data$PCC.rt)] <- y.data$PCC.ex[is.na(y.data$PCC.rt)]
    y.data$ZCC.rt[is.na(y.data$ZCC.rt)] <- y.data$ZCC.ex[is.na(y.data$ZCC.rt)]

    ##create interaction variables in real time and ex post for forecast construction
    y.data$GI.ex <- y.data$G.ex*y.data$I
    y.data$PI.ex <- y.data$P.ex*y.data$I
    y.data$ZI.ex <- y.data$Z.ex*y.data$I
    y.data$GI.rt <- y.data$G.rt*y.data$I
    y.data$PI.rt <- y.data$P.rt*y.data$I
    y.data$ZI.rt <- y.data$Z.rt*y.data$I
    y.data$index.VC.rt <- 0.672*y.data$G.rt*y.data$I - 0.654*y.data$P.rt*y.data$I + 0.990*y.data$Z.rt*y.data$I
    y.data$index.VC.ex <- 0.672*y.data$G.ex*y.data$I - 0.654*y.data$P.ex*y.data$I + 0.990*y.data$Z.ex*y.data$I
    y.data$index.VCC.rt <- -0.654*y.data$PCC.rt*y.data$I + 0.990*y.data$ZCC.rt*y.data$I
    y.data$index.VCC.ex <- -0.654*y.data$PCC.ex*y.data$I + 0.990*y.data$ZCC.ex*y.data$I

    ##1996 regression on ex post data
    reg.1.ex <- lm(VP ~ G.ex:I + P.ex:I + Z.ex:I + DPER + DUR + I + WAR,data=y.data[y.data$t<y,])
    coefs <- as.numeric(reg.1.ex$coefficients[c("(Intercept)","G.ex:I","I:P.ex","I:Z.ex","DPER","DUR","I","WAR")])
    data <- as.numeric(cbind(1,y.data[y.data$t==y,c("GI.ex","PI.ex","ZI.ex","DPER","DUR","I","WAR")]))
    forecasts.ex[forecasts.ex$t==y,"VP"] <- t(data) %*% coefs
    reg.1.rt <- lm(VP ~ G.rt:I + P.rt:I + Z.rt:I + DPER + DUR + I + WAR,data=y.data[y.data$t<y,])
    coefs <- as.numeric(reg.1.rt$coefficients[c("(Intercept)","G.rt:I","I:P.rt","I:Z.rt","DPER","DUR","I","WAR")])
    data <- as.numeric(cbind(1,y.data[y.data$t==y,c("GI.rt","PI.rt","ZI.rt","DPER","DUR","I","WAR")]))
    forecasts.rt[forecasts.rt$t==y,"VP"] <- t(data) %*% coefs

    reg.2a.rt <- lm(VC ~ index.VC.rt + DPER + I + WAR + VCC.2,data=y.data)
    coefs <- as.numeric(reg.2a.rt$coefficients[c("(Intercept)","index.VC.rt","DPER","I","WAR","VCC.2")])
    data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VC.rt","DPER","I","WAR","VCC.2")]))
    forecasts.rt[forecasts.rt$t==y,"VC"] <- t(data) %*% coefs
    
    reg.2a.ex <- lm(VC ~ index.VC.ex + DPER + I + WAR + VCC.2,data=y.data)
    coefs <- as.numeric(reg.2a.ex$coefficients[c("(Intercept)","index.VC.ex","DPER","I","WAR","VCC.2")])
    data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VC.ex","DPER","I","WAR","VCC.2")]))
    forecasts.ex[forecasts.ex$t==y,"VC"] <- t(data) %*% coefs

    reg.3a.rt <- lm(VCC ~ index.VCC.rt + I + WAR + VC.2 + VP.2,data=y.data)
    coefs <- as.numeric(reg.3a.rt$coefficients[c("(Intercept)","index.VCC.rt","I","WAR","VC.2","VP.2")])
    data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VCC.rt","I","WAR","VC.2","VP.2")]))
    forecasts.rt[forecasts.rt$t==y,"VCC"] <- t(data) %*% coefs
    
    reg.3a.ex <- lm(VCC ~ index.VCC.ex + I + WAR + VC.2 + VP.2,data=y.data)
    coefs <- as.numeric(reg.3a.ex$coefficients[c("(Intercept)","index.VCC.ex","I","WAR","VC.2","VP.2")])
    data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VCC.ex","I","WAR","VC.2","VP.2")]))
    forecasts.ex[forecasts.ex$t==y,"VCC"] <- t(data) %*% coefs
  }
}

## [1] 1996

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## [1] 1998
## [1] 2000

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## [1] 2002

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## [1] 2004

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## [1] 2006

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## [1] 2008

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## [1] 2010

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## [1] 2012

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

forecasts.rt

##      t       VP       VC      VCC
## 1 1996 52.80855 47.91334       NA
## 2 1998       NA       NA 47.66593
## 3 2000 52.20871 50.68758       NA
## 4 2002       NA       NA 51.60220
## 5 2004 43.20585 48.08633       NA
## 6 2006       NA       NA 50.81040
## 7 2008 50.79055 55.41992       NA
## 8 2010       NA       NA 49.47886
## 9 2012 48.53606 47.34091       NA

The final table is the real-time forecasts created. Compared to Fair’s Table 5 there are differences. Next we show the forecast errors:

## [1] "Errors"

##           VP         VC       VCC
## 31  1.928452  2.2446568        NA
## 65        NA         NA  1.867067
## 32 -1.946708 -0.8685828        NA
## 66        NA         NA -4.040199
## 33  5.561152  0.5456739        NA
## 67        NA         NA  3.309602
## 34  2.898450  0.1150787        NA
## 68        NA         NA -3.578860
## 69  3.463936  3.3390900        NA

## [1] "Mean Errors"

##         VP         VC        VCC 
##  2.3810564  1.0751833 -0.6105977

## [1] "Mean Squared Errors"

##        VP        VC       VCC 
## 11.766974  3.450689 10.892715

## [1] "Mean Absolute Errors"

##       VP       VC      VCC 
## 3.159739 1.422616 3.198932

Improving the model

Testing the implied restrictions

The dummy variables \(I\), \(DPER\) and \(DUR\) are somewhat restrictive in their format, likely reflecting a preference for smaller econometric models over larger ones. An important step towards such a restrictive model specification is to test those restrictions. We can test them by estimating the unrestricted model and calculating a likelihood ratio test.

We first need to split up the variables. Because \(I\) is only ever 1 or -1 (it is never zero), it is simply a scaling of a standard 0/1 dummy variable and so no restrictions are involved. The \(DPER\) variable, however, can take the value zero. As such, we split this into \(DIR\) and \(RIR\) for Democrat and Republican incumbents standing for re-election, respectively. We then split \(DUR\) into individual dummy variables for each type of term length:

fair$DIR <- as.numeric(fair$DPER==1)
fair$RIR <- as.numeric(fair$DPER==-1)
fair$"1Term" <- as.numeric(fair$DUR==0)
fair$D2Terms <- as.numeric(fair$DUR==1)
fair$R2Terms <- as.numeric(fair$DUR==-1)
fair$D3Terms <- as.numeric(fair$DUR==1.25)
fair$R3Terms <- as.numeric(fair$DUR==-1.25)
fair$D4Terms <- as.numeric(fair$DUR==1.5)
fair$R4Terms <- as.numeric(fair$DUR==-1.5)
fair$D5Terms <- as.numeric(fair$DUR==1.75)
fair$R5Terms <- as.numeric(fair$DUR==-1.75)
fair$R6Terms <- as.numeric(fair$DUR==-2)

We now run the unrestricted version and test the restrictions:

reg.1.u <- lm(VP ~ G:I + P:I + Z:I + DIR + RIR + D2Terms + R2Terms + D3Terms + R3Terms + D4Terms + D5Terms + WAR,data=fair[fair$t>=1916 & fair$t<2012,])
summary(reg.1.u)

## 
## Call:
## lm(formula = VP ~ G:I + P:I + Z:I + DIR + RIR + D2Terms + R2Terms + 
##     D3Terms + R3Terms + D4Terms + D5Terms + WAR, data = fair[fair$t >= 
##     1916 & fair$t < 2012, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6660 -0.2485  0.0000  0.7578  2.9887 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  47.5047     3.7465  12.680 6.59e-08 ***
## DIR           2.1435     2.9500   0.727 0.482652    
## RIR          -1.5416     5.0110  -0.308 0.764096    
## D2Terms      -4.6031     2.7872  -1.652 0.126860    
## R2Terms       3.8078     4.7789   0.797 0.442427    
## D3Terms       3.6892     4.2873   0.860 0.407881    
## R3Terms       6.7444     2.2035   3.061 0.010837 *  
## D4Terms       2.5029     4.2545   0.588 0.568215    
## D5Terms      -8.4792     4.0560  -2.091 0.060586 .  
## WAR          -1.4995     3.8911  -0.385 0.707317    
## G:I           0.4584     0.1297   3.535 0.004676 ** 
## I:P          -0.7811     0.2871  -2.720 0.019919 *  
## I:Z           1.0267     0.2259   4.545 0.000837 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.342 on 11 degrees of freedom
##   (24 observations deleted due to missingness)
## Multiple R-squared:  0.9464, Adjusted R-squared:  0.888 
## F-statistic: 16.19 on 12 and 11 DF,  p-value: 2.795e-05

anova(reg.1,reg.1.u)

## Analysis of Variance Table
## 
## Model 1: VP ~ G:I + P:I + Z:I + DPER + DUR + I + WAR
## Model 2: VP ~ G:I + P:I + Z:I + DIR + RIR + D2Terms + R2Terms + D3Terms + 
##     R3Terms + D4Terms + D5Terms + WAR
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     16 99.573                           
## 2     11 60.333  5    39.239 1.4308 0.2876

On the basis of this test, it appears that the restrictions are not rejected.

Impulse Saturation

The accuracy of any forecast is determined, in large part, by the value of the constant. However, the constant, as the simple average over the entire sample, may not be particularly informative if there have been structural changes over the sample. An effective way of determining the right level is to use impulse saturation. Impulse saturation is a technique developed by David F. Hendry, Soren Johansen and Carlos Santos, and it involves adding large sets of indicator (dummy) variables in batches to determine their significance.

Here we determine whether at each forecast horizon in our real-time exercise impulse saturation could have helped improve the forecast.

We run impulse saturation on each forecast model. For unclear reasons (the R code is still in developmental stages), the \(WAR\) dummy caused problems with singularity in some regressions, and hence was dropped for all; as impulse saturation is an outlier detection mechanism it will pick up dummies, if they are required, for the war periods, and indeed will select individual impulses for different years of different wars if they are helpful for better modelling the data series.

We implement impulse saturation allowing both single impulse dummies (one observation), and step-impulse dummies (dummies for many observations). We then forecast using the selected model, and report statistics about all forecasts below.

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## [1] 1996

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 4 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 1998
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 2000

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 2002

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 2004

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 2006

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 1 regressors
## 
## regressor-matrix is column rank deficient, so dropping 1 regressors
## 
## regressor-matrix is column rank deficient, so dropping 7 regressors
## 
## regressor-matrix is column rank deficient, so dropping 7 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 2008

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 1 regressors
## 
## regressor-matrix is column rank deficient, so dropping 1 regressors
## 
## regressor-matrix is column rank deficient, so dropping 3 regressors
## 
## regressor-matrix is column rank deficient, so dropping 7 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 2010

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 7 regressors
## 
## regressor-matrix is column rank deficient, so dropping 7 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## [1] 2012

## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion

## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 7 regressors
## 
## regressor-matrix is column rank deficient, so dropping 7 regressors
## 
## regressor-matrix is column rank deficient, so dropping 2 regressors
## 
## regressor-matrix is column rank deficient, so dropping 1 regressors

##      t       VP       VC      VCC  VP.isat  VC.isat VCC.isat
## 1 1996 52.80855 47.91334       NA 54.24743 52.34577       NA
## 2 1998       NA       NA 47.66593       NA       NA 46.99754
## 3 2000 52.20871 50.68758       NA 48.40067 46.77065       NA
## 4 2002       NA       NA 51.60220       NA       NA 52.33356
## 5 2004 43.20585 48.08633       NA 38.19719 46.22998       NA
## 6 2006       NA       NA 50.81040       NA       NA 52.48237
## 7 2008 50.79055 55.41992       NA 52.53429 53.33611       NA
## 8 2010       NA       NA 49.47886       NA       NA 48.85087
## 9 2012 48.53606 47.34091       NA 48.96142 46.03248       NA

Turning to the forecast errors:

## [1] "Errors"

##    fair$t[fair$t >= 1996]        VP         VC       VCC       VP.1
## 31                   1996  1.928452  2.2446568        NA  0.4895652
## 65                   1998        NA         NA  1.867067         NA
## 32                   2000 -1.946708 -0.8685828        NA  1.8613336
## 66                   2002        NA         NA -4.040199         NA
## 33                   2004  5.561152  0.5456739        NA 10.5698099
## 67                   2006        NA         NA  3.309602         NA
## 34                   2008  2.898450  0.1150787        NA  1.1547105
## 68                   2010        NA         NA -3.578860         NA
## 69                   2012  3.463936  3.3390900        NA  3.0385826
##         VC.1     VCC.1
## 31 -2.187775        NA
## 65        NA  2.535456
## 32  3.048350        NA
## 66        NA -4.771558
## 33  2.402016        NA
## 67        NA  1.637631
## 34  2.198890        NA
## 68        NA -2.950868
## 69  4.647518        NA

## [1] "Mean Errors"

##         VP         VC        VCC       VP.1       VC.1      VCC.1 
##  2.3810564  1.0751833 -0.6105977  3.4228004  2.0217998 -0.8873347

## [1] "Mean Squared Errors"

##        VP        VC       VCC      VP.1      VC.1     VCC.1 
## 11.766974  3.450689 10.892715 25.198292  9.256603 10.146440

## [1] "Mean Absolute Errors"

##       VP       VC      VCC     VP.1     VC.1    VCC.1 
## 3.159739 1.422616 3.198932 3.422800 2.896910 2.973878

The first three columns are the standard forecasts, the second three columns are with impulse saturation. By and large, impulse saturation does not help improve the forecasts. Only for off-term house elections (\(VCC\)) does impulse saturation help improve forecasts, on average. That said, for the presidential elections it is only 2004 where isat generates a substantially worse forecast than the simple Fair regression model; for all other presidential elections isat produces smaller forecast errors.

We can represent these forecasts graphically:

## pdf 
##   2

In this plot, election outcomes are in black, real time forecasts are in red, and isat real-time forecasts are in green. The circles are presidential elections, the triangles are on-term congressional elections, and the crosses are off-term congressional elections.

Ray Fair Statistical Model

J James Reade

11/12/2014