This is a collection of notes surrounding the creation of a reasonably simple statistical model to forecast US presidential elections.
The classic paper in this regard is written by Ray Fair, and he helpfully makes all his data available on his website.
The principle of the paper to which this analysis will add (work with Leighton Vaughan Williams) is to forecast as much as possible given data availability. With Fair’s data, we can use real-time data to construct forecasts back to 1992, on the principle that population data, which is only available in real time back to 1996, is not as variable between different data releases as is GDP data.
The regression model is very simple; if \(VP\) is vote share, \(I\) is a dummy for whether the incumbent was Democrat (1) or Republican (-1), \(DPER\) is a dummy for whether the incumbent Democrat (1) or Republican (-1) is running for re-election, \(DUR\) is a dummy representing the length of terms by the formula \(I(0.25T-1)\) where \(T\) is number of terms, \(WAR\) is a dummy for the two World Wars, then \(G\) is real per capita GDP growth in the first three quarters of the election year, \(P\) is inflation (from the GDP deflator) for the first 15 quarters of the administration, and \(Z\) is the number of quarters in which the growth rate of real per capita GDP is greater than 3.2% (annualised rate): \[ VP_t = \beta_0 + \beta_1 I_t + \beta_2 DPER_t + \beta_3 DUR_t + \beta_4 WAR_t + \beta_5 G_t\times I_t + \beta_6 P_t\times I_t + \beta_7 Z_t\times I_t + e_t. \]
We can load this data, having processed it from Fair’s website, and stripping out the non-presidential year variables.
fair <- read.csv("fair.csv",stringsAsFactors=F)
fair <- fair[order(fair$t),]
#fair <- fair[is.na(fair$VP)==F,]
for(i in 1:ncol(fair)) {
fair[,i] <- as.numeric(fair[,i])
}
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
summary(fair)
## t VP VC VCC
## Min. :1876 Min. :36.15 Min. :37.96 Min. :41.57
## 1st Qu.:1910 1st Qu.:44.95 1st Qu.:48.63 1st Qu.:46.55
## Median :1943 Median :49.93 Median :51.38 Median :50.50
## Mean :1943 Mean :48.91 Mean :50.93 Mean :50.49
## 3rd Qu.:1976 3rd Qu.:52.17 3rd Qu.:54.01 3rd Qu.:54.18
## Max. :2010 Max. :62.23 Max. :58.48 Max. :58.53
## NA's :34 NA's :39 NA's :39
## I DPER DUR WAR
## Min. :-1.0000 Min. :-1.00000 Min. :-2.0000 Min. :0.00000
## 1st Qu.:-1.0000 1st Qu.:-1.00000 1st Qu.:-1.0000 1st Qu.:0.00000
## Median :-1.0000 Median : 0.00000 Median : 0.0000 Median :0.00000
## Mean :-0.1471 Mean :-0.05882 Mean :-0.2353 Mean :0.08824
## 3rd Qu.: 1.0000 3rd Qu.: 0.75000 3rd Qu.: 0.0000 3rd Qu.:0.00000
## Max. : 1.0000 Max. : 1.00000 Max. : 1.7500 Max. :1.00000
## NA's :34 NA's :34
## G GCC P PCC
## Min. :-14.5860 Min. :-11.349 Min. :0.000 Min. : 0.000
## 1st Qu.: -2.0577 1st Qu.: -1.174 1st Qu.:1.442 1st Qu.: 1.026
## Median : 2.2195 Median : 2.590 Median :2.245 Median : 2.336
## Mean : 0.7025 Mean : 3.111 Mean :2.667 Mean : 2.755
## 3rd Qu.: 4.0983 3rd Qu.: 5.814 3rd Qu.:3.310 3rd Qu.: 3.434
## Max. : 11.8360 Max. : 22.006 Max. :7.864 Max. :11.480
## NA's :34 NA's :34 NA's :35 NA's :34
## Z ZCC
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 3.000 1st Qu.: 2.143
## Median : 5.000 Median : 4.286
## Mean : 5.182 Mean : 5.105
## 3rd Qu.: 7.000 3rd Qu.: 8.571
## Max. :10.000 Max. :12.857
## NA's :35 NA's :34
We can add in the 2012 data, for which Fair wrote on October 26 2012: “the value of G is 1.03, the value of P is 1.57, and the value of Z is still 1”, and on 30 October 2014 “The Democratic share of the two-party presidential vote in 2012 was 52.0, and the Democratic share of the two-party House vote in 2012 was 50.68”.
fair <- rbind(fair,c(2012,52,50.68,NA,1,1,1,0,1.03,NA,1.57,NA,1,NA))
This information is essential for constructing the 2012 forecast, and also evaluating it.
Running the regression model we replicate Fair’s 1996 paper output (Table 2, Eq.~1):
##
## Call:
## lm(formula = VP ~ G:I + P:I + Z:I + DPER + DUR + I + WAR, data = fair[fair$t >=
## 1916 & fair$t < 2012, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3393 -1.4517 -0.5878 1.5195 4.7853
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.3843 0.6109 77.563 < 2e-16 ***
## DPER 2.9244 1.3394 2.183 0.044249 *
## DUR -3.4085 1.1861 -2.874 0.011025 *
## I -1.9143 2.2634 -0.846 0.410155
## WAR 5.0611 2.5402 1.992 0.063677 .
## G:I 0.6724 0.1081 6.221 1.22e-05 ***
## I:P -0.6540 0.2831 -2.310 0.034571 *
## I:Z 0.9904 0.2301 4.305 0.000545 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.495 on 16 degrees of freedom
## (24 observations deleted due to missingness)
## Multiple R-squared: 0.9116, Adjusted R-squared: 0.8729
## F-statistic: 23.56 on 7 and 16 DF, p-value: 2.708e-07
##
## Call:
## lm(formula = VC ~ G:I + P:I + Z:I + DPER + I + WAR + VCC.2, data = fair[fair$t >=
## 1916 & fair$t < 2012, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5297 -0.5715 -0.0639 1.0663 3.2607
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.5688 0.5644 87.823 < 2e-16 ***
## DPER 2.3533 1.0293 2.286 0.036201 *
## I -3.8567 1.6296 -2.367 0.030901 *
## WAR 3.4032 2.2785 1.494 0.154724
## VCC.2 0.6440 0.1321 4.874 0.000169 ***
## G:I 0.4130 0.1016 4.065 0.000901 ***
## I:P -0.2972 0.2593 -1.146 0.268570
## I:Z 0.5264 0.2166 2.431 0.027209 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.23 on 16 degrees of freedom
## (24 observations deleted due to missingness)
## Multiple R-squared: 0.8559, Adjusted R-squared: 0.7929
## F-statistic: 13.58 on 7 and 16 DF, p-value: 1.17e-05
##
## Call:
## lm(formula = VC ~ index.VC + DPER + I + WAR + VCC.2, data = fair[fair$t >=
## 1916 & fair$t < 2012, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3552 -0.5043 0.0844 0.9929 3.3474
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.56138 0.53294 92.996 < 2e-16 ***
## index.VC 0.57049 0.08583 6.647 3.08e-06 ***
## DPER 2.47907 0.91695 2.704 0.01454 *
## I -3.85160 0.84359 -4.566 0.00024 ***
## WAR 3.20811 1.66481 1.927 0.06991 .
## VCC.2 0.62440 0.11275 5.538 2.94e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.112 on 18 degrees of freedom
## (24 observations deleted due to missingness)
## Multiple R-squared: 0.8547, Adjusted R-squared: 0.8144
## F-statistic: 21.18 on 5 and 18 DF, p-value: 5.767e-07
##
## Call:
## lm(formula = VCC ~ PCC:I + ZCC:I + I + WAR + VC.2 + VP.2, data = fair[fair$t >=
## 1916 & fair$t < 2012, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6406 -1.4618 -0.0689 1.4299 5.5703
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.5108 0.7251 66.904 < 2e-16 ***
## I -2.9797 1.1979 -2.487 0.02355 *
## WAR 0.8357 2.1283 0.393 0.69947
## VC.2 0.6463 0.1973 3.275 0.00447 **
## VP.2 -0.3225 0.1535 -2.101 0.05086 .
## PCC:I -0.4938 0.2246 -2.198 0.04206 *
## I:ZCC 0.6905 0.2780 2.484 0.02369 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.494 on 17 degrees of freedom
## (24 observations deleted due to missingness)
## Multiple R-squared: 0.7785, Adjusted R-squared: 0.7003
## F-statistic: 9.957 on 6 and 17 DF, p-value: 8.757e-05
##
## Call:
## lm(formula = VCC ~ index.VCC + I + WAR + VC.2 + VP.2, data = fair[fair$t >=
## 1916 & fair$t < 2012, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.718 -1.503 -0.117 1.560 5.537
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.5261 0.6994 69.385 < 2e-16 ***
## index.VCC 0.7171 0.2471 2.902 0.00950 **
## I -3.0877 0.9736 -3.171 0.00528 **
## WAR 0.9663 1.9199 0.503 0.62085
## VC.2 0.6353 0.1806 3.518 0.00246 **
## VP.2 -0.3297 0.1432 -2.302 0.03347 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.426 on 18 degrees of freedom
## (24 observations deleted due to missingness)
## Multiple R-squared: 0.7781, Adjusted R-squared: 0.7165
## F-statistic: 12.63 on 5 and 18 DF, p-value: 2.302e-05
In order to effectively evaluate the quality of forecasts, we need to consider real-time data rather than ex post data. Ex post data is what is contained in the `r fair’ dataset, as it is data as we know it to be now. However, the forecast model was constructed based on data available pre-election in each year.
From the quartery data file also on Fair’s website, we can update the quarterly data, but more importantly we can use real time data. A difficulty is found in the real GDP data, for which the base year of comparing prices is updated at regular intervals. Nonetheless, this is simply a scaling, and as such has an entirely predictable scaling effect on regression estimates.
Using data from Alfred, we have real-time updates for nominal and real GDP back to 1991, and population back to 1996.
Our first step is to replicate the Fair \(G\), \(P\), and \(Z\) variables:
fairq <- read.csv("tbl2.csv",stringsAsFactors=F)
fairq$def <- 100*fairq$Nominal.GDP/fairq$Real.GDP
fairq$def_15 <- c(rep(NA,15),fairq$def[-seq(NROW(fairq)-14,NROW(fairq))])
fairq$P <- ((fairq$def/fairq$def_15)^(4/15)-1)*100
fairq$Real.GDP.pc <- fairq$Real.GDP/fairq$Population
fairq$Real.GDP.pc_1 <- c(NA,fairq$Real.GDP.pc[-NROW(fairq)])
fairq$G0 <- ((fairq$Real.GDP.pc/fairq$Real.GDP.pc_1)^4-1)*100
fairq$Z0 <- as.numeric(fairq$G0>3.2)
fairq$term2 <- floor(fairq$quarter) - as.numeric(floor(fairq$quarter) %% 2 != 0)
fairq$term4 <- floor(fairq$quarter) - as.numeric(floor(fairq$quarter) %% 4)
fairq$Z <- ave(fairq$Z0,by=fairq$term4,FUN=cumsum)
fairq$Zcc <- (15/7)*ave(fairq$Z0,by=fairq$term2,FUN=cumsum)
fairq$Real.GDP.pc_3 <- c(rep(NA,3),fairq$Real.GDP.pc[-seq(NROW(fairq)-2,NROW(fairq))])
fairq$G <- ((fairq$Real.GDP.pc/fairq$Real.GDP.pc_3)^(4/3)-1)*100
#ex post forecast values
fairq.x <- fairq[floor(fairq$quarter) %% 2 == 0 & fairq$quarter-floor(fairq$quarter)<0.4 & fairq$quarter-floor(fairq$quarter)>0.2,c("quarter","G","P","Z","Zcc")]
So the object fairq.x contains our recreated Fair values. While Fair is able to, in his papers, refer back to his own previous forecasts, we seek to use forecast we are able to create ourselves. To do this, we create a dataset for each election and construct a forecast for each election.
#real time forecast values
gdp <- read.table("GDP_2_Vintages_Starting_1991_12_04.txt",sep="\t", header=TRUE,stringsAsFactors=F)
gdp <- gdp[,c("observation_date","GDP_19921027","GDP_19941028","GDP_19961030","GDP_19981030",
"GDP_20001027","GDP_20021031","GDP_20041029","GDP_20061027","GDP_20081030",
"GDP_20101029","GDP_20121026")]
gdp$quarter <- seq(from=1946,to=2014.5,by=0.25)
gdpr <- read.table("GDPC1_2_Vintages_Starting_1991_12_04.txt",sep="\t", header=TRUE,stringsAsFactors=F)
gdpr <- gdpr[,c("observation_date","GDPC1_19921027","GDPC1_19941028","GDPC1_19961030","GDPC1_19981030",
"GDPC1_20001027","GDPC1_20021031","GDPC1_20041029","GDPC1_20061027","GDPC1_20081030",
"GDPC1_20101029","GDPC1_20121026")]
gdpr$quarter <- seq(from=1947,to=2014.5,by=0.25)
pop <- read.table("POP_2_Vintages_Starting_1996_12_06.txt",sep="\t", header=TRUE,stringsAsFactors=F)
pop <- pop[,c("observation_date","POP_19961206","POP_19981002","POP_20001027","POP_20021002","POP_20041029","POP_20061002",
"POP_20081031","POP_20101008","POP_20121024")]
pop <- pop[grep("-01-01|-04-01|-07-01|-10-01",pop$observation_date),] #get quarterly observations
pop <- pop[-NROW(pop),]
pop$quarter <- seq(from=1952,to=2014.5,by=0.25)
fairq$quarter <- floor(fairq$quarter) + 2.5*(fairq$quarter-floor(fairq$quarter))
It is important to consider the correspondence between the data we are introducing and the data Fair provides. Without looking further at all of Fair’s older papers [TO DO!], I cannot know whether the difference between the real-time data as provided by Fred and Fair’s data is because Fair’s data is revised as provided on his website (although to emphasise, when he does a historical forecast exercise in his paper he uses real-time data).
plot(pop$quarter,pop$POP_20121024,type="l",main="Population",ylab="Population (thousands)",xlab="Year")
lines(fairq$quarter,1000*fairq$Population,type="l",col=2)
legend("topleft",lty=1,col=c(1,2),legend=c("Real time","Fair's ex post"))
plot(gdp$quarter,gdp$GDP_19961030,type="l",
main="Real GDP",ylab="Real GDP ($bn)",xlab="Year")
## Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by
## coercion
lines(fairq$quarter,fairq$Nominal.GDP,type="l",col=2)
legend("topleft",lty=1,col=c(1,2),legend=c("Real time","Fair's ex post"))
rgdp.scale <- as.numeric(gdpr$GDPC1_19961030[gdpr$quarter==1959.5])/fairq$Real.GDP[fairq$quarter==1959.5]
rgdp.scale2012 <- as.numeric(gdpr$GDPC1_20121026[gdpr$quarter==1959.5])/fairq$Real.GDP[fairq$quarter==1959.5]
plot(gdpr$quarter,gdpr$GDPC1_19961030,type="l",ylim=range(c(as.numeric(gdpr$GDPC1_19961030),fairq$Real.GDP),na.rm=T),
main="Real GDP",ylab="Real GDP (chained linked $bn)",xlab="Year")
## Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by
## coercion
## Warning in plot.default(gdpr$quarter, gdpr$GDPC1_19961030, type = "l",
## ylim = range(c(as.numeric(gdpr$GDPC1_19961030), : NAs introduced by
## coercion
lines(gdpr$quarter,gdpr$GDPC1_20121026,type="l",lty=2)
lines(fairq$quarter,fairq$Real.GDP,type="l",col=2)
lines(fairq$quarter,rgdp.scale*fairq$Real.GDP,type="l",col=2,lty=2)
legend("topleft",lty=c(1,2,1,2),col=c(1,1,2,2),legend=c("Real time","Real time (2012)","Fair's ex post","Fair's ex post scaled to real time"),bty="n")
The additional lines on the real GDP plot suggest that it is the ex post data that drives the differences.
In order to create real time datasets, we scale down the pre-Alfred real GDP data from Fair to be inline with the Alred real-time data release on account of the fact that the clear difference between the series is driven by the different base years (see the Alfred page for real GDP contains more information on the six different bases used between 1996 and 2014).
The first step is to calculate \(G\), \(P\) and \(Z\) from the real-time data. All three centre on calculating real GDP growth per capita, and the GDP deflator, and additionally calculating annualised growth. The latter is calculated following the formula: \[ r = \left[\left(\frac{Y_t}{Y_0}\right)^{m/n}-1\right]\times100, \] where \(r\) is the annualised growth rate, \(Y_t\) the most recent observation of the data series in levels, \(Y_0\) the value of the data series in levels in the base year, \(m\) is the frequency of data, and \(n\) the number of periods between \(t\) and the base period, so \(t-0\). More details can be found on the BEA website
fair1996 <- data.frame("observation_date"=seq(from=as.Date("1946-01-01"),to=as.Date("2014-07-01"),by="quarter"),"pop"=c(rep(NA,24),pop$POP_19961206),"gdp"=as.numeric(gdp$GDP_19961030),"gdpr"=c(rep(NA,4),as.numeric(gdpr$GDPC1_19961030)), stringsAsFactors=F)
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
##useful date variables
fair1996$quarter <- seq(from=1946,to=2014.5,by=0.25)
fair1996$term2.0 <- floor(fair1996$quarter) - as.numeric(floor(fair1996$quarter) %% 2 != 0)
fair1996$term2 <- c(rep(NA,4),fair1996$term2.0[-seq(NROW(fair1996)-3,NROW(fair1996))])
fair1996$term4.0 <- floor(fair1996$quarter) - as.numeric(floor(fair1996$quarter) %% 4)
fair1996$term4 <- c(rep(NA,4),fair1996$term4.0[-seq(NROW(fair1996)-3,NROW(fair1996))])
fair1996$term4.0 <- NULL
fair1996$term2.0 <- NULL
###GDP deflator for P
fair1996$def <- 100*fair1996$gdp/fair1996$gdpr
fair1996$def_15 <- c(rep(NA,15),fair1996$def[-seq(NROW(fair1996)-14,NROW(fair1996))])
fair1996$P <- ((fair1996$def/fair1996$def_15)^(4/15)-1)*100
###real GDP per capita for G and Z
fair1996$gdprpc <- fair1996$gdpr/fair1996$pop
fair1996$gdprpc_1 <- c(NA,fair1996$gdprpc[-NROW(fair1996)])
fair1996$G0 <- ((fair1996$gdprpc/fair1996$gdprpc_1)^4-1)*100
fair1996$Z0 <- as.numeric(fair1996$G0>3.2)
fair1996$Z <- ave(fair1996$Z0,by=fair1996$term4,FUN=cumsum)
fair1996$ZCC <- (15/7)*ave(fair1996$Z0,by=fair1996$term2,FUN=cumsum)
fair1996$gdprpc_3 <- c(rep(NA,3),fair1996$gdprpc[-seq(NROW(fair1996)-2,NROW(fair1996))])
fair1996$G <- ((fair1996$gdprpc/fair1996$gdprpc_3)^(4/3)-1)*100
We next must condense this data down to just the information needed to generate forecasts:
fair1996.x <- fair1996[floor(fair1996$quarter) %% 2 == 0 & fair1996$quarter-floor(fair1996$quarter)<0.6 & fair1996$quarter-floor(fair1996$quarter)>0.4,c("observation_date","quarter","G","P","Z","ZCC")]
fair1996.x$t <- floor(fair1996.x$quarter)
fair1996.x$quarter <-NULL
##add the off-term values of Z into Z for condensing
fair1996.x$Z[fair1996.x$t %% 2==0 & fair1996.x$t %% 4 == 2] <- fair1996.x$ZCC[fair1996.x$t %% 2==0 & fair1996.x$t %% 4 == 2]
fair1996.x$ZCC <- NULL
We now augment our \(G\), \(P\) and \(Z\) series to the older data from Fair:
fair1996.full <- merge(fair[fair$t<1997,],fair1996.x[,c("t","G","P","Z")],by=c("t"),suffixes=c(".ex",".rt"))
fair1996.full[,c("t","G.ex","GCC","G.rt","P.ex","PCC","P.rt","Z.ex","Z.rt")]
## t G.ex GCC G.rt P.ex PCC P.rt Z.ex Z.rt
## 1 1946 NA -3.474 NA NA 0.000 NA NA NA
## 2 1948 3.638 NA NA 0.000 NA NA 0 NA
## 3 1950 NA 13.635 NA NA 0.052 NA NA NA
## 4 1952 0.726 NA NA 2.349 NA NA 7 NA
## 5 1954 NA -0.685 NA NA 0.761 NA NA NA
## 6 1956 -1.451 NA NA 1.897 NA NA 5 NA
## 7 1958 NA -1.337 NA NA 2.709 NA NA NA
## 8 1960 0.455 NA 0.8840938 1.938 NA NA 5 NA
## 9 1962 NA 3.669 3.3301345 NA 1.185 NA NA 8.571429
## 10 1964 5.087 NA 5.0462592 1.269 NA 1.188754 10 10.000000
## 11 1966 NA 3.525 3.5229642 NA 2.579 2.024482 NA 10.714286
## 12 1968 5.049 NA 4.9267934 3.130 NA 3.183645 7 7.000000
## 13 1970 NA 0.018 -0.0719502 NA 5.034 4.411258 NA 2.142857
## 14 1972 5.949 NA 6.2056874 4.795 NA 4.799025 4 4.000000
## 15 1974 NA -3.013 -2.7951648 NA 8.195 6.157096 NA 4.285714
## 16 1976 3.806 NA 3.9525173 7.632 NA 7.539401 5 5.000000
## 17 1978 NA 6.035 5.4300089 NA 6.728 6.755313 NA 8.571429
## 18 1980 -3.659 NA -3.7077064 7.864 NA 8.113971 5 5.000000
## 19 1982 NA -2.877 -3.2232713 NA 7.041 8.211463 NA 4.285714
## 20 1984 5.424 NA 5.2088370 5.249 NA 5.425439 8 8.000000
## 21 1986 NA 2.241 1.5728947 NA 2.507 3.398495 NA 6.428571
## 22 1988 2.210 NA 2.0768833 2.955 NA 3.292633 4 6.000000
## 23 1990 NA 0.729 0.1490085 NA 3.897 3.925883 NA 0.000000
## 24 1992 2.949 NA 2.3586669 3.310 NA 3.683054 2 1.000000
## 25 1994 NA 2.819 2.6944740 NA 2.164 2.745830 NA 4.285714
## 26 1996 3.258 NA 2.0429080 2.027 NA 2.310833 4 3.000000
##need to put old values into real time series from before Fred data sample starts
fair1996.full$G.rt[is.na(fair1996.full$G.rt)] <- fair1996.full$G.ex[is.na(fair1996.full$G.rt)]
fair1996.full$P.rt[is.na(fair1996.full$P.rt)] <- fair1996.full$P.ex[is.na(fair1996.full$P.rt)]
fair1996.full$Z.rt[is.na(fair1996.full$Z.rt)] <- fair1996.full$Z.ex[is.na(fair1996.full$Z.rt)]
##need to create the interaction terms in order to calculate the forecast
fair1996.full$GI.ex <- fair1996.full$G.ex*fair1996.full$I
fair1996.full$PI.ex <- fair1996.full$P.ex*fair1996.full$I
fair1996.full$ZI.ex <- fair1996.full$Z.ex*fair1996.full$I
fair1996.full$GI.rt <- fair1996.full$G.rt*fair1996.full$I
fair1996.full$PI.rt <- fair1996.full$P.rt*fair1996.full$I
fair1996.full$ZI.rt <- fair1996.full$Z.rt*fair1996.full$I
##1996 regression on ex post data
reg.1996.ex <- lm(VP ~ G.ex:I + P.ex:I + Z.ex:I + DPER + DUR + I + WAR,data=fair1996.full[fair1996.full$t<1996,])
forc.1996.ex <- reg.1996.ex$coefficients %*% t(cbind(1,fair1996.full[fair1996.full$t==1996,c("GI.ex","PI.ex","ZI.ex","DPER","DUR","I","WAR")]))
summary(reg.1996.ex)
##
## Call:
## lm(formula = VP ~ G.ex:I + P.ex:I + Z.ex:I + DPER + DUR + I +
## WAR, data = fair1996.full[fair1996.full$t < 1996, ])
##
## Residuals:
## 2 4 6 8 10 12
## 2.220e-16 2.308e+00 -2.581e-01 2.002e+00 3.142e-01 -2.249e+00
## 14 16 18 20 22 24
## -1.919e+00 -2.209e-01 -3.730e-01 7.336e-01 -1.723e+00 1.385e+00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.3128 1.1588 41.694 1.98e-06 ***
## DPER -2.0117 2.5391 -0.792 0.4725
## DUR -8.5242 2.3526 -3.623 0.0223 *
## I 9.2768 6.2620 1.481 0.2126
## WAR 6.4054 5.3021 1.208 0.2936
## G.ex:I 0.8582 0.2950 2.909 0.0437 *
## I:P.ex -1.0645 0.3846 -2.767 0.0505 .
## I:Z.ex 0.2296 0.6243 0.368 0.7317
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.442 on 4 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.9454, Adjusted R-squared: 0.8499
## F-statistic: 9.896 on 7 and 4 DF, p-value: 0.0214
reg.1996.rt <- lm(VP ~ G.rt:I + P.rt:I + Z.rt:I + DPER + DUR + I + WAR,data=fair1996.full[fair1996.full$t<1996,])
forc.1996.rt <- reg.1996.rt$coefficients %*% t(cbind(1,fair1996.full[fair1996.full$t==1996,c("GI.rt","PI.rt","ZI.rt","DPER","DUR","I","WAR")]))
summary(reg.1996.rt)
##
## Call:
## lm(formula = VP ~ G.rt:I + P.rt:I + Z.rt:I + DPER + DUR + I +
## WAR, data = fair1996.full[fair1996.full$t < 1996, ])
##
## Residuals:
## 2 4 6 8 10 12
## 2.220e-16 1.636e+00 -3.136e-01 1.920e+00 -3.273e-02 -1.453e+00
## 14 16 18 20 22 24
## -2.118e+00 5.380e-02 -1.499e-01 1.267e+00 -1.791e+00 9.817e-01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.9557 0.9756 49.154 1.02e-06 ***
## DPER -0.1592 2.5829 -0.062 0.9538
## DUR -6.9481 2.3154 -3.001 0.0399 *
## I 5.2177 6.1496 0.848 0.4440
## WAR 6.7971 4.3285 1.570 0.1914
## G.rt:I 0.8054 0.2554 3.154 0.0344 *
## I:P.rt -0.9462 0.3462 -2.733 0.0523 .
## I:Z.rt 0.5282 0.5423 0.974 0.3852
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.172 on 4 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.9568, Adjusted R-squared: 0.8812
## F-statistic: 12.66 on 7 and 4 DF, p-value: 0.01365
The 1996 real-time forecast is 53.078315, ex post is 66.9281945. The former is close to the 53.2 Fair reports in his 2010 paper as having been his 1996 forecast in real terms. The actual value was 54.737, giving a forecast error of 1.658685.
We now nest this process in a loop to create forecasts from 1996 to 2012:
forecasts.rt <- data.frame("t"=seq(from=1996,to=2012,by = 2),"VP"=NA,"VC"=NA,"VCC"=NA)
forecasts.ex <- data.frame("t"=seq(from=1996,to=2012,by = 2),"VP"=NA,"VC"=NA,"VCC"=NA)
for (y in 1996:2012){
if (y %% 2 == 0) {
print(y)
#dataset of Fred real time variables
fair.y <- data.frame("observation_date"=seq(from=as.Date("1946-01-01"),to=as.Date("2014-07-01"),by="quarter"),
"pop"=c(rep(NA,24),pop[,grep(y,colnames(pop))]),"gdp"=as.numeric(gdp[,grep(y,colnames(gdp))]),
"gdpr"=c(rep(NA,4),as.numeric(gdpr[,grep(y,colnames(gdpr))])), stringsAsFactors=F)
fair.y$quarter <- seq(from=1946,to=2014.5,by=0.25)
fair.y$term2.0 <- floor(fair.y$quarter) - as.numeric(floor(fair.y$quarter) %% 2 != 0)
fair.y$term2 <- c(rep(NA,4),fair.y$term2.0[-seq(NROW(fair.y)-3,NROW(fair.y))])
fair.y$term4.0 <- floor(fair.y$quarter) - as.numeric(floor(fair.y$quarter) %% 4)
fair.y$term4 <- c(rep(NA,4),fair.y$term4.0[-seq(NROW(fair.y)-3,NROW(fair.y))])
fair.y$term4.0 <- NULL
fair.y$term2.0 <- NULL
fair.y$def <- 100*fair.y$gdp/fair.y$gdpr
fair.y$def_15 <- c(rep(NA,15),fair.y$def[-seq(NROW(fair.y)-14,NROW(fair.y))])
fair.y$P <- ((fair.y$def/fair.y$def_15)^(4/15)-1)*100
fair.y$gdprpc <- fair.y$gdpr/fair.y$pop
fair.y$gdprpc_1 <- c(NA,fair.y$gdprpc[-NROW(fair.y)])
fair.y$G0 <- ((fair.y$gdprpc/fair.y$gdprpc_1)^4-1)*100
fair.y$Z0 <- as.numeric(fair.y$G0>3.2)
fair.y$Z <- ave(fair.y$Z0,by=fair.y$term4,FUN=cumsum)
fair.y$ZCC <- (15/7)*ave(fair.y$Z0,by=fair.y$term2,FUN=cumsum)
fair.y$gdprpc_3 <- c(rep(NA,3),fair.y$gdprpc[-seq(NROW(fair.y)-2,NROW(fair.y))])
fair.y$G <- ((fair.y$gdprpc/fair.y$gdprpc_3)^(4/3)-1)*100
##condensing to just election years
fair.y.x <- fair.y[floor(fair.y$quarter) %% 2 == 0 & fair.y$quarter-floor(fair.y$quarter)<0.6
& fair.y$quarter-floor(fair.y$quarter)>0.4,
c("observation_date","quarter","G","P","Z","ZCC")]
fair.y.x$t <- floor(fair.y.x$quarter)
fair.y.x$quarter <-NULL
##make the non-CC and CC variables separately
fair.y.x$ZCC[fair.y.x$t %% 4 == 0] <- NA
fair.y.x$Z[fair.y.x$t %% 4 != 0] <- NA
fair.y.x$PCC[fair.y.x$t %% 4 != 0] <- fair.y.x$P[fair.y.x$t %% 4 != 0]
fair.y.x$P[fair.y.x$t %% 4 != 0] <- NA
fair.y.x$GCC[fair.y.x$t %% 4 != 0] <- fair.y.x$G[fair.y.x$t %% 4 != 0]
fair.y.x$G[fair.y.x$t %% 4 != 0] <- NA
y.data <- merge(fair[fair$t<=y,],fair.y.x[,c("t","G","P","Z","GCC","PCC","ZCC")],
by=c("t"),suffixes=c(".ex",".rt"),all.x=T)
# y.data[,c("t","G.ex","GCC","G.rt","P.ex","PCC","P.rt","Z.ex","Z.rt")]
##need to put old values into real time series from before Fred data sample starts
y.data$G.rt[is.na(y.data$G.rt)] <- y.data$G.ex[is.na(y.data$G.rt)]
y.data$P.rt[is.na(y.data$P.rt)] <- y.data$P.ex[is.na(y.data$P.rt)]
y.data$Z.rt[is.na(y.data$Z.rt)] <- y.data$Z.ex[is.na(y.data$Z.rt)]
y.data$GCC.rt[is.na(y.data$GCC.rt)] <- y.data$GCC.ex[is.na(y.data$GCC.rt)]
y.data$PCC.rt[is.na(y.data$PCC.rt)] <- y.data$PCC.ex[is.na(y.data$PCC.rt)]
y.data$ZCC.rt[is.na(y.data$ZCC.rt)] <- y.data$ZCC.ex[is.na(y.data$ZCC.rt)]
##create interaction variables in real time and ex post for forecast construction
y.data$GI.ex <- y.data$G.ex*y.data$I
y.data$PI.ex <- y.data$P.ex*y.data$I
y.data$ZI.ex <- y.data$Z.ex*y.data$I
y.data$GI.rt <- y.data$G.rt*y.data$I
y.data$PI.rt <- y.data$P.rt*y.data$I
y.data$ZI.rt <- y.data$Z.rt*y.data$I
y.data$index.VC.rt <- 0.672*y.data$G.rt*y.data$I - 0.654*y.data$P.rt*y.data$I + 0.990*y.data$Z.rt*y.data$I
y.data$index.VC.ex <- 0.672*y.data$G.ex*y.data$I - 0.654*y.data$P.ex*y.data$I + 0.990*y.data$Z.ex*y.data$I
y.data$index.VCC.rt <- -0.654*y.data$PCC.rt*y.data$I + 0.990*y.data$ZCC.rt*y.data$I
y.data$index.VCC.ex <- -0.654*y.data$PCC.ex*y.data$I + 0.990*y.data$ZCC.ex*y.data$I
##1996 regression on ex post data
reg.1.ex <- lm(VP ~ G.ex:I + P.ex:I + Z.ex:I + DPER + DUR + I + WAR,data=y.data[y.data$t<y,])
coefs <- as.numeric(reg.1.ex$coefficients[c("(Intercept)","G.ex:I","I:P.ex","I:Z.ex","DPER","DUR","I","WAR")])
data <- as.numeric(cbind(1,y.data[y.data$t==y,c("GI.ex","PI.ex","ZI.ex","DPER","DUR","I","WAR")]))
forecasts.ex[forecasts.ex$t==y,"VP"] <- t(data) %*% coefs
reg.1.rt <- lm(VP ~ G.rt:I + P.rt:I + Z.rt:I + DPER + DUR + I + WAR,data=y.data[y.data$t<y,])
coefs <- as.numeric(reg.1.rt$coefficients[c("(Intercept)","G.rt:I","I:P.rt","I:Z.rt","DPER","DUR","I","WAR")])
data <- as.numeric(cbind(1,y.data[y.data$t==y,c("GI.rt","PI.rt","ZI.rt","DPER","DUR","I","WAR")]))
forecasts.rt[forecasts.rt$t==y,"VP"] <- t(data) %*% coefs
reg.2a.rt <- lm(VC ~ index.VC.rt + DPER + I + WAR + VCC.2,data=y.data)
coefs <- as.numeric(reg.2a.rt$coefficients[c("(Intercept)","index.VC.rt","DPER","I","WAR","VCC.2")])
data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VC.rt","DPER","I","WAR","VCC.2")]))
forecasts.rt[forecasts.rt$t==y,"VC"] <- t(data) %*% coefs
reg.2a.ex <- lm(VC ~ index.VC.ex + DPER + I + WAR + VCC.2,data=y.data)
coefs <- as.numeric(reg.2a.ex$coefficients[c("(Intercept)","index.VC.ex","DPER","I","WAR","VCC.2")])
data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VC.ex","DPER","I","WAR","VCC.2")]))
forecasts.ex[forecasts.ex$t==y,"VC"] <- t(data) %*% coefs
reg.3a.rt <- lm(VCC ~ index.VCC.rt + I + WAR + VC.2 + VP.2,data=y.data)
coefs <- as.numeric(reg.3a.rt$coefficients[c("(Intercept)","index.VCC.rt","I","WAR","VC.2","VP.2")])
data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VCC.rt","I","WAR","VC.2","VP.2")]))
forecasts.rt[forecasts.rt$t==y,"VCC"] <- t(data) %*% coefs
reg.3a.ex <- lm(VCC ~ index.VCC.ex + I + WAR + VC.2 + VP.2,data=y.data)
coefs <- as.numeric(reg.3a.ex$coefficients[c("(Intercept)","index.VCC.ex","I","WAR","VC.2","VP.2")])
data <- as.numeric(cbind(1,y.data[y.data$t==y,c("index.VCC.ex","I","WAR","VC.2","VP.2")]))
forecasts.ex[forecasts.ex$t==y,"VCC"] <- t(data) %*% coefs
}
}
## [1] 1996
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## [1] 1998
## [1] 2000
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## [1] 2002
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## [1] 2004
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## [1] 2006
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## [1] 2008
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## [1] 2010
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## [1] 2012
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
forecasts.rt
## t VP VC VCC
## 1 1996 52.80855 47.91334 NA
## 2 1998 NA NA 47.66593
## 3 2000 52.20871 50.68758 NA
## 4 2002 NA NA 51.60220
## 5 2004 43.20585 48.08633 NA
## 6 2006 NA NA 50.81040
## 7 2008 50.79055 55.41992 NA
## 8 2010 NA NA 49.47886
## 9 2012 48.53606 47.34091 NA
The final table is the real-time forecasts created. Compared to Fair’s Table 5 there are differences. Next we show the forecast errors:
## [1] "Errors"
## VP VC VCC
## 31 1.928452 2.2446568 NA
## 65 NA NA 1.867067
## 32 -1.946708 -0.8685828 NA
## 66 NA NA -4.040199
## 33 5.561152 0.5456739 NA
## 67 NA NA 3.309602
## 34 2.898450 0.1150787 NA
## 68 NA NA -3.578860
## 69 3.463936 3.3390900 NA
## [1] "Mean Errors"
## VP VC VCC
## 2.3810564 1.0751833 -0.6105977
## [1] "Mean Squared Errors"
## VP VC VCC
## 11.766974 3.450689 10.892715
## [1] "Mean Absolute Errors"
## VP VC VCC
## 3.159739 1.422616 3.198932
The dummy variables \(I\), \(DPER\) and \(DUR\) are somewhat restrictive in their format, likely reflecting a preference for smaller econometric models over larger ones. An important step towards such a restrictive model specification is to test those restrictions. We can test them by estimating the unrestricted model and calculating a likelihood ratio test.
We first need to split up the variables. Because \(I\) is only ever 1 or -1 (it is never zero), it is simply a scaling of a standard 0/1 dummy variable and so no restrictions are involved. The \(DPER\) variable, however, can take the value zero. As such, we split this into \(DIR\) and \(RIR\) for Democrat and Republican incumbents standing for re-election, respectively. We then split \(DUR\) into individual dummy variables for each type of term length:
fair$DIR <- as.numeric(fair$DPER==1)
fair$RIR <- as.numeric(fair$DPER==-1)
fair$"1Term" <- as.numeric(fair$DUR==0)
fair$D2Terms <- as.numeric(fair$DUR==1)
fair$R2Terms <- as.numeric(fair$DUR==-1)
fair$D3Terms <- as.numeric(fair$DUR==1.25)
fair$R3Terms <- as.numeric(fair$DUR==-1.25)
fair$D4Terms <- as.numeric(fair$DUR==1.5)
fair$R4Terms <- as.numeric(fair$DUR==-1.5)
fair$D5Terms <- as.numeric(fair$DUR==1.75)
fair$R5Terms <- as.numeric(fair$DUR==-1.75)
fair$R6Terms <- as.numeric(fair$DUR==-2)
We now run the unrestricted version and test the restrictions:
reg.1.u <- lm(VP ~ G:I + P:I + Z:I + DIR + RIR + D2Terms + R2Terms + D3Terms + R3Terms + D4Terms + D5Terms + WAR,data=fair[fair$t>=1916 & fair$t<2012,])
summary(reg.1.u)
##
## Call:
## lm(formula = VP ~ G:I + P:I + Z:I + DIR + RIR + D2Terms + R2Terms +
## D3Terms + R3Terms + D4Terms + D5Terms + WAR, data = fair[fair$t >=
## 1916 & fair$t < 2012, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6660 -0.2485 0.0000 0.7578 2.9887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.5047 3.7465 12.680 6.59e-08 ***
## DIR 2.1435 2.9500 0.727 0.482652
## RIR -1.5416 5.0110 -0.308 0.764096
## D2Terms -4.6031 2.7872 -1.652 0.126860
## R2Terms 3.8078 4.7789 0.797 0.442427
## D3Terms 3.6892 4.2873 0.860 0.407881
## R3Terms 6.7444 2.2035 3.061 0.010837 *
## D4Terms 2.5029 4.2545 0.588 0.568215
## D5Terms -8.4792 4.0560 -2.091 0.060586 .
## WAR -1.4995 3.8911 -0.385 0.707317
## G:I 0.4584 0.1297 3.535 0.004676 **
## I:P -0.7811 0.2871 -2.720 0.019919 *
## I:Z 1.0267 0.2259 4.545 0.000837 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.342 on 11 degrees of freedom
## (24 observations deleted due to missingness)
## Multiple R-squared: 0.9464, Adjusted R-squared: 0.888
## F-statistic: 16.19 on 12 and 11 DF, p-value: 2.795e-05
anova(reg.1,reg.1.u)
## Analysis of Variance Table
##
## Model 1: VP ~ G:I + P:I + Z:I + DPER + DUR + I + WAR
## Model 2: VP ~ G:I + P:I + Z:I + DIR + RIR + D2Terms + R2Terms + D3Terms +
## R3Terms + D4Terms + D5Terms + WAR
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 16 99.573
## 2 11 60.333 5 39.239 1.4308 0.2876
On the basis of this test, it appears that the restrictions are not rejected.
The accuracy of any forecast is determined, in large part, by the value of the constant. However, the constant, as the simple average over the entire sample, may not be particularly informative if there have been structural changes over the sample. An effective way of determining the right level is to use impulse saturation. Impulse saturation is a technique developed by David F. Hendry, Soren Johansen and Carlos Santos, and it involves adding large sets of indicator (dummy) variables in batches to determine their significance.
Here we determine whether at each forecast horizon in our real-time exercise impulse saturation could have helped improve the forecast.
We run impulse saturation on each forecast model. For unclear reasons (the R code is still in developmental stages), the \(WAR\) dummy caused problems with singularity in some regressions, and hence was dropped for all; as impulse saturation is an outlier detection mechanism it will pick up dummies, if they are required, for the war periods, and indeed will select individual impulses for different years of different wars if they are helpful for better modelling the data series.
We implement impulse saturation allowing both single impulse dummies (one observation), and step-impulse dummies (dummies for many observations). We then forecast using the selected model, and report statistics about all forecasts below.
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## [1] 1996
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 4 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 1998
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 2000
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 2002
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 2004
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 2006
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 1 regressors
##
## regressor-matrix is column rank deficient, so dropping 1 regressors
##
## regressor-matrix is column rank deficient, so dropping 7 regressors
##
## regressor-matrix is column rank deficient, so dropping 7 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 2008
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 1 regressors
##
## regressor-matrix is column rank deficient, so dropping 1 regressors
##
## regressor-matrix is column rank deficient, so dropping 3 regressors
##
## regressor-matrix is column rank deficient, so dropping 7 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 2010
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 7 regressors
##
## regressor-matrix is column rank deficient, so dropping 7 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## [1] 2012
## Warning in data.frame(observation_date = seq(from = as.Date("1946-01-01"),
## : NAs introduced by coercion
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 7 regressors
##
## regressor-matrix is column rank deficient, so dropping 7 regressors
##
## regressor-matrix is column rank deficient, so dropping 2 regressors
##
## regressor-matrix is column rank deficient, so dropping 1 regressors
## t VP VC VCC VP.isat VC.isat VCC.isat
## 1 1996 52.80855 47.91334 NA 54.24743 52.34577 NA
## 2 1998 NA NA 47.66593 NA NA 46.99754
## 3 2000 52.20871 50.68758 NA 48.40067 46.77065 NA
## 4 2002 NA NA 51.60220 NA NA 52.33356
## 5 2004 43.20585 48.08633 NA 38.19719 46.22998 NA
## 6 2006 NA NA 50.81040 NA NA 52.48237
## 7 2008 50.79055 55.41992 NA 52.53429 53.33611 NA
## 8 2010 NA NA 49.47886 NA NA 48.85087
## 9 2012 48.53606 47.34091 NA 48.96142 46.03248 NA
Turning to the forecast errors:
## [1] "Errors"
## fair$t[fair$t >= 1996] VP VC VCC VP.1
## 31 1996 1.928452 2.2446568 NA 0.4895652
## 65 1998 NA NA 1.867067 NA
## 32 2000 -1.946708 -0.8685828 NA 1.8613336
## 66 2002 NA NA -4.040199 NA
## 33 2004 5.561152 0.5456739 NA 10.5698099
## 67 2006 NA NA 3.309602 NA
## 34 2008 2.898450 0.1150787 NA 1.1547105
## 68 2010 NA NA -3.578860 NA
## 69 2012 3.463936 3.3390900 NA 3.0385826
## VC.1 VCC.1
## 31 -2.187775 NA
## 65 NA 2.535456
## 32 3.048350 NA
## 66 NA -4.771558
## 33 2.402016 NA
## 67 NA 1.637631
## 34 2.198890 NA
## 68 NA -2.950868
## 69 4.647518 NA
## [1] "Mean Errors"
## VP VC VCC VP.1 VC.1 VCC.1
## 2.3810564 1.0751833 -0.6105977 3.4228004 2.0217998 -0.8873347
## [1] "Mean Squared Errors"
## VP VC VCC VP.1 VC.1 VCC.1
## 11.766974 3.450689 10.892715 25.198292 9.256603 10.146440
## [1] "Mean Absolute Errors"
## VP VC VCC VP.1 VC.1 VCC.1
## 3.159739 1.422616 3.198932 3.422800 2.896910 2.973878
The first three columns are the standard forecasts, the second three columns are with impulse saturation. By and large, impulse saturation does not help improve the forecasts. Only for off-term house elections (\(VCC\)) does impulse saturation help improve forecasts, on average. That said, for the presidential elections it is only 2004 where isat generates a substantially worse forecast than the simple Fair regression model; for all other presidential elections isat produces smaller forecast errors.
We can represent these forecasts graphically:
## pdf
## 2
In this plot, election outcomes are in black, real time forecasts are in red, and isat real-time forecasts are in green. The circles are presidential elections, the triangles are on-term congressional elections, and the crosses are off-term congressional elections.