I. Introduction

In this paper we aim to analyze monthly data for Total Private Residential Construction Spending in the United States. Specifically we want to use the Box-Jenkins methodology to first build a time series model over the data until 2013 and then to use this model to forecast the data from 2013 until 2016. In short we find that using a rolling forecasting scheme of an \(ARIMA(3,1,0)(3,0,0)\) model best produces data that looks like the actual data.

The remainder of the paper is outlined as follows. Section II shows where we got the data and splits the data into an estimation sub-sample, and a prediction sub-sample. Section III then uses the three step Box-Jenkins methodology over the estimation sub-sample to come up with the best model that we will use for forecasting. Section IV uses the model we pick from Section III and performs two types of forecasting schemes over the prediction sub-sample. Section V concludes.

II. Downloading the Data

For this assignment we download monthly data for Total Private Residential Construction Spending from the St. Louis Federal Reserve Database (FRED). We then split the data into an estimation sample and a prediction sample. The estimation sample is also called in-sample data, which we use to build our time series model. The prediction sample, also called out of sample data, is used to test the accuracy of the model we built from the estimation sample. The estimation sample starts in Janaury 1993 and ends in Decemeber 2013. The prediction sample starts in January 2014 and ends January 2017.

# Import the Data
Quandl.api_key("75HFthvbcxsrteNWupus")
data.TPRCS <- Quandl("FRED/PRRESCON", type = 'zoo')

# Split Sample - Estimation and Prediction Sample
start.month <- 1993
end.month <- 2013 + (11/12)
data.TPRCS.estimation <- window(data.TPRCS, end = end.month)
data.TPRCS.prediction <- window(data.TPRCS, start = end.month + (1/12))

III. Box-Jenkins Methodology

The Box-Jenkins methodology is an algorithm used to find an ARIMA model that best fits a time series data set. In other words, the methodology aims to find an ARIMA model that produces data that looks most like the actual data. The Box-Jenkins methodology is composed of three steps:

  1. Identification - make sure the variables are stationary and determine if any transformations need to be made to the data
  2. Estimation - consider a selection of models based upon the results of step one and select the model based on some test statistic
  3. Checking - forecast the model’s prediction of out of sample data and compare to the actual out of sample data

The remainder of this section gives the details regarding each of these steps.

Step 1 - Identification

In this section we examine the time series plot to determine if any transformations or differences need to made to produce an approximately weakly stationary time series. We also look at the auto-correlation function (ACF) and partial auto-correlation function (PACF) for each transformation to help us determine if any further transformations need to be made. Once all transformations are complete then we will use the final ACF and PACF to determine a set number of models and possible parameters that we will use in step 2.

A. Logarithmic Transformation

We start by determining if we need to log transform our data. To do this we examine the simple time series of our original data as well as the time series of the log transformed data.

From the first time series plot we can see a slight non-linear trend upwards from 1993 to about 2007 and a slight non-linear trend from 2007 to 2010. We also see that the variance of the original time series data appears to get slightly larger over the period 1993 to 2007. The second time series plot shows us the log transformed data which appears much more linear over each sub-time period mentioned. Furthermore, the log transformed data appears to have more constant variance than the original data. Hence, from here on out we will use the log transformed Total Priviate Residential Construction Spending data.

B. Seasonal and Non-seasonal Differencing

For further examination of the data we look at the ACF and PACF of the log transformed data. Specifically we are looking for any evidence that might suggest the possibility of a unit root.

The slow decay of the ACF of the log transformed data suggests the presence of a unit root, and more importantly suggusts that the data is non-stationary. To fix this problem we need to take the difference of the data (first, second, third, etc.). To formally test which difference we should take we could run a series of ADF, KPSS, and ERS tests like we did in Homework Assignment #4. For the sake of time and space we will simply take the first difference of the data.

We should also point out that the time series of the log transformed data shows patterns of seasonality. We have monthly data and therefore we suspect that the seasonality can be fixed by differencing across the entire year (every twelve months).

We perform both of these differences on the log transformed data set separately and jointly as shown below.

first.difference <- diff(log.original, 1)
seasonal.difference <- diff(log.original, 12)
first.diff.seasonal.diff <- diff(diff(log.original, 12), 1)

We now plot the time series of the each of the above three data sets as well as the original log transformed data set. Ideally we want to find the time series that looks most like a weakly stationary time series (constant mean and constant co-variance with itself across time).

Out of the above time-series the data where we took the first difference model (third row) looks to be the most stationary out of all of the time-series. To more formally examine which time series is best we look at the ACF and PACF of each model.

From the second row we see that seaonsal differening alone does not fix the slow decay of the ACF which still suggests the precense of a unit root. Therefore, seaonsal differencing alone cannot produce an approximately weakly stationary time series. The third row of the above figure shows us that first differencing appears to deal with the unit root problem but also shows us that seasonality is definitely present. It is important to note at this point that the ACF and PACF of the third row show good indications that our model is multiplicative seasonal AR model. We disregard the fourth row since the corresponding time series does not appear to have constant variance across time.

C. Possible ARIMA Models

Using the ACF and PACFs above in Part B we now provide a list of possible ARIMA models that we think will be plausible. We write the models in the form:

\[ ARIMA(p, d, q)(P, D, Q)_{s} \]

where p denotes the non-seasonal AR order, d denotes the non-seasonal differencing order, q denotes the non-seasonal MA order, P denotes the seasonal AR order, D denotes the seasonal differencing order, Q denotes the seasonal MA order, and s denotes the time span of repeating seasonal pattern.

The Models we will be Estimating are:
  • Model 1: \(ARIMA(1, 1, 0)(1, 0, 0)_{12}\)
  • Model 2: \(ARIMA(1, 1, 0)(2, 0, 0)_{12}\)
  • Model 3: \(ARIMA(1, 1, 0)(3, 0, 0)_{12}\)
  • Model 4: \(ARIMA(2, 1, 0)(1, 0, 0)_{12}\)
  • Model 5: \(ARIMA(2, 1, 0)(2, 0, 0)_{12}\)
  • Model 6: \(ARIMA(2, 1, 0)(3, 0, 0)_{12}\)
  • Model 7: \(ARIMA(3, 1, 0)(1, 0, 0)_{12}\)
  • Model 8: \(ARIMA(3, 1, 0)(2, 0, 0)_{12}\)
  • Model 9: \(ARIMA(3, 1, 0)(3, 0, 0)_{12}\)
  • Model 10: \(ARIMA(4, 1, 0)(1, 0, 0)_{12}\)
  • Model 11: \(ARIMA(4, 1, 0)(2, 0, 0)_{12}\)
  • Model 12: \(ARIMA(4, 1, 0)(3, 0, 0)_{12}\)
  • Model 13: \(ARIMA(10, 1, 0)(1, 0, 0)_{12}\)
  • Model 14: \(ARIMA(10, 1, 0)(2, 0, 0)_{12}\)
  • Model 15: \(ARIMA(10, 1, 0)(3, 0, 0)_{12}\)
Justification of the Model Choices:

We justify the above model choices using the ACF and PACF of the first difference of the log transformed data. In every model we take the first difference (d = 1) to correct for the unit root issue previously discussed.

The overall pattern of the ACF and PACF both suggest that our model is a multiplicative seasonal AR model. Therefore, we set both the non-seasonal and the seasonal MA component equal to zero (q = 0 and Q = 0). Further, we see that the exponential decay of the ACF occurs every 12 lags starting at lag 12 hence we try setting the seasonal AR component to 1 (P = 1). Similarly, we see a smaller but similar pattern starting at lag 13 and lag 14, therefore we also attempt setting the seasonal AR component equal to 2 and 3 (P = 2, and P = 3).

To determine the non-seasonal AR component of the model we look at the PACF. From the PACF we see that the first and third lags are statistically significant, while the fourth lag is marginally significant and the second lag is almost significant. Hence, we try setting the non-seasonal AR component of the model equal to 1, 2, 3, and 4 (p = {1, 2, 3, 4}). We also note that many of the lags of the PACF are statistically significant as expected in a multiplicative seasonal AR model, but to ensure that these lags are not induced by the non-seasonl component we try up to lag 10 (p = 10). We do not go further than this due to the number of parameters being estimated.

Note: we also try all of our models with a first seasonal difference (D = 1). We report the results of these models in the Appendix.

Step 2 - Estimation

In this section we estimate all fifteen models suggested in Part C of Step 1. To determine which model is best we look at the coefficients for statistical significance and look at the information criteria of each model. This is done in two rounds.

For the first round we use the information criteria only. We first look at the AICc statistic (AIC corrected for finite samples) and pick the two models that have the lowest AIC statistic. We then look at the BIC statistic and similarly pick the two models that have the lowest BIC statistic (these can be the same or different as the models chosen using the AIC statistic). Once the models are picked we move to the second round.

For the second round we then look at the coefficients of each model picked by the information criteria in the first round and determine which of these coefficients are statistically significant. If any of these coefficients are not statistically different from zero then we re-run the model and force those coefficients to be equal to zero. We do this for each model picked. We then look at the coefficients of each model again and repeat the process until all of the coefficients are statistically significant. Once all the models in this round have all of their coefficients statistically significant we then compare the AICc and BIC statistics again. If any model has both a lower AIC and BIC statistic then this will be the model we will use for our forecast. Otherwise we will choose the model with the lowest AICc statistic and the model with the lowest BIC statistic. In the next section we will examine the in-sample model adequecy of each model and whichever model performs better will then be chosen as the forecasting model.

A. Estimating the Models

Below is the code that we use to estimate the model parameters.

m1 <- Arima(log.original, order = c(1,1,0), seasonal = list(order = c(1,0,0), period = 12))
m2 <- Arima(log.original, order = c(1,1,0), seasonal = list(order = c(2,0,0), period = 12))
m3 <- Arima(log.original, order = c(1,1,0), seasonal = list(order = c(3,0,0), period = 12))

m4 <- Arima(log.original, order = c(2,1,0), seasonal = list(order = c(1,0,0), period = 12))
m5 <- Arima(log.original, order = c(2,1,0), seasonal = list(order = c(2,0,0), period = 12))
m6 <- Arima(log.original, order = c(2,1,0), seasonal = list(order = c(3,0,0), period = 12))

m7 <- Arima(log.original, order = c(3,1,0), seasonal = list(order = c(1,0,0), period = 12))
m8 <- Arima(log.original, order = c(3,1,0), seasonal = list(order = c(2,0,0), period = 12))
m9 <- Arima(log.original, order = c(3,1,0), seasonal = list(order = c(3,0,0), period = 12))

m10 <- Arima(log.original, order = c(4,1,0), seasonal = list(order = c(1,0,0), period = 12))
m11 <- Arima(log.original, order = c(4,1,0), seasonal = list(order = c(2,0,0), period = 12))
m12 <- Arima(log.original, order = c(4,1,0), seasonal = list(order = c(3,0,0), period = 12))

m13 <- Arima(log.original, order = c(10,1,0), seasonal = list(order = c(1,0,0), period = 12))
m14 <- Arima(log.original, order = c(10,1,0), seasonal = list(order = c(2,0,0), period = 12))
m15 <- Arima(log.original, order = c(10,1,0), seasonal = list(order = c(3,0,0), period = 12))

B. First Round - Top Four Models

Below we have constructed a table that shows us each model and their respective AICc and BIC statistic. For convience we have also sorted the fourth through sixth column on the AICc statistic, and the seventh through ninth column on the BIC statistic. From the table we can see that AICc statistic has chosen Model 9 and Model 13 and the BIC statistic has chosen Model 4 and Model 7. For the next section we will only consider these four models.

Model AICc BIC Sort on AICc AICc Sort on BIC BIC
m1 ARIMA(1,1,0)(1,0,0) -1238.70 -1228.22 m9 ARIMA(3,1,0)(3,0,0) -1259.49 m4 ARIMA(2,1,0)(1,0,0) -1240.23
m2 ARIMA(1,1,0)(2,0,0) -1236.64 -1222.70 m13 ARIMA(10,1,0)(1,0,0) -1258.76 m7 ARIMA(3,1,0)(1,0,0) -1238.94
m3 ARIMA(1,1,0)(3,0,0) -1242.41 -1225.03 m12 ARIMA(4,1,0)(3,0,0) -1258.24 m6 ARIMA(2,1,0)(3,0,0) -1236.32
m4 ARIMA(2,1,0)(1,0,0) -1254.17 -1240.23 m6 ARIMA(2,1,0)(3,0,0) -1257.13 m5 ARIMA(2,1,0)(2,0,0) -1236.19
m5 ARIMA(2,1,0)(2,0,0) -1253.57 -1236.19 m15 ARIMA(10,1,0)(3,0,0) -1256.98 m9 ARIMA(3,1,0)(3,0,0) -1235.27
m6 ARIMA(2,1,0)(3,0,0) -1257.13 -1236.32 m14 ARIMA(10,1,0)(2,0,0) -1256.74 m8 ARIMA(3,1,0)(2,0,0) -1234.77
m7 ARIMA(3,1,0)(1,0,0) -1256.33 -1238.94 m7 ARIMA(3,1,0)(1,0,0) -1256.33 m10 ARIMA(4,1,0)(1,0,0) -1233.77
m8 ARIMA(3,1,0)(2,0,0) -1255.58 -1234.77 m8 ARIMA(3,1,0)(2,0,0) -1255.58 m12 ARIMA(4,1,0)(3,0,0) -1230.63
m9 ARIMA(3,1,0)(3,0,0) -1259.49 -1235.27 m10 ARIMA(4,1,0)(1,0,0) -1254.58 m11 ARIMA(4,1,0)(2,0,0) -1229.38
m10 ARIMA(4,1,0)(1,0,0) -1254.58 -1233.77 m4 ARIMA(2,1,0)(1,0,0) -1254.17 m1 ARIMA(1,1,0)(1,0,0) -1228.22
m11 ARIMA(4,1,0)(2,0,0) -1253.60 -1229.38 m11 ARIMA(4,1,0)(2,0,0) -1253.60 m3 ARIMA(1,1,0)(3,0,0) -1225.03
m12 ARIMA(4,1,0)(3,0,0) -1258.24 -1230.63 m5 ARIMA(2,1,0)(2,0,0) -1253.57 m2 ARIMA(1,1,0)(2,0,0) -1222.70
m13 ARIMA(10,1,0)(1,0,0) -1258.76 -1217.76 m3 ARIMA(1,1,0)(3,0,0) -1242.41 m13 ARIMA(10,1,0)(1,0,0) -1217.76
m14 ARIMA(10,1,0)(2,0,0) -1256.74 -1212.44 m1 ARIMA(1,1,0)(1,0,0) -1238.70 m14 ARIMA(10,1,0)(2,0,0) -1212.44
m15 ARIMA(10,1,0)(3,0,0) -1256.98 -1209.41 m2 ARIMA(1,1,0)(2,0,0) -1236.64 m15 ARIMA(10,1,0)(3,0,0) -1209.41

C. Second Round - Statistical Significance

Below we now look at each model’s coefficients and their standard errors. Specifically we are looking to see which coefficients are statistically significant.

m4
## Series: log.original 
## ARIMA(2,1,0)(1,0,0)[12]                    
## 
## Coefficients:
##          ar1     ar2    sar1
##       0.4131  0.2595  0.9729
## s.e.  0.0608  0.0608  0.0091
## 
## sigma^2 estimated as 0.0003367:  log likelihood=631.17
## AIC=-1254.34   AICc=-1254.17   BIC=-1240.23

Model 4 has all statistically significant coefficients. As such, we will make no adjustments to this model.

m7
## Series: log.original 
## ARIMA(3,1,0)(1,0,0)[12]                    
## 
## Coefficients:
##          ar1     ar2      ar3    sar1
##       0.4469  0.3149  -0.1304  0.9705
## s.e.  0.0625  0.0659   0.0631  0.0098
## 
## sigma^2 estimated as 0.0003338:  log likelihood=633.29
## AIC=-1256.57   AICc=-1256.33   BIC=-1238.94

The third coefficient of Model 7 is marginally significant. We will try restricting this coefficient to be zero.

m9
## Series: log.original 
## ARIMA(3,1,0)(3,0,0)[12]                    
## 
## Coefficients:
##          ar1     ar2      ar3    sar1    sar2     sar3
##       0.4474  0.3196  -0.1343  0.9148  0.2180  -0.1652
## s.e.  0.0652  0.0671   0.0631  0.0649  0.0867   0.0667
## 
## sigma^2 estimated as 0.0003255:  log likelihood=636.97
## AIC=-1259.95   AICc=-1259.49   BIC=-1235.27

The third coefficient of Model 9 is also marginally significant. We will try restricting this coefficient to be zero.

m13
## Series: log.original 
## ARIMA(10,1,0)(1,0,0)[12]                    
## 
## Coefficients:
##          ar1     ar2      ar3      ar4     ar5      ar6      ar7     ar8
##       0.4618  0.2921  -0.1153  -0.0259  0.0566  -0.0073  -0.1721  0.1855
## s.e.  0.0632  0.0690   0.0712   0.0701  0.0702   0.0692   0.0705  0.0706
##           ar9     ar10    sar1
##       -0.1314  -0.0557  0.9763
## s.e.   0.0693   0.0645  0.0084
## 
## sigma^2 estimated as 0.0003176:  log likelihood=642.03
## AIC=-1260.07   AICc=-1258.76   BIC=-1217.76

The third, fourth, fifth, sixth, and tenth coefficient are not statistically significant. We will try restricting these coefficients to be zero.

Below we give the code used to estimate our restricted models.

m7.restricted <- Arima(log.original, order = c(3,1,0), seasonal = list(order = c(1,0,0), period = 12), 
                       fixed = c(NA, NA, 0, NA))

m9.restricted <- Arima(log.original, order = c(3,1,0), seasonal = list(order = c(3,0,0), period = 12), 
                       fixed = c(NA, NA, 0, NA, NA, NA))


m13.restricted <- Arima(log.original, order = c(10,1,0), seasonal = list(order = c(1,0,0), period = 12), 
                        fixed = c(NA, NA, 0, 0, 0, 0, NA, NA, NA, 0, NA))

We again check to see which of the restricted models are statistically significant.

m7.restricted 
## Series: log.original 
## ARIMA(3,1,0)(1,0,0)[12]                    
## 
## Coefficients:
##          ar1     ar2  ar3    sar1
##       0.4131  0.2595    0  0.9729
## s.e.  0.0608  0.0608    0  0.0091
## 
## sigma^2 estimated as 0.000338:  log likelihood=631.17
## AIC=-1254.34   AICc=-1254.17   BIC=-1240.23

The restricted Model 4 has all statistically significant coefficients. As such, we will make no adjustments to this model.

m9.restricted
## Series: log.original 
## ARIMA(3,1,0)(3,0,0)[12]                    
## 
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
##          ar1     ar2  ar3    sar1    sar2     sar3
##       0.4111  0.2632    0  0.9153  0.2163  -0.1611
## s.e.  0.0479     NaN    0  0.0567     NaN   0.0538
## 
## sigma^2 estimated as 0.00033:  log likelihood=634.74
## AIC=-1257.47   AICc=-1257.13   BIC=-1236.32

For some reason the restricted Model 9 has some standard deviations that are unable to be calculated. Therefore, we will not consider this model further.

m13.restricted
## Series: log.original 
## ARIMA(10,1,0)(1,0,0)[12]                    
## 
## Coefficients:
##          ar1     ar2  ar3  ar4  ar5  ar6      ar7     ar8      ar9  ar10
##       0.4447  0.2377    0    0    0    0  -0.1620  0.1649  -0.1615     0
## s.e.  0.0599  0.0601    0    0    0    0   0.0617  0.0654   0.0623     0
##         sar1
##       0.9773
## s.e.  0.0080
## 
## sigma^2 estimated as 0.0003212:  log likelihood=640.23
## AIC=-1266.46   AICc=-1266   BIC=-1241.78

The restricted Model 13 has all statistically significant coefficients. As such, we will make no adjustments to this model.

We now choose between the unrestricted and restricted models using the AICc and BIC statistics.

Model AICc BIC
m4 ARIMA(2,1,0)(1,0,0) -1254.17 -1240.23
m7 ARIMA(3,1,0)(1,0,0) -1256.33 -1238.94
m9 ARIMA(3,1,0)(3,0,0) -1259.49 -1235.27
m13 ARIMA(10,1,0)(1,0,0) -1258.76 -1217.76
m7.restricted ARIMA(3,1,0)(1,0,0) - Restricted -1254.09 -1236.71
m13.restricted ARIMA(10,1,0)(1,0,0) - Restricted -1265.15 -1224.16

Although the restricted Model 13 has the lowest AICc statistic, the BIC statistic of this model is one of the highest out of all the models; thus, we will not consider this model especially with the amount of parameters that need to be estimated from this model. The next lowest AIc statistic is given by Model 9 and the lowest BIC statistic is given by Model 4.

Step 3 - Model Adequacy

In this section we evaluate the model adequacy of both models chosen from Part C in Step 2. We perform an in-sample adequecy check by looking at characteristics of the residuals of each model. Whichever model performs better in the in-sample adequecy model will be the model we will use in the out-of-sample adequecy test; that is, our forecasting test.

Specifically, we examine the residuals of each model, the ACF of the residuals, and the Q-statistics of each model. The residuals of the model need to look like white noise. We can examine this visually by looking at the time series of the residuals of the model. Formally, we can examine white noise by first looking at the ACF of the resiudals; if the residuals are white noise then we should see statistically significant lags in the ACF (not autocorrelated). Also formally, we can examine the Q statistics of each model, where if the residuals are white noise then the Q-statistics should not be statistically significant.

The code that produces this analysis is given below.

tsdiag(m4, gof.lag = max.lag)
tsdiag(m9, gof.lag = max.lag)

We first look at Model 4 - ARIMA(2,1,0)(1,0,0).

Visually the time series of the residuals looks like white noise and there appears to be no outliers. Similarly, the ACF of the residuals appears to have no significant lags. However, many of the Q statistics are statistically significant. Thus, we most likely will not use this model unless Model 9 produces worse results.

We now look at Model 9 - ARIMA(3,1,0)(3,0,0).

Visually the time series of the residuals looks like white noise and there appears to be no outliers. Similarly, the ACF of the residuals appears to have no significant lags. However, many of the Q statistics are statistically significant. Unlike Model 4 the Q-statistics have much larger p-values than the critical values and compared with Model 4.

Therefore, we will use Model 9 - \(ARIMA(3,1,0)(3,0,0)\) - as our forecasting model.

IV. Forecasting - Out of Sample Adequacy

Using Model 9 we will run two forecasts to check how accurate our model is:

  1. Multistep Forecast Model
  2. One Month ahead Forecasts

Both forecasts are forecasted over the period between January 2014 and December 2016.

Forecast 1 - Multistep Forecast

In a multi-step forecast the model parameters for the \(ARIMA(3,1,0)(3,0,0)\) are estimated once over the estimation sub-sample. The parameters estimated for the estimation sub-sample are then used across the entire prediction sample. The code used to produce the multi-step forecast is given below. We also show a plot of the multi-step forecast, along with the actual values over the prediction period.

multistep.forecast <- forecast(m9, length(data.TPRCS.prediction))

As we can see the multi-step forecast follows the same shape of the actual data; however, since the parameters are unchanging since the model was inaccurate towards the beginning of the prediction sample, it remained inaccurate for the remaining periods. Therefore, we also perform a rolling forecast in the following section.

Forecast 2 - Rolling Forecast

In a rolling forecast the estimation sample moves across time as time progresses. The estimation sample updates to contain the same amount of data that were in the original model, and each update contains new parameters for the \(ARIMA(3,1,0)(3,0,0)\) model to forecast the one month ahead Total Private Residential Construction Spending. The model is re-estimated again and again until we have forecasted the entirety of the prediction sample. The code used to produce the rolling forecast is given below. We also show a plot of the rolling forecast, along with the actual values over the prediction period.

Rolling.Forecast <- function(DATA, first.month, last.month){
  rolling.forecast <- zoo()
  log.full.data <- log(DATA)
  
  for(i in 1:length(data.TPRCS.prediction)) {
    temp <- window(log.full.data, start = first.month + (i-1)/12, end = last.month + (i-1) / 12)
    model.update <- Arima(temp, order = c(3,1,0), seasonal = list(order = c(3,0,0), period = 12))
    rolling.forecast <- c(rolling.forecast, forecast(model.update, 1)$mean)
  }
  
  rolling.forecast <- as.ts(rolling.forecast)
  
  return(rolling.forecast)
}

rolling.forecast <- Rolling.Forecast(data.TPRCS, start.month, end.month)

As we can see from the above graph the rolling forecast appears to almost perfectly forecast the actual data over the prediction sample.

Comparing the Forecasts

We now compare the accuracy of the two forecasts by first graphically looking at the time series of both forecasts with respect to the actual data values we saw during the prediction sample. We then quantify the accuracy of the two models using common forecasting error statistics. The graph of both forecasts is given below.

The above graph clearly shows that the rolling forecasting method performs better in the out-of-sample test than does the multi-step forecasting method. To quantify how much better the rolling method is we use the accuracy commmand in R. The code and results are shown below and comments follow.

accuracy(f = multistep.forecast, x = log(data.TPRCS.prediction))
accuracy(f = rolling.forecast, x = log(data.TPRCS.prediction))

In the table below ME stands for Mean Error, RMSE stands for Relative Mean Squared Error, MAE stands for Mean Absolute Error, MPE stands for Mean Percent Error, and lastly MAPE stands for Mean Absolute Percentage Error. All of the above measures quantify the out-of-sample accuracy of each forecast.

ME RMSE MAE MPE MAPE
Multi-Step Forecast -0.2069935 0.2398379 0.2069935 -1.9723477 1.9723477
Rolling Scheme Forecast -0.0021240 0.0228476 0.0175377 -0.0205714 0.1675144

We see that the rolling scheme forecast of the \(ARIMA(3,1,0)(3,0,0)\) model far outperforms the multi-step forecasting scheme of the model since all of the error measures are much smaller and signifcantly closer to zero than is the multi-step forecasting errors.

V. Conclusion

In this exercise we have performed a time-series analysis on the Total Private Residential Construction Spending (TPRCS) of the United States. We found that an \(ARIMA(3,1,0)(3,0,0)\) model best produces data that looks like the actual data we see in the TPRCS of the United States out of all of the models tested. We also find that using the rolling forecasting scheme produces much more accurate forecasts of TPRCS than the multi-step forecasting scheme. In short, we should use an \(ARIMA(3,1,0)(3,0,0)\) model to both model and forecast the Total Private Residential Construction Spending (TPRCS) of the United States.