The goal of this assignment is to construct an appropriate time series model of quarterly Real Personal Consumption Expenditures. This data is readily available thruogh the Federal Reserve Economic Data of St. Louis Federal Reserve Bank (FRED).
The rest of the document is organized as follows. Section I describes how we import the data and also how and why we need to transform the data. Section II calculates the ACF and PACF functions in order to determine the appropriate models to test. Section III calculates the models we identified as plausible from Section II. Section IV tests the adequacy of the models. Section V concludes.
In this section we will show how we import the data and how we correct the data for possible violations of the weakly stationary assumption.
I downloaded quarterly Real Personal Consumption Expenditures data from Quandl at the following website: FRED/PCECC96. I imported this data into R using the following set of commands:
library(Quandl)
library(forecast)
Quandl.api_key("75HFthvbcxsrteNWupus")
raw.data <- Quandl("FRED/PCECC96")
Graphically a plot of the time series of Real Personal Consumption Expenditures looks like: As we can see, there is a clear linear trend upwards. This implies that the time series is not weakly stationary, which is a necessary condition for most of the models we will be testing. Hence we will need to transform the data before we proceed any further.
We now transform the raw data by first taking the log of all of the consumption values and then taking the first difference between each value from quarter to quarter. Ideally, this will allow us to assume that our time series is weakly stationary. Mathematically this looks like: \[y_t = \Delta \log(c_t)=\log(c_t)-\log(c_{t-1}) \] We reproduce the above equation in R using the following code:
log.data <- xts(x = log(raw.data$VALUE), order.by = raw.data$DATE)
diff.log.data <- diff(log.data, k = 1)
Graphically our time series after the log transform looks like: The time series after the transformation looks as if the mean of each conditional distribution with respect to time at least approximationatly constant over time. In other words, the transformation allows us to assume that the time series is approximately weakly stationary.
In this section we show how we derive the ACF and PACF. We then use these functions to determine which models could have been suitable in producing the given data.
Now that we have our time series we want to determine which type of model we should use. To do so we need to examine the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the transformed data. Theoretically there are three possible scenarios we can conclude from looking at the ACF and the PACF:
If the true model that produced the data is an auto-regressive model (AR) then the PACF should equal zero after lag p.
If the true model that produced the data is a moving average model (MA) then the ACF should equal zero after lag q.
If the true model that produced the data is an auto-regressive moving average model (ARMA) then the ACF and PACF will oscillate around zero, and not be particularly useful.
We now estimate the ACF and the PACF from the data. We calculate these functions up until the 20th lag (5 years). The code below shows how we produced these functions and the graphs follow:
ACF <- Acf(diff.log.data, type = 'correlation', lag = 20)
PACF <- Acf(diff.log.data, type = 'partial', lag = 20)
The second lag in both figures is statistically different from zero. All other lags (with the exception of a slightly signficant 4th lag in the PACF) are not statistically significant. We also note that neither the ACF or PACF model seem to converge towards zero. This leaves us to suspect that the model that produced the data is an ARMA model.
Nevertheless we will go ahead and calculate the following five possible model types that we seem reasonable from the ACF and PACF results:
We now proceed to the actual calculations of each possible model.
We use the following R code to calculate each of the models described above.
m1 <- Arima(diff.log.data, order = c(2,0,0))
m2 <- Arima(diff.log.data, order = c(0,0,2))
m3 <- Arima(diff.log.data, order = c(1,0,2))
m4 <- Arima(diff.log.data, order = c(2,0,1))
m5 <- Arima(diff.log.data, order = c(2,0,2))
The AR(2) model is given by:
\(y_t =\) 0.008 \(+\) 0.061 \(y_{t-1} +\) 0.319 \(y_{t-2}+\varepsilon_t\)
The MA(2) model is given by:
\(y_t =\) 0.008 \(+ \varepsilon_t +\) 0.028 \(\varepsilon_{t-1} +\) 0.365 \(\varepsilon_{t-2}\)
The ARMA(1,2) model is given by:
\(y_t =\) 0.008 \(+\) 0.186 \(y_{t-1} + \varepsilon_t +\) 0.131 \(\varepsilon_{t-1} +\) 0.362 \(\varepsilon_{t-2}\)
The ARMA(2,1) model is given by:
\(y_t =\) 0.008 \(+\) 0.085 \(y_{t-1} +\) 0.316 \(y_{t-2}+\varepsilon_t -\) 0.027 \(\varepsilon_{t-1}\)
The ARMA(2,2) model is given by:
\(y_t =\) 0.008 \(+\) 0.193 \(y_{t-1} -\) 0.031 \(y_{t-2}+\varepsilon_t -\) 0.138 \(\varepsilon_{t-1} +\) 0.389 \(\varepsilon_{t-2}\)
We will now test if the above models are adequate and plausible.
Stationarity implies that the mean of a variable and the covariance between the variable and its lag are time invariant, that is, constant over time. In applications stationarity allows us to make inferences concerning future observations. Realistically, time series data are generally not independent (which is a usual assumption in most statistical applications). Hence, stationarity is a means by which we can model the dependence between a variable and the lag of that same variable. In fact, stationarity is useful because many statistical properties we can employ when random variables are independent can be employed in dependent time series data if the stationarity assumption holds.
Invertibility implies that more recent observations have more weight in determining today’s observation than observations that are more distant in the past. Thus, for a model to be invertible then we need more recent information to be more relevant than past information.
We now note that pure AR models are always invertible, and that pure MA models are always stationary. Hence, for an AR model we must check for stationarity, and for an MA model we must check for invertibility. For an ARMA model we will need to check for both conditions. Mathematically, any stationary AR process can be expressed as an infinite ordered MA process, and any invertible MA process can be expressed as an infinite ordered AR process.
More empirically, we check if an AR model is stationry if the inverses of the characteristic roots to the ACF lie within the unit circle. Similarly we determine if an MA model is invertible if the inverse of the characteristic roots of the PACF lie within the unit circle. An ARMA model is valid if and only if the characteristic roots of both lie within the unit circle. To test these roots we use the following code below. (Note: for convenience we plot the AR and MA unit circles on the same plot, but it is important to see that these are separate)
plot.Arima(m1) # Checks for stationarity of AR model
plot.Arima(m2) # Checks for invertibility of MA model
plot.Arima(m3) # Checks for both stationartiy and invertibility of ARMA model
plot.Arima(m4) # Checks for both stationartiy and invertibility of ARMA model
plot.Arima(m5) # Checks for both stationartiy and invertibility of ARMA model
Inverse Roots of AR(2) and MA (2) Model
Inverse Roots of ARMA(1,2) Model
Inverse Roots of ARMA(2,1) Model
Inverse Roots of ARMA(2,2) Model
The above plots show that all five model specifications are stationary and invertible. Therefore, each model can be considered to be a valid model. We will now diagnose the residuals of each model.
Since all of the models have stationarity and invertibility then we will now check the residuals of each model to determine if our models will be good forecasting models. We note that a good model will have residuals that are uncorrelated and an even better model will have uncorrelated residuals with a zero mean. If residuals are correlated then we say that there is still information in the residuals that could be used in our forecasting. If residuals do not have a zero mean then we say that our forecasts are biased.
We use the following code to evaluate the residuals of each model:
tsdiag(m1, gof.lag = 10)
tsdiag(m2, gof.lag = 10)
tsdiag(m3, gof.lag = 10)
tsdiag(m4, gof.lag = 10)
tsdiag(m5, gof.lag = 10)
The code above produces the following graphs:
Residuals of AR(2) Model
Residuals of MA(2) Model
Residuals of ARMA(1,2) Model
Residuals of ARMA(2,1) Model
Residuals of ARMA(2,2) Model
The first graph produced in each of the above sections shows a simple plot of the (standardized) residuals. This plot is most useful to show us whether or not the mean of the residual is approximately zero (no bias in forecasting). From the plots we can see that each model appears to have a zero-mean residual but we can be more mathematically precise and actually calculate the mean of each residual below:
Hence, all five model specifications have an approximately zero-mean average residual (statistically not different than zero).
Since our models all appear to be valid under the above tests, we now determine which is the best model using information criteria statistics. In general, the smaller these statistics the better the model (except for the log-liklihood statistic, where the larger the statistic the better the model). We report the AIC, AICc, BIC, and log-liklihood statistics in the table below:
## AIC AICc BIC log.liklihood
## AR(2) -1917.343 -1917.197 -1902.804 962.6714
## MA(2) -1920.922 -1920.777 -1906.383 964.4611
## ARMA(1,2) -1920.413 -1920.194 -1902.239 965.2064
## ARMA(2,1) -1915.381 -1915.162 -1897.207 962.6906
## ARMA(2,2) -1918.447 -1918.139 -1896.638 965.2234
According to the above statistics the best model appears to be the MA(2) model. This is quite surprising as our original analysis suggseted that because the ACF and PACF appeared to be oscillating that we might suspect that the data were produced by an ARMA model. Nonetheless the MA(2) model seems to prevail.
When analyzing the time series of quarterly Real Personal Consumption Expenditures, I have found that an MA(2) model is the most accurate representation of producing data similar to that of which we have observed. I have also tested an AR(2), ARMA(1,2), ARMA(2,1), and ARMA(2,2) model. The MA(2) model is stationary, invertible, has statistically significant coefficients and has uncorrelated, zero-mean residuals.