Basic autoregressive (AR) & moving-avergae (MA) Model Development for Real Personal Consumption Expenditures

Introduction

For this simple model development I will construct a time series with log change in Real Personal Consumption Expenditures \(y_t=\Delta log (c_t) = log (c_t) - log (c_{t-1})\) where \(c_t\) is the original quarterly Real Personal Consumption Expenditures.

Data

I will import the time series for the quarterly Real Personal Consumption Expenditures from the Quandl website.
Before I get started with loading the data I need to take care of a few house keeping issues. I will load/require the necessary packages that will be used during this model development.

require(forecast)
require(Quandl)
require(ggplot2)
require(dygraphs)

Next I will load the data.

Quandl.api_key('Ltw-PAye5rkz6MwzLNx-')
rPCECC96 <- Quandl("FRED/PCECC96", type="zoo")

A quick inspection shows that this data on Real Personal Consumption Expenditures is at quarterly frequency and that the data is available for the period 1947 Q1 to 2015 Q4.

str(rPCECC96)

## 'zooreg' series from 1947 Q1 to 2015 Q4
##   Data: num [1:276] 1199 1219 1223 1224 1230 ...
##   Index: Class 'yearqtr'  num [1:276] 1947 1947 1948 1948 1948 ...
##   Frequency: 4

A plot of the original time series data will allow us to quickly determine if any transformations of the data are necessary.

We can see from the above plot that the original data follows an exponential trend and is non-stationary. Applying a log change transformation to the data will remove this exponential trend and allow us to determine if this transformation will be sufficiently stationary to satisfy the necessary conditions for time series weak stationarity.

The following plot shows a time series for the log change in Real Personal Consumption Expenditures \(y_t=\Delta log (c_t) = log (c_t) - log (c_{t-1})\)

From the above plot we can see that the exponential trend is gone and the application of the log change transformation has rendered the series time invariant with respect to the mean and variance. In addition, the covariance between \(c_t\) and \(c_{t-\tau}\) only depend on the lag \(\tau\), where \(\tau\) is a finite integer. The time series is now weakly stationary where \(E(c_t)=\mu\) and \(y_\tau=cov(c_t, c_{t-\tau})\).

Model Identification, Estimation & Checking for Adequacy

The next step will be to identify an appropriate model by looking at a plot of the autocorrelation function (ACF) and partial autocorrelation function (PACF) for \(y_t\).

Autoregressive (AR) Model Exploration

For my AR(p) model, according to the ACF and PACF plots I will look at an AR(2) as my primary candidate and then compare the AR(2) with the AR(1) and AR(3) models.

ar2.model <- arima(diff_log_rPCECC96, order = c(2,0,0))
ar1.model <- arima(diff_log_rPCECC96, order = c(1,0,0))
ar3.model <- arima(diff_log_rPCECC96, order = c(3,0,0))

# Testing AR(2) Model
tsdiag(ar2.model, gof.lag = 12)

# Testing AR(1) Model
tsdiag(ar1.model, gof.lag = 12)

# Testing AR(3) Model
tsdiag(ar3.model, gof.lag = 12)

Based on the diagnostic information above it appears that the AR(2) model may be the best fitting AR model. The AR(1) shows some autocorrelation between the residuals but more importantly the p-values on the Ljung-Box statistics are all small indicating some pattern in the residuals.

The AR(3) model is similar to the AR(2).

I will have a look at the Akaike Information Criterion for the three models to determine which would be the better to use.

aicCompare <- c(ar1=ar1.model$aic, ar2=ar2.model$aic, ar3=ar3.model$aic)
aicCompare

##       ar1       ar2       ar3 
## -1858.642 -1886.158 -1884.233

So as we can see the Akaike Information Criterion favors the AR(2) model compared to the three models presented in this basic autoregressive (AR) analysis.

Moving Average (MA) Model Exploration

The same ambiguity exists in the ACF and PACF plots for the moving-average but I will look again at the MA(2) as my primary model of interest and then compare the results to the MA(1) and MA(3) models.

ma2.model <- arima(diff_log_rPCECC96, order = c(0,0,2))
ma1.model <- arima(diff_log_rPCECC96, order = c(0,0,1))
ma3.model <- arima(diff_log_rPCECC96, order = c(0,0,3))

# Testing MA(2) Model
tsdiag(ma2.model, gof.lag = 12)

# Testing MA(1) Model
tsdiag(ma1.model, gof.lag = 12)

# Testing MA(3) Model
tsdiag(ma3.model, gof.lag = 12)

Based on the diagnostic information above it appears that the MA(2) model may be the best fitting MA model. The MA(1) shows some autocorrelation between the residuals but more importantly the p-values on the Ljung-Box statistics are all small indicating some pattern in the residuals.

The MA(3) model is similar to the MA(2).

I will have a look at the Akaike Information Criterion for the three models to determine which would be the better to use.

aicCompare_ma <- c(ma1=ma1.model$aic, ma2=ma2.model$aic, ma3=ma3.model$aic)
aicCompare_ma

##       ma1       ma2       ma3 
## -1857.783 -1889.769 -1889.209

The Akaike Information Criterion favors the MA(2) model compared to the three models presented in this basic moving-average (MA) analysis.

Conclusion

This project called for the analysis of quarterly Real Personal Consumption Expenditures time series data. The first step was to determine which order autoregressive model would fit this data better. We concluded that an AR(2) model would fit this data set better than an AR(1) or AR(3). The next step was to look at a moving-average model and determine which order MA(q) model might fit this data best. We concluded that an MA(2) model would fit the data better than an MA(1) or MA(3) model.

However, for this time series I might favor an ARMA(1,2) model as a better fit for modeling this data set. Further investigations would be needed to determine the viability of this idea.