Time Series Analysis: The Quarterly Real Personal Consumption Expenditures

  1. Read Data
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
library(ggfortify)
## Warning: package 'ggfortify' was built under R version 3.3.3
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
library(Quandl)
## Warning: package 'Quandl' was built under R version 3.3.3
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.3.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# Import time series for the quarterly Real Personal Consumption Expenditures

RPCE <- Quandl("FRED/PCECC96", type="ts")

str(RPCE)
##  Time-Series [1:284] from 1947 to 2018: 1199 1219 1223 1224 1230 ...
  1. Construct the Log Changes
# construct the log changes in the Real Personal Consumption Expenditures

dlRPCE <- diff(log(RPCE), lag=1)

autoplot(RPCE, main="Real Personal Consumption Expenditures")

autoplot(dlRPCE, main="The Log Changes in the Real Personal Consumption Expenditures")

  1. Autocorrelation and Partial Autocorrelation Functions
nlags <- 40

ggAcf(dlRPCE, type="correlation", lag.max=nlags, main="ACF for the log changes in the Real Personal Consumption Expenditures")

ggPacf(dlRPCE, type="partial",lag.max=nlags, main="PACF for the log changes in the Real Personal Consumption Expenditures" )

  1. ARIMA
# As shown in the plots in part (c), the ACF fucntion has a significant lag at 2, indicating AR(2) model. The AIC = -1949.09

ar2 <- arima(dlRPCE, order=c(2,0,0))

ar2
## 
## Call:
## arima(x = dlRPCE, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.0617  0.3193     0.0082
## s.e.  0.0562  0.0562     0.0007
## 
## sigma^2 estimated as 5.805e-05:  log likelihood = 978.55,  aic = -1949.09
#The PACF fucntion has a significant lags at 2 and 4, indicating MA(2) or MA(4) model. The AIC = -1952.77 for MA(2) and the AIC = -1950.28 for MA(4). [the AIC is the lowest at MA(2). Then, MA(2) is the best model].

ma2 <- arima(dlRPCE, order=c(0,0,2))

ma2
## 
## Call:
## arima(x = dlRPCE, order = c(0, 0, 2))
## 
## Coefficients:
##          ma1     ma2  intercept
##       0.0291  0.3660     0.0082
## s.e.  0.0560  0.0576     0.0006
## 
## sigma^2 estimated as 5.729e-05:  log likelihood = 980.39,  aic = -1952.77
ma4 <- arima(dlRPCE, order=c(0,0,4))

ma4
## 
## Call:
## arima(x = dlRPCE, order = c(0, 0, 4))
## 
## Coefficients:
##          ma1     ma2     ma3      ma4  intercept
##       0.0564  0.3683  0.0701  -0.0037     0.0082
## s.e.  0.0595  0.0600  0.0571   0.0734     0.0007
## 
## sigma^2 estimated as 5.698e-05:  log likelihood = 981.14,  aic = -1950.28
# The AIC for ARMA(2,2) = -1950.26

arma22 <- arima(dlRPCE, order=c(2,0,2))

arma22
## 
## Call:
## arima(x = dlRPCE, order = c(2, 0, 2))
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  intercept
##       0.1875  -0.0310  -0.1314  0.3901     0.0082
## s.e.  0.1559   0.1786   0.1431  0.1682     0.0007
## 
## sigma^2 estimated as 5.698e-05:  log likelihood = 981.13,  aic = -1950.26
  1. Diagnostics of models and Accuracy, and AIC and BIC and Q-statistics
nlags <- 40 

# Diagnostics of both models

ggtsdiag(ar2, gof.lag = nlags)
## Warning: package 'bindrcpp' was built under R version 3.3.3

ggtsdiag(ma2, gof.lag = nlags)

ggtsdiag(arma22, gof.lag = nlags)

# Accuracy.This is done by checking for stationary and invertibility using the plots of inverted AR and MA roots. 

plot(ar2)

plot(ma2)

plot(arma22)

# AIC

# The model that has the smallest AIC(in small samples) or BIC(in large samples) values is preferred. The number of data is 284 and this is small, so AIC is useful to assess the models.

# From the results, the MA(2) model has the smallest AIC value, -1952.77, therefore it suggests that an MA(2) model is preferred by the criterion.

# BIC

# From the results above, the MA(2) model has the smallest BIC value, -1938.19, therefore it suggests that an MA(2) model is preferred by the criterion.

# To sum up, MA(2) which has the smallest values is preferred based on both AIC, and BIC.

# The AIC and the BIC is the lowest at MA(2). Then the best model is MA(2).

AIC(ar2, ma2, arma22)
##        df       AIC
## ar2     4 -1949.094
## ma2     4 -1952.773
## arma22  6 -1950.263
BIC(ar2, ma2, arma22)
##        df       BIC
## ar2     4 -1934.512
## ma2     4 -1938.191
## arma22  6 -1928.390
# Q-statistics : Portmanteau' tests

# The Portmanteau (Box-Pierce or Ljung-Box) test examines the null of independently distributed residuals. It's derived from the idea that the residuals of a "correctly specified" model are independently distributed. If the residuals are not, then they come from a miss-specified model. Based on the results below, the null hypothesis cannot be rejected (p-values are larger than 0.1). Therfore there are no autocorrelation in time series residuals. 

# I test residuals using Box-Ljung. 

ar2.LB <- Box.test(residuals(ar2),lag=2, type="Ljung")

# The p-value is 0.7406 which is relatively large, so as a result, there is no corrleation in the residuals.

ma2.LB <- Box.test(residuals(ma2),lag=2,type="Ljung")

# The p-value is 0.8904 which is relatively large, so as a result, there is no correlation in the residuals.

arma22.LB <- Box.test(residuals(arma22),lag=2,type="Ljung")

# The p-value is 0.9995 which is relatively large, so as a result, there is no correlation in the residuals.

ar2.LB; ma2.LB; arma22.LB
## 
##  Box-Ljung test
## 
## data:  residuals(ar2)
## X-squared = 0.60066, df = 2, p-value = 0.7406
## 
##  Box-Ljung test
## 
## data:  residuals(ma2)
## X-squared = 0.23218, df = 2, p-value = 0.8904
## 
##  Box-Ljung test
## 
## data:  residuals(arma22)
## X-squared = 0.00095196, df = 2, p-value = 0.9995
  1. Model specification using auto.arima
m1 <- auto.arima(dlRPCE, ic="aic", seasonal=FALSE, stationary=TRUE, stepwise=FALSE, approximation=FALSE)

m1
## Series: dlRPCE 
## ARIMA(0,0,2) with non-zero mean 
## 
## Coefficients:
##          ma1     ma2    mean
##       0.0291  0.3660  0.0082
## s.e.  0.0560  0.0576  0.0006
## 
## sigma^2 estimated as 5.79e-05:  log likelihood=980.39
## AIC=-1952.77   AICc=-1952.63   BIC=-1938.19
plot(m1)

accuracy(m1)
##                         ME       RMSE         MAE      MPE     MAPE
## Training set -9.922055e-06 0.00756888 0.005296848 29.15054 201.3332
##                   MASE       ACF1
## Training set 0.6629412 0.02844712
ggtsdiag(m1, gof.lag = nlags)

m2 <- auto.arima(dlRPCE, ic="bic", seasonal=FALSE, stationary=TRUE, stepwise=FALSE, approximation=FALSE)

m2
## Series: dlRPCE 
## ARIMA(0,0,2) with non-zero mean 
## 
## Coefficients:
##          ma1     ma2    mean
##       0.0291  0.3660  0.0082
## s.e.  0.0560  0.0576  0.0006
## 
## sigma^2 estimated as 5.79e-05:  log likelihood=980.39
## AIC=-1952.77   AICc=-1952.63   BIC=-1938.19
plot(m2)

accuracy(m2)
##                         ME       RMSE         MAE      MPE     MAPE
## Training set -9.922055e-06 0.00756888 0.005296848 29.15054 201.3332
##                   MASE       ACF1
## Training set 0.6629412 0.02844712
ggtsdiag(m2, gof.lag = nlags)

  1. Summary
# The quarterly data of Real Personal Consumption Expenditures were analyzed. Using the ACF and PACF functions, it was determined that three models should be tested to see which one would be chosen as the best fit. The AR(2), MA(2), ARMA(2,2) models all seemed to fit the data well, though based on AIC and BIC statistics, it appears that the MA(2) model is the best fit for the data.

# To choose preferred model, we also examine the adequacy and diagnose residuals. These models are adequate and very similar properties base don ACF, PACF and Q statistics for the residuals. As a result, MA(2) is more proper when it is compared to other two models,