The objective of this study is to check which Autoregressive (AR(p)) and Moving Average (MA(q)) model fits the log transformed 1st differenced data of ‘Real Personal Consumption Expenditure’. Therefore, different formal tests for identification, model estimation and model diagonostic had been conducted on the log transformed differenced dataset of Real Personal Consumption Expenditure.
The data was collected from Quandl where the main data source is Federal Reserve Economic Data website. For this analysis, quarterly data of Real Personal Consumption Expenditure had been considered and the quaterly data is from January 1947 to October 2015. For convenience, I assume rpc = Real Personal Consumption Expernditure.
rpc<- read.csv(url("http://research.stlouisfed.org/fred2/data/PCECC96.csv"))
str(rpc)
## 'data.frame': 276 obs. of 2 variables:
## $ DATE : Factor w/ 276 levels "1947-01-01","1947-04-01",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ VALUE: num 1199 1219 1223 1224 1230 ...
The following table suggests that the dataset is quarterly:
head(rpc)
## DATE VALUE
## 1 1947-01-01 1199.4
## 2 1947-04-01 1219.3
## 3 1947-07-01 1223.3
## 4 1947-10-01 1223.6
## 5 1948-01-01 1229.8
## 6 1948-04-01 1244.1
Now, if we plot the quarterly data against time, the following plot shows an upward trend over time.
plot(rpc, xlab="Quarterly Time period (1947-2015)", ylab="Billions of Chained $2005 Seasonally Adjusted Annual Rate", main= "Quarterly RPC(Real Personal Consumption Expenditure)")
According to the instruction in the problem, I took log of the quarterly data (defined as “lrpc”“) and then took a difference of the log transformed rpc which had been defined as”dlrpc“.
lrpc<- log(rpc[,2])
dlrpc<- diff(lrpc)
plot(dlrpc, xlab="Quarterly Time period (1947-2015)", ylab="Log difference of RPC", main= "Log difference of Quarterly RPC")
From the plot of the differenced dataset, we can see that the trend had been removed. Furthermore, there is no cyclical or seasonal effect is visible. Therefore, from the plot, we can assume that the data is stationary.
\(Note\) : Usually, a dataset is differenced to make the dataset stationary (as without making the dataset stationary, running an AR(p) or MA(q) model will not give any proper meaning). Therefore, after differencing, whether the data is stationary or not, it is important to do the Unit Root test. I conducted the Augmented Dickey-Fuller test which suggests that the differenced log value of Real Personal Consumption Expenditure (dlrpc) is stationary (\(p-value = 0.01 ; lag = 5\))
The specific order of AR(p) and MA(q) are tried to be found in this stage. The auto correlation function (ACF) and partial autocorrelation function (PACF) are constructed for the differenced dataset.
acf(dlrpc, lag=36, xlab="Quarterly Lag", ylab="ACF", main="Sample ACF")
pacf(dlrpc, lag=36, xlab="Quarterly Lag", ylab="PACF", main="Sample PACF")
Based on the PACF, we can suggest that spike on lag 2 and lag 4 are significant as they are outside of the bound.Hence, we can have an assumption that AR(2) or AR(4) might be a fit. In sample PACF, the spikes are showing decay with an oscialiation process.
The sample ACF shows the spike on lag 2 is significant. Therefore, we can assume that MA(2) can be a better fit for the dataset. In sample ACF, the spikes are showing decay with an oscialiation process.
Based on the observation of sample PACF, AR(1), AR(2), AR(3) and AR(4) had been checked.
arima(dlrpc, order=c(1,0,0))
##
## Call:
## arima(x = dlrpc, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.0893 0.0082
## s.e. 0.0601 0.0005
##
## sigma^2 estimated as 6.649e-05: log likelihood = 932.32, aic = -1858.64
ar1<-arima(dlrpc, order=c(1,0,0))
BIC(ar1)
## [1] -1847.791
ar1$coef/sqrt(diag(ar1$var.coef))
## ar1 intercept
## 1.486139 15.035198
(1-pnorm(abs(ar1$coef)/sqrt(diag(ar1$var.coef))))*2
## ar1 intercept
## 0.1372424 0.0000000
tsdiag(ar1, gof.lag=24)
arima(dlrpc, order=c(2,0,0))
##
## Call:
## arima(x = dlrpc, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 0.0599 0.3188 0.0082
## s.e. 0.0571 0.0570 0.0007
##
## sigma^2 estimated as 5.968e-05: log likelihood = 947.08, aic = -1886.16
ar2<-arima(dlrpc, order=c(2,0,0))
BIC(ar2)
## [1] -1871.691
ar2$coef/sqrt(diag(ar2$var.coef))
## ar1 ar2 intercept
## 1.050373 5.589556 10.899071
(1-pnorm(abs(ar2$coef)/sqrt(diag(ar2$var.coef))))*2
## ar1 ar2 intercept
## 2.935465e-01 2.276514e-08 0.000000e+00
tsdiag(ar2, gof.lag=24)
arima(dlrpc, order=c(3,0,0))
##
## Call:
## arima(x = dlrpc, order = c(3, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 intercept
## 0.0545 0.3178 0.0165 0.0082
## s.e. 0.0604 0.0571 0.0603 0.0008
##
## sigma^2 estimated as 5.966e-05: log likelihood = 947.12, aic = -1884.23
ar3<-arima(dlrpc, order=c(3,0,0))
BIC(ar3)
## [1] -1866.149
ar3$coef/sqrt(diag(ar3$var.coef))
## ar1 ar2 ar3 intercept
## 0.9031198 5.5618712 0.2738954 10.7257884
(1-pnorm(abs(ar3$coef)/sqrt(diag(ar3$var.coef))))*2
## ar1 ar2 ar3 intercept
## 3.664623e-01 2.668972e-08 7.841650e-01 0.000000e+00
tsdiag(ar3, gof.lag=24)
arima(dlrpc, order=c(4,0,0))
##
## Call:
## arima(x = dlrpc, order = c(4, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 ar4 intercept
## 0.0570 0.3647 0.0244 -0.1444 0.0082
## s.e. 0.0597 0.0598 0.0598 0.0596 0.0007
##
## sigma^2 estimated as 5.84e-05: log likelihood = 950.02, aic = -1888.03
ar4<-arima(dlrpc, order=c(4,0,0))
BIC(ar4)
## [1] -1866.332
ar4$coef/sqrt(diag(ar4$var.coef))
## ar1 ar2 ar3 ar4 intercept
## 0.9554484 6.0975802 0.4076225 -2.4215870 12.3531993
(1-pnorm(abs(ar4$coef)/sqrt(diag(ar4$var.coef))))*2
## ar1 ar2 ar3 ar4 intercept
## 3.393510e-01 1.076861e-09 6.835508e-01 1.545290e-02 0.000000e+00
tsdiag(ar4, gof.lag=24)
arima(dlrpc, order=c(4,0,0), fixed=c(0,NA,0,NA,NA))
## Warning in arima(dlrpc, order = c(4, 0, 0), fixed = c(0, NA, 0, NA, NA)):
## some AR parameters were fixed: setting transform.pars = FALSE
##
## Call:
## arima(x = dlrpc, order = c(4, 0, 0), fixed = c(0, NA, 0, NA, NA))
##
## Coefficients:
## ar1 ar2 ar3 ar4 intercept
## 0 0.3708 0 -0.1406 0.0082
## s.e. 0 0.0598 0 0.0596 0.0006
##
## sigma^2 estimated as 5.872e-05: log likelihood = 949.28, aic = -1890.55
ar4R<-arima(dlrpc, order=c(4,0,0), fixed=c(0,NA,0,NA,NA))
## Warning in arima(dlrpc, order = c(4, 0, 0), fixed = c(0, NA, 0, NA, NA)):
## some AR parameters were fixed: setting transform.pars = FALSE
BIC(ar4R)
## [1] -1876.086
ar4R$coef/sqrt(diag(ar4R$var.coef))
## Warning in ar4R$coef/sqrt(diag(ar4R$var.coef)): longer object length is not
## a multiple of shorter object length
## ar1 ar2 ar3 ar4 intercept
## 0.0000000 6.2172518 0.0000000 -2.3524939 0.1370206
(1-pnorm(abs(ar4R$coef)/sqrt(diag(ar4R$var.coef))))*2
## Warning in abs(ar4R$coef)/sqrt(diag(ar4R$var.coef)): longer object length
## is not a multiple of shorter object length
## ar1 ar2 ar3 ar4 intercept
## 1.000000e+00 5.059377e-10 1.000000e+00 1.864800e-02 8.910145e-01
tsdiag(ar4R, gof.lag=24)
Analyzing the Standarized Residual plot, ACF of REsiduals plot, and p-values for Ljung-Box Statistic plot. In case of AR(2), AR(4) and restricted AR(4), ACF residual is closer to zero. Moreover, p-value for Ljung -Box stastistic is higher than 0.6. This indicats Restricted AR(4) ad AR(4) are the better models among all the AR(p) models.
Based on the observation of sample ACF, MA(1) and MA(2) had been checked.
arima(dlrpc, order=c(0,0,1))
##
## Call:
## arima(x = dlrpc, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.0546 0.0082
## s.e. 0.0472 0.0005
##
## sigma^2 estimated as 6.67e-05: log likelihood = 931.89, aic = -1857.78
ma1<-arima(dlrpc, order=c(0,0,1))
BIC(ma1)
## [1] -1846.933
ma1$coef/sqrt(diag(ma1$var.coef))
## ma1 intercept
## 1.158286 15.616958
(1-pnorm(abs(ma1$coef)/sqrt(diag(ma1$var.coef))))*2
## ma1 intercept
## 0.2467474 0.0000000
tsdiag(ma1, gof.lag=24)
arima(dlrpc, order=c(0,0,2))
##
## Call:
## arima(x = dlrpc, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## 0.0268 0.3660 0.0082
## s.e. 0.0567 0.0586 0.0006
##
## sigma^2 estimated as 5.889e-05: log likelihood = 948.88, aic = -1889.77
ma2<-arima(dlrpc, order=c(0,0,2))
BIC(ma2)
## [1] -1875.302
ma2$coef/sqrt(diag(ma2$var.coef))
## ma1 ma2 intercept
## 0.4727428 6.2471529 12.6517469
(1-pnorm(abs(ma2$coef)/sqrt(diag(ma2$var.coef))))*2
## ma1 ma2 intercept
## 6.363967e-01 4.180019e-10 0.000000e+00
tsdiag(ma2, gof.lag=24)
arima(dlrpc, order=c(0,0,2), fixed=c(0,NA,NA))
##
## Call:
## arima(x = dlrpc, order = c(0, 0, 2), fixed = c(0, NA, NA))
##
## Coefficients:
## ma1 ma2 intercept
## 0 0.3703 0.0082
## s.e. 0 0.0579 0.0006
##
## sigma^2 estimated as 5.893e-05: log likelihood = 948.77, aic = -1891.55
ma2R<-arima(dlrpc, order=c(0,0,2),fixed=c(0,NA,NA))
BIC(ma2R)
## [1] -1880.696
ma2R$coef/sqrt(diag(ma2R$var.coef))
## Warning in ma2R$coef/sqrt(diag(ma2R$var.coef)): longer object length is not
## a multiple of shorter object length
## ma1 ma2 intercept
## 0.000000 582.248376 0.141182
(1-pnorm(abs(ma2R$coef)/sqrt(diag(ma2R$var.coef))))*2
## Warning in abs(ma2R$coef)/sqrt(diag(ma2R$var.coef)): longer object length
## is not a multiple of shorter object length
## ma1 ma2 intercept
## 1.0000000 0.0000000 0.8877262
tsdiag(ma2R, gof.lag=24)
From the above tables,we can see that MA(2) shows only significant coefficient with 1% level of significance. Furthermore,MA(2) suggests a lower AIC (-1889.77) and BIC (-1875.302) compared to MA(1). We have also considered Restricted MA(2) which suggests a lower AIC (-1891.55) and similar BIC (-1880.696). In case of MA(2) and Restricted MA(2), ACF residual is closer to zero. Moreover, p-value for Ljung -Box stastistic is higher than 0.6. This indicats Restricted MA(2)[based on AIC] and MA(2) [based on BIC] are the better models among all the MA(p) models.
4.1.6 Comments on the AR(p) models
AR(1): We can see that the coefficient of AR(1) is not significant (p-value= 0.13724) with 1% or 5% significance level. Here, AIC= -1858.642 and BIC= -1847.791
AR(2): Results from the above table suggest that AR(2) coefficient is significant with 1% level of significance (p-value= 0.00). Here, AIC= -1886.16 , BIC = -1871.691, which suggests that AR(2) is a better fit than AR(1)
AR(3): Coefficient of AR(3) is not significant (p-value= 0.366) with 1% or 5% significaance level. AIC= -1884.23, BIC = 1866.149 which suggests, it is not a better model than AR(2)
AR(4): Coefficient of AR(4) is significant with 1% significance level (p-value = 0.0154). AIC = -1888.03 , BIC = -1866.332 which suggests, it is a better model than AR(2) and AR(3).
Restricted AR(4): AIC= -1890.55, BIC= -1876.086. Based on AIC and BIC, Restricted AR(4) is the best fit among all the previous models.