Build an AR and MA model for Log Changes in Real Personal COnsumption Expenditures and check for the adequacy.
Before we get to the details of the problem, it is better for us to see the data plot first and familiarize ourself with the data. There is no dramatic hike or decreasing in the data and although it is difficult to observe, it seems there is no obvious seasonal pattern caught in the plot. Shorter period plot might be helpful in observing seasonal pattern, however, in this univariate time series model we won’t worry about the seasonality pattern yet.
plot(Ct, xlab="", ylab="Billions of Chained 2009 Dollars", main="Real Personal Consumption Expenditures")
plot(Yt, xlab="", ylab="", main="Log-change in Personal Consumption Expenditures")
Consider, Ct as quarterly Real Personal Consumption Expenditures, we want to make an adequate AR and MA model from the log changes in Real Personal Consumption Expenditure. And then, construct the time series data with log changes in Real Personal Consumption Expenditure \[Yt = Log C_{t} - Log C_{t-1}\]
The first thing to do is to decide what order of an AR and MA model that we are going to build, by referring to ACF and PACF plot. For AR model we will look more closely to the plot of Partial Autocorrelation Function (PACF) while for MA we will look more closely to the plot of Autocorrelation Function (ACF)
acf(as.data.frame(Yt),type='correlation',lag=24, main="Autocorrelation Function of Yt")
From ACF plot, there is a dampening Sine Wave that died down quickly after lag 4, however, this does not help much in deciding order of an AR model, thus we will need a PACF plot to help us deciding the best order of the AR model.
acf(as.data.frame(Yt),type='partial', lag=24, main="Partial Autocorrelation Function of Yt")
From the partial auto correlation funtion, we can see that the more suitable order for AR model is probably an AR(2) or an AR(4), since coefficient of correlation is outside its confidence interval for lag 2 and 4. To decide which level of MA model we are going to use, we can refer to ACF plot and it seems that MA (2) is the most probable model, after each estimation, we have to see the residuals plot and decide whether the models are adequate or not.
For a simple example below, when we see an AR(1) residual checking, the p values plot of Ljung-Box statistics showing that in the most of the lag we reject Hypothesis Null, showing serial correlation in residual thus we have to try another order.
##
## Call:
## arima(x = Yt, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.0893 0.0082
## s.e. 0.0601 0.0005
##
## sigma^2 estimated as 6.649e-05: log likelihood = 932.32, aic = -1858.64
Now we will try to estimate the AR(2) model and still see that half of the lags still problems with its P values, thus we have to try another order, since lag 4 have a huge negative coefficient, we can then try order 4
m2<-arima(Yt, order=c(2,0,0))
m2
##
## Call:
## arima(x = Yt, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 0.0599 0.3188 0.0082
## s.e. 0.0571 0.0570 0.0007
##
## sigma^2 estimated as 5.968e-05: log likelihood = 947.08, aic = -1886.16
tsdiag(m2,gof.lag=24)
When we estimate AR(4), we can see that p values for Ljung-Box Statistic showing better result, most of the p values are significant
m3<-arima(Yt,order=c(4,0,0))
m3
##
## Call:
## arima(x = Yt, order = c(4, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 ar4 intercept
## 0.0570 0.3647 0.0244 -0.1444 0.0082
## s.e. 0.0597 0.0598 0.0598 0.0596 0.0007
##
## sigma^2 estimated as 5.84e-05: log likelihood = 950.02, aic = -1888.03
tsdiag(m3,gof.lag=24)
By using simple adequacy check, through evaluating their Akaike Information Criterion (AIC) and their Schwart - Bayesian Criterion (BIC), by choosing the smallest criterion we can see that AR(4) is the most suitable model we can use
m$order
## [1] 4
m$aic
## 0 1 2 3 4 5
## 29.5898061 29.3905747 1.8742608 3.8053769 0.0000000 1.9999836
## 6 7 8 9 10 11
## 2.8643601 1.8839187 1.6397094 1.8639941 0.5695805 2.5608319
## 12
## 1.1372523
For BIC we can get
BIC(m2)
## [1] -1871.691
BIC(m3)
## [1] -1866.332
Although BIC prefers AR 2 than AR 4, AIC showing us that AR 4 is preferable than AR 2, again because BIC put more penalty compared to AIC calculation.
Referring back to ACF, we can try either MA (1) (for comparation) or MA(2), after we get each estimation, we can test for model adequacy again and from there we can decide which model suit best with the data. Estimating MA (1), the result is:
m4<-arima(x=Yt, order=c(0,0,1))
m4
##
## Call:
## arima(x = Yt, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.0546 0.0082
## s.e. 0.0472 0.0005
##
## sigma^2 estimated as 6.67e-05: log likelihood = 931.89, aic = -1857.78
tsdiag(m4,gof.lag=24)
Looking at its residual analysis we can see that this model have problem with its p value of Ljung-Box statistics. We need to try another order, referring to ACF, the next probable order is to use MA(2). Estimating MA(2) we get:
m5<-arima(x=Yt, order=c(0,0,2))
m5
##
## Call:
## arima(x = Yt, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## 0.0268 0.3660 0.0082
## s.e. 0.0567 0.0586 0.0006
##
## sigma^2 estimated as 5.889e-05: log likelihood = 948.88, aic = -1889.77
tsdiag(m5,gof.lag=24)
From the residual diagnostic plot, we can see that MA(3) showing better Ljung-Box p values plot, thus we will go with the MA(3). Other than Ljung Box, we can also see that MA(2) has smaller AIC.
BIC(m4)
## [1] -1846.933
BIC(m5)
## [1] -1875.302
Again, resulting BIC from each model will show that MA(1) is preferred th6an MA(3).
From the formal Ljung-Box test we can see that based on its asymptotic \(x^2\) at at degree of freedom of m-g where m = LnT and T is equal to the number of the observation which is 276 resulting in m = 5 and degree of freedom is 1, we can see that Model AR(4) is suitable for the time series data of change of log Real Personal Consumption since the hypothesis non cannot be rejected, there is no serial correlation in the residual.
## List of 5
## $ statistic: Named num 0.175
## ..- attr(*, "names")= chr "X-squared"
## $ parameter: Named num 5
## ..- attr(*, "names")= chr "df"
## $ p.value : num 0.999
## $ method : chr "Box-Ljung test"
## $ data.name: chr "m3$residuals"
## - attr(*, "class")= chr "htest"
##
## Box-Ljung test
##
## data: m3$residuals
## X-squared = 0.17481, df = 5, p-value = 0.9994
## X-squared
## 0.6758749
From the formal Ljung-Box test we can see that based on its asymptotic \(x^2\) at at degree of freedom of m-g where m = LnT and T is equal to the number of the observation which is 276 resulting in m = 5 and degree of freedom is 3, we can see that Model MA(2) is suitable for the time series data of change of log Real Personal Consumption since the hypothesis non cannot be rejected, there is no serial correlation in the residual.
## List of 5
## $ statistic: Named num 0.175
## ..- attr(*, "names")= chr "X-squared"
## $ parameter: Named num 5
## ..- attr(*, "names")= chr "df"
## $ p.value : num 0.999
## $ method : chr "Box-Ljung test"
## $ data.name: chr "m3$residuals"
## - attr(*, "class")= chr "htest"
##
## Box-Ljung test
##
## data: m3$residuals
## X-squared = 0.17481, df = 5, p-value = 0.9994
## X-squared
## 0.9815501
From data point of view of Real Personal Consumption Expenditures, AR (4) and MA (2) are adequate.