consumption <- read.csv("http://research.stlouisfed.org/fred2/data/PCECC96.csv")
str(consumption)
## 'data.frame': 276 obs. of 2 variables:
## $ DATE : Factor w/ 276 levels "1947-01-01","1947-04-01",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ VALUE: num 1199 1219 1223 1224 1230 ...
summary(consumption)
## DATE VALUE
## 1947-01-01: 1 Min. : 1199
## 1947-04-01: 1 1st Qu.: 2212
## 1947-07-01: 1 Median : 4050
## 1947-10-01: 1 Mean : 5004
## 1948-01-01: 1 3rd Qu.: 7465
## 1948-04-01: 1 Max. :11322
## (Other) :270
head(consumption)
## DATE VALUE
## 1 1947-01-01 1199.4
## 2 1947-04-01 1219.3
## 3 1947-07-01 1223.3
## 4 1947-10-01 1223.6
## 5 1948-01-01 1229.8
## 6 1948-04-01 1244.1
tail(consumption)
## DATE VALUE
## 271 2014-07-01 10918.6
## 272 2014-10-01 11033.3
## 273 2015-01-01 11081.2
## 274 2015-04-01 11178.9
## 275 2015-07-01 11262.4
## 276 2015-10-01 11322.5
plot(consumption, xlab="Years", ylab="Consumption Expenditures", main="Trend of Consumption Expenditures from 1947 to 2015")
Construction of time series is done in such a way that \[yt = ??? log ct = log ct ??? log ct???1\] where ct is the original quarterly Real Personal Consumption Expenditures. This will help in making time series stationary.
dlconsumption <- diff(log(consumption[,2]))
plot(consumption[2:276,1],dlconsumption, xlab="Years", ylab="", main="Logarithmic change in consumption expenditure from 1947 to 2015")
Construction of Partial Autocorrelation Function (PACF) and Auto-correlation Function(ACF) helps us to understand the relation between data over period of time. Hence, those functions are designed.
acf(dlconsumption, type="correlation", lag=275, xlab="Lag", ylab="correlations",main="ACF")
acf(dlconsumption, type="partial", lag=275, xlab="Lag", ylab="correlations",main="PACF")
In AR(p) model, PACF cuts off after p lag while in MA(q) model, ACF cuts off after q lag. In our case, I didn’t clear condition in ACF and PACF. However, I saw change in trend after lag of three.
ESTIMATION AND CHECKING FOR ADEQUACY OF MODELS
Designing AR(p) model
Lets look at AR(1)- AR(5) models. Value of AIC is considered while examining the adequacy of the model.
ar1 <- arima(dlconsumption, order=c(1,0,0))
ar1
##
## Call:
## arima(x = dlconsumption, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.0893 0.0082
## s.e. 0.0601 0.0005
##
## sigma^2 estimated as 6.649e-05: log likelihood = 932.32, aic = -1858.64
tsdiag(ar1, gof.lag=10)
ar2 <- arima(dlconsumption, order=c(2,0,0))
ar2
##
## Call:
## arima(x = dlconsumption, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 0.0599 0.3188 0.0082
## s.e. 0.0571 0.0570 0.0007
##
## sigma^2 estimated as 5.968e-05: log likelihood = 947.08, aic = -1886.16
tsdiag(ar2, gof.lag=10)
ar3 <- arima(dlconsumption, order=c(3,0,0))
ar3
##
## Call:
## arima(x = dlconsumption, order = c(3, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 intercept
## 0.0545 0.3178 0.0165 0.0082
## s.e. 0.0604 0.0571 0.0603 0.0008
##
## sigma^2 estimated as 5.966e-05: log likelihood = 947.12, aic = -1884.23
tsdiag(ar3, gof.lag=10)
ar4 <- arima(dlconsumption, order=c(4,0,0))
ar4
##
## Call:
## arima(x = dlconsumption, order = c(4, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 ar4 intercept
## 0.0570 0.3647 0.0244 -0.1444 0.0082
## s.e. 0.0597 0.0598 0.0598 0.0596 0.0007
##
## sigma^2 estimated as 5.84e-05: log likelihood = 950.02, aic = -1888.03
tsdiag(ar4, gof.lag=10)
ar5 <- arima(dlconsumption, order=c(5,0,0))
ar5
##
## Call:
## arima(x = dlconsumption, order = c(5, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 intercept
## 0.0570 0.3647 0.0245 -0.1444 -0.0003 0.0082
## s.e. 0.0604 0.0598 0.0639 0.0597 0.0603 0.0007
##
## sigma^2 estimated as 5.84e-05: log likelihood = 950.02, aic = -1886.03
tsdiag(ar5, gof.lag=10)
As we have relatively small sample size (i.e. 276), calculation of BIC will provide another approach for estimation of model adequacy. BICs are calculated in each model.
BIC(ar1)
## [1] -1847.791
BIC(ar2)
## [1] -1871.691
BIC(ar3)
## [1] -1866.149
BIC(ar4)
## [1] -1866.332
BIC(ar5)
## [1] -1860.715
Among all 5 AR models, AR(4) has lowest AIC and that is -1888.03.However, AR2 has lowest BIC of -1871.691. Since in smaller sample, BIC gives more accuracy, AR2 is good model. For adequacy, I have analyzed the standardized residual plot, ACF of residuals and p-values for Ljung Box. Ar1 has lower p-value for Ljung Box statistic and ACF residual is equal to zero after lag 2. In case of AR 2 and Ar3, ACF of residual are zero and p values for Ljung Box are higher than 0.6. In case of AR4 and AR5, ACF of residual are zero and P value of ljung statistic is equal to 1. This makes AR(2)adequate model.
Constructing MA(q) Models
ma1 <- arima(dlconsumption, order=c(0,0,1))
ma1
##
## Call:
## arima(x = dlconsumption, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.0546 0.0082
## s.e. 0.0472 0.0005
##
## sigma^2 estimated as 6.67e-05: log likelihood = 931.89, aic = -1857.78
tsdiag(ma1, gof.lag=10)
ma2 <- arima(dlconsumption, order=c(0,0,2))
ma2
##
## Call:
## arima(x = dlconsumption, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## 0.0268 0.3660 0.0082
## s.e. 0.0567 0.0586 0.0006
##
## sigma^2 estimated as 5.889e-05: log likelihood = 948.88, aic = -1889.77
tsdiag(ma2, gof.lag=10)
ma3 <- arima(dlconsumption, order=c(0,0,3))
ma3
##
## Call:
## arima(x = dlconsumption, order = c(0, 0, 3))
##
## Coefficients:
## ma1 ma2 ma3 intercept
## 0.0543 0.3687 0.0695 0.0082
## s.e. 0.0604 0.0580 0.0578 0.0007
##
## sigma^2 estimated as 5.858e-05: log likelihood = 949.6, aic = -1889.21
tsdiag(ma3, gof.lag=10)
ma4 <- arima(dlconsumption, order=c(0,0,4))
ma4
##
## Call:
## arima(x = dlconsumption, order = c(0, 0, 4))
##
## Coefficients:
## ma1 ma2 ma3 ma4 intercept
## 0.0541 0.3673 0.0696 -0.0057 0.0082
## s.e. 0.0604 0.0609 0.0579 0.0748 0.0007
##
## sigma^2 estimated as 5.857e-05: log likelihood = 949.61, aic = -1887.21
tsdiag(ma4, gof.lag=10)
ma5 <- arima(dlconsumption, order=c(0,0,5))
ma5
##
## Call:
## arima(x = dlconsumption, order = c(0, 0, 5))
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 intercept
## 0.0544 0.3667 0.0786 -0.0073 0.0136 0.0082
## s.e. 0.0605 0.0610 0.0748 0.0755 0.0715 0.0007
##
## sigma^2 estimated as 5.857e-05: log likelihood = 949.63, aic = -1885.25
tsdiag(ma5, gof.lag=10)
BIC(ma1)
## [1] -1846.933
BIC(ma2)
## [1] -1875.302
BIC(ma3)
## [1] -1871.125
BIC(ma4)
## [1] -1865.514
BIC(ma5)
## [1] -1859.933
MA (2) has lowest AIC among all 5 and that is -1889.77. Similarly, MA2 has lowes BIC of -1875.302. For accuracy, I have checked the standard residuals, ACF or residuals and p values of Ljung Box statistics. In MA1, after lag 2, there is low value p value for Ljung box and ACF residual is closer to zero. In all other case, p-values of Ljung box are greater than 0.6 and residuals are zero. This makes MA2 best.
Hence, AR(2) and MA(2) are best among all the ten models studied.