consumption <- read.csv("http://research.stlouisfed.org/fred2/data/PCECC96.csv")
str(consumption)
## 'data.frame':    276 obs. of  2 variables:
##  $ DATE : Factor w/ 276 levels "1947-01-01","1947-04-01",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ VALUE: num  1199 1219 1223 1224 1230 ...
summary(consumption)
##          DATE         VALUE      
##  1947-01-01:  1   Min.   : 1199  
##  1947-04-01:  1   1st Qu.: 2212  
##  1947-07-01:  1   Median : 4050  
##  1947-10-01:  1   Mean   : 5004  
##  1948-01-01:  1   3rd Qu.: 7465  
##  1948-04-01:  1   Max.   :11322  
##  (Other)   :270
head(consumption)
##         DATE  VALUE
## 1 1947-01-01 1199.4
## 2 1947-04-01 1219.3
## 3 1947-07-01 1223.3
## 4 1947-10-01 1223.6
## 5 1948-01-01 1229.8
## 6 1948-04-01 1244.1
tail(consumption)
##           DATE   VALUE
## 271 2014-07-01 10918.6
## 272 2014-10-01 11033.3
## 273 2015-01-01 11081.2
## 274 2015-04-01 11178.9
## 275 2015-07-01 11262.4
## 276 2015-10-01 11322.5
plot(consumption, xlab="Years", ylab="Consumption Expenditures", main="Trend of Consumption Expenditures from 1947 to 2015")

Construction of time series is done in such a way that \[yt = ??? log ct = log ct ??? log ct???1\] where ct is the original quarterly Real Personal Consumption Expenditures. This will help in making time series stationary.

dlconsumption <- diff(log(consumption[,2]))
plot(consumption[2:276,1],dlconsumption, xlab="Years", ylab="", main="Logarithmic change in consumption expenditure from 1947 to 2015")

Construction of Partial Autocorrelation Function (PACF) and Auto-correlation Function(ACF) helps us to understand the relation between data over period of time. Hence, those functions are designed.

acf(dlconsumption, type="correlation", lag=275, xlab="Lag", ylab="correlations",main="ACF")

acf(dlconsumption, type="partial", lag=275, xlab="Lag", ylab="correlations",main="PACF")

In AR(p) model, PACF cuts off after p lag while in MA(q) model, ACF cuts off after q lag. In our case, I didn’t clear condition in ACF and PACF. However, I saw change in trend after lag of three.

ESTIMATION AND CHECKING FOR ADEQUACY OF MODELS

Designing AR(p) model

Lets look at AR(1)- AR(5) models. Value of AIC is considered while examining the adequacy of the model.

ar1 <- arima(dlconsumption, order=c(1,0,0))
ar1
## 
## Call:
## arima(x = dlconsumption, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.0893     0.0082
## s.e.  0.0601     0.0005
## 
## sigma^2 estimated as 6.649e-05:  log likelihood = 932.32,  aic = -1858.64
tsdiag(ar1, gof.lag=10)

ar2 <- arima(dlconsumption, order=c(2,0,0))
ar2
## 
## Call:
## arima(x = dlconsumption, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.0599  0.3188     0.0082
## s.e.  0.0571  0.0570     0.0007
## 
## sigma^2 estimated as 5.968e-05:  log likelihood = 947.08,  aic = -1886.16
tsdiag(ar2, gof.lag=10)

ar3 <- arima(dlconsumption, order=c(3,0,0))
ar3
## 
## Call:
## arima(x = dlconsumption, order = c(3, 0, 0))
## 
## Coefficients:
##          ar1     ar2     ar3  intercept
##       0.0545  0.3178  0.0165     0.0082
## s.e.  0.0604  0.0571  0.0603     0.0008
## 
## sigma^2 estimated as 5.966e-05:  log likelihood = 947.12,  aic = -1884.23
tsdiag(ar3, gof.lag=10)

ar4 <- arima(dlconsumption, order=c(4,0,0))
ar4
## 
## Call:
## arima(x = dlconsumption, order = c(4, 0, 0))
## 
## Coefficients:
##          ar1     ar2     ar3      ar4  intercept
##       0.0570  0.3647  0.0244  -0.1444     0.0082
## s.e.  0.0597  0.0598  0.0598   0.0596     0.0007
## 
## sigma^2 estimated as 5.84e-05:  log likelihood = 950.02,  aic = -1888.03
tsdiag(ar4, gof.lag=10)

ar5 <- arima(dlconsumption, order=c(5,0,0))
ar5
## 
## Call:
## arima(x = dlconsumption, order = c(5, 0, 0))
## 
## Coefficients:
##          ar1     ar2     ar3      ar4      ar5  intercept
##       0.0570  0.3647  0.0245  -0.1444  -0.0003     0.0082
## s.e.  0.0604  0.0598  0.0639   0.0597   0.0603     0.0007
## 
## sigma^2 estimated as 5.84e-05:  log likelihood = 950.02,  aic = -1886.03
tsdiag(ar5, gof.lag=10)

As we have relatively small sample size (i.e. 276), calculation of BIC will provide another approach for estimation of model adequacy. BICs are calculated in each model.

BIC(ar1)
## [1] -1847.791
BIC(ar2)
## [1] -1871.691
BIC(ar3)
## [1] -1866.149
BIC(ar4)
## [1] -1866.332
BIC(ar5)
## [1] -1860.715

Among all 5 AR models, AR(4) has lowest AIC and that is -1888.03.However, AR2 has lowest BIC of -1871.691. Since in smaller sample, BIC gives more accuracy, AR2 is good model. For adequacy, I have analyzed the standardized residual plot, ACF of residuals and p-values for Ljung Box. Ar1 has lower p-value for Ljung Box statistic and ACF residual is equal to zero after lag 2. In case of AR 2 and Ar3, ACF of residual are zero and p values for Ljung Box are higher than 0.6. In case of AR4 and AR5, ACF of residual are zero and P value of ljung statistic is equal to 1. This makes AR(2)adequate model.

Constructing MA(q) Models

ma1 <- arima(dlconsumption, order=c(0,0,1))
ma1
## 
## Call:
## arima(x = dlconsumption, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1  intercept
##       0.0546     0.0082
## s.e.  0.0472     0.0005
## 
## sigma^2 estimated as 6.67e-05:  log likelihood = 931.89,  aic = -1857.78
tsdiag(ma1, gof.lag=10)

ma2 <- arima(dlconsumption, order=c(0,0,2))
ma2
## 
## Call:
## arima(x = dlconsumption, order = c(0, 0, 2))
## 
## Coefficients:
##          ma1     ma2  intercept
##       0.0268  0.3660     0.0082
## s.e.  0.0567  0.0586     0.0006
## 
## sigma^2 estimated as 5.889e-05:  log likelihood = 948.88,  aic = -1889.77
tsdiag(ma2, gof.lag=10)

ma3 <- arima(dlconsumption, order=c(0,0,3))
ma3
## 
## Call:
## arima(x = dlconsumption, order = c(0, 0, 3))
## 
## Coefficients:
##          ma1     ma2     ma3  intercept
##       0.0543  0.3687  0.0695     0.0082
## s.e.  0.0604  0.0580  0.0578     0.0007
## 
## sigma^2 estimated as 5.858e-05:  log likelihood = 949.6,  aic = -1889.21
tsdiag(ma3, gof.lag=10)

ma4 <- arima(dlconsumption, order=c(0,0,4))
ma4
## 
## Call:
## arima(x = dlconsumption, order = c(0, 0, 4))
## 
## Coefficients:
##          ma1     ma2     ma3      ma4  intercept
##       0.0541  0.3673  0.0696  -0.0057     0.0082
## s.e.  0.0604  0.0609  0.0579   0.0748     0.0007
## 
## sigma^2 estimated as 5.857e-05:  log likelihood = 949.61,  aic = -1887.21
tsdiag(ma4, gof.lag=10)

ma5 <- arima(dlconsumption, order=c(0,0,5))
ma5
## 
## Call:
## arima(x = dlconsumption, order = c(0, 0, 5))
## 
## Coefficients:
##          ma1     ma2     ma3      ma4     ma5  intercept
##       0.0544  0.3667  0.0786  -0.0073  0.0136     0.0082
## s.e.  0.0605  0.0610  0.0748   0.0755  0.0715     0.0007
## 
## sigma^2 estimated as 5.857e-05:  log likelihood = 949.63,  aic = -1885.25
tsdiag(ma5, gof.lag=10)

BIC(ma1)
## [1] -1846.933
BIC(ma2)
## [1] -1875.302
BIC(ma3)
## [1] -1871.125
BIC(ma4)
## [1] -1865.514
BIC(ma5)
## [1] -1859.933

MA (2) has lowest AIC among all 5 and that is -1889.77. Similarly, MA2 has lowes BIC of -1875.302. For accuracy, I have checked the standard residuals, ACF or residuals and p values of Ljung Box statistics. In MA1, after lag 2, there is low value p value for Ljung box and ACF residual is closer to zero. In all other case, p-values of Ljung box are greater than 0.6 and residuals are zero. This makes MA2 best.

Hence, AR(2) and MA(2) are best among all the ten models studied.