Q3

We consider the monthly time series for the Consumer Price Index for all Urban Consumers: All Items Less Food and Energy using the FRED Database.

First we load the “Quandl” packages as well as others.

library("Quandl")
library("tseries")
library("urca")
library("stargazer")
library("TTR")

The symbol is FRED/CPILFENS. First lets find the lagging 12 month CPI by using the simple moving average. Then lets first plot the raw series, log series, first differences, and second differences.

cpil<-Quandl("FRED/CPILFENS", type="zoo")
smacpil<-SMA(cpil,12)
acpil<-window(smacpil,start="Dec 1957")
lcpil<-log(acpil)
dlcpil<-diff(lcpil)
ddlcpil<-diff(diff(lcpil))
par(mfrow=c(2,2))
plot(acpil,xlab="", ylab="",main=expression(SMAy))
plot(lcpil,xlab="", ylab="",main=expression(log(SMAy)))
plot(dlcpil,xlab="", ylab="",main=expression(paste(Delta, "log(SMAy)")))
plot(ddlcpil,xlab="", ylab="",main=expression(paste(Delta, Delta, "log(SMAy)")))

Looking at the plotted series, it looks as though the second differences is stationary. However, let us conduct the ADF and KPSS test on the raw simple moving average data.

adf.cpil<-ur.df(acpil,type="trend",selectlags="BIC")
kpss.cpil<-kpss.test(acpil,null="Trend")
## Warning in kpss.test(acpil, null = "Trend"): p-value smaller than printed
## p-value
summary(adf.cpil)

Augmented Dickey-Fuller Test Unit Root Test

Test regression trend

Call: lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)

Residuals: Min 1Q Median 3Q Max -0.081169 -0.008533 0.000466 0.009322 0.119424

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.580e-03 1.532e-03 1.684 0.092647 .
z.lag.1 -2.555e-04 7.039e-05 -3.630 0.000304 tt 9.511e-05 2.605e-05 3.651 0.000281 z.diff.lag 9.833e-01 5.253e-03 187.190 < 2e-16 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1

Residual standard error: 0.01774 on 690 degrees of freedom Multiple R-squared: 0.9891, Adjusted R-squared: 0.9891 F-statistic: 2.09e+04 on 3 and 690 DF, p-value: < 2.2e-16

Value of test-statistic is: -3.63 5.5412 6.6752

Critical values for test statistics: 1pct 5pct 10pct tau3 -3.96 -3.41 -3.12 phi2 6.09 4.68 4.03 phi3 8.27 6.25 5.34

kpss.cpil
KPSS Test for Trend Stationarity

data: acpil KPSS Trend = 1.5225, Truncation lag parameter = 6, p-value = 0.01

Looking at the ADF test, we can reject that \(\gamma\) is zero, we can reject that \(\beta\) is zero, and we can reject that \(\rho\) is zero. We can conclude that the 12 month SMA CPI data is trend stationary. KPSS also confirms that the time series is trend stationary.

Box Jenkins Methodology

We truncate the data to the end of 2014. Since the data is trend stationary, we look at the second differences for stationarity. We know this from the beginning of this excercise, but we can double check since we truncated our data to the end of 2014.

acpil2<-window(acpil, end="Dec 2014")
lcpil2<-log(acpil2)
dlcpil2<-diff(lcpil2)
ddlcpil2<-diff(diff(lcpil2))
par(mfrow=c(2,2))
plot(acpil2,xlab="", ylab="",main=expression(SMAy[2014]))
plot(lcpil2,xlab="", ylab="",main=expression(log(SMAy[2014])))
plot(dlcpil2,xlab="", ylab="",main=expression(paste(Delta, "log(SMAy[2014])")))
plot(ddlcpil2,xlab="", ylab="",main=expression(paste(Delta, Delta, "log(SMAy[2014])")))

So we confirm that we use the second differenced data in the lower right hand chart.

ACF & PACF Suggestions for Estimated Model

Lets look at the ACF and PACF of our second differenced data.

par(mfrow=c(1,2))
acf(ddlcpil2,type='correlation',na.action=na.pass,lag=96)
acf(ddlcpil2,type='partial',na.action=na.pass,lag=96)

Looking at the ACF and PACF, it seems that our data is AR(2). Lets look at the Ljung-Box Q Statistic to confirm.

Ljung-Box Q Statistic

ARMA20 <- arima(ddlcpil2, order=c(2,0,0))
tsdiag(ARMA20,gof.lag=12)

After trying out various ARIMA combinations using the Ljung-Box Q Statistic, I settled on an AR(2) process for the estimator.

Now I test our ARMA(2,0) model for significance.

##          ar1          ar2    intercept 
## 3.224088e-13 1.736295e-06 9.576907e-01

The ARMA(2,0) does indeed show significance. Next we do the BIC check.

BIC Check

BIC(ARMA20)
## [1] -9757.388
AR22<-arima(ddlcpil2, order=c(2,0,2))
BIC(AR22)
## [1] -9744.783

We see that the AR(2) model is better than an ARMA(2,2) because the former’s BIC value is lower.

Forecasts

Now we will construct forecasts using the AR(2) model and estimate how close we get to the actual housing starts values from Jan 2015-Nov 2015. Then I will plot that data as well as a forecast till the end of 2016.

library("forecast")
ARMA20fore<-forecast.Arima(ARMA20,h=24)
plot(ARMA20fore, xlim=c(2005,2016),ylim=c(-0.0005,.0004))
lines(ARMA20fore$mean, type="p", pch=16, col="blue")
lines(ddlcpil, type="o", pch=16)

The blue dotted line is our forecast, and the black dotted line are the actual values. The AR(2) forecast is relatively close to the actual values and seems to be a reasonable forecast.