Direction

Consider monthly time series or consumer price index, all item less food and energy. Use it to construct 12 month inflation rate. Test for non stationarity and build the model based on Box-Jenkins Methodology until the end of 2014 then forecast until 2016.

Inflation Data

Construct 12 month inflation rate using CPI dataset:

\(Inf_{t}= \frac{CPI_{t} - CPI_{t-12}}{CPI_{t-12}}*100\)

cpit <- Quandl("FRED/CPILFENS", type="ts")
inftall <- (diff(cpit,12)/lag(cpit,-12))*100
plot(inftall, main="12 month Inflation Rate", ylab="% Changes")

For the forecast purpose we have to split the data into two part (up to 2014 and after 2014)

inf <- inftall
inf1 <- window(inf, end=c(2014,12))
inf2 <- window(inf, start=c(2015,1))

# first part used to identify and estimate the model
inft <- inf1

Stationary Test

There will be no transformation needed in the data since we are dealing with Inflation rate, so we can doing the ADF test straightforward

inf.urdf<-ur.df(inft, type="trend", selectlags="BIC")
summary(inf.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26363 -0.11118 -0.01523  0.12409  1.82953 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0424792  0.0259896   1.634   0.1026    
## z.lag.1     -0.0072426  0.0036857  -1.965   0.0498 *  
## tt          -0.0000481  0.0000486  -0.990   0.3226    
## z.diff.lag   0.3506552  0.0359593   9.751   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2419 on 678 degrees of freedom
## Multiple R-squared:  0.1262, Adjusted R-squared:  0.1223 
## F-statistic: 32.64 on 3 and 678 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -1.965 1.3851 2.0639 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34

We can see that \(\tau_3\) is bigger than the critical value, thus, we cannot reject hypothesis null. Looking at \(\phi_3\) we can also see that hypothesis null cannot be rejected at any level of confidence interval. Estimating model B, the result is:

inf.urdf<-ur.df(inft, type="drift", selectlags="BIC")
summary(inf.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26920 -0.11666 -0.01028  0.12278  1.82743 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.022548   0.016431   1.372   0.1704    
## z.lag.1     -0.006333   0.003569  -1.774   0.0765 .  
## z.diff.lag   0.351152   0.035955   9.766   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2419 on 679 degrees of freedom
## Multiple R-squared:  0.1249, Adjusted R-squared:  0.1224 
## F-statistic: 48.47 on 2 and 679 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -1.7743 1.5878 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78

Value of \(\tau_2\) is bigger than the critical value so we cannot reject hypothesis null. If we see at \(\phi_1\) we also get that hypothesis null cannot be rejected at any level of confidence interval. Thus, we have to estimate model A

inf.urdf<-ur.df(inft, type="none", selectlags="BIC")
summary(inf.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.30086 -0.10505  0.00215  0.12933  1.81181 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.002287   0.002013  -1.136    0.256    
## z.diff.lag  0.348722   0.035935   9.704   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.242 on 680 degrees of freedom
## Multiple R-squared:  0.1226, Adjusted R-squared:   0.12 
## F-statistic:  47.5 on 2 and 680 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -1.1361 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

From estimation result of model A, we see that \(\tau_1\) t-statistics is bigger than the critical value, concluding that we are dealing with random walk data set. This result confirmed with KPSS test.

kpss.test(inft, null="Trend")
## Warning in kpss.test(inft, null = "Trend"): p-value smaller than printed p-
## value
## 
##  KPSS Test for Trend Stationarity
## 
## data:  inft
## KPSS Trend = 1.3385, Truncation lag parameter = 6, p-value = 0.01

So, the next thing that we need to do is to difference the data and re-checking the ADF test

dinft<-diff(inft)
dinf.urdf<-ur.df(dinft, type="trend", selectlags="BIC")
summary(dinf.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.36961 -0.10403 -0.00324  0.11885  1.77956 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.868e-03  1.833e-02   0.429    0.668    
## z.lag.1     -5.358e-01  4.313e-02 -12.422  < 2e-16 ***
## tt          -2.526e-05  4.649e-05  -0.543    0.587    
## z.diff.lag  -1.794e-01  3.774e-02  -4.752 2.46e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2384 on 677 degrees of freedom
## Multiple R-squared:  0.3489, Adjusted R-squared:  0.3461 
## F-statistic: 120.9 on 3 and 677 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -12.4218 51.4406 77.1603 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34

We can see that now \(\tau_3\) t-statistic is smaller than our criterical values. So we have a trend stationary data. This is also confirmed by KPSS test.

kpss.test(dinft, null="Trend")
## Warning in kpss.test(dinft, null = "Trend"): p-value greater than printed
## p-value
## 
##  KPSS Test for Trend Stationarity
## 
## data:  dinft
## KPSS Trend = 0.045618, Truncation lag parameter = 6, p-value = 0.1

Forecasting

The first thing to do is to see the plot of ACF and PACF of the differenced inflation data

acf(as.data.frame(dinft), lag=24, main=expression(Delta("Inflation")))

pacf(as.data.frame(dinft), lag=24, main=expression(Delta("Inflation")))

Judging from the PACF and ACF it seems that we are dealing with an ARMA model, it could be ARIMA(1,0,1)(2,0,1)[12] or ARIMA(1,0,2)(2,0,1)[12] or ARIMA(1,0,3)(2,0,1)[12]

m1 <- arima(dinft,order=c(1,0,1),seasonal=list(order=c(2,0,1),period=12))
m1
## 
## Call:
## arima(x = dinft, order = c(1, 0, 1), seasonal = list(order = c(2, 0, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1    sar1    sar2     sma1  intercept
##       0.9412  -0.6809  0.0222  -0.002  -0.8229    -0.0015
## s.e.  0.0218   0.0514  0.0523   0.048   0.0358     0.0076
## 
## sigma^2 estimated as 0.03576:  log likelihood = 161.95,  aic = -309.9
tsdiag(m1)

Estimating ARIMA(1,0,1)(2,0,1)[12], we can see that p values for the Ljung Box Statistics does not show a good plot, indicating that we should try another model or adding lag

m2 <- arima(dinft,order=c(1,0,2),seasonal=list(order=c(2,0,1),period=12))
m2
## 
## Call:
## arima(x = dinft, order = c(1, 0, 2), seasonal = list(order = c(2, 0, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1      ma2    sar1     sar2     sma1  intercept
##       0.9604  -0.6524  -0.0919  0.0249  -0.0046  -0.8347    -0.0016
## s.e.  0.0161   0.0401   0.0372  0.0510   0.0474   0.0345     0.0084
## 
## sigma^2 estimated as 0.03542:  log likelihood = 164.92,  aic = -313.84
tsdiag(m2)

Estimating ARIMA(1,0,1)(2,0,1)[12] we still see problem with its p-values and the AIC is getting bigger

m3 <- arima(dinft,order=c(1,0,3),seasonal=list(order=c(2,0,1),period=12))
m3
## 
## Call:
## arima(x = dinft, order = c(1, 0, 3), seasonal = list(order = c(2, 0, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1      ma2      ma3    sar1    sar2     sma1
##       0.9674  -0.6514  -0.0105  -0.1134  0.0354  0.0062  -0.8408
## s.e.  0.0140   0.0402   0.0492   0.0418  0.0512  0.0476   0.0342
##       intercept
##         -0.0017
## s.e.     0.0087
## 
## sigma^2 estimated as 0.03505:  log likelihood = 168.56,  aic = -319.13
tsdiag(m3)

In ARIMA(1,0,3)(2,0,1)[12], we see better p-values however the AIC is the biggest among the model, to confirm this, testing the Ljung Box Statistics on ARIMA(1,0,3)(2,0,1)[12]. Our observation is 694, means that our m = log(694), or around to 6.

m3.LB.lag6 <- Box.test(m3$residuals, lag=6, type="Ljung")
m3.LB.lag6
## 
##  Box-Ljung test
## 
## data:  m3$residuals
## X-squared = 3.5643, df = 6, p-value = 0.7354
1-pchisq(m3.LB.lag6$statistic,2)
## X-squared 
## 0.1682784

We can see that we cannot reject hypothesis null, thus, ARIMA(1,0,3)(2,0,1)[12] is appropriate, we can use this model for the forecast

m3.fcast <- forecast(m3, h=13)
plot(m3.fcast, xlim=c(2012,2016))
lines(diff(inftall))

We can see that the forecast data match with the real data closely.