Analysis, Model Development, and Forecasting for Inflation Rates (YOY)-Constructed from Consumer Price Index for All Urban Consumers: All Items Less Food & Energy

# original inflation rate data
str(CPILFENS_all)

## 'zooreg' series from Jan 1958 to Nov 2015
##   Data: num [1:695] 3.16 3.16 2.79 2.43 2.43 ...
##   Index: Class 'yearmon'  num [1:695] 1958 1958 1958 1958 1958 ...
##   Frequency: 12

# subset inflation rate
str(CPILFENS)

## 'zooreg' series from Jan 1958 to Dec 2014
##   Data: num [1:684] 3.16 3.16 2.79 2.43 2.43 ...
##   Index: Class 'yearmon'  num [1:684] 1958 1958 1958 1958 1958 ...
##   Frequency: 12

The above graph shows inflation rates constructed from Consumer Price Index for All Urban Consumers: All Items Less Food & Energy data. We can see by viewing this data that there is no exponential growth and no obvious reason to Log Transform the data. There does not appear to be an noticeable trend in the data.

The ACF and PACF suggest nonstationarity and the presence of deterministic or stochastic trend.

I will look at the data in some detail below to verify there is no Trend Stationarity.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  CPILFENS
## Dickey-Fuller = -3.2109, Lag order = 8, p-value = 0.08605
## alternative hypothesis: stationary

The simple ADF test above seems to suggest we Fail to Reject the null of a difference stationary time series, due to the large p-value.

We will investigate this further with a KPSS test.

# KPSS Test - H0-Trend Stationary (tseries)
kpss.test(CPILFENS, null = "Trend")

## 
##  KPSS Test for Trend Stationarity
## 
## data:  CPILFENS
## KPSS Trend = 1.3385, Truncation lag parameter = 6, p-value = 0.01

# KPSS Test - H0-Level Stationary (tseries)
kpss.test(CPILFENS, null = "Level")

## 
##  KPSS Test for Level Stationarity
## 
## data:  CPILFENS
## KPSS Level = 2.0997, Truncation lag parameter = 6, p-value = 0.01

# KPSS Test - H0-Trend Stationary-mu (urca)
CPILFENS_mu.kpss <- ur.kpss(CPILFENS, type = "mu")
summary(CPILFENS_mu.kpss)

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 2.0997 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

# KPSS Test - H0-Trend Stationary-tau (urca)
CPILFENS_tau.kpss <- ur.kpss(CPILFENS, type = "tau")
summary(CPILFENS_tau.kpss)

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: tau with 6 lags. 
## 
## Value of test-statistic is: 1.3385 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216

In both of the KPSS tests above we Reject the Null that this time series is Trend Stationary which puts us back to difference stationary.

Before we start the ADF testing we need to determine the appropriate lag length to select based on this time series data set. However, care must be taken to ensure that an appropriate lag length is chosen. If the lag length is too short then the residuals will be serially correlated and the test will be biased. If the lag length is too large the power of the test will suffer. There is some literate precedence to suggest, based on boot strapping the lag length, that erring on the side of too large is better than erring on the side of too short.

To get a staring point for \(p_{max}\) I will use a procedure developed by G. William Schwert in his 1989 paper in The Journal of Business of Economic Statistics.
\[{l_{12}} = {\mathop{\rm int}} \left\{ {12{{\left( {{T \over {100}}} \right)}^{{1 \over 4}}}} \right\}\]
This differs some from the suggestions by Said and Dickey but since I am only using this as a guide it seems reasonable to use.

Once a starting point has been determined, I will use this \(p_{max}\) value as my lag parameter and then plot the residuals of the ADF test to determine the appropriateness of the lag length. If the residuals appear correlated a larger lag value will be chosen. This process will continue until all serial correlation has disappeared. This process is more of an art form than a science but I feel like it is worth the time to ensure a proper lag length is determined.

After executing the above mentioned process, I have determined that a lag length of \(p_{max}=36\).

We will follow the “Proposed Full Procedure for ADF Test” guidelines to determine what type of difference stationary model we are dealing with.

We will start by testing some hypnosis against Model C, below:

# Model C
t1 <- ur.df(CPILFENS, type="trend", lags=36, selectlags = "Fixed")
plot(t1)

summary(t1)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.27527 -0.10010  0.00496  0.10345  0.94657 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.144e-02  2.459e-02   2.092 0.036829 *  
## z.lag.1      -6.241e-03  3.452e-03  -1.808 0.071082 .  
## tt           -7.496e-05  4.256e-05  -1.761 0.078732 .  
## z.diff.lag1   3.142e-01  3.940e-02   7.973 7.72e-15 ***
## z.diff.lag2   1.971e-01  4.140e-02   4.760 2.42e-06 ***
## z.diff.lag3   2.357e-02  4.196e-02   0.562 0.574492    
## z.diff.lag4  -5.210e-02  4.180e-02  -1.246 0.213097    
## z.diff.lag5   6.115e-02  4.187e-02   1.460 0.144706    
## z.diff.lag6   9.755e-02  4.193e-02   2.327 0.020301 *  
## z.diff.lag7   2.877e-02  4.204e-02   0.684 0.494077    
## z.diff.lag8   4.051e-02  4.201e-02   0.964 0.335295    
## z.diff.lag9   2.410e-02  4.200e-02   0.574 0.566283    
## z.diff.lag10  4.275e-02  4.201e-02   1.018 0.309319    
## z.diff.lag11  4.423e-02  4.173e-02   1.060 0.289652    
## z.diff.lag12 -6.924e-01  4.125e-02 -16.786  < 2e-16 ***
## z.diff.lag13  2.341e-01  4.713e-02   4.968 8.81e-07 ***
## z.diff.lag14  1.337e-01  4.802e-02   2.784 0.005542 ** 
## z.diff.lag15  3.454e-02  4.751e-02   0.727 0.467586    
## z.diff.lag16 -1.229e-01  4.714e-02  -2.607 0.009352 ** 
## z.diff.lag17  6.276e-02  4.732e-02   1.326 0.185286    
## z.diff.lag18  1.575e-01  4.726e-02   3.332 0.000916 ***
## z.diff.lag19  5.662e-02  4.739e-02   1.195 0.232675    
## z.diff.lag20  4.555e-03  4.739e-02   0.096 0.923454    
## z.diff.lag21 -2.679e-02  4.686e-02  -0.572 0.567733    
## z.diff.lag22 -5.058e-02  4.681e-02  -1.081 0.280344    
## z.diff.lag23  5.016e-02  4.641e-02   1.081 0.280282    
## z.diff.lag24 -4.033e-01  4.573e-02  -8.818  < 2e-16 ***
## z.diff.lag25  1.352e-01  4.061e-02   3.328 0.000926 ***
## z.diff.lag26  7.655e-02  4.094e-02   1.870 0.061981 .  
## z.diff.lag27 -5.869e-03  4.081e-02  -0.144 0.885679    
## z.diff.lag28 -1.595e-02  4.063e-02  -0.393 0.694799    
## z.diff.lag29  4.749e-02  4.058e-02   1.170 0.242329    
## z.diff.lag30  3.524e-02  4.053e-02   0.869 0.385033    
## z.diff.lag31 -2.098e-02  4.043e-02  -0.519 0.603901    
## z.diff.lag32 -6.779e-03  4.031e-02  -0.168 0.866505    
## z.diff.lag33 -3.040e-02  4.016e-02  -0.757 0.449360    
## z.diff.lag34  2.335e-02  4.014e-02   0.582 0.560885    
## z.diff.lag35  2.413e-02  3.926e-02   0.615 0.539068    
## z.diff.lag36 -2.109e-01  3.777e-02  -5.584 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1891 on 608 degrees of freedom
## Multiple R-squared:  0.4976, Adjusted R-squared:  0.4662 
## F-statistic: 15.85 on 38 and 608 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -1.8081 1.6081 2.4121 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34

We start by testing against Model C, \[\Delta {y_t} = \gamma {y_{t - 1}} + \mu + \beta t + \sum\limits_{i = 1}^{p - 1} {{p_i}} \Delta {y_{t - i}} + {e_t}\] where we look at the \(\phi_3\) statistic to test \(H_0:\) \(\gamma=\beta=0\)

According to the critical region we Fail to Reject the null which implies that this time series data set contains a unit root. This is further confirmed by looking at \(\tau_3\) statistic for the lagged endogenous variable in levels to see that we Fail to Reject the null that \(H_0:\) \(\gamma=0\). Again a unit root can not be rejected.

We next look at the \(\phi_2\) statistic to determine if this time series data is a random walk or a random walk with drift. The \(H_0: \gamma=\mu=\beta=0\) but since we have already determined that \(\gamma=\beta=0\) in previous hypothesis test we are really testing for \(\mu=0\). We find that we Fail to reject this hypnosis which indicates we are dealing with a Pure Random Walk.

This can be further confirmed by now testing several hypothesis on Model B.

#Model B
t2 <- ur.df(CPILFENS, type="drift", lags=36, selectlags = "Fixed")
plot(t2)

summary(t2)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.28604 -0.10190 -0.00116  0.09863  0.94111 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.016837   0.014803   1.137 0.255818    
## z.lag.1      -0.004291   0.003275  -1.310 0.190570    
## z.diff.lag1   0.317002   0.039439   8.038 4.77e-15 ***
## z.diff.lag2   0.198346   0.041462   4.784 2.16e-06 ***
## z.diff.lag3   0.023337   0.042032   0.555 0.578947    
## z.diff.lag4  -0.051926   0.041871  -1.240 0.215400    
## z.diff.lag5   0.061422   0.041942   1.464 0.143586    
## z.diff.lag6   0.097414   0.041998   2.319 0.020698 *  
## z.diff.lag7   0.027815   0.042113   0.660 0.509187    
## z.diff.lag8   0.039658   0.042081   0.942 0.346352    
## z.diff.lag9   0.023251   0.042072   0.553 0.580710    
## z.diff.lag10  0.041827   0.042083   0.994 0.320663    
## z.diff.lag11  0.042520   0.041790   1.017 0.309332    
## z.diff.lag12 -0.693504   0.041317 -16.785  < 2e-16 ***
## z.diff.lag13  0.236060   0.047194   5.002 7.44e-07 ***
## z.diff.lag14  0.134628   0.048096   2.799 0.005286 ** 
## z.diff.lag15  0.033430   0.047591   0.702 0.482664    
## z.diff.lag16 -0.123280   0.047221  -2.611 0.009258 ** 
## z.diff.lag17  0.063384   0.047405   1.337 0.181698    
## z.diff.lag18  0.157481   0.047346   3.326 0.000934 ***
## z.diff.lag19  0.055854   0.047472   1.177 0.239830    
## z.diff.lag20  0.003585   0.047464   0.076 0.939811    
## z.diff.lag21 -0.027316   0.046941  -0.582 0.560829    
## z.diff.lag22 -0.051075   0.046888  -1.089 0.276455    
## z.diff.lag23  0.049495   0.046491   1.065 0.287473    
## z.diff.lag24 -0.403340   0.045811  -8.804  < 2e-16 ***
## z.diff.lag25  0.136223   0.040675   3.349 0.000861 ***
## z.diff.lag26  0.077084   0.041006   1.880 0.060606 .  
## z.diff.lag27 -0.006331   0.040876  -0.155 0.876967    
## z.diff.lag28 -0.015700   0.040701  -0.386 0.699832    
## z.diff.lag29  0.047670   0.040646   1.173 0.241328    
## z.diff.lag30  0.034817   0.040603   0.857 0.391508    
## z.diff.lag31 -0.021169   0.040496  -0.523 0.601344    
## z.diff.lag32 -0.007092   0.040377  -0.176 0.860640    
## z.diff.lag33 -0.030228   0.040229  -0.751 0.452697    
## z.diff.lag34  0.023535   0.040208   0.585 0.558542    
## z.diff.lag35  0.023901   0.039328   0.608 0.543584    
## z.diff.lag36 -0.210507   0.037835  -5.564 3.96e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1894 on 609 degrees of freedom
## Multiple R-squared:  0.4951, Adjusted R-squared:  0.4644 
## F-statistic: 16.14 on 37 and 609 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -1.3104 0.8586 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78

Model B,

\[\Delta {y_t} = \gamma {y_{t - 1}} + \mu + \sum\limits_{i = 1}^{p - 1} {{p_i}} \Delta {y_{t - i}} + {e_t}\]
In Model B we first look at the \(\tau_2\) statistic under the null, \(H_0:\) \(\gamma=0\); Fail to Reject. Next we look at \(\phi_1\) statistic under the null \(H_0:\) \(\gamma=\mu=0\) which really means, based on the previous result, that we are testing if \(\mu=0\); Fail to Reject, which confirms again we are dealing with a Pure Random Walk.

Since we suspect we are dealing with a pure random walk model we will investigate Model A.

#Model A
t3 <- ur.df(CPILFENS, type="none", lags=36, selectlags = "Fixed")
plot(t3)

summary(t3)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.30144 -0.09570  0.00441  0.10401  0.92925 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## z.lag.1      -0.001072   0.001648  -0.651 0.515571    
## z.diff.lag1   0.316498   0.039446   8.024 5.28e-15 ***
## z.diff.lag2   0.197153   0.041459   4.755 2.47e-06 ***
## z.diff.lag3   0.021390   0.042007   0.509 0.610793    
## z.diff.lag4  -0.053672   0.041853  -1.282 0.200197    
## z.diff.lag5   0.059853   0.041929   1.427 0.153958    
## z.diff.lag6   0.095775   0.041983   2.281 0.022877 *  
## z.diff.lag7   0.025800   0.042086   0.613 0.540075    
## z.diff.lag8   0.037442   0.042046   0.890 0.373550    
## z.diff.lag9   0.021066   0.042038   0.501 0.616465    
## z.diff.lag10  0.039546   0.042046   0.941 0.347310    
## z.diff.lag11  0.039746   0.041729   0.952 0.341232    
## z.diff.lag12 -0.696403   0.041248 -16.883  < 2e-16 ***
## z.diff.lag13  0.235645   0.047204   4.992 7.81e-07 ***
## z.diff.lag14  0.133623   0.048100   2.778 0.005637 ** 
## z.diff.lag15  0.031800   0.047580   0.668 0.504173    
## z.diff.lag16 -0.124594   0.047218  -2.639 0.008535 ** 
## z.diff.lag17  0.062482   0.047410   1.318 0.188027    
## z.diff.lag18  0.156168   0.047344   3.299 0.001028 ** 
## z.diff.lag19  0.053870   0.047452   1.135 0.256711    
## z.diff.lag20  0.001256   0.047431   0.026 0.978878    
## z.diff.lag21 -0.029079   0.046927  -0.620 0.535705    
## z.diff.lag22 -0.052952   0.046870  -1.130 0.259020    
## z.diff.lag23  0.047202   0.046458   1.016 0.310031    
## z.diff.lag24 -0.406021   0.045762  -8.873  < 2e-16 ***
## z.diff.lag25  0.135956   0.040684   3.342 0.000883 ***
## z.diff.lag26  0.076501   0.041012   1.865 0.062616 .  
## z.diff.lag27 -0.007401   0.040875  -0.181 0.856385    
## z.diff.lag28 -0.016589   0.040703  -0.408 0.683745    
## z.diff.lag29  0.046723   0.040647   1.149 0.250814    
## z.diff.lag30  0.033581   0.040599   0.827 0.408480    
## z.diff.lag31 -0.022537   0.040488  -0.557 0.577987    
## z.diff.lag32 -0.008667   0.040363  -0.215 0.830062    
## z.diff.lag33 -0.031540   0.040222  -0.784 0.433250    
## z.diff.lag34  0.022120   0.040199   0.550 0.582342    
## z.diff.lag35  0.021852   0.039296   0.556 0.578351    
## z.diff.lag36 -0.213142   0.037773  -5.643 2.56e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1894 on 610 degrees of freedom
## Multiple R-squared:  0.494,  Adjusted R-squared:  0.4633 
## F-statistic:  16.1 on 37 and 610 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -0.6506 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

Model A,

\[\Delta {y_t} = \gamma {y_{t - 1}} + \sum\limits_{i = 1}^{p - 1} {{p_i}} \Delta {y_{t - i}} + {e_t}\]
We are testing \(H_0: \gamma=0\). When we look at \(\tau_1\) we see that we Fail to Reject the null which confirms, in the strongest way, that we are dealing with a Pure Random Walk.

It certainly seems that we are dealing with a difference stationary time series. We will now follow the Box-Jenkins Method to build a time series model.

According to the \(\Delta\) ACF and \(\Delta\) PACF we look like we are dealing with a ARIMA(0,d,q)X(P,D,Q)[12] model with I(1) differeiencing. It looks like there is an AR and an MA component as well as a seasonal component at lags 12, 24, 36.

We will start with an ARIMA(3,1,1)X(0,1,0)[12] model and based on the results we will model a few more ARIMA(p,1,q) models.

m1 <- Arima(CPILFENS, order=c(3,1,1), seasonal= list(order=c(0,1,0), period=12))
m1

## Series: CPILFENS 
## ARIMA(3,1,1)(0,1,0)[12]                    
## 
## Coefficients:
##          ar1     ar2      ar3     ma1
##       0.1111  0.1765  -0.0200  0.1451
## s.e.  0.5576  0.1437   0.0853  0.5567
## 
## sigma^2 estimated as 0.1635:  log likelihood=-344.65
## AIC=699.3   AICc=699.39   BIC=721.84

tsdiag(m1, gof.lag=12)

m1.fcast <- forecast(m1, h=24)

m2 <- Arima(CPILFENS, order=c(4,1,1), seasonal= list(order=c(0,1,0), period=12))
m2

## Series: CPILFENS 
## ARIMA(4,1,1)(0,1,0)[12]                    
## 
## Coefficients:
##          ar1     ar2      ar3     ar4      ma1
##       0.7381  0.0131  -0.1123  0.0451  -0.4816
## s.e.  0.3664  0.1040   0.0686  0.0392   0.3652
## 
## sigma^2 estimated as 0.1634:  log likelihood=-344.37
## AIC=700.73   AICc=700.86   BIC=727.78

tsdiag(m2, gof.lag=12)

m2.fcast <- forecast(m2, h=24)

m3 <- Arima(CPILFENS, order=c(1,1,3), seasonal= list(order=c(2,0,1), period=12))
m3

## Series: CPILFENS 
## ARIMA(1,1,3)(2,0,1)[12]                    
## 
## Coefficients:
##          ar1      ma1      ma2      ma3    sar1    sar2     sma1
##       0.9675  -0.6515  -0.0105  -0.1134  0.0352  0.0060  -0.8406
## s.e.  0.0140   0.0402   0.0492   0.0418  0.0511  0.0475   0.0341
## 
## sigma^2 estimated as 0.03505:  log likelihood=168.55
## AIC=-321.09   AICc=-320.88   BIC=-284.88

tsdiag(m3, gof.lag=12)

m3.fcast <- forecast(m3, h=24)

m4 <- Arima(CPILFENS, order=c(1,1,1), seasonal= list(order=c(2,0,2), period=12))
m4

## Series: CPILFENS 
## ARIMA(1,1,1)(2,0,2)[12]                    
## 
## Coefficients:

## Warning in sqrt(diag(x$var.coef)): NaNs produced

##          ar1      ma1     sar1    sar2     sma1     sma2
##       0.9411  -0.6807  -0.3651  0.0079  -0.4357  -0.3191
## s.e.  0.0218   0.0512      NaN     NaN      NaN      NaN
## 
## sigma^2 estimated as 0.03576:  log likelihood=161.93
## AIC=-309.86   AICc=-309.69   BIC=-278.17

tsdiag(m4, gof.lag=12)

m4.fcast <- forecast(m4, h=24)

m5 <- Arima(CPILFENS, order=c(1,1,1), seasonal= list(order=c(1,0,1), period=12))
m5

## Series: CPILFENS 
## ARIMA(1,1,1)(1,0,1)[12]                    
## 
## Coefficients:
##          ar1      ma1    sar1     sma1
##       0.9414  -0.6812  0.0227  -0.8237
## s.e.  0.0217   0.0511  0.0479   0.0295
## 
## sigma^2 estimated as 0.03576:  log likelihood=161.93
## AIC=-313.86   AICc=-313.77   BIC=-291.23

tsdiag(m5, gof.lag=12)

m5.fcast <- forecast(m5, h=24)

m6 <- Arima(CPILFENS, order=c(1,1,1), seasonal= list(order=c(0,0,1), period=12))
m6

## Series: CPILFENS 
## ARIMA(1,1,1)(0,0,1)[12]                    
## 
## Coefficients:
##          ar1      ma1     sma1
##       0.9419  -0.6817  -0.8158
## s.e.  0.0214   0.0507   0.0254
## 
## sigma^2 estimated as 0.03577:  log likelihood=161.82
## AIC=-315.63   AICc=-315.57   BIC=-297.53

tsdiag(m6, gof.lag=12)

m6.fcast <- forecast(m6, h=24)

par(mfrow=c(1,1), cex=0.65)
plot.splineforecast(m1.fcast, type="o", pch=16, xlim=c(2010, 2017), main="Model 1: Forecasts for ARIMA(3,1,1)X(0,1,0)[12]")
legend(2014, 10,legend=c("Original Data","Forecasted Data","Fitted Data"), col=c("black", "blue", "red"), lty=c(1,1,1))
lines(m3.fcast$mean, type="p", pch=16, col="blue")
lines(CPILFENS_all, type="o", pch=16)

Conclusion

The best model for forecasting this time series data seems to be an ARIMA(3,1,1)X(0,1,0)[12]. This is based on all the analysis presented above. There is a strange correlation in the residuals at lag 1 which I am not sure about. Maybe this is just a strange anomaly with the data at this region. Most of the models developed were reasonable but Model 1 was very slightly better than the rest.