This document was created with R Markdown, and then printed as pdf for peer-graded evaluation purposes.
Code chunks will not be echoed in the paper.
Consumer price index of the United States of America and of the Euro area (data obtained from the Reserve Bank of Australia):
- CPI_EUR: Consumer price index in the Euro area
- CPI_USA: Consumer price index in the United States of America
- LOGPEUR: logarithm of CPI_EUR
- LOGPUSA: logarithm of CPI_USA
- DPEUR: first difference of LOGPEUR, monthly inflation rate
- DPUSA: first difference of LOGPUSA, monthly inflation rate
- TREND: linear trend (value 1 in Jan 2000 to value 144 in Dec 2011)
This plots point out that:
##
## Augmented Dickey-Fuller Test
##
## data: df$LOGPEUR
## Dickey-Fuller = -2.8263, Lag order = 3, p-value = 0.2324
## alternative hypothesis: stationary
##
## Augmented Dickey-Fuller Test
##
## data: df$LOGPUSA
## Dickey-Fuller = -2.7345, Lag order = 3, p-value = 0.2706
## alternative hypothesis: stationary
For both variables, the ADF statistic is greater than the critical value of −3.5. Therefore, the non-stationarity hypothesis is not rejected.
We calculate sample autocorrelations and the sample partial autocorrelations and show the highest values for partial autocorrelation:
| lag | AC | PAC |
|---|---|---|
| 12 | 0.554 | 0.398 |
| 6 | 0.403 | 0.374 |
\(t-6\) and \(t-12\) have the largest partial auto correlations, that justifies the model proposed.
Estimating the AR model for lags 6 and 12 produces the following Autoregressive Fit Model:
##
## Time series regression with "ts" data:
## Start = 14, End = 132
##
## Call:
## dynlm(formula = ts(DPEUR) ~ L(ts(DPEUR, 6)) + L(ts(DPEUR, 12)),
## data = df_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0103343 -0.0017369 -0.0000475 0.0015322 0.0080903
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0003838 0.0002811 1.365 0.1749
## L(ts(DPEUR, 6)) 0.1887459 0.0772888 2.442 0.0161 *
## L(ts(DPEUR, 12)) 0.5979841 0.0835544 7.157 8.05e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002569 on 116 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.4232, Adjusted R-squared: 0.4132
## F-statistic: 42.55 on 2 and 116 DF, p-value: 1.381e-14
We extend the model with USA data:
##
## Time series regression with "ts" data:
## Start = 14, End = 132
##
## Call:
## dynlm(formula = ts(DPEUR) ~ L(ts(DPEUR, 6)) + L(ts(DPEUR, 12)) +
## L(ts(DPUSA)) + L(ts(DPUSA, 6)) + L(ts(DPUSA, 12)), data = df_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0065866 -0.0016535 -0.0000118 0.0012630 0.0082682
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0004407 0.0002853 1.545 0.125
## L(ts(DPEUR, 6)) 0.2029891 0.0785520 2.584 0.011 *
## L(ts(DPEUR, 12)) 0.6367464 0.0874766 7.279 4.78e-11 ***
## L(ts(DPUSA)) 0.2264287 0.0511286 4.429 2.20e-05 ***
## L(ts(DPUSA, 6)) -0.0560565 0.0547645 -1.024 0.308
## L(ts(DPUSA, 12)) -0.2300418 0.0541695 -4.247 4.47e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002272 on 113 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5602, Adjusted R-squared: 0.5408
## F-statistic: 28.79 on 5 and 113 DF, p-value: < 2.2e-16
USA lag 6 is not significant (p-value>0.30), so we restrict the model like this:
##
## Time series regression with "ts" data:
## Start = 14, End = 132
##
## Call:
## dynlm(formula = ts(DPEUR) ~ L(ts(DPEUR, 6)) + L(ts(DPEUR, 12)) +
## L(ts(DPUSA)) + L(ts(DPUSA, 12)), data = df_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0067809 -0.0016356 0.0000532 0.0013660 0.0082448
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0003391 0.0002676 1.267 0.2076
## L(ts(DPEUR, 6)) 0.1687310 0.0710801 2.374 0.0193 *
## L(ts(DPEUR, 12)) 0.6551529 0.0856263 7.651 6.93e-12 ***
## L(ts(DPUSA)) 0.2326460 0.0507772 4.582 1.19e-05 ***
## L(ts(DPUSA, 12)) -0.2264880 0.0540694 -4.189 5.55e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002273 on 114 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5561, Adjusted R-squared: 0.5406
## F-statistic: 35.71 on 4 and 114 DF, p-value: < 2.2e-16
| Model | RMSE | MAE | SUM |
|---|---|---|---|
| AR model forecasting | 0.0011367 | 0.0008655 | 0.0012774 |
| ADL model forecasting | 0.0009278 | 0.0007104 | 0.0006066 |
We can conclude that the ADL model performs better forecasts than the AR model, as it scored less at all errors scores.
Both estimates were pretty accurate, as shown on the graph, but the ADL, using wisely chosen USA lag data, proved better almost every single months.