HICP <- read_csv("C:/Users/david/Downloads/HICP2025.csv",
col_types = cols(Category = col_date("%Y-%m")))
ggplot(HICP, aes(Category, `Euro area`)) +
geom_line(na.rm = TRUE) +
xlab("Time") +
ylab("Year-on-Year inflation rate")
In this case, I used ADF and KPSS tests to check whether the process is stationary
ADF (Augmented Dickey–Fuller):
KPSS (Level):
Decision rules:
If ADF p-value < α, reject H₀ ⇒ stationary.
If KPSS p-value is large, fail to reject H₀ ⇒ stationary.
adf.test(HICP$`Euro area`[!is.na(HICP$`Euro area`)])
## Warning in adf.test(HICP$`Euro area`[!is.na(HICP$`Euro area`)]): p-value
## smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: HICP$`Euro area`[!is.na(HICP$`Euro area`)]
## Dickey-Fuller = -4.8002, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(HICP$`Euro area`[!is.na(HICP$`Euro area`)])
##
## KPSS Test for Level Stationarity
##
## data: HICP$`Euro area`[!is.na(HICP$`Euro area`)]
## KPSS Level = 0.49293, Truncation lag parameter = 5, p-value = 0.04326
Results on the original series
ADF gives p ≈ 0.01 < 0.05 ⇒ reject the null hypothesis.
KPSS gives p ≈ 0.043 < 0.05 ⇒ reject the null hypothesis at the 5% level.
This means that ADF and KPSS test give different results about whether the process is stationary.
Therefore, I difference the series and repeat the tests to check the resulting data.
HICP <- HICP |> mutate(diff = c(NA, diff(HICP$`Euro area`)))
ggplot(HICP, aes(Category, diff)) +
geom_line(na.rm = TRUE) +
xlab("Time") +
ylab("Diff")
HICP_clean <- HICP[!is.na(HICP$diff),]
adf.test(HICP_clean$diff)
## Warning in adf.test(HICP_clean$diff): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: HICP_clean$diff
## Dickey-Fuller = -4.245, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(HICP_clean$diff)
## Warning in kpss.test(HICP_clean$diff): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: HICP_clean$diff
## KPSS Level = 0.039189, Truncation lag parameter = 5, p-value = 0.1
assuming α = 0.05
In ADF test, p-value ≈ 0.01 < 0.05 ⇒ reject the null hypothesis.
In KPSS test, p-value ≈ 0.1 > 0.05 ⇒ fail to reject he null hypothesis.
Therefore, both tests support stationarity and we can try to fit an ARMA model using differenced series.
Applying auto.arima with d=D=0,
seasonal = FALSE, and max(p,q)=5, AICc
selects an ARMA(1,1) model(lower is better).
fit <- forecast::auto.arima(
HICP_clean$diff,
d = 0,
D = 0,
seasonal = FALSE,
stepwise = TRUE,
approximation = FALSE,
max.p = 5, max.q = 5
)
fit
## Series: HICP_clean$diff
## ARIMA(1,0,1) with zero mean
##
## Coefficients:
## ar1 ma1
## 0.8309 -0.6322
## s.e. 0.0618 0.0835
##
## sigma^2 = 0.09364: log likelihood = -79.86
## AIC=165.73 AICc=165.8 BIC=177.25
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1) with zero mean
## Q* = 19.47, df = 8, p-value = 0.01254
##
## Model df: 2. Total lags used: 10
Ljung-Box Test:
Residuals are centered around zero with roughly constant variance and approximately normal shape.
However, the Ljung-Box test gives p-value ≈ 0.01 < 0.05.We therefore reject H₀ at α = 0.05, indicating that some remaining autocorrelation after the ARMA(1,1) fit.
Although the Ljung–Box test rejects white noise, I proceed to produce 12-month-ahead forecasts of the YoY inflation rate from this model.
Since the model is fitted to the differenced series \(\Delta y_t\), forecast()
returns \(\widehat{\Delta
y}_{t+1},\dots,\widehat{\Delta y}_{t+12}\). I recover level (YoY)
forecasts by cumulatively summing these predicted differences and
aadding them to the last observed YoY \(y_T\): \[
\hat y_{t+h} = y_t + \sum_{i=1}^{h} \widehat{\Delta y}_{t+i},\quad
h=1,\dots,12.
\]
# 12-steps predictions
fc <- predict(fit, n.ahead = 12)
len = nrow(HICP_clean)
pred_12 <- ts(HICP_clean$`Euro area`[len] + cumsum(as.numeric(fc$pred)), start = len+1, frequency = 1)
pred_12_U <- ts(pred_12 + 1.96*as.numeric(fc$se), start = len+1, frequency = 1)
pred_12_L <- ts(pred_12 -1.96* as.numeric(fc$se), start = len+1, frequency = 1)
{ts.plot(c(HICP_clean$`Euro area`[(len-12):len], rep(NA,12)), ylim=c(1,3), ylab='Year-on-Year inflation rate (%)',main='12-steps predictions')
lines(length(HICP_clean$`Euro area`[(len-12):len])+(1:12), pred_12, col='red')
lines(length(HICP_clean$`Euro area`[(len-12):len])+(1:12), pred_12_U, col=4,lty=2)
lines(length(HICP_clean$`Euro area`[(len-12):len])+(1:12), pred_12_L, col=4, lty=2)
legend(1,3,c('prediction','95% confidence interval'),
col=c('red',4),lty=c(1,2),pch=c(NA,NA))}
I also tried to use ARMA(1,1) model to make 1-step predictions.
However, it seems that there is no significant difference between this two methods in this case.
# 1-step predictions with model ARIMA(1,0,1)
x.pred = NULL
pred_1_U = NULL
pred_1_L = NULL
pred_1 = NULL
difference = HICP_clean$diff
for(i in 1:12){
fit <- stats::arima(difference[1:(len+i-1)], order=c(1,0,1))
x.pred <- c(x.pred, predict(fit, n.ahead=1)$pred)
pred_1 <- c(pred_1, HICP_clean$`Euro area`[len]+sum(x.pred))
pred_1_U <- c(pred_1_U, pred_1[i] + 1.96*predict(fit, n.ahead=1)$se)
pred_1_L <- c(pred_1_L, pred_1[i] - 1.96*predict(fit, n.ahead=1)$se)
difference <- c(difference, predict(fit, n.ahead=1)$pred)
}
{ts.plot(c(HICP_clean$`Euro area`[(len-12):len], rep(NA,12)), ylim=c(1,3), ylab='Year-on-Year inflation rate (%)',main='1-step predictions')
lines(length(HICP_clean$`Euro area`[(len-12):len])+(1:12), pred_1, col='red')
lines(length(HICP_clean$`Euro area`[(len-12):len])+(1:12), pred_1_U, col=4,lty=2)
lines(length(HICP_clean$`Euro area`[(len-12):len])+(1:12), pred_1_L, col=4, lty=2)
legend(1,3,c('prediction','95% confidence interval'),
col=c('red',4),lty=c(1,2),pch=c(NA,NA))}
compare_table <- cbind(as.numeric(pred_1), as.numeric(pred_12)) |> as.data.frame()
colnames(compare_table) <- c("1-step predictions", "12-steps predictions")
compare_table
All 12 point forecasts for YoY inflation are above 2%, and the 95% lower bounds stay above 1.5%, so the outlook might be alarming. However, the COVID period is very volatile and may introduce outliers and breaks, so these forecasts may be inaccurate. ,