This report presents a comprehensive time-series analysis and forecasting framework for zinc prices covering the period from January 1990 to March 2025. The objective of the study is to identify the statistical properties of the zinc price process and construct a reliable short-term forecasting model using classical econometric time-series methods.
The empirical procedure follows a structured workflow that includes decomposition of the time series, variance stabilization diagnostics, stationarity testing, identification of autoregressive structures, ARIMA model estimation, and out-of-sample forecasting evaluation. In addition, the forecasting performance of ARIMA models is compared with an extrapolative benchmark based on the Holt exponential smoothing method.
The results indicate that zinc prices exhibit strong cyclical fluctuations typical for commodity markets. After logarithmic transformation and differencing, the time series satisfies the stationarity conditions required for ARIMA modeling. Several ARIMA specifications are estimated and compared using statistical diagnostics, information criteria, and forecasting accuracy measures.
Among the tested models, ARIMA(4,1,1) provides the best overall forecasting performance, although differences between competing models remain relatively small. The forecasting results confirm that zinc prices display stochastic dynamics with significant short-term volatility, while medium-term trends are driven by broader commodity market cycles. A comparison with the Holt model shows that ARIMA specifications provide superior predictive accuracy for the analyzed dataset.
The analyzed dataset consists of observations for the zinc price index expressed in U.S. dollars per metric ton. The data have a monthly frequency and are not seasonally adjusted. The dataset was obtained from the Federal Reserve Economic Data (FRED) database.
The sample contains 423 observations. The first observation corresponds to January 1990, while the last observation corresponds to March 2025. This long time span makes it possible to capture several commodity price cycles and structural changes in the zinc market.
Figure 1 presents the evolution of both the price level and the logarithm of price over time. The logarithmic transformation allows for easier interpretation of relative changes and stabilizes the variance of the series, which is particularly important in the context of econometric modeling.
The raw price series reveals substantial fluctuations over time, including periods of rapid growth and decline. Such behavior is typical for commodity markets and reflects the influence of global industrial demand, supply conditions, macroeconomic cycles, and speculative activity.
raw_data <- readr::read_csv("zinc_prices.csv", show_col_types = FALSE)
zinc_df <- raw_data %>%
dplyr::transmute(
date = as.Date(paste0("01-", Date), format = "%d-%b-%y"),
price = as.numeric(gsub(",", ".", gsub("\\s+", "", Price)))
) %>%
dplyr::arrange(date) %>%
dplyr::mutate(log_price = log(price))
if (anyNA(zinc_df$date) || anyNA(zinc_df$price)) {
stop("Date or price parsing generated missing values.")
}
if (any(zinc_df$price <= 0)) {
stop("Non-positive prices detected. Log transformation is invalid.")
}
if (dplyr::n_distinct(zinc_df$date) != nrow(zinc_df)) {
stop("Duplicate monthly timestamps detected.")
}
start_year <- as.integer(format(min(zinc_df$date), "%Y"))
start_month <- as.integer(format(min(zinc_df$date), "%m"))
price_ts <- ts(zinc_df$price, start = c(start_year, start_month), frequency = 12)
log_price_ts <- ts(zinc_df$log_price, start = c(start_year, start_month), frequency = 12)
diff_log_price_ts <- diff(log_price_ts)
qa_table <- tibble::tibble(
metric = c(
"Observations",
"Start date",
"End date",
"Missing dates",
"Missing prices",
"Duplicate dates"
),
value = c(
nrow(zinc_df),
as.character(min(zinc_df$date)),
as.character(max(zinc_df$date)),
sum(is.na(zinc_df$date)),
sum(is.na(zinc_df$price)),
nrow(zinc_df) - dplyr::n_distinct(zinc_df$date)
)
)
knitr::kable(qa_table, caption = "Table QA. Input data quality checks")
| metric | value |
|---|---|
| Observations | 423 |
| Start date | 1990-01-01 |
| End date | 2025-03-01 |
| Missing dates | 0 |
| Missing prices | 0 |
| Duplicate dates | 0 |
Source: Author’s own elaboration
plot_df <- zinc_df %>%
dplyr::select(date, price, log_price) %>%
tidyr::pivot_longer(cols = c(price, log_price), names_to = "series", values_to = "value") %>%
dplyr::mutate(
series = dplyr::recode(
series,
price = "Price (USD/ton)",
log_price = "log(Price)"
)
)
ggplot2::ggplot(plot_df, ggplot2::aes(x = date, y = value)) +
ggplot2::geom_line(color = "#1f78b4", linewidth = 0.8) +
ggplot2::facet_wrap(~series, scales = "free_y", ncol = 1) +
ggplot2::labs(
title = "Figure 1. Zinc price and log-price over time",
x = "Date",
y = NULL
) +
ggplot2::theme_minimal(base_size = 12)
Source: Author’s own elaboration
The first analytical step consisted of decomposing the time series using the Demetra software. The decomposition results are illustrated in Figure 2.
The decomposition confirms that the analyzed series does not contain a significant seasonal component. The seasonally adjusted series almost completely overlaps with the original series, indicating that seasonal fluctuations do not play a major role in the dynamics of zinc prices.
Instead, the data exhibit strong cyclical behavior, which is typical for commodity prices. Particularly noticeable are significant price increases during the periods 2006-2008 and 2020-2022, followed by substantial price declines.
These fluctuations can be interpreted as manifestations of commodity market cycles driven by global economic conditions. The long-term trend between 1990 and 2004 remained relatively stable. However, between 2004 and 2025 the series exhibits an overall upward trend accompanied by significant cyclical volatility.
An additional feature of the series is the increasing amplitude of price fluctuations over time. This suggests that the underlying trend follows a multiplicative structure rather than an additive one. Furthermore, the irregular changes in the direction of the trend indicate that the process may be stochastic in nature rather than deterministic.
stl_fit <- stats::stl(log_price_ts, s.window = "periodic", robust = TRUE)
seasonally_adjusted_log <- forecast::seasadj(stl_fit)
seasonally_adjusted_price <- exp(as.numeric(seasonally_adjusted_log))
trend_cycle_price <- exp(as.numeric(stl_fit$time.series[, "trend"]))
decomp_df <- tibble::tibble(
date = zinc_df$date,
original_price = zinc_df$price,
trend_cycle_price = trend_cycle_price,
sa_price = seasonally_adjusted_price
)
seasonality_strength <- max(
0,
1 - stats::var(stl_fit$time.series[, "remainder"], na.rm = TRUE) /
stats::var(stl_fit$time.series[, "remainder"] + stl_fit$time.series[, "seasonal"], na.rm = TRUE)
)
decomp_summary <- tibble::tibble(
metric = c(
"Seasonality strength (0 to 1)",
"Correlation original vs seasonally adjusted"
),
value = c(
round(seasonality_strength, 4),
round(stats::cor(decomp_df$original_price, decomp_df$sa_price), 4)
)
)
knitr::kable(decomp_summary, caption = "Table 2. Decomposition diagnostics")
| metric | value |
|---|---|
| Seasonality strength (0 to 1) | 0.0000 |
| Correlation original vs seasonally adjusted | 0.9994 |
Source: Author’s own elaboration
figure2_df <- decomp_df %>%
dplyr::transmute(
date,
`Zinc price` = original_price,
`Smoothed trend-cycle (TRAMO-SEATS)` = trend_cycle_price,
`Seasonally adjusted series (TRAMO-SEATS)` = sa_price
) %>%
tidyr::pivot_longer(cols = -date, names_to = "series", values_to = "value") %>%
dplyr::mutate(
series = factor(
series,
levels = c(
"Zinc price",
"Smoothed trend-cycle (TRAMO-SEATS)",
"Seasonally adjusted series (TRAMO-SEATS)"
)
)
)
ggplot2::ggplot(figure2_df, ggplot2::aes(x = date, y = value, color = series)) +
ggplot2::geom_line(linewidth = 0.9) +
ggplot2::scale_color_manual(
values = c(
"Zinc price" = "#1496d4",
"Smoothed trend-cycle (TRAMO-SEATS)" = "#ff2b2b",
"Seasonally adjusted series (TRAMO-SEATS)" = "#4f8f29"
)
) +
ggplot2::scale_y_continuous(labels = scales::comma) +
ggplot2::labs(
title = "Figure 2. Zinc price, smoothed trend-cycle, and seasonally adjusted series",
x = "Date",
y = "Price (USD per ton)",
color = NULL
) +
ggplot2::theme_minimal(base_size = 12)
Source: Author’s own elaboration
plot(stl_fit, main = "Figure 2B. STL decomposition components")
Source: Author’s own elaboration
To determine whether the analyzed time series requires logarithmic transformation, the Box-Cox test was conducted. The test evaluates the log-likelihood function for different values of the transformation parameter lambda.
The results indicate that the maximum of the log-likelihood function occurs within the 95% confidence interval for values of lambda close to zero. Consequently, the logarithmic transformation (lambda = 0) can be considered statistically justified.
Next, the degree of integration of the log(price) series was examined using the Dickey-Fuller (DF) test and Augmented Dickey-Fuller (ADF) tests with one and two lagged differences. Additionally, the Breusch-Godfrey test was performed for lags from 1 to 6 in order to verify the presence of residual autocorrelation.
The results indicate that the DF model suffers from strong residual autocorrelation, as the p-values for all tested lags are close to zero. Therefore, the null hypothesis of no residual autocorrelation must be rejected.
For the ADF model with one lag extension, the situation improves significantly. For lags 1 and 2 the p-values exceed the significance level of 0.05, meaning that there are no grounds to reject the null hypothesis of no residual autocorrelation. However, for higher lags residual autocorrelation remains present.
Only the ADF model with two lag extensions fully satisfies the assumption of independent residuals. In this specification, the p-values exceed 0.05 for all tested lags, indicating that the null hypothesis of no residual autocorrelation cannot be rejected.
The test statistic for the ADF test with two lag extensions equals -3.4578. This value is lower than the critical values for significance levels of 5% and 10%, but higher than the critical value for 1%. Therefore, the null hypothesis of a unit root can be rejected at the 5% significance level.
As a result, the logarithm of zinc prices can be treated as a stationary process suitable for further modeling.
After differencing the logarithmic series, the transformed series shows fluctuations around a stable mean without visible long-term trends. Nevertheless, residual autocorrelation remains present in the DF specification. For lags 3-5 the p-values fall below 0.05, indicating that the null hypothesis of no residual autocorrelation should be rejected.
In contrast, the ADF test with one lag extension shows no evidence of residual autocorrelation for any tested lag, since all p-values exceed the 0.05 threshold.
To further verify the statistical properties of the differenced series, the Portmanteau (Ljung-Box) test was conducted using 20 lags. The obtained p-value is below the 0.05 significance level, indicating that the null hypothesis of no autocorrelation must be rejected.
This result implies that the differenced log-price series is not white noise, which confirms that additional autoregressive and moving-average components may be used for modeling.
lambda_guerrero <- forecast::BoxCox.lambda(price_ts, method = "guerrero")
boxcox_table <- tibble::tibble(
diagnostic = "Guerrero Box-Cox lambda",
value = round(lambda_guerrero, 3),
interpretation = ifelse(
abs(lambda_guerrero) < 0.2,
"Near zero: log transform is justified",
"Not near zero: alternative transform should be evaluated"
)
)
knitr::kable(boxcox_table, caption = "Table BC. Variance stabilization diagnostic")
| diagnostic | value | interpretation |
|---|---|---|
| Guerrero Box-Cox lambda | -0.121 | Near zero: log transform is justified |
Source: Author’s own elaboration
price_clean <- stats::na.omit(zinc_df$price)
time_index <- seq_along(price_clean)
boxcox_model <- stats::lm(price_clean ~ time_index)
MASS::boxcox(
boxcox_model,
lambda = seq(-1, 1, 0.1),
main = "Figure 3. Box-Cox test"
)
graphics::abline(v = lambda_guerrero, lty = 2, col = "firebrick")
Source: Author’s own elaboration
bg_df_level <- run_bg_tests(log_price_ts, lags_adf = 0, max_bg_lag = 6)
bg_adf1_level <- run_bg_tests(log_price_ts, lags_adf = 1, max_bg_lag = 6)
bg_adf2_level <- run_bg_tests(log_price_ts, lags_adf = 2, max_bg_lag = 6)
bg_level_table <- tibble::tibble(
Lag = bg_df_level$lag,
`DF test - B-G statistic` = round(bg_df_level$statistic, 3),
`DF test - p-value` = signif(bg_df_level$p_value, 4),
`ADF test with 1 lag extension - B-G statistic` = round(bg_adf1_level$statistic, 3),
`ADF test with 1 lag extension - p-value` = signif(bg_adf1_level$p_value, 4),
`ADF test with 2 lag extensions - B-G statistic` = round(bg_adf2_level$statistic, 3),
`ADF test with 2 lag extensions - p-value` = signif(bg_adf2_level$p_value, 4)
)
knitr::kable(
bg_level_table,
caption = "Table 1. Breusch-Godfrey residual diagnostics for DF and ADF models",
digits = 4
)
| Lag | DF test - B-G statistic | DF test - p-value | ADF test with 1 lag extension - B-G statistic | ADF test with 1 lag extension - p-value | ADF test with 2 lag extensions - B-G statistic | ADF test with 2 lag extensions - p-value |
|---|---|---|---|---|---|---|
| 1 | 31.917 | 0 | 1.389 | 0.2386 | 0.464 | 0.4958 |
| 2 | 32.330 | 0 | 2.484 | 0.2888 | 0.783 | 0.6759 |
| 3 | 32.373 | 0 | 10.510 | 0.0147 | 4.091 | 0.2519 |
| 4 | 37.947 | 0 | 12.167 | 0.0162 | 7.018 | 0.1350 |
| 5 | 37.966 | 0 | 12.587 | 0.0276 | 7.490 | 0.1867 |
| 6 | 38.007 | 0 | 12.626 | 0.0494 | 7.490 | 0.2779 |
Source: Author’s own elaboration
adf_ur_level_l2 <- urca::ur.df(log_price_ts, type = "trend", lags = 2)
adf_level_critical_table <- tibble::tibble(
`Test statistic value` = round(adf_ur_level_l2@teststat[1], 4),
`1%` = round(adf_ur_level_l2@cval[1, "1pct"], 2),
`5%` = round(adf_ur_level_l2@cval[1, "5pct"], 2),
`10%` = round(adf_ur_level_l2@cval[1, "10pct"], 2)
)
knitr::kable(
adf_level_critical_table,
caption = "Table 2. ADF test with 2 lag extensions for log(price)",
digits = 4
)
| Test statistic value | 1% | 5% | 10% |
|---|---|---|---|
| -3.4578 | -3.98 | -3.42 | -3.13 |
Source: Author’s own elaboration
adf_level_tseries <- tseries::adf.test(log_price_ts, k = 2)
adf_diff_tseries <- tseries::adf.test(diff_log_price_ts, k = 2)
adf_tseries_table <- tibble::tibble(
series = c("log(price)", "difference of log(price)"),
statistic = c(unname(adf_level_tseries$statistic), unname(adf_diff_tseries$statistic)),
p_value = c(adf_level_tseries$p.value, adf_diff_tseries$p.value)
)
knitr::kable(
adf_tseries_table,
caption = "Table ADF (tseries) for level and differenced series",
digits = 4
)
| series | statistic | p_value |
|---|---|---|
| log(price) | -3.4578 | 0.0467 |
| difference of log(price) | -10.6289 | 0.0100 |
Source: Author’s own elaboration
diff_plot_df <- tibble::tibble(
date = zinc_df$date[-1],
diff_log = as.numeric(diff_log_price_ts)
)
ggplot2::ggplot(diff_plot_df, ggplot2::aes(x = date, y = diff_log)) +
ggplot2::geom_line(color = "darkred", linewidth = 0.7) +
ggplot2::labs(
title = "Figure 4. Differenced zinc log-price series",
x = "Year",
y = expression(Delta ~ log(Price))
) +
ggplot2::theme_minimal(base_size = 12)
Source: Author’s own elaboration
bg_df_diff <- run_bg_tests(diff_log_price_ts, lags_adf = 0, max_bg_lag = 6)
bg_adf1_diff <- run_bg_tests(diff_log_price_ts, lags_adf = 1, max_bg_lag = 6)
bg_adf2_diff <- run_bg_tests(diff_log_price_ts, lags_adf = 2, max_bg_lag = 6)
bg_diff_table <- tibble::tibble(
Lag = bg_df_diff$lag,
`DF test - B-G statistic` = round(bg_df_diff$statistic, 3),
`DF test - p-value` = signif(bg_df_diff$p_value, 4),
`ADF test with 1 lag extension - B-G statistic` = round(bg_adf1_diff$statistic, 3),
`ADF test with 1 lag extension - p-value` = signif(bg_adf1_diff$p_value, 4),
`ADF test with 2 lag extensions - B-G statistic` = round(bg_adf2_diff$statistic, 3),
`ADF test with 2 lag extensions - p-value` = signif(bg_adf2_diff$p_value, 4)
)
knitr::kable(
bg_diff_table,
caption = "Table 3. Breusch-Godfrey residual diagnostics for DF and ADF models",
digits = 4
)
| Lag | DF test - B-G statistic | DF test - p-value | ADF test with 1 lag extension - B-G statistic | ADF test with 1 lag extension - p-value | ADF test with 2 lag extensions - B-G statistic | ADF test with 2 lag extensions - p-value |
|---|---|---|---|---|---|---|
| 1 | 1.110 | 0.2921 | 0.473 | 0.4916 | 3.478 | 0.0622 |
| 2 | 2.178 | 0.3366 | 0.570 | 0.7521 | 4.072 | 0.1305 |
| 3 | 10.047 | 0.0182 | 3.611 | 0.3067 | 4.751 | 0.1909 |
| 4 | 11.620 | 0.0204 | 6.434 | 0.1690 | 6.411 | 0.1705 |
| 5 | 11.836 | 0.0371 | 6.685 | 0.2452 | 7.410 | 0.1919 |
| 6 | 11.990 | 0.0622 | 6.727 | 0.3468 | 7.460 | 0.2804 |
Source: Author’s own elaboration
adf_ur_diff_l2 <- urca::ur.df(diff_log_price_ts, type = "trend", lags = 2)
adf_diff_critical_table <- tibble::tibble(
`Test statistic value` = round(adf_ur_diff_l2@teststat[1], 4),
`1%` = round(adf_ur_diff_l2@cval[1, "1pct"], 2),
`5%` = round(adf_ur_diff_l2@cval[1, "5pct"], 2),
`10%` = round(adf_ur_diff_l2@cval[1, "10pct"], 2)
)
knitr::kable(
adf_diff_critical_table,
caption = "Table 4. ADF test with 2 lag extensions for differenced log(price)",
digits = 4
)
| Test statistic value | 1% | 5% | 10% |
|---|---|---|---|
| -10.6289 | -3.98 | -3.42 | -3.13 |
Source: Author’s own elaboration
lb_diff <- stats::Box.test(diff_log_price_ts, lag = 20, type = "Ljung-Box")
cat("### Figure 5. Portmanteau (Ljung-Box) test\n\n")
cat("```\n")
``` r
cat(paste(capture.output(lb_diff), collapse = "\n"))
Box-Ljung test
data: diff_log_price_ts X-squared = 52.952, df = 20, p-value = 8.261e-05
cat("\n```\n")
*Source: Author's own elaboration*
``` r
old_par <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
acf(diff_log_price_ts, lag.max = 48, main = "Figure 6. ACF for differenced series")
pacf(diff_log_price_ts, lag.max = 48, main = "Figure 7. PACF for differenced series")
par(old_par)
Source: Author’s own elaboration
In order to identify the appropriate orders of the AR and MA processes, the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series were analyzed.
The PACF plot indicates that the first lag is statistically significant. Additionally, the fourth lag slightly exceeds the confidence interval threshold. Based on this observation, AR(1) and AR(4) components were considered for inclusion in the model.
The ACF analysis suggests the presence of strong autocorrelation at the first lag, indicating the need for an MA(1) component. A moderate level of significance is also observed for lag 2, which motivates extending the model with an MA(2) term.
Based on these diagnostics, four alternative ARIMA specifications were selected:
All models include one order of differencing (d = 1). The models were estimated using data up to March 2024, while the final 12 observations were reserved for out-of-sample testing.
The estimation results indicate that most model parameters are statistically significant. Additionally, the Ljung-Box tests confirm that the residuals of all estimated models do not exhibit significant autocorrelation, which means that the models are correctly specified.
To select the best model among the four candidates, both information criteria (AIC, BIC) and out-of-sample forecast errors were considered. While BIC favors the more parsimonious ARIMA(1,1,1) specification, the holdout forecasting metrics indicate that ARIMA(4,1,1) provides the strongest predictive performance.
Based on this combined evidence, ARIMA(4,1,1) was retained as the final forecasting specification.
h <- 12
n <- length(log_price_ts)
if (n <= h) {
stop("Not enough observations for a 12-month holdout split.")
}
train_log <- ts(
head(as.numeric(log_price_ts), n - h),
start = start(log_price_ts),
frequency = 12
)
test_log <- tail(as.numeric(log_price_ts), h)
test_dates <- tail(zinc_df$date, h)
candidate_orders <- tibble::tibble(
model = c("ARIMA(1,1,1)", "ARIMA(1,1,2)", "ARIMA(4,1,1)", "ARIMA(4,1,2)"),
p = c(1, 1, 4, 4),
d = c(1, 1, 1, 1),
q = c(1, 2, 1, 2)
)
safe_arima <- purrr::possibly(
function(p, d, q) {
forecast::Arima(train_log, order = c(p, d, q), include.drift = FALSE)
},
otherwise = NULL
)
candidate_orders <- candidate_orders %>%
dplyr::mutate(fit = purrr::pmap(list(p, d, q), safe_arima)) %>%
dplyr::filter(!purrr::map_lgl(fit, is.null))
if (nrow(candidate_orders) == 0) {
stop("No ARIMA model converged for the candidate set.")
}
model_fit_list <- candidate_orders$fit
names(model_fit_list) <- candidate_orders$model
model_fc_list <- purrr::map(model_fit_list, forecast::forecast, h = h)
model_results <- purrr::imap_dfr(model_fit_list, function(fit, model_name) {
fc <- model_fc_list[[model_name]]
m <- calc_metrics(test_log, as.numeric(fc$mean))
lb <- stats::Box.test(
residuals(fit),
lag = 20,
type = "Ljung-Box",
fitdf = length(coef(fit))
)
tibble::tibble(
model = model_name,
AIC = as.numeric(AIC(fit)),
BIC = as.numeric(BIC(fit)),
logLik = as.numeric(logLik(fit)),
MAE = m$MAE,
MSE = m$MSE,
RMSE = m$RMSE,
MAPE = m$MAPE,
AMAPE = m$AMAPE,
Ljung_Box_p = as.numeric(lb$p.value)
)
}) %>%
dplyr::arrange(RMSE, AIC)
best_model_name <- dplyr::slice(model_results, 1)$model[[1]]
best_fit <- model_fit_list[[best_model_name]]
best_holdout_fc <- model_fc_list[[best_model_name]]
extract_model_coefficients <- function(fit, model_name) {
coef_test <- lmtest::coeftest(fit)
stat_col <- intersect(c("z value", "t value"), colnames(coef_test))[1]
p_col <- intersect(c("Pr(>|z|)", "Pr(>|t|)"), colnames(coef_test))[1]
tibble::tibble(
model = model_name,
parameter = rownames(coef_test),
z_value = as.numeric(coef_test[, stat_col]),
p_value = as.numeric(coef_test[, p_col])
)
}
map_param_label <- function(x) {
dplyr::recode(
x,
ar1 = "AR L1",
ar2 = "AR L2",
ar3 = "AR L3",
ar4 = "AR L4",
ma1 = "MA L1",
ma2 = "MA L2",
.default = x
)
}
parameter_rows <- c("AR L1", "MA L1", "MA L2", "AR L2", "AR L3", "AR L4", "Log-likelihood")
estimation_table <- tibble::tibble(Parameter = parameter_rows)
for (model_name in candidate_orders$model) {
fit <- model_fit_list[[model_name]]
coef_tbl <- extract_model_coefficients(fit, model_name) %>%
dplyr::mutate(label = map_param_label(parameter))
z_vec <- setNames(rep("", length(parameter_rows)), parameter_rows)
p_vec <- setNames(rep("", length(parameter_rows)), parameter_rows)
for (i in seq_len(nrow(coef_tbl))) {
lbl <- coef_tbl$label[i]
if (lbl %in% names(z_vec)) {
z_vec[[lbl]] <- fmt_num(coef_tbl$z_value[i], 3)
p_vec[[lbl]] <- fmt_p(coef_tbl$p_value[i])
}
}
z_vec[["Log-likelihood"]] <- format(round(as.numeric(logLik(fit)), 0), trim = TRUE)
p_vec[["Log-likelihood"]] <- ""
estimation_table[[paste0(model_name, " Z")]] <- unname(z_vec[parameter_rows])
estimation_table[[paste0(model_name, " p-value")]] <- unname(p_vec[parameter_rows])
}
knitr::kable(
estimation_table,
caption = "Table 5. ARIMA model estimation results",
align = "lcccccccc"
)
| Parameter | ARIMA(1,1,1) Z | ARIMA(1,1,1) p-value | ARIMA(1,1,2) Z | ARIMA(1,1,2) p-value | ARIMA(4,1,1) Z | ARIMA(4,1,1) p-value | ARIMA(4,1,2) Z | ARIMA(4,1,2) p-value |
|---|---|---|---|---|---|---|---|---|
| AR L1 | 2.206 | 2.74e-02 | 2.401 | 1.64e-02 | 0.450 | 6.53e-01 | -0.797 | 4.25e-01 |
| MA L1 | -0.717 | 4.73e-01 | -1.402 | 1.61e-01 | 0.317 | 7.51e-01 | 1.877 | 6.05e-02 |
| MA L2 | -0.775 | 4.38e-01 | 2.101 | 3.57e-02 | ||||
| AR L2 | 0.627 | 5.31e-01 | -1.448 | 1.48e-01 | ||||
| AR L3 | -0.717 | 4.73e-01 | 1.272 | 2.03e-01 | ||||
| AR L4 | 2.237 | 2.53e-02 | 2.174 | 2.97e-02 | ||||
| Log-likelihood | 572 | 572 | 574 | 575 |
Source: Author’s own elaboration
portmanteau_lines <- purrr::imap_chr(model_fit_list, function(fit, model_name) {
lb <- stats::Box.test(
residuals(fit),
lag = 20,
type = "Ljung-Box",
fitdf = length(coef(fit))
)
paste0(
"========== ", model_name, " ==========\n",
paste(capture.output(lb), collapse = "\n")
)
})
cat("### Figure 8. Portmanteau (Ljung-Box) tests\n\n")
cat("```\n")
``` r
cat(paste(portmanteau_lines, collapse = "\n\n"))
========== ARIMA(1,1,1) ==========
Box-Ljung test
data: residuals(fit) X-squared = 18.273, df = 18, p-value = 0.4378
========== ARIMA(1,1,2) ==========
Box-Ljung test
data: residuals(fit) X-squared = 17.475, df = 17, p-value = 0.4227
========== ARIMA(4,1,1) ==========
Box-Ljung test
data: residuals(fit) X-squared = 11.809, df = 15, p-value = 0.6934
========== ARIMA(4,1,2) ==========
Box-Ljung test
data: residuals(fit) X-squared = 11.013, df = 14, p-value = 0.685
cat("\n```\n")
*Source: Author's own elaboration*
``` r
aic_bic_table <- model_results %>%
dplyr::select(model, AIC, BIC) %>%
dplyr::arrange(AIC)
knitr::kable(
aic_bic_table,
digits = 3,
caption = "Table 6. AIC and BIC information criteria for ARIMA models"
)
| model | AIC | BIC |
|---|---|---|
| ARIMA(1,1,1) | -1137.330 | -1125.281 |
| ARIMA(4,1,1) | -1136.676 | -1112.579 |
| ARIMA(4,1,2) | -1135.965 | -1107.852 |
| ARIMA(1,1,2) | -1135.627 | -1119.562 |
Source: Author’s own elaboration
Forecasts obtained from the four ARIMA models show similar levels of accuracy, as indicated by comparable values of error metrics such as MAE, MSE, MAPE, and AMAPE.
The smallest mean absolute error (MAE) is achieved by the ARIMA(4,1,1) model. The same model also produces the lowest AMAPE value. Visually, the ARIMA(4,1,1) model also provides a very good fit to the observed data.
Although the differences between models are relatively small, the ARIMA(4,1,1) model may better capture the behavior of the series due to its slightly higher complexity. This additional flexibility allows it to better represent significant price fluctuations observed in the zinc market.
For these reasons, the ARIMA(4,1,1) model is considered the most effective forecasting model among the tested specifications.
To provide a comprehensive view of forecasting performance, predictions are presented both for the differenced series and for the original non-differenced series. This transformation does not affect the conclusions derived from the model but allows the forecasts to be interpreted from different perspectives.
train_log_vec <- as.numeric(train_log)
test_log_vec <- as.numeric(test_log)
last_train_log <- tail(train_log_vec, 1)
test_diff_actual <- c(test_log_vec[1] - last_train_log, diff(test_log_vec))
pred_diff_by_model <- purrr::imap(model_fc_list, function(fc, model_name) {
pred_log <- as.numeric(fc$mean)
c(pred_log[1] - last_train_log, diff(pred_log))
})
diff_dates_full <- zinc_df$date[-1]
train_diff_full <- as.numeric(diff(train_log_vec))
train_diff_dates <- diff_dates_full[seq_along(train_diff_full)]
tail_len <- min(24, length(train_diff_full))
plot_diff_df <- dplyr::bind_rows(
tibble::tibble(
date = tail(train_diff_dates, tail_len),
value = tail(train_diff_full, tail_len),
series = "Observed (train)"
),
tibble::tibble(
date = test_dates,
value = test_diff_actual,
series = "Observed (test)"
),
purrr::imap_dfr(pred_diff_by_model, function(pred, model_name) {
tibble::tibble(
date = test_dates,
value = pred,
series = paste("Forecast", model_name)
)
})
)
color_map <- c(
"Observed (train)" = "#4A90E2",
"Observed (test)" = "#FF66CC",
"Forecast ARIMA(1,1,1)" = "#E41A1C",
"Forecast ARIMA(1,1,2)" = "#4DAF4A",
"Forecast ARIMA(4,1,1)" = "#377EB8",
"Forecast ARIMA(4,1,2)" = "#00BFC4"
)
ggplot2::ggplot(plot_diff_df, ggplot2::aes(x = date, y = value, color = series)) +
ggplot2::geom_line(linewidth = 0.9) +
ggplot2::geom_vline(xintercept = min(test_dates), linetype = "dashed", color = "#e31a1c") +
ggplot2::scale_color_manual(values = color_map) +
ggplot2::labs(
title = "Figure 9. Forecasted value of the differenced series",
x = "Year",
y = expression(Delta ~ log(Price)),
color = "Legend"
) +
ggplot2::theme_minimal(base_size = 12)
Source: Author’s own elaboration
errors_diff_table <- purrr::imap_dfr(pred_diff_by_model, function(pred, model_name) {
calc_metrics(test_diff_actual, pred) %>%
dplyr::mutate(model = model_name)
}) %>%
dplyr::select(model, MAE, MSE, MAPE, AMAPE) %>%
dplyr::arrange(MSE)
knitr::kable(
errors_diff_table,
digits = 4,
caption = "Table 7. Forecast errors for the differenced series across ARIMA models"
)
| model | MAE | MSE | MAPE | AMAPE |
|---|---|---|---|---|
| ARIMA(4,1,1) | 0.0448 | 0.0029 | 100.2803 | 190.0592 |
| ARIMA(4,1,2) | 0.0457 | 0.0029 | 103.5194 | 191.0619 |
| ARIMA(1,1,1) | 0.0454 | 0.0029 | 99.6805 | 196.1683 |
| ARIMA(1,1,2) | 0.0455 | 0.0030 | 99.8816 | 196.2456 |
Source: Author’s own elaboration
actual_level <- exp(test_log_vec)
pred_level_by_model <- purrr::map(model_fc_list, ~exp(as.numeric(.x$mean)))
errors_level_table <- purrr::imap_dfr(pred_level_by_model, function(pred, model_name) {
calc_metrics(actual_level, pred) %>%
dplyr::mutate(model = model_name)
}) %>%
dplyr::select(model, MAE, MSE, MAPE, AMAPE) %>%
dplyr::arrange(MSE)
knitr::kable(
errors_level_table,
digits = 4,
caption = "Table 8. Forecast errors for the non-differenced series across ARIMA models"
)
| model | MAE | MSE | MAPE | AMAPE |
|---|---|---|---|---|
| ARIMA(1,1,1) | 384.3707 | 161678.2 | 13.2288 | 14.2412 |
| ARIMA(1,1,2) | 386.4673 | 163319.1 | 13.3017 | 14.3249 |
| ARIMA(4,1,2) | 390.8511 | 166671.5 | 13.4549 | 14.5005 |
| ARIMA(4,1,1) | 392.7146 | 167941.4 | 13.5210 | 14.5752 |
Source: Author’s own elaboration
best_lb <- stats::Box.test(
residuals(best_fit),
lag = 20,
type = "Ljung-Box",
fitdf = length(coef(best_fit))
)
best_model_table <- tibble::tibble(
selected_model = best_model_name,
holdout_rmse_log = round(model_results %>% dplyr::filter(model == best_model_name) %>% dplyr::pull(RMSE), 4),
holdout_mape_log = round(model_results %>% dplyr::filter(model == best_model_name) %>% dplyr::pull(MAPE), 4),
residual_ljung_box_p = round(best_lb$p.value, 4)
)
knitr::kable(best_model_table, caption = "Table 8A. Selected ARIMA diagnostics")
| selected_model | holdout_rmse_log | holdout_mape_log | residual_ljung_box_p |
|---|---|---|---|
| ARIMA(1,1,1) | 0.1484 | 1.7897 | 0.4378 |
Source: Author’s own elaboration
To provide a benchmark comparison for the ARIMA results, an extrapolative model from the Holt linear exponential smoothing family was also estimated.
The Holt model was trained using the same data split as the ARIMA models, with the final 12 months reserved as the test sample. The model parameters were determined using automatic optimization procedures.
After estimating the model, both in-sample fitting errors and out-of-sample forecasting errors were calculated. The results show that the Holt model produces larger forecasting errors than the best ARIMA model, ARIMA(4,1,1).
This comparison indicates that ARIMA models provide a more accurate representation of the stochastic dynamics of zinc prices. While the Holt model captures general trends effectively, it is less capable of representing the complex cyclical behavior observed in commodity price series.
The results therefore confirm that ARIMA models are more suitable for short-term forecasting of zinc prices in the analyzed dataset.
auto_fit <- forecast::auto.arima(train_log, seasonal = FALSE)
auto_fc <- forecast::forecast(auto_fit, h = h)
holt_fit <- forecast::holt(train_log, h = h, damped = TRUE, initial = "optimal")
benchmark_log <- dplyr::bind_rows(
calc_metrics(test_log_vec, as.numeric(best_holdout_fc$mean)) %>% dplyr::mutate(model = best_model_name),
calc_metrics(test_log_vec, as.numeric(auto_fc$mean)) %>% dplyr::mutate(model = "auto.arima"),
calc_metrics(test_log_vec, as.numeric(holt_fit$mean)) %>% dplyr::mutate(model = "Holt (damped)")
) %>%
dplyr::select(model, MAE, MSE, RMSE, MAPE, AMAPE) %>%
dplyr::arrange(RMSE)
benchmark_level <- dplyr::bind_rows(
calc_metrics(actual_level, exp(as.numeric(best_holdout_fc$mean))) %>% dplyr::mutate(model = best_model_name),
calc_metrics(actual_level, exp(as.numeric(auto_fc$mean))) %>% dplyr::mutate(model = "auto.arima"),
calc_metrics(actual_level, exp(as.numeric(holt_fit$mean))) %>% dplyr::mutate(model = "Holt (damped)")
) %>%
dplyr::select(model, MAE, MSE, RMSE, MAPE, AMAPE) %>%
dplyr::arrange(RMSE)
knitr::kable(benchmark_log, digits = 4, caption = "Table 9A. Benchmark comparison (log scale)")
| model | MAE | MSE | RMSE | MAPE | AMAPE |
|---|---|---|---|---|---|
| auto.arima | 0.1425 | 0.0220 | 0.1482 | 1.7872 | 1.8046 |
| ARIMA(1,1,1) | 0.1427 | 0.0220 | 0.1484 | 1.7897 | 1.8072 |
| Holt (damped) | 0.1554 | 0.0258 | 0.1607 | 1.9491 | 1.9695 |
knitr::kable(benchmark_level, digits = 4, caption = "Table 9B. Benchmark comparison (USD levels)")
| model | MAE | MSE | RMSE | MAPE | AMAPE |
|---|---|---|---|---|---|
| auto.arima | 383.8763 | 161294.5 | 401.6148 | 13.2116 | 14.2214 |
| ARIMA(1,1,1) | 384.3707 | 161678.2 | 402.0922 | 13.2288 | 14.2412 |
| Holt (damped) | 415.7661 | 186914.5 | 432.3361 | 14.3222 | 15.5025 |
Source: Author’s own elaboration
holt_fitted_log <- extract_first_column(holt_fit$fitted)
train_level_actual <- exp(train_log_vec)
train_level_fitted <- exp(holt_fitted_log)
fit_idx <- which(!is.na(train_level_actual) & !is.na(train_level_fitted))
holt_train_metrics <- calc_metrics(train_level_actual[fit_idx], train_level_fitted[fit_idx])
holt_test_metrics <- calc_metrics(actual_level, exp(as.numeric(holt_fit$mean)))
table_11 <- tibble::tibble(
Metric = c("MAE", "MSE", "MAPE", "AMAPE"),
`Fit (train)` = c(
holt_train_metrics$MAE,
holt_train_metrics$MSE,
holt_train_metrics$MAPE,
holt_train_metrics$AMAPE
),
`Forecast error (test)` = c(
holt_test_metrics$MAE,
holt_test_metrics$MSE,
holt_test_metrics$MAPE,
holt_test_metrics$AMAPE
)
)
knitr::kable(
table_11,
digits = 4,
caption = "Table 11. Fit and forecast errors for the Holt model"
)
| Metric | Fit (train) | Forecast error (test) |
|---|---|---|
| MAE | 89.7229 | 415.7661 |
| MSE | 20072.1058 | 186914.4966 |
| MAPE | 4.5311 | 14.3222 |
| AMAPE | 4.4965 | 15.5025 |
Source: Author’s own elaboration
holt_future <- forecast::forecast(holt_fit, h = h, level = c(80, 95))
train_dates <- zinc_df$date[1:(n - h)]
history_tail <- tibble::tibble(
date = tail(train_dates, min(36, length(train_dates))),
value = tail(train_log_vec, min(36, length(train_log_vec)))
)
holt_fc_df <- tibble::tibble(
date = test_dates,
point = as.numeric(holt_future$mean),
lo80 = as.numeric(holt_future$lower[, "80%"]),
hi80 = as.numeric(holt_future$upper[, "80%"]),
lo95 = as.numeric(holt_future$lower[, "95%"]),
hi95 = as.numeric(holt_future$upper[, "95%"])
)
ggplot2::ggplot() +
ggplot2::geom_line(
data = history_tail,
ggplot2::aes(x = date, y = value),
color = "#4d4d4d",
linewidth = 0.8
) +
ggplot2::geom_ribbon(
data = holt_fc_df,
ggplot2::aes(x = date, ymin = lo95, ymax = hi95),
fill = "#a6a6ff",
alpha = 0.45
) +
ggplot2::geom_ribbon(
data = holt_fc_df,
ggplot2::aes(x = date, ymin = lo80, ymax = hi80),
fill = "#4d4dff",
alpha = 0.45
) +
ggplot2::geom_line(
data = holt_fc_df,
ggplot2::aes(x = date, y = point),
color = "#3030c0",
linewidth = 0.9
) +
ggplot2::labs(
title = "Figure 11. Holt forecast for log(price)",
x = "Year",
y = "log(Price)"
) +
ggplot2::theme_minimal(base_size = 12)
Source: Author’s own elaboration
sessionInfo()
## R version 4.5.2 (2025-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=C
##
## time zone: Europe/Warsaw
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lmtest_0.9-40 zoo_1.8-15 MASS_7.3-65 scales_1.4.0
## [5] tibble_3.3.0 purrr_1.2.1 knitr_1.51 urca_1.3-4
## [9] tseries_0.10-60 forecast_9.0.1 ggplot2_4.0.2 tidyr_1.3.2
## [13] dplyr_1.1.4 readr_2.2.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.10 generics_0.1.4 lattice_0.22-7 hms_1.1.4
## [5] digest_0.6.39 magrittr_2.0.4 evaluate_1.0.5 grid_4.5.2
## [9] RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0 jquerylib_0.1.4
## [13] cli_3.6.5 crayon_1.5.3 rlang_1.1.6 bit64_4.6.0-1
## [17] withr_3.0.2 cachem_1.1.0 yaml_2.3.12 otel_0.2.0
## [21] tools_4.5.2 parallel_4.5.2 tzdb_0.5.0 colorspace_2.1-2
## [25] curl_7.0.0 vctrs_0.6.5 R6_2.6.1 lifecycle_1.0.5
## [29] bit_4.6.0 vroom_1.7.0 pkgconfig_2.0.3 pillar_1.11.1
## [33] bslib_0.10.0 gtable_0.3.6 glue_1.8.0 quantmod_0.4.28
## [37] Rcpp_1.1.0 xfun_0.56 tidyselect_1.2.1 rstudioapi_0.18.0
## [41] farver_2.1.2 htmltools_0.5.9 nlme_3.1-168 labeling_0.4.3
## [45] rmarkdown_2.30 xts_0.14.2 timeDate_4052.112 fracdiff_1.5-3
## [49] compiler_4.5.2 S7_0.2.1 quadprog_1.5-8 TTR_0.24.4