Introduction

Loading packages and libraries

Import Dataset

df <- read_excel("C:/Users/mateu/OneDrive/Pulpit/Historyczne ceny KGH.xlsx")

Data Description

The study employs monthly time series data covering the period from January 2015 to August 2025, providing approximately 128 observations for each variable. The dataset includes five key economic and financial indicators that reflect both the domestic capital market and global commodity markets. The first variable consists of the stock prices of KGHM Polska Miedź S.A., one of the largest publicly listed companies in Poland and a global producer of copper and silver. The share price of KGHM is strongly influenced by developments in commodity markets, particularly copper and silver, as well as the overall performance of the Warsaw Stock Exchange, making it a natural object of forecasting analysis. The second variable captures silver futures traded on the ICE exchange, which reflect investors’ expectations regarding the future valuation of this precious metal. As silver is one of KGHM’s main products, this factor is directly linked to the company’s financial performance. The third variable includes gold futures (ICE), which serve as a benchmark of global investment sentiment and are commonly perceived as a “safe haven” asset during periods of economic uncertainty. Although KGHM’s exposure to gold is relatively smaller compared to copper and silver, this variable provides important contextual information on the broader precious metals market. The fourth component of the dataset consists of copper futures traded on the COMEX exchange, which represent a crucial indicator of global copper price trends. Given copper’s dominant role in KGHM’s revenue structure, this factor is one of the most significant determinants of the company’s share price dynamics. The final variable in the dataset is the WIG20 index, which comprises the twenty largest companies listed on the Warsaw Stock Exchange. As a synthetic measure of the overall condition of the domestic capital market, WIG20 directly influences the performance of KGHM, one of its largest constituents.

The analysis starts in January 2015 for several reasons. This date marks the beginning of a relatively stable market cycle for KGHM after the significant volatility observed in 2010–2014, which was driven by both global commodity price shocks and internal company-specific factors. Starting the analysis at this point ensures that the dataset captures a complete market cycle characterized by a more consistent relationship between KGHM’s stock price and external macroeconomic indicators. Moreover, the chosen horizon of 2015–2025 encompasses several critical events that shaped both the domestic and international market environment. In 2015, a political shift in Poland marked a turning point that could have influenced the performance of state-linked companies such as KGHM. In 2020, the outbreak of the COVID-19 pandemic caused global disruptions and prompted investors to seek safe-haven assets, including gold and silver, commodities in which KGHM has direct exposure. Finally, the years after 2022 brought a surge in commodity prices driven by geopolitical uncertainty and supply chain disruptions, underscoring the role of copper and other strategic resources in the global economy. By selecting this time frame, the analysis balances the need for a sufficiently long dataset for robust econometric modeling with the relevance of recent developments, thereby allowing for an assessment of how political, economic, and financial factors jointly shaped KGHM’s market valuation.

Visualization Dataset

df_long <- df %>%
  pivot_longer(cols = c(kghm, silver, copper, gold, wig20),
               names_to = "Asset",
               values_to = "Value")


ggplot(df_long, aes(x = Data, y = Value)) +
  geom_line(color = "darkgreen") +
  facet_wrap(~Asset, scales = "free_y", ncol = 1) +   
  ggtitle("Time series") +
  scale_y_continuous(labels = comma) +
  theme_minimal()

Stationarity Tests for Selected Variables and Differencing for Stationarity

The results of the Augmented Dickey–Fuller (ADF) and KPSS tests consistently indicate that all variables in levels (KGHM, WIG20, copper, gold, silver) are non-stationary, as the ADF tests fail to reject the null of a unit root while the KPSS tests reject stationarity. After first differencing, however, the ADF tests strongly reject the unit root and the KPSS statistics no longer reject stationarity for all series. This confirms that the variables are integrated of order one, I(1), and that differencing successfully induces stationarity, making the transformed series suitable for econometric modeling.

tseries::adf.test(na.omit(df$kghm)); tseries::kpss.test(na.omit(df$kghm))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$kghm)
## Dickey-Fuller = -2.9264, Lag order = 5, p-value = 0.1914
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$kghm)): p-value smaller than printed
## p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$kghm)
## KPSS Level = 0.86001, Truncation lag parameter = 4, p-value = 0.01

tseries::adf.test(na.omit(df$wig20));     tseries::kpss.test(na.omit(df$wig20))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$wig20)
## Dickey-Fuller = -2.2706, Lag order = 5, p-value = 0.464
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$wig20)): p-value greater than printed
## p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$wig20)
## KPSS Level = 0.23831, Truncation lag parameter = 4, p-value = 0.1

tseries::adf.test(na.omit(df$copper));     tseries::kpss.test(na.omit(df$copper))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$copper)
## Dickey-Fuller = -3.1457, Lag order = 5, p-value = 0.1003
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$copper)): p-value smaller than printed
## p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$copper)
## KPSS Level = 2.0836, Truncation lag parameter = 4, p-value = 0.01

tseries::adf.test(na.omit(df$gold));     tseries::kpss.test(na.omit(df$gold))

## Warning in tseries::adf.test(na.omit(df$gold)): p-value greater than printed
## p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$gold)
## Dickey-Fuller = 0.28564, Lag order = 5, p-value = 0.99
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$gold)): p-value smaller than printed
## p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$gold)
## KPSS Level = 2.1497, Truncation lag parameter = 4, p-value = 0.01

tseries::adf.test(na.omit(df$silver));     tseries::kpss.test(na.omit(df$silver))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$silver)
## Dickey-Fuller = -1.7367, Lag order = 5, p-value = 0.686
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$silver)): p-value smaller than printed
## p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$silver)
## KPSS Level = 2.0586, Truncation lag parameter = 4, p-value = 0.01

df <- df %>%
  dplyr::mutate(
    d_kghm = c(NA, diff(kghm)),
    d_wig20     = c(NA, diff(wig20)),
    d_copper     = c(NA, diff(copper)),
    d_gold     = c(NA, diff(gold)),
    d_silver     = c(NA, diff(silver))
  )

tseries::adf.test(na.omit(df$d_kghm)); tseries::kpss.test(na.omit(df$d_kghm))

## Warning in tseries::adf.test(na.omit(df$d_kghm)): p-value smaller than printed
## p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$d_kghm)
## Dickey-Fuller = -4.5806, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$d_kghm)): p-value greater than printed
## p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$d_kghm)
## KPSS Level = 0.042182, Truncation lag parameter = 4, p-value = 0.1

tseries::adf.test(na.omit(df$d_wig20));     tseries::kpss.test(na.omit(df$d_wig20))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$d_wig20)
## Dickey-Fuller = -3.8057, Lag order = 5, p-value = 0.02098
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$d_wig20)): p-value greater than
## printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$d_wig20)
## KPSS Level = 0.19009, Truncation lag parameter = 4, p-value = 0.1

tseries::adf.test(na.omit(df$d_copper));     tseries::kpss.test(na.omit(df$d_copper))

## Warning in tseries::adf.test(na.omit(df$d_copper)): p-value smaller than
## printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$d_copper)
## Dickey-Fuller = -4.4498, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$d_copper)): p-value greater than
## printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$d_copper)
## KPSS Level = 0.047381, Truncation lag parameter = 4, p-value = 0.1

tseries::adf.test(na.omit(df$d_gold));     tseries::kpss.test(na.omit(df$d_gold))

## Warning in tseries::adf.test(na.omit(df$d_gold)): p-value smaller than printed
## p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$d_gold)
## Dickey-Fuller = -4.4338, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$d_gold)
## KPSS Level = 0.62379, Truncation lag parameter = 4, p-value = 0.02047

tseries::adf.test(na.omit(df$d_silver));     tseries::kpss.test(na.omit(df$d_silver))

## Warning in tseries::adf.test(na.omit(df$d_silver)): p-value smaller than
## printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(df$d_silver)
## Dickey-Fuller = -5.231, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

## Warning in tseries::kpss.test(na.omit(df$d_silver)): p-value greater than
## printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  na.omit(df$d_silver)
## KPSS Level = 0.20235, Truncation lag parameter = 4, p-value = 0.1

Autocorrelation and Partial Autocorrelation Analysis (ACF & PACF)

The correlogram analysis for differenced KGHM returns shows that autocorrelation decays quickly after the first lag. In the ACF plot, only the first lag is significant, while subsequent values remain within the confidence bands, suggesting limited persistence. The PACF confirms this pattern, with a single notable spike at lag 1 and no clear structure at higher lags.

Overall, these results are consistent with a weak autoregressive process of low order, most likely AR(1). This indicates that past price changes have only short-lived effects on future dynamics, supporting the use of parsimonious ARIMA specifications for modeling KGHM returns.

acf(na.omit(df$d_kghm), main = "ACF kghm")

pacf(na.omit(df$d_kghm), main = "PACF kghm")

acf(na.omit(df$d_wig20), main = "ACF WIG20")

pacf(na.omit(df$d_wig20), main = "PACF WIG20")

acf(na.omit(df$d_copper), main = "ACF COPPER")

pacf(na.omit(df$d_copper), main = "PACF COPPER")

acf(na.omit(df$d_gold), main = "ACF GOLD")

pacf(na.omit(df$d_gold), main = "PACF GOLD")

acf(na.omit(df$d_silver), main = "ACF SILVER")

pacf(na.omit(df$d_silver), main = "PACF SILVER")

OLS Model Estimation (ΔKGHM vs. Explanatory Variables)

The OLS estimation results show that changes in KGHM stock prices are significantly influenced by movements in the domestic equity index (WIG20) and by commodity market dynamics, particularly copper and silver.

In the full specification, the coefficients on ΔWIG20, Δcopper, and Δsilver are all highly significant and economically meaningful, while Δgold is statistically insignificant and adds no explanatory power. After excluding gold, the restricted model (model_lm2) achieves very similar explanatory strength, with an adjusted R² of about 0.575, indicating that roughly 58% of the variation in KGHM returns is explained by the regressors.

Interestingly, despite the fact that KGHM is also a substantial producer of gold, fluctuations in gold prices do not have a statistically significant impact on its stock returns. This suggests that investors primarily value the company through its exposure to copper and the broader domestic equity market (WIG20), with silver also playing a secondary but meaningful role, while gold remains marginal in shaping short-term market dynamics

Model

model_lm <- lm(d_kghm ~ d_wig20 + d_copper + d_gold + d_silver, data = df)
summary(model_lm)

## 
## Call:
## lm(formula = d_kghm ~ d_wig20 + d_copper + d_gold + d_silver, 
##     data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.3913  -5.0190  -0.7179   5.3535  27.2673 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.547405   0.731192  -0.749    0.456    
## d_wig20      0.036643   0.006707   5.463 2.51e-07 ***
## d_copper    20.296305   3.945055   5.145 1.04e-06 ***
## d_gold      -0.009313   0.013397  -0.695    0.488    
## d_silver     2.524593   0.599701   4.210 4.92e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.992 on 122 degrees of freedom
##   (1 obserwacja została skasowana z uwagi na braki w niej zawarte)
## Multiple R-squared:  0.5864, Adjusted R-squared:  0.5728 
## F-statistic: 43.24 on 4 and 122 DF,  p-value: < 2.2e-16

model_lm2 <- lm(d_kghm ~ d_wig20 + d_copper  + d_silver, data = df)
summary(model_lm2)

## 
## Call:
## lm(formula = d_kghm ~ d_wig20 + d_copper + d_silver, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.0599  -5.0563  -0.4078   5.1811  26.8315 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.65979    0.71159  -0.927    0.356    
## d_wig20      0.03618    0.00666   5.432 2.85e-07 ***
## d_copper    20.36201    3.93563   5.174 9.04e-07 ***
## d_silver     2.27236    0.47647   4.769 5.14e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.976 on 123 degrees of freedom
##   (1 obserwacja została skasowana z uwagi na braki w niej zawarte)
## Multiple R-squared:  0.5847, Adjusted R-squared:  0.5746 
## F-statistic: 57.73 on 3 and 123 DF,  p-value: < 2.2e-16

model_lm3 <- lm(d_kghm ~ d_wig20 + d_copper , data = df)
summary(model_lm3)

## 
## Call:
## lm(formula = d_kghm ~ d_wig20 + d_copper, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.5135  -5.6115  -0.2309   4.9506  31.1343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.413423   0.769431  -0.537    0.592    
## d_wig20      0.039588   0.007179   5.514 1.94e-07 ***
## d_copper    27.595681   3.937171   7.009 1.36e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.647 on 124 degrees of freedom
##   (1 obserwacja została skasowana z uwagi na braki w niej zawarte)
## Multiple R-squared:  0.5079, Adjusted R-squared:    0.5 
## F-statistic:    64 on 2 and 124 DF,  p-value: < 2.2e-16

model_lm4 <- lm(d_kghm ~  d_copper , data = df)
summary(model_lm4)

## 
## Call:
## lm(formula = d_kghm ~ d_copper, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.5834  -5.5586   0.4091   5.6361  31.2838 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.3550     0.8551  -0.415    0.679    
## d_copper     35.9210     4.0414   8.888 5.73e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.61 on 125 degrees of freedom
##   (1 obserwacja została skasowana z uwagi na braki w niej zawarte)
## Multiple R-squared:  0.3873, Adjusted R-squared:  0.3824 
## F-statistic:    79 on 1 and 125 DF,  p-value: 5.73e-15

model_lm5 <- lm(d_kghm ~  d_wig20 , data = df)
summary(model_lm5)

## 
## Call:
## lm(formula = d_kghm ~ d_wig20, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.972  -6.307  -0.733   5.194  38.375 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.075101   0.903733  -0.083    0.934    
## d_wig20      0.058882   0.007803   7.546 8.09e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.18 on 125 degrees of freedom
##   (1 obserwacja została skasowana z uwagi na braki w niej zawarte)
## Multiple R-squared:  0.313,  Adjusted R-squared:  0.3075 
## F-statistic: 56.95 on 1 and 125 DF,  p-value: 8.092e-12

car::vif(model_lm2)

##  d_wig20 d_copper d_silver 
## 1.186027 1.376894 1.258658

Model Diagnostics and Residual Analysis

The diagnostic tests and residual plots provide a comprehensive assessment of model validity. The Durbin–Watson statistic (DW ≈ 1.89, p = 0.28) does not indicate significant autocorrelation, suggesting that residuals are serially uncorrelated. The Shapiro–Wilk test (p = 0.35) and the Jarque–Bera test (p ≈ 0.07) both fail to reject normality, confirming that the error distribution is close to Gaussian.

However, the Breusch–Pagan test (p = 0.0017) reveals evidence of heteroskedasticity, implying that the variance of residuals is not constant across observations. This motivates the use of heteroskedasticity-robust standard errors, which were applied in the coefficient tests.

Graphical inspection further supports these findings: residuals appear symmetrically distributed around zero with only a few outliers, the Q–Q plot aligns closely with the theoretical normal line, and leverage plots indicate no severe influential observations.

dwtest(model_lm2)

## 
##  Durbin-Watson test
## 
## data:  model_lm2
## DW = 1.8913, p-value = 0.275
## alternative hypothesis: true autocorrelation is greater than 0

bptest(model_lm2)

## 
##  studentized Breusch-Pagan test
## 
## data:  model_lm2
## BP = 15.182, df = 3, p-value = 0.001667

jarque.bera.test(resid(model_lm2))

## 
##  Jarque Bera Test
## 
## data:  resid(model_lm2)
## X-squared = 5.4085, df = 2, p-value = 0.06692

shapiro.test(resid(model_lm2))

## 
##  Shapiro-Wilk normality test
## 
## data:  resid(model_lm2)
## W = 0.98819, p-value = 0.3464

par(mfrow = c(2, 3))

plot(model_lm2, which = 1)  # Res vs Fitted
plot(model_lm2, which = 2)  # Q-Q
plot(model_lm2, which = 3)  # Scale-Location
plot(model_lm2, which = 5)  # Residuals vs Leverage


hist(residuals(model_lm2),
     breaks = "Scott",
     main = "Histogram reszt",
     xlab = "Reszty",
     col = "lightblue", border = "white")


curve(dnorm(x, mean = mean(residuals(model_lm2)), 
               sd = sd(residuals(model_lm2))) * length(residuals(model_lm2)) *
               diff(hist(residuals(model_lm2), plot = FALSE)$breaks)[1],
      add = TRUE, col = "darkgreen", lwd = 2)


par(mfrow = c(1, 1))

coeftest(model_lm2, vcov = vcovHC(model_lm2, type = "HC1"))

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.6597941  0.6753081 -0.9770 0.3304736    
## d_wig20      0.0361813  0.0080004  4.5224 1.419e-05 ***
## d_copper    20.3620086  4.4648826  4.5605 1.216e-05 ***
## d_silver     2.2723643  0.6191525  3.6701 0.0003597 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Forecasting

Arima

The ARIMA forecasting analysis was conducted using three model specifications: ARIMA(1,1,2), ARIMA(2,1,1), and ARIMA(0,1,3). All models were estimated on the training set and evaluated using key performance metrics such as AIC, BIC, RMSE, and MAPE. Among them, ARIMA(1,1,2) demonstrated the best in-sample fit, with the lowest error measures and no significant autocorrelation in residuals. Comparative visualizations with confidence intervals confirmed the model’s stability and predictive reliability. These findings support ARIMA(1,1,2) as the most suitable approach for short-term forecasting of KGHM stock prices.

suppressPackageStartupMessages({
  library(forecast)
  library(lubridate)
  library(dplyr)
  library(tibble)
})

# --- 1) Przygotowanie danych (zakładam, że df jest już w pamięci) ---
stopifnot(all(c("Data","kghm") %in% names(df)))
df$Data <- as.Date(df$Data)
df <- df[order(df$Data), ]

n <- nrow(df)
cat("Number of months:", n, "\n")

## Number of months: 128

h <- 5
if (h >= n) {
  h <- n - 1
  cat("Not enough data for 5 test months, setting h to:", h, "\n")
}

pkghm <- ts(df$kghm,
            start     = c(as.numeric(format(min(df$Data), "%Y")),
                          as.numeric(format(min(df$Data), "%m"))),
            frequency = 12)

kghm_train <- window(pkghm, end   = c(time(pkghm)[n - h]))
kghm_test  <- window(pkghm, start = c(time(pkghm)[n - h + 1]))
cat("Training set has", length(kghm_train), "points\n")

## Training set has 123 points

model_list <- list(
  "ARIMA(1,1,2)" = c(1,1,2),
  "ARIMA(2,1,1)" = c(2,1,1),
  "ARIMA(0,1,3)" = c(0,1,3)
)

evaluate_model <- function(order, train_data) {
  fit <- Arima(train_data, order = order)
  acc <- accuracy(fit)  # in-sample
  data.frame(
    AIC  = AIC(fit),
    BIC  = BIC(fit),
    RMSE = acc["Training set","RMSE"],
    MAPE = acc["Training set","MAPE"],
    ACF1 = acf(residuals(fit), plot = FALSE)$acf[2]
  )
}


results <- lapply(model_list, evaluate_model, train_data = kghm_train)
comparison_table <- bind_rows(results, .id = "Model")
print(comparison_table)

##          Model      AIC      BIC     RMSE     MAPE          ACF1
## 1 ARIMA(1,1,2) 959.2661 970.4822 11.84329 8.481985 -0.0062317078
## 2 ARIMA(2,1,1) 962.0099 973.2259 12.01964 8.515117 -0.0036854341
## 3 ARIMA(0,1,3) 961.7703 972.9863 12.00746 8.523572  0.0005509118

plot_arima_forecast <- function(order, label) {
  fit <- Arima(kghm_train, order = order)
  fc  <- forecast(fit, h = length(kghm_test))

  plot(pkghm,
       xlim = c(start(pkghm)[1], end(pkghm)[1] + length(fc$mean)/12),
       ylim = range(pkghm, fc$lower, fc$upper),
       main = paste(label, "Forecast with Confidence Bands"),
       ylab = "KGHM", xlab = "Time", col = "black", lwd = 1)

  time_forecast <- time(fc$mean)

  polygon(c(time_forecast, rev(time_forecast)),
          c(fc$upper[,1], rev(fc$lower[,1])),
          col = rgb(0,0,1,0.2), border = NA)

  polygon(c(time_forecast, rev(time_forecast)),
          c(fc$upper[,2], rev(fc$lower[,2])),
          col = rgb(0,0,1,0.1), border = NA)

  fitted_and_forecast <- ts(c(fitted(fit), fc$mean),
                            start = start(kghm_train),
                            frequency = frequency(kghm_train))
  lines(fitted_and_forecast, col = "blue", lwd = 2)
  lines(pkghm, col = "black")
  lines(kghm_test, col = "red", lwd = 2)

  legend("topleft",
         legend = c("Original Data", "Test Data", "Fitted + Forecast", "95% Forecast CI"),
         col = c("black", "red", "blue", rgb(0,0,1,0.2)),
         lwd = c(1, 2, 2, NA), pch = c(NA, NA, NA, 15),
         pt.cex = 2, bty = "n")
}


for (name in names(model_list)) {
  plot_arima_forecast(model_list[[name]], name)
}

h <- length(kghm_test)  # spójnie z testem
future_dates <- seq(from = max(df$Data) %m+% months(1),
                    by   = "month",
                    length.out = h)

arima_table <- function(order_vec, label){
  fit <- Arima(kghm_train, order = order_vec)
  fc  <- forecast(fit, h = h)
  tibble(
    Model        = label,
    Data         = as.Date(future_dates),
    Prognoza_PLN = round(as.numeric(fc$mean), 2),
    Dolny_80_PLN = round(as.numeric(fc$lower[, 1]), 2),
    Gorny_80_PLN = round(as.numeric(fc$upper[, 1]), 2),
    Dolny_95_PLN = round(as.numeric(fc$lower[,  2]), 2),
    Gorny_95_PLN = round(as.numeric(fc$upper[,  2]), 2)
  )
}

wyniki_arima <- bind_rows(
  arima_table(model_list[[1]], names(model_list)[1]),
  arima_table(model_list[[2]], names(model_list)[2]),
  arima_table(model_list[[3]], names(model_list)[3])
)

print(wyniki_arima, n = nrow(wyniki_arima))

## # A tibble: 15 × 7
##    Model        Data       Prognoza_PLN Dolny_80_PLN Gorny_80_PLN Dolny_95_PLN
##    <chr>        <date>            <dbl>        <dbl>        <dbl>        <dbl>
##  1 ARIMA(1,1,2) 2025-09-01         124         109.          139.        100. 
##  2 ARIMA(1,1,2) 2025-10-01         124.         99.9         147.         87.4
##  3 ARIMA(1,1,2) 2025-11-01         123.         94.4         152.         79.2
##  4 ARIMA(1,1,2) 2025-12-01         123.         90.4         155.         73.4
##  5 ARIMA(1,1,2) 2026-01-01         122.         87.4         157.         68.9
##  6 ARIMA(2,1,1) 2025-09-01         124.        108.          140.        100. 
##  7 ARIMA(2,1,1) 2025-10-01         124.         99.6         148.         86.7
##  8 ARIMA(2,1,1) 2025-11-01         124.         94.4         154.         78.5
##  9 ARIMA(2,1,1) 2025-12-01         124.         90.5         158.         72.6
## 10 ARIMA(2,1,1) 2026-01-01         125.         87.2         162.         67.4
## 11 ARIMA(0,1,3) 2025-09-01         124.        108.          139.         99.7
## 12 ARIMA(0,1,3) 2025-10-01         124          99.6         148.         86.8
## 13 ARIMA(0,1,3) 2025-11-01         124.         94.5         154.         78.6
## 14 ARIMA(0,1,3) 2025-12-01         124.         90.6         158.         72.6
## 15 ARIMA(0,1,3) 2026-01-01         124.         87.1         162.         67.3
## # ℹ 1 more variable: Gorny_95_PLN <dbl>

wyniki_arima_112 <- filter(wyniki_arima, Model == "ARIMA(1,1,2)")
wyniki_arima_211 <- filter(wyniki_arima, Model == "ARIMA(2,1,1)")
wyniki_arima_013 <- filter(wyniki_arima, Model == "ARIMA(0,1,3)")


cat("Prognoza ARIMA obejmuje:",
    format(min(future_dates), "%Y-%m"), "→",
    format(max(future_dates), "%Y-%m"), "\n")

## Prognoza ARIMA obejmuje: 2025-09 → 2026-01

Linear Model

Using the refined linear model (model_lm2) with predictors WIG20, silver, and copper returns, a five-month ahead forecast was generated for KGHM stock price changes. The predicted monthly differences range from a decline of –10.96 PLN to an increase of +16.54 PLN. Starting from the most recent observed price, the cumulative forecast suggests a fluctuating but overall upward trajectory, reaching approximately 134.70 PLN by Month 5. The visualization confirms this dynamic, highlighting both volatility and recovery phases within the forecast horizon.

new_data <- tail(df[, c("d_wig20", "d_silver", "d_copper")], 5)
forecast_diff <- predict(model_lm2, newdata = new_data)
forecast_diff

##          1          2          3          4          5 
## -10.964251   3.271818  16.538644  -9.657504   4.158815

last_price <- tail(df$kghm, 1)
forecast_level <- cumsum(c(last_price, forecast_diff))[-1]

forecast_result <- data.frame(
Month = c("Month 1", "Month 2", "Month 3","Month 4","Month 5"),
kghm_change = round(forecast_diff, 2),
kghm_forecast = round(forecast_level, 2)
)
print(forecast_result)

##     Month kghm_change kghm_forecast
## 1 Month 1      -10.96        120.39
## 2 Month 2        3.27        123.66
## 3 Month 3       16.54        140.20
## 4 Month 4       -9.66        130.54
## 5 Month 5        4.16        134.70

ggplot(forecast_result, aes(x = Month, y = kghm_forecast)) +
geom_line(group = 1, color = "steelblue") +
geom_point(size = 3) +
ggtitle("5-Month Forecast of KGHM Price") +
ylab("Predicted KGHM Price (PLN)") +
theme_minimal()

Sensitivity Analysis

The baseline scenario was constructed by computing the mean values of the independent variables — WIG20, copper, and silver returns — across the entire dataset. These average inputs were then used to generate a point forecast from the linear model, resulting in a predicted change in KGHM’s stock price of approximately +0.20 PLN. This modest increase represents the expected outcome under typical market conditions and serves as a reference point for further scenario testing.

Building on this baseline, the sensitivity analysis evaluates how deviations in the explanatory variables influence the forecast. The results show that changes in the WIG20 index have only a marginal effect on KGHM’s share price: a 10% increase in WIG20 corresponds to an increase of just 0.017 PLN. By contrast, commodity prices — particularly silver and copper — exert a stronger influence, with comparable shocks leading to changes of 0.038 PLN and 0.032 PLN respectively. In the downside scenarios, silver (–0.019 PLN) and copper (–0.016 PLN) again produce stronger effects than WIG20 (–0.009 PLN), reinforcing the importance of global commodity markets.

The multi-factor analysis further demonstrates the combined impact of simultaneous shocks. When both copper and silver increase by 10% while WIG20 remains unchanged, the forecast rises by about 0.069 PLN. The strongest positive deviation occurs when all three variables increase by 10%, raising the forecast by approximately 0.087 PLN. Conversely, the most adverse case appears when all three variables decline by 5%, reducing the forecast by about 0.043 PLN. These results underline the dominant role of commodity markets in shaping KGHM’s performance, with domestic market conditions playing only a secondary role.

stopifnot(all(c("d_wig20","d_copper","d_silver") %in% names(df)))

X_vars   <- names(coef(model_lm2))[-1]
X_means  <- sapply(X_vars, function(v) mean(df[[v]], na.rm = TRUE))

predict_at <- function(x_list) {
  nd <- as.data.frame(as.list(x_list))
  predict(model_lm2, newdata = nd)
}

baseline <- as.numeric(predict_at(X_means))

M0   <- 1.00   
MUP  <- 1.10   # +10%
MDN  <- 0.95   # -5%

pred_for <- function(m_wig, m_cu, m_ag) {
  X <- X_means
  X[["d_wig20"]] <- X[["d_wig20"]] * m_wig
  X[["d_copper"]] <- X[["d_copper"]] * m_cu
  X[["d_silver"]] <- X[["d_silver"]] * m_ag
  as.numeric(predict_at(X))
}

scenarios <- tibble::tribble(
  ~Scenario,                                 ~m_wig, ~m_cu, ~m_ag,
  "Both metals +10%, WIG20 unchanged",        M0,     MUP,   MUP,
  "Copper +10%, WIG20 & Silver unchanged",    M0,     MUP,   M0,
  "Silver +10%, WIG20 & Copper unchanged",    M0,     M0,    MUP,
  "WIG20 +10%, metals unchanged",             MUP,    M0,    M0,
  "WIG20 -5%, metals unchanged",              MDN,    M0,    M0,
  "Metals -5%, WIG20 unchanged",              M0,     MDN,   MDN,
  "Copper -5%, others unchanged",             M0,     MDN,   M0,
  "Silver -5%, others unchanged",             M0,     M0,    MDN,
  "All +10% (WIG20, Copper, Silver)",         MUP,    MUP,   MUP,
  "All -5% (WIG20, Copper, Silver)",          MDN,    MDN,   MDN
)

scen_results <- scenarios %>%
  dplyr::rowwise() %>%
  dplyr::mutate(
    Forecast_PLN   = pred_for(m_wig, m_cu, m_ag),
    Delta_vs_Base  = Forecast_PLN - baseline
  ) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(
    Baseline_PLN = round(baseline, 4),
    Forecast_PLN = round(Forecast_PLN, 4),
    Delta_PLN    = round(Delta_vs_Base, 4)
  ) %>%
  dplyr::select(Scenario, Baseline_PLN, Forecast_PLN, Delta_PLN) %>%
  dplyr::arrange(dplyr::desc(Delta_PLN))

scen_results

## # A tibble: 10 × 4
##    Scenario                              Baseline_PLN Forecast_PLN Delta_PLN
##    <chr>                                        <dbl>        <dbl>     <dbl>
##  1 All +10% (WIG20, Copper, Silver)             0.205        0.291    0.0865
##  2 Both metals +10%, WIG20 unchanged            0.205        0.274    0.0693
##  3 Silver +10%, WIG20 & Copper unchanged        0.205        0.242    0.0375
##  4 Copper +10%, WIG20 & Silver unchanged        0.205        0.236    0.0317
##  5 WIG20 +10%, metals unchanged                 0.205        0.222    0.0172
##  6 WIG20 -5%, metals unchanged                  0.205        0.196   -0.0086
##  7 Copper -5%, others unchanged                 0.205        0.189   -0.0159
##  8 Silver -5%, others unchanged                 0.205        0.186   -0.0188
##  9 Metals -5%, WIG20 unchanged                  0.205        0.170   -0.0346
## 10 All -5% (WIG20, Copper, Silver)              0.205        0.162   -0.0432

apply(scen_results, 1, function(r) {
  cat(
    sprintf(
      "Scenario: %s → forecast = %s PLN (change vs. baseline: %+0.2f PLN)\n",
      r[["Scenario"]], r[["Forecast_PLN"]], as.numeric(r[["Delta_PLN"]])
    )
  )
})

## Scenario: All +10% (WIG20, Copper, Silver) → forecast = 0.2912 PLN (change vs. baseline: +0.09 PLN)
## Scenario: Both metals +10%, WIG20 unchanged → forecast = 0.2740 PLN (change vs. baseline: +0.07 PLN)
## Scenario: Silver +10%, WIG20 & Copper unchanged → forecast = 0.2423 PLN (change vs. baseline: +0.04 PLN)
## Scenario: Copper +10%, WIG20 & Silver unchanged → forecast = 0.2365 PLN (change vs. baseline: +0.03 PLN)
## Scenario: WIG20 +10%, metals unchanged → forecast = 0.2219 PLN (change vs. baseline: +0.02 PLN)
## Scenario: WIG20 -5%, metals unchanged → forecast = 0.1961 PLN (change vs. baseline: -0.01 PLN)
## Scenario: Copper -5%, others unchanged → forecast = 0.1889 PLN (change vs. baseline: -0.02 PLN)
## Scenario: Silver -5%, others unchanged → forecast = 0.1860 PLN (change vs. baseline: -0.02 PLN)
## Scenario: Metals -5%, WIG20 unchanged → forecast = 0.1701 PLN (change vs. baseline: -0.03 PLN)
## Scenario: All -5% (WIG20, Copper, Silver) → forecast = 0.1615 PLN (change vs. baseline: -0.04 PLN)

## NULL

ggplot(scen_results, aes(x = reorder(Scenario, Delta_PLN), y = Delta_PLN)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "darkred") +
  labs(title = "Sensitivity Analysis – Scenario Impacts",
       x = "Scenario",
       y = "Change vs. Baseline (PLN)") +
  theme_minimal()

Model ETS

The exponential smoothing model ETS(A,N,N) applied to the monthly time series of KGHM stock prices represents a formulation with additive errors, no trend, and no seasonal component. This specification implies that the series is best characterized by a relatively stable level over time, with short-term fluctuations captured through additive noise. The smoothing parameter α is estimated at 0.9999, indicating that the model places nearly exclusive weight on the most recent observations, thereby adapting rapidly to changes in the underlying data. The initial level is set at 105.38, while the estimated standard deviation of the residuals (σ = 12.23) reflects moderate volatility in the series.

Model selection criteria, including AIC (1266.04), AICc (1266.24), and BIC (1274.60), suggest a parsimonious fit, although the presence of residual autocorrelation (ACF1 = 0.178) may point to remaining temporal dependencies not fully captured by the model. Forecast accuracy metrics such as RMSE (12.13), MAE (9.36), and MAPE (8.58%) indicate satisfactory predictive performance within the training sample.

The resulting forecast projects a continuation of the current level with widening confidence intervals over the five-month horizon, consistent with the absence of trend and seasonality in the model structure. This behavior underscores the model’s assumption of mean-reverting dynamics and highlights the increasing uncertainty associated with longer-term projections.

The five-month forecast projects a continuation of the current level of approximately 131.35 PLN, with uncertainty bands widening over time. By January 2026, the 95% confidence interval spans from roughly 77.8 PLN to 185.0 PLN, reflecting the inherent uncertainty of medium-horizon forecasts. Overall, the ETS model points to a mean-reverting behavior without strong directional bias, capturing stability in the central tendency but substantial volatility in the forecast distribution.

h <- 5

KGHM <- ts(df$kghm,
           start = c(year(min(df$Data)), month(min(df$Data))),
           frequency = 12)


model_ets   <- ets(KGHM)
forecast_ets <- forecast(model_ets, h = h)

future_dates <- seq(from = max(df$Data) %m+% months(1),
                    by   = "month",
                    length.out = h)

wyniki_ets <- data.frame(
  Data         = as.Date(future_dates),
  Prognoza_PLN = round(as.numeric(forecast_ets$mean), 2),
  Dolny_80_PLN = round(as.numeric(forecast_ets$lower[, 1]), 2),
  Gorny_80_PLN = round(as.numeric(forecast_ets$upper[, 1]), 2),
  Dolny_95_PLN = round(as.numeric(forecast_ets$lower[, 2]), 2),
  Gorny_95_PLN = round(as.numeric(forecast_ets$upper[,  2]), 2)
)


cat("Prognoza ETS obejmuje:",
    format(min(wyniki_ets$Data), "%Y-%m"), "→",
    format(max(wyniki_ets$Data), "%Y-%m"), "\n")

## Prognoza ETS obejmuje: 2025-09 → 2026-01

print(wyniki_ets, row.names = FALSE)

##        Data Prognoza_PLN Dolny_80_PLN Gorny_80_PLN Dolny_95_PLN Gorny_95_PLN
##  2025-09-01       131.35       115.68       147.02       107.38       155.32
##  2025-10-01       131.35       109.18       153.51        97.45       165.25
##  2025-11-01       131.35       104.20       158.50        89.83       172.87
##  2025-12-01       131.35       100.00       162.69        83.41       179.29
##  2026-01-01       131.35        96.31       166.39        77.75       184.95

plot(forecast_ets, main = "KGHM Forecast - ETS")

Monte Carlo

The Monte Carlo simulation applied to the linear regression model of KGHM stock prices provides a probabilistic forecast framework that accounts for parameter uncertainty. Drawing 1,000 samples from the multivariate normal distribution of the estimated coefficients, the simulation generates a distribution of possible future price paths over a five-month horizon. The input matrix for forecasting is constructed using the mean values of the explanatory variables, ensuring consistency with the baseline economic scenario.

The resulting forecast yields a central trajectory of KGHM prices, beginning at the last observed value and incrementally increasing to an expected level of 132.3 PLN by January 2026. The 95% confidence interval expands over time, reflecting the compounding uncertainty inherent in forward-looking projections. Specifically, the lower bound declines to 125.6 PLN while the upper bound rises to 138.62 PLN, indicating a plausible range of outcomes under the modeled assumptions.

The fitted component of the model is reconstructed from the cumulative sum of predicted changes, allowing for a seamless integration of historical dynamics with future expectations. The final visualization overlays the original time series, the fitted and forecasted trajectory, and the Monte Carlo-derived confidence bands, offering a comprehensive view of both model fit and forecast uncertainty.

set.seed(123)
Sigma <- vcov(model_lm2)
B <- mvrnorm(n = 1000, mu = coef(model_lm2), Sigma = Sigma)
h <- 5 

X_vars <- names(coef(model_lm2))[-1]
X0 <- colMeans(df[, X_vars], na.rm = TRUE)

X_new_matrix <- matrix(rep(as.numeric(X0), each = h), nrow = h)
X_new_matrix <- cbind(1, X_new_matrix)

sim_forecasts <- replicate(1000, {
  beta <- mvrnorm(1, coef(model_lm2), Sigma)
  as.numeric(X_new_matrix %*% beta)
})

forecast_matrix <- t(sim_forecasts)
mean_forecast <- colMeans(forecast_matrix)
lower_95 <- apply(forecast_matrix, 2, quantile, 0.025)
upper_95 <- apply(forecast_matrix, 2, quantile, 0.975)

last_price <- tail(df$kghm, 1)
forecast_price <- cumsum(c(last_price, mean_forecast))[-1]
lower_price <- cumsum(c(last_price, lower_95))[-1]
upper_price <- cumsum(c(last_price, upper_95))[-1]

# Adjusted changes
fitted_diff <- fitted(model_lm2)
# Level reconstruction with fitted
start_price <- df$kghm[1]
fitted_level <- cumsum(c(start_price, fitted_diff))[-1]
# Fitted + forecast
fitted_and_forecast <- ts(c(fitted_level, forecast_price),
                          start = start(KGHM),
                          frequency = frequency(KGHM))
# --- Chart ---
# Forecast time
time_start <- time(KGHM)[length(KGHM)]
forecast_time <- time_start + seq(1, h) / frequency(KGHM)

# Chart
plot(KGHM,
     xlim = c(start(KGHM)[1], forecast_time[h]),
     ylim = range(KGHM, lower_price, upper_price),
     main = "Monte Carlo Forecast with Fitted Model and Confidence Bands",
     ylab = "KGHM Price",
     xlab = "Time",
     col = "black",
     lwd = 1)

# Confidence Band
polygon(c(forecast_time, rev(forecast_time)),
        c(upper_price, rev(lower_price)),
        col = rgb(0, 0, 1, 0.2), border = NA)

# Line fitted + forecast
lines(fitted_and_forecast, col = "blue", lwd = 2)

# Actual data
lines(KGHM, col = "black")

# Legend
legend("topleft",
       legend = c("Original Data", "Fitted + Forecast", "95% CI (Monte Carlo)"),
       col = c("black", "blue", rgb(0, 0, 1, 0.2)),
       lwd = c(1, 2, NA),
       pch = c(NA, NA, 15),
       pt.cex = 2,
       bty = "n")

future_dates <- seq(
  from = max(df$Data) + lubridate::period(months = 1),
  by = "month",
  length.out = h
)


wyniki_mc <- data.frame(
  Data          = as.Date(future_dates),
  Prognoza_PLN  = round(as.numeric(forecast_price), 2),
  Dolna_95_PLN  = round(as.numeric(lower_price), 2),
  Gorna_95_PLN  = round(as.numeric(upper_price), 2)
)

cat("Prognoza obejmuje:", format(min(wyniki_mc$Data), "%Y-%m"),
    "→", format(max(wyniki_mc$Data), "%Y-%m"), "\n")

## Prognoza obejmuje: 2025-09 → 2026-01

print(wyniki_mc, row.names = FALSE)

##        Data Prognoza_PLN Dolna_95_PLN Gorna_95_PLN
##  2025-09-01       131.54       130.21       132.81
##  2025-10-01       131.73       129.07       134.26
##  2025-11-01       131.92       127.92       135.72
##  2025-12-01       132.11       126.78       137.18
##  2026-01-01       132.30       125.64       138.64

Summary

The analysis conducted provides a comprehensive assessment of the short-term determinants and forecasting potential of KGHM stock prices over the period 2015–2025. Preliminary stationarity tests, including the Augmented Dickey–Fuller and KPSS procedures, revealed that all variables in levels (KGHM, WIG20, copper, silver, and gold) were non-stationary, exhibiting stochastic trends inconsistent with the assumptions of standard econometric models. After first differencing, however, all series achieved stationarity, confirming that they are integrated of order one, I(1), and thus suitable for further modeling.

The ordinary least squares (OLS) regressions highlighted that the most important drivers of KGHM’s short-term stock returns are the domestic equity index WIG20 and the prices of copper and silver. These variables exhibited both statistical and economic significance, while gold prices were insignificant, despite KGHM’s role as a producer of this metal. This finding suggests that investors primarily value the company through its exposure to copper markets and overall equity market conditions, with silver serving as a secondary but relevant factor. The refined linear model incorporating WIG20, copper, and silver explained approximately 58% of the variation in returns, which is a relatively high explanatory power for financial return regressions. Forecasts derived from this model displayed short-term volatility but suggested a gradual upward tendency in price levels, reaching around 134–135 PLN over a five-month horizon.

In parallel, time-series forecasting methods were employed. Within the ARIMA framework, the ARIMA(1,1,2) specification proved superior, producing the lowest information criteria and error measures while showing no significant autocorrelation in residuals. This model’s forecasts indicated a stabilization of KGHM prices around 122–124 PLN, though accompanied by wide confidence intervals, reflecting the inherent uncertainty of financial markets. The ETS(A,N,N) model, based on exponential smoothing, suggested a stable price level around 131 PLN with gradually widening forecast intervals, consistent with the assumption of no underlying trend or seasonality. Furthermore, the Monte Carlo simulation applied to the linear regression model provided a probabilistic outlook by generating 1,000 simulated trajectories of future prices. The resulting distribution pointed to a central scenario of moderate appreciation, with prices reaching approximately 132 PLN by early 2026 and a 95% confidence interval spanning 126–139 PLN.

Diagnostic testing of the linear model confirmed its robustness. The Durbin–Watson test showed no evidence of serial correlation in residuals, while the Jarque–Bera and Shapiro–Wilk tests did not reject normality, suggesting that deviations from the Gaussian distribution were minor. The Breusch–Pagan test, however, indicated heteroskedasticity, meaning that the variance of residuals is not constant. This issue was mitigated by the use of heteroskedasticity-robust standard errors, ensuring the reliability of statistical inference. Graphical diagnostics supported these results, showing no major violations of model assumptions, aside from the presence of a few influential points with moderate leverage.

Overall, the findings demonstrate that KGHM’s short-term price dynamics are predominantly shaped by copper prices and domestic equity market performance, with silver playing a complementary role and gold remaining largely irrelevant. Forecasts from multiple approaches converge on a picture of relative stability, with prices fluctuating around 122–132 PLN, though subject to considerable uncertainty. The linear regression and Monte Carlo approaches point to a moderate upward trajectory, while ARIMA and ETS emphasize mean-reverting dynamics around stable levels. Taken together, the results underscore the importance of combining market and macro indicators with robust statistical techniques to generate meaningful short-term forecasts in the context of financial time-series volatility.

Short-Term Forecasting of KGHM Stock Prices: An Econometric and Simulation-Based Approach (2015–2025)

Piotr Łukowski

2025-08-25