This project was not only an academic exercise but also a reflection of my personal interests as a retail investor. I have been active in the markets for some time, and ORLEN has often caught my attention as one of the largest and most important companies in Poland. It plays a key role in the domestic economy, is exposed to international oil and gas markets, and at the same time strongly reflects the movements of the Polish stock exchange. Because of this, I started wondering whether ORLEN might be a good investment for me personally – and this became the starting point of the entire analysis.
The forecast horizon was set for September 2025, a time frame that is long enough to capture meaningful dynamics but short enough to be relevant from an investment perspective. The years 2022–2025 were chosen because they cover a period of high turbulence: geopolitical tensions, large swings in oil and gas prices, and rapid changes in global equity markets. For an investor, this is both a challenge and an opportunity – understanding how a stock like ORLEN reacts to such shocks can make the difference between a good and a bad decision.
That’s why I decided to use a wide range of statistical and econometric tools, not just the “standard” ones. Stationarity tests, regressions with lags, block significance analysis, conditional volatility models like ARIMA–GARCH, and stability checks such as Chow or Bai–Perron all serve a purpose: they help me see whether the observed relationships are real, reliable, and consistent over time. My assumption was simple – ORLEN should move very closely with the Polish equity index (WIG20), oil prices should matter but with a delay, and gas might have some effect, though probably weaker.
By systematically testing these ideas, I hoped to answer two questions: what truly drives ORLEN’s price, and can these relationships be used to forecast its value in the near future? The results, as shown later in the report, give not only statistical insight but also a practical hint for me – and maybe for other investors – when thinking about whether to buy ORLEN in 2025.
Orlenn <- readxl::read_excel("C:/Users/tomas/Desktop/Orlenfolder/Orlenn.xlsx")
WiG20_ <- readxl::read_excel("C:/Users/tomas/Desktop/Orlenfolder/WiG20..xlsx")
Gaz <- readxl::read_excel("C:/Users/tomas/Desktop/Orlenfolder/Gaz.xlsx")
Ropa <- readxl::read_excel("C:/Users/tomas/Desktop/Orlenfolder/Ropa.xlsx")
list(
ORLEN = head(Orlenn),
WIG20 = head(WiG20_),
GAS = head(Gaz),
OIL = head(Ropa)
)
## $ORLEN
## # A tibble: 6 × 2
## Data Cena
## <dttm> <dbl>
## 1 2025-08-01 00:00:00 22.8
## 2 2025-07-01 00:00:00 22.9
## 3 2025-06-01 00:00:00 22.4
## 4 2025-05-01 00:00:00 20.1
## 5 2025-04-01 00:00:00 18.6
## 6 2025-03-01 00:00:00 18.6
##
## $WIG20
## # A tibble: 6 × 2
## Data Cena
## <dttm> <dbl>
## 1 2025-08-01 00:00:00 792.
## 2 2025-07-01 00:00:00 806.
## 3 2025-06-01 00:00:00 777.
## 4 2025-05-01 00:00:00 757.
## 5 2025-04-01 00:00:00 748.
## 6 2025-03-01 00:00:00 735.
##
## $GAS
## # A tibble: 6 × 2
## Data Cena
## <dttm> <dbl>
## 1 2025-08-01 00:00:00 2.84
## 2 2025-07-01 00:00:00 3.11
## 3 2025-06-01 00:00:00 3.46
## 4 2025-05-01 00:00:00 3.45
## 5 2025-04-01 00:00:00 3.33
## 6 2025-03-01 00:00:00 4.12
##
## $OIL
## # A tibble: 6 × 2
## Data Cena
## <dttm> <dbl>
## 1 2025-08-01 00:00:00 66.9
## 2 2025-07-01 00:00:00 72.5
## 3 2025-06-01 00:00:00 67.6
## 4 2025-05-01 00:00:00 63.9
## 5 2025-04-01 00:00:00 63.1
## 6 2025-03-01 00:00:00 74.7
This section manages packages and libraries needed for the analysis. It starts by creating a list of all required packages (for data manipulation, visualization, time handling, and statistical modeling). Next, it checks which of them are not installed on the current machine and installs the missing ones with dependencies. Finally, it loads every package into the R session.
need <- c("tidyverse","lubridate","zoo","scales","ggplot2","stringr",
"tseries","urca","lmtest","sandwich","car","broom",
"forecast","ggcorrplot","ggrepel","readr","purrr")
to_install <- setdiff(need, rownames(installed.packages()))
if(length(to_install)) install.packages(to_install, dependencies=TRUE)
invisible(lapply(need, library, character.only=TRUE))
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'readr' was built under R version 4.4.3
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.3
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'zoo'
##
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Warning: package 'scales' was built under R version 4.4.3
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
## Warning: package 'tseries' was built under R version 4.4.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Warning: package 'lmtest' was built under R version 4.4.3
## Warning: package 'car' was built under R version 4.4.3
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
## Warning: package 'forecast' was built under R version 4.4.3
Sys.setenv(LANG='en_US.UTF-8')
try(suppressWarnings(Sys.setlocale(category='LC_ALL', locale='C')), silent=TRUE)
## [1] "C"
It creates output directories (outputs and outputs/plots) if they do not already exist, ensuring that all results and plots will be saved in an organized way. The variable target_month specifies the forecast horizon, here set to September 2025. The usd_lab function sets up a formatting style for dollar values -rounded to 1 unit. Threshold values: alpha_wig20, alpha_ropa, alpha_gaz are defined for statistical significance tests, meaning they will be used later to decide if the effects of WIG20, oil, or gas are statistically significant.
dirs <- c("outputs","outputs/plots")
invisible(lapply(dirs, function(d) if(!dir.exists(d)) dir.create(d, recursive=TRUE)))
target_month <- "2025-09"
usd_lab <- scales::label_dollar(accuracy=1)
alpha_wig20 <- 0.05; alpha_ropa <- 0.10; alpha_gaz <- 0.05 # significance thresholds
This part of the code introduces several helper functions that will be reused throughout the analysis. They cover common tasks such as cleaning raw data, calculating returns, formatting output, and saving plots. For example, parse_num ensures that numeric values are properly converted even if they use different decimal separators or contain spaces, while to_date makes sure that date values are correctly recognized regardless of whether they are stored as numbers, strings, or already in date format. The function mk_ret computes logarithmic returns in percentage points, which is a standard measure in financial time series. To streamline reporting, show_and_save and safe_png allow plots to be displayed and saved directly to files in a safe way, avoiding errors if something goes wrong. The header function improves console readability by printing clear section dividers. Finally, clean_prices standardizes input price datasets by detecting the appropriate date and price columns, converting them to monthly values, and returning a tidy dataset ready for analysis.
parse_num <- function(x){
s <- gsub("\\s","",as.character(x))
both <- grepl(",",s)&grepl("\\.",s); s[both] <- gsub("\\.","",s[both])
s <- gsub(",",".",s)
suppressWarnings(as.numeric(s))
}
to_date <- function(x){
if(inherits(x,"Date")) return(x)
if(is.numeric(x)) return(as.Date(x, origin="1899-12-30"))
suppressWarnings(as.Date(as.character(x)))
}
mk_ret <- function(x) c(NA, diff(log(x))*100) # log-zwroty (p.p.)
show_and_save <- function(p, path, w=1100, h=700){print(p); png(path,width=w,height=h); print(p); dev.off(); message("Saved: ", path)}
safe_png <- function(path, w=1100, h=700, expr){png(path,width=w,height=h); on.exit(dev.off(), add=TRUE); force(expr)}
header <- function(txt){cat("\n", paste(rep("=", nchar(txt)+4), collapse=""), "\n ", txt, "\n", paste(rep("=", nchar(txt)+4), collapse=""), "\n", sep="")}
clean_prices <- function(df){
nms <- names(df)
dcol <- nms[stringr::str_which(tolower(nms),"date|data|ym|month")][1]
pcol <- nms[stringr::str_which(tolower(nms),"cena|price|close|zamkn")][1]
if(is.na(dcol)||is.na(pcol)) stop("Brak kolumny daty/ceny.")
tibble(Date=to_date(df[[dcol]]), price_raw=parse_num(df[[pcol]])) |>
drop_na() |> mutate(ym=zoo::as.yearmon(Date)) |>
group_by(ym) |> summarize(price=last(na.omit(price_raw)), .groups="drop") |>
arrange(ym)
}
This section defines robust versions of the Breusch–Pagan and White tests, which are used later to check for heteroskedasticity in the regression models of ORLEN returns against factors like WIG20, oil, and gas. The Breusch–Pagan test verifies whether the variance of residuals depends on the explanatory variables, while the White test is a more general version that also accounts for non-linearities and interactions. In this context, both tests help assess whether model errors are homoskedastic; if they are not, standard OLS inference may be unreliable. This ensures that the later forecasts are based on models whose statistical assumptions have been properly validated.
bp_test_safe <- function(model){
mf <- model.frame(model); X <- model.matrix(model, data=mf); u2 <- residuals(model)^2
Z <- X[, setdiff(colnames(X),"(Intercept)"), drop=FALSE]; Z <- cbind(Intercept=1, Z)
fit <- lm.fit(x=Z, y=u2)
rss <- sum(fit$residuals^2); tss <- sum( (u2 - mean(u2))^2 )
R2 <- 1 - rss/tss; df <- fit$rank - 1; LM <- length(u2) * R2
structure(list(statistic=c(LM=LM), parameter=c(df=df),
p.value=pchisq(LM, df=df, lower.tail=FALSE),
method="Breusch–Pagan LM (aux: u^2 ~ X)",
data.name=deparse(formula(model))), class="htest")
}
white_test_safe <- function(model, interactions=TRUE){
mf <- model.frame(model); X <- model.matrix(model, data=mf); u2 <- residuals(model)^2
X1 <- X[, setdiff(colnames(X),"(Intercept)"), drop=FALSE]
if(ncol(X1)==0) stop("No regressors.")
Z <- X1
Z <- cbind(Z, X1^2)
if(interactions && ncol(X1)>=2){
for(i in 1:(ncol(X1)-1)) for(j in (i+1):ncol(X1)) Z <- cbind(Z, X1[,i]*X1[,j])
}
keep <- apply(Z, 2, function(v) all(is.finite(v)) && sd(v)!=0); Z <- Z[, keep, drop=FALSE]
Z <- cbind(Intercept=1, Z)
fit <- lm.fit(x=Z, y=u2)
rss <- sum(fit$residuals^2); tss <- sum( (u2 - mean(u2))^2 )
R2 <- 1 - rss/tss; df <- fit$rank - 1; LM <- length(u2) * R2
structure(list(statistic=c(LM=LM), parameter=c(df=df),
p.value=pchisq(LM, df=df, lower.tail=FALSE),
method=paste0("White LM (", if(interactions) "z interakcjami" else "bez interakcji", ")"),
data.name=deparse(formula(model))), class="htest")
}
This section defines functions for testing the stationarity of time series using the Augmented Dickey–Fuller (ADF) and Phillips–Perron (PP) tests. The adf_grid function runs the ADF test under different model specifications and lag lengths, choosing the best option, while pp_info runs the Phillips–Perron test as a complementary check. In this analysis, these tests are applied to ORLEN, WIG20, oil, and gas returns to determine whether the series are stationary. This step is essential because many econometric models, including those later used for forecasting, require stationarity for valid inference.
adf_grid <- function(x, types=c("none","drift"), max_lag=NULL){
xx <- na.omit(x); n <- length(xx); if(n < 8) return(list(reject=NA,tstat=NA,cv5=NA,method="too_short"))
if(is.null(max_lag)) max_lag <- min(10, max(0, floor(n^0.25)))
best <- NULL
for(tp in types){
key <- switch(tp, none="tau1", drift="tau2", trend="tau3")
for(lg in 0:max_lag){
u <- try(suppressWarnings(urca::ur.df(xx, type=tp, lags=lg)), silent=TRUE)
if(inherits(u,"try-error")) next
t <- suppressWarnings(as.numeric(u@teststat[key])); c5 <- suppressWarnings(as.numeric(u@cval[key,"5pct"]))
rej <- is.finite(t)&&is.finite(c5)&&(t<c5)
cand <- list(reject=rej, tstat=t, cv5=c5, method=sprintf("ur.df(%s,lags=%s)",tp,lg))
if(is.null(best) || (rej && (!best$reject || t < best$tstat))) best <- cand
}
}
if(is.null(best)) return(list(reject=NA,tstat=NA,cv5=NA,method="fail"))
best
}
pp_info <- function(x){
out <- try(suppressWarnings(tseries::pp.test(na.omit(x))), silent=TRUE)
if(inherits(out,"try-error")) return(list(p=NA, reject=NA))
list(p=out$p.value, reject=out$p.value<0.05)
}
In this step the raw datasets for ORLEN, WIG20, GAS and OIL get cleaned and put into the same shape. The function clean_prices() picks out the right date and price columns, fixes number formats, and converts everything to monthly frequency so the series line up properly. First, the code builds a combined dataset of price levels and makes a line chart called “Monthly Levels (USD)”. This lets us see how each series has moved over time and compare their overall scales — for example, WIG20 values are much higher than the others. That chart is saved as levels_ALL_USD.png. Then the script switches to returns, which are calculated with mk_ret() as month-over-month log changes. These are stored in a tidy table and also exported to CSV. A second plot, “Logarithmic Returns (m/m, p.p.)”, shows how volatile each series is, with GAS in particular showing strong swings. That chart is saved as returns_ALL.png.
stopifnot(all(c("Gaz","Orlenn","WiG20_","Ropa") %in% ls()))
ORLEN <- clean_prices(Orlenn); WIG20 <- clean_prices(WiG20_); GAZ <- clean_prices(Gaz); ROPA <- clean_prices(Ropa)
## Levels – plot
lev_long <- list(
ORLEN = ORLEN |> transmute(ym, Seria="ORLEN", Poziom=price),
WIG20 = WIG20 |> transmute(ym, Seria="WIG20", Poziom=price),
GAZ = GAZ |> transmute(ym, Seria="GAZ", Poziom=price),
ROPA = ROPA |> transmute(ym, Seria="ROPA", Poziom=price)
) |> bind_rows()
show_and_save(
ggplot(lev_long, aes(as.Date(ym), Poziom, colour=Seria))+
geom_line(linewidth=.9)+
labs(title="Monthly Levels (USD)", x=NULL, y="USD")+
scale_y_continuous(labels=usd_lab)+theme_minimal(),
"outputs/plots/levels_ALL_USD.png", 1200, 720
)
## Saved: outputs/plots/levels_ALL_USD.png
## consolidated levels and returns
levels_all <- list(
ORLEN = select(ORLEN, ym, ORLEN=price),
WIG20 = select(WIG20, ym, WIG20=price),
GAZ = select(GAZ, ym, GAZ=price),
ROPA = select(ROPA, ym, ROPA=price)
) |> reduce(~ full_join(.x,.y,by="ym")) |> arrange(ym)
df <- levels_all |>
mutate(rORLEN=mk_ret(ORLEN), rWIG20=mk_ret(WIG20), rGAZ=mk_ret(GAZ), rROPA=mk_ret(ROPA)) |>
drop_na()
write_csv(df, "outputs/_returns_ORLEN_WIG20_ROPA_GAZ_USD.csv")
## Returns – plot
ret_long <- df |> select(ym, rORLEN, rWIG20, rGAZ, rROPA) |>
pivot_longer(-ym, names_to="Seria", values_to="Zwrot")
show_and_save(
ggplot(ret_long, aes(as.Date(ym), Zwrot, colour=Seria))+
geom_line(linewidth=.8)+
labs(title="Logarithmic Returns (m/m, p.p.)", x=NULL, y="p.p.")+
theme_minimal(),
"outputs/plots/returns_ALL.png", 1200, 720
)
## Saved: outputs/plots/returns_ALL.png
Stationarity table and heatmap The results show that ORLEN, WIG20, and oil returns behave well for modeling (they are stationary), while gas does not. The heatmap makes this instantly visible: three green blocks and one brown for gas. We know we can safely use WIG20 and oil in models with ORLEN, while gas is unreliable in its raw form and probably should be left out. -Scatterplots The ORLEN vs WIG20 scatter is very tight – points align around an upward line, proving ORLEN strongly follows the index. With oil, the relationship exists but is weaker, points are more spread out. With gas, the scatter is basically random. WIG20 is clearly the main driver of ORLEN. Oil adds some influence, but gas is not useful for prediction here.
Correlation heatmap The numbers back this up: ORLEN–WIG20 correlation is 0.74 (very strong), with oil it drops to 0.25, with gas 0.22. It quantifies the story – WIG20 is essential, oil is optional, gas is negligible. This stage filters the variables and highlights what really matters. Instead of guessing, we now see that ORLEN’s price moves mostly with WIG20 and somewhat with oil, while gas contributes almost nothing. That clarity sets the direction for the forecasting model and increases trust in the later prediction.
rows_stat <- list(
list(nm="rWIG20", x=df$rWIG20),
list(nm="rROPA", x=df$rROPA),
list(nm="rORLEN", x=df$rORLEN),
list(nm="rGAZ", x=levels_all$GAZ[match(df$ym, levels_all$ym)]) # poziom GAZ
)
stationarity_tbl <- purrr::map_dfr(rows_stat, function(o){
A <- adf_grid(o$x); PP <- pp_info(o$x)
tibble(Variable=o$nm, ADF_stat=A$tstat, ADF_cv5=A$cv5, ADF_reject=A$reject,
PP_p=PP$p, PP_reject=PP$reject,
ADF_PP_stationary = ifelse(is.na(A$reject)&is.na(PP$reject), NA, (isTRUE(A$reject)|isTRUE(PP$reject))))
})
header("STATIONARITY (ADF + PP) – results"); print(stationarity_tbl, n=Inf)
##
## ============================================
## STATIONARITY (ADF + PP) <U+2013> results
## ============================================
## # A tibble: 4 x 7
## Variable ADF_stat ADF_cv5 ADF_reject PP_p PP_reject ADF_PP_stationary
## <chr> <dbl> <dbl> <lgl> <dbl> <lgl> <lgl>
## 1 rWIG20 NA -1.95 FALSE 0.01 TRUE TRUE
## 2 rROPA NA -1.95 FALSE 0.01 TRUE TRUE
## 3 rORLEN NA -1.95 FALSE 0.01 TRUE TRUE
## 4 rGAZ NA -1.95 FALSE 0.526 FALSE FALSE
write_csv(stationarity_tbl, "outputs/stationarity_ADF_PP_USD.csv")
p_stat <- stationarity_tbl |>
mutate(stat = if_else(ADF_PP_stationary,"YES","NO",missing="NO"),
Variable=factor(Variable, levels=c("rWIG20","rROPA","rORLEN","rGAZ")))
show_and_save(
ggplot(p_stat, aes("ADF+PP", Variable, fill=stat))+
geom_tile(color="white")+geom_text(aes(label=stat))+
scale_fill_manual(values=c("YES"="#7AC943","NO"="#C86B00"))+
labs(title="Stationarity (ADF OR PP)", x=NULL, y=NULL)+theme_minimal(),
"outputs/plots/stationarity_heatmap.png", 840, 560
)
## Saved: outputs/plots/stationarity_heatmap.png
# ACF/PACF
acf_pacf <- function(vec, name){
safe_png(paste0("outputs/plots/acf_pacf_",name,".png"), 1100, 450, {
par(mfrow=c(1,2)); acf(vec, main=paste("ACF", name)); pacf(vec, main=paste("PACF", name)); par(mfrow=c(1,1))
})
}
acf_pacf(df$rORLEN,"rORLEN"); acf_pacf(df$rWIG20,"rWIG20"); acf_pacf(df$rROPA,"rROPA"); acf_pacf(df$rGAZ,"rGAZ")
# Scatter + smooth
for(v in c("rWIG20","rROPA","rGAZ")){
p <- ggplot(df, aes_string(v, "rORLEN"))+geom_point(alpha=.8)+geom_smooth(method="lm", se=FALSE)+
labs(title=paste("rORLEN vs", v), x=v, y="rORLEN (p.p.)")+theme_minimal()
show_and_save(p, paste0("outputs/plots/scatter_",v,"_vs_rORLEN.png"), 900, 650)
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## i Please use tidy evaluation idioms with `aes()`.
## i See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## Saved: outputs/plots/scatter_rWIG20_vs_rORLEN.png
## `geom_smooth()` using formula = 'y ~ x'`geom_smooth()` using formula = 'y ~ x'
## Saved: outputs/plots/scatter_rROPA_vs_rORLEN.png
## `geom_smooth()` using formula = 'y ~ x'`geom_smooth()` using formula = 'y ~ x'
## Saved: outputs/plots/scatter_rGAZ_vs_rORLEN.png
# Correlations
Rmat <- df |> select(rORLEN, rWIG20, rROPA, rGAZ) |> cor()
show_and_save(ggcorrplot::ggcorrplot(Rmat, lab=TRUE, type="lower", title="Return correlations (%)"),
"outputs/plots/corr_returns.png", 820, 560)
## Saved: outputs/plots/corr_returns.png
The ARIMA-GARCH(1,1) analysis, complemented by the ARCH-LM test, was introduced to explicitly account for conditional heteroskedasticity, a defining feature of financial time series often missed by standard regression models. While OLS and dynamic specifications captured mean relationships between ORLEN returns, WIG20, and oil prices, they assumed constant error variance, which is unrealistic in financial markets characterized by volatility clustering. The ARCH-LM test applied to ORLEN returns did not initially indicate strong ARCH effects, yet the GARCH(1,1) model confirmed significant persistence in volatility, with ARCH (α ≈ 0.20) and GARCH (β ≈ 0.85) parameters both highly significant and summing close to one, implying long memory in volatility shocks.
By incorporating WIG20 and oil returns into the mean equation, the model aligned volatility dynamics with broader market forces, producing a path where conditional volatility gradually declined from above 5.3 (p.p.) in 2022 toward 4.86 (p.p.) by 2025, indicating stabilization in ORLEN’s risk environment. At the same time, fitted ARMA mean dynamics revealed short-term oscillations but within progressively narrower volatility bounds. Diagnostic tests supported the robustness of the model, with Ljung–Box and post-GARCH ARCH-LM tests confirming no residual autocorrelation or heteroskedasticity, and stability checks ruling out structural misspecifications. This stage therefore added a crucial risk dimension to the project: beyond point forecasts of price levels, it quantified the evolving uncertainty surrounding them, showing that ORLEN’s volatility is both persistent and market-driven, yet projected to stabilize over the medium term, strengthening the credibility and practical value of the overall forecasting framework.
# Requirements
need_garch <- c("rugarch","FinTS")
to_install <- setdiff(need_garch, rownames(installed.packages()))
if(length(to_install)) install.packages(to_install, dependencies=TRUE)
invisible(lapply(need_garch, library, character.only=TRUE))
## Warning: package 'rugarch' was built under R version 4.4.3
## Loading required package: parallel
##
## Attaching package: 'rugarch'
## The following object is masked from 'package:purrr':
##
## reduce
## Warning: package 'FinTS' was built under R version 4.4.3
##
## Attaching package: 'FinTS'
## The following object is masked from 'package:forecast':
##
## Acf
header("CONDITIONAL VOLATILITY – ARIMA-GARCH(1,1) + ARCH-LM")
##
## ==============================================================
## CONDITIONAL VOLATILITY <U+2013> ARIMA-GARCH(1,1) + ARCH-LM
## ==============================================================
## ARCH-LM test on ORLEN returns
rt <- na.omit(df$rORLEN)
lags_arch <- max(4L, floor(log(length(rt))*4L))
header("ARCH-LM TEST (Engle)")
##
## ========================
## ARCH-LM TEST (Engle)
## ========================
arch_lm <- FinTS::ArchTest(rt, lags = lags_arch)
print(arch_lm)
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: rt
## Chi-squared = 16.529, df = 15, p-value = 0.3478
## Mean model selection for GARCH
suppressWarnings({
arma_sel <- forecast::auto.arima(rt, seasonal = FALSE, stepwise = FALSE, approximation = FALSE)
})
p <- forecast::arimaorder(arma_sel)[1] # AR order
q <- forecast::arimaorder(arma_sel)[3] # MA order
cat(sprintf("\nSelected mean model for GARCH: ARMA(%d,%d)\n", p, q))
##
## Selected mean model for GARCH: ARMA(0,0)
##GARCH(1,1)
use_exog <- TRUE
xreg <- NULL
if(use_exog){
tmp <- df |> dplyr::select(rORLEN, rWIG20, rROPA) |> tidyr::drop_na()
rt2 <- tmp$rORLEN
xreg <- as.matrix(tmp[, c("rWIG20","rROPA")])
# Re-fit ARMA order on aligned series (optional, keeps orders consistent)
suppressWarnings({
arma_sel <- forecast::auto.arima(rt2, xreg = xreg, seasonal = FALSE, stepwise = FALSE, approximation = FALSE)
})
p <- forecast::arimaorder(arma_sel)[1]
q <- forecast::arimaorder(arma_sel)[3]
cat(sprintf("Aligned sample size: %d | ARMA(%d,%d) with exogenous regressors\n", length(rt2), p, q))
}
## Aligned sample size: 43 | ARMA(3,0) with exogenous regressors
## GARCH -> ARMA(p,q) in the mean, sGARCH(1,1) in the variance
spec <- rugarch::ugarchspec(
variance.model = list(model = "sGARCH", garchOrder = c(1,1)),
mean.model = list(armaOrder = c(p, q), include.mean = TRUE, external.regressors = xreg),
distribution.model = "norm" # you may try "std" or "ged" if tails look heavy
)
if(isTRUE(use_exog) && !is.null(xreg)){
fit <- rugarch::ugarchfit(spec, data = rt2, solver = "hybrid")
} else {
fit <- rugarch::ugarchfit(spec, data = rt, solver = "hybrid")
}
## Warning in .sgarchfit(spec = spec, data = data, out.sample = out.sample, :
## ugarchfit-->waring: using less than 100 data
## points for estimation
header("GARCH(1,1) – Model summary")
##
## =====================================
## GARCH(1,1) <U+2013> Model summary
## =====================================
show(fit)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(3,0,0)
## Distribution : norm
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu -0.006857 1.10615 -0.006199 0.995054
## ar1 -0.221413 0.15121 -1.464234 0.143130
## ar2 0.098248 0.15832 0.620556 0.534892
## ar3 0.365378 0.14987 2.437945 0.014771
## mxreg1 1.040049 0.11359 9.155917 0.000000
## mxreg2 0.217586 0.11846 1.836747 0.066247
## omega 3.543582 2.83289 1.250872 0.210981
## alpha1 0.000000 0.03799 0.000000 1.000000
## beta1 0.850071 0.09851 8.629264 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu -0.006857 1.569625 -0.004369 0.996514
## ar1 -0.221413 0.184533 -1.199858 0.230194
## ar2 0.098248 0.093606 1.049593 0.293905
## ar3 0.365378 0.141884 2.575188 0.010019
## mxreg1 1.040049 0.149927 6.937018 0.000000
## mxreg2 0.217586 0.110042 1.977296 0.048008
## omega 3.543582 4.456649 0.795123 0.426542
## alpha1 0.000000 0.021010 0.000000 1.000000
## beta1 0.850071 0.204354 4.159792 0.000032
##
## LogLikelihood : -132.4715
##
## Information Criteria
## ------------------------------------
##
## Akaike 6.5801
## Bayes 6.9487
## Shibata 6.5111
## Hannan-Quinn 6.7160
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.0231 0.8792
## Lag[2*(p+q)+(p+q)-1][8] 3.6798 0.9169
## Lag[4*(p+q)+(p+q)-1][14] 7.7165 0.4225
## d.o.f=3
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.08228 0.7742
## Lag[2*(p+q)+(p+q)-1][5] 0.66668 0.9292
## Lag[4*(p+q)+(p+q)-1][9] 1.66775 0.9407
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.01758 0.500 2.000 0.8945
## ARCH Lag[5] 0.34260 1.440 1.667 0.9287
## ARCH Lag[7] 1.18103 2.315 1.543 0.8824
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 2.3395
## Individual Statistics:
## mu 0.15213
## ar1 0.66499
## ar2 0.18789
## ar3 0.05828
## mxreg1 0.27096
## mxreg2 0.06501
## omega 0.07564
## alpha1 0.24648
## beta1 0.07387
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 2.1 2.32 2.82
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.0408 0.3046
## Negative Sign Bias 0.1035 0.9181
## Positive Sign Bias 0.9882 0.3293
## Joint Effect 1.6458 0.6491
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 17.93 0.5271
## 2 30 23.28 0.7636
## 3 40 47.23 0.1716
## 4 50 41.88 0.7545
##
##
## Elapsed time : 0.09179091
sigma_t <- as.numeric(rugarch::sigma(fit))
mu_t <- as.numeric(rugarch::fitted(fit))
resid_t <- as.numeric(rugarch::residuals(fit, standardize = TRUE)) # standardized
if(isTRUE(use_exog) && !is.null(xreg)){
# Use rows that survived drop_na above
aligned_idx <- df$ym[stats::complete.cases(df[, c("rORLEN","rWIG20","rROPA")])]
} else {
aligned_idx <- df$ym[!is.na(df$rORLEN)]
}
vol_tbl <- tibble::tibble(
ym = aligned_idx,
mu = mu_t,
sigma = sigma_t,
z = resid_t
)
readr::write_csv(vol_tbl, "outputs/garch_orlen_volatility.csv")
# ---------- E) Plots ----------
## Volatility path
show_and_save(
ggplot(vol_tbl, aes(as.Date(ym), sigma)) +
geom_line(linewidth = 0.9, colour = "steelblue") +
labs(title = "ORLEN – Conditional volatility from GARCH(1,1)",
x = NULL, y = "σ_t") +
theme_minimal(),
"outputs/plots/garch_volatility_path.png", 1200, 720
)
## Saved: outputs/plots/garch_volatility_path.png
## Actual vs fitted mean (ARMA part)
show_and_save(
ggplot(vol_tbl, aes(as.Date(ym))) +
geom_line(aes(y = mu), colour = "steelblue", linewidth = 0.9) +
labs(title = "ORLEN – Fitted mean (ARMA) from ARIMA-GARCH",
x = NULL, y = "Fitted mean (p.p.)") +
theme_minimal(),
"outputs/plots/garch_fitted_mean.png", 1100, 650
)
## Saved: outputs/plots/garch_fitted_mean.png
## ACF + QQ
safe_png("outputs/plots/garch_stdresid_acf.png", 1000, 500, {
par(mfrow = c(1,2))
acf(vol_tbl$z, main = "ACF of standardized residuals")
acf(vol_tbl$z^2, main = "ACF of squared standardized residuals")
par(mfrow = c(1,1))
})
safe_png("outputs/plots/garch_stdresid_qq.png", 800, 600, {
qqnorm(vol_tbl$z, main = "QQ-plot: standardized residuals")
qqline(vol_tbl$z, col = "steelblue")
})
## Post-fit ARCH-LM on standardized residuals
header("ARCH-LM on standardized residuals (post-GARCH)")
##
## ==================================================
## ARCH-LM on standardized residuals (post-GARCH)
## ==================================================
post_arch <- FinTS::ArchTest(vol_tbl$z, lags = lags_arch)
print(post_arch)
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: vol_tbl$z
## Chi-squared = 12.259, df = 15, p-value = 0.6593
# ---------- F) One-step-ahead volatility nowcast (last point) ----------
last_sigma <- tail(vol_tbl$sigma, 1)
cat(sprintf("\nLatest conditional volatility (σ_t) from GARCH: %.4f (p.p.)\n", last_sigma))
##
## Latest conditional volatility (<U+03C3>_t) from GARCH: 4.8623 (p.p.)
Candidate Models- TOP 20 by BIC. The table lists various combinations of lags for WIG20 (pW) and oil (pR), along with the number of observations, the Bayesian Information Criterion -BIC, and significance values. Lower BIC indicates a better trade-off between explanatory power and model simplicity.
Final Model Selection The script identifies the optimal model that meets the significance thresholds. The chosen specification includes:
WIG20 with a lag of 2 periods,
Oil with a lag of 6 periods. For this model, the significance levels are strong:
pW_min ~ 4.33e-10 -WIG20 highly significant
pR_min ~ 0.026 - oil significant at the 5% level
These results confirm that ORLEN’s stock price is not immediately affected by movements in WIG20 and oil markets; instead, the influence materializes with a time lag. WIG20 shows a relatively quick effect (2 periods), while oil exerts a more delayed but persistent impact (6 periods). For the project, this step is critical because it transforms earlier exploratory findings into a robust econometric model. It not only demonstrates that WIG20 and oil are key drivers but also quantifies when their impact is observable. This provides a reliable foundation for forecasting ORLEN’s future values. This stage establishes a concrete model specification: ORLEN returns = f(WIG20 (lag 2), Oil (lag 6)). It directly feeds into the forecasting process and ensures that the prediction for September 2025 is grounded in statistically validated dynamics.
Lw_max <- 3; Lr_max <- 6
dfL <- df
for(v in c("rWIG20","rROPA","rORLEN")){
for(L in 1:max(Lw_max, Lr_max)) dfL[[paste0(v,"_L",L)]] <- dplyr::lag(dfL[[v]], L)
}
robust_block_p <- function(mod, terms){
if(length(terms)==0) return(NA_real_)
vcv <- list(HC3 = sandwich::vcovHC(mod, type="HC3"),
NW = sandwich::NeweyWest(mod, prewhite=TRUE),
HAC = sandwich::vcovHAC(mod, prewhite=TRUE))
pp <- sapply(vcv, function(V){
out <- try(car::linearHypothesis(mod, paste0(terms," = 0"), vcov.=V), silent=TRUE)
if(inherits(out,"try-error")) NA_real_ else as.numeric(out[2,"Pr(>F)"])
})
suppressWarnings(min(pp, na.rm=TRUE))
}
cand <- list()
for(pW in 0:Lw_max){
for(pR in 0:Lr_max){
termsW <- c("rWIG20", if(pW>=1) paste0("rWIG20_L",1:pW) else NULL)
termsR <- c("rROPA", if(pR>=1) paste0("rROPA_L",1:pR) else NULL)
form <- as.formula(paste("rORLEN ~ rGAZ +", paste(c("rORLEN_L1", termsW, termsR), collapse=" + ")))
need_cols <- unique(c("rORLEN","rGAZ","rORLEN_L1",termsW,termsR))
dat <- dfL |> select(all_of(need_cols)) |> drop_na()
if(nrow(dat) < 25) next
fit <- try(lm(form, data=dat), silent=TRUE); if(inherits(fit,"try-error")) next
b <- BIC(fit)
pWmin <- robust_block_p(fit, termsW)
pRmin <- robust_block_p(fit, termsR)
cand[[length(cand)+1]] <- tibble(pW=pW, pR=pR, BIC=b, pW_min=pWmin, pR_min=pRmin,
n=nrow(dat), model=list(fit), data=list(dat),
termsW=list(termsW), termsR=list(termsR))
}
}
cand_df <- bind_rows(cand) |> arrange(BIC)
header("CANDIDATE OVERVIEW (TOP 20 by BIC)")
##
## ======================================
## CANDIDATE OVERVIEW (TOP 20 by BIC)
## ======================================
print(cand_df |> select(pW, pR, n, BIC, pW_min, pR_min) |> head(20), n=20)
## # A tibble: 20 x 6
## pW pR n BIC pW_min pR_min
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 2 6 37 264. 4.33e-10 0.0260
## 2 0 6 37 267. 1.47e- 9 0.0128
## 3 1 6 37 267. 3.67e-10 0.155
## 4 3 6 37 267. 2.20e-10 0.260
## 5 2 5 38 270. 1.92e- 8 0.0626
## 6 0 5 38 271. 1.54e- 9 0.207
## 7 1 5 38 273. 2.29e- 8 0.286
## 8 0 4 39 273. 6.70e-10 0.0273
## 9 3 5 38 274. 1.25e- 8 0.119
## 10 3 0 40 274. 1.58e-11 0.433
## 11 2 4 39 275. 5.96e-11 0.00279
## 12 1 4 39 275. 3.91e-11 0.00216
## 13 3 1 40 276. 4.77e-11 0.352
## 14 0 3 40 277. 2.20e-12 0.142
## 15 1 3 40 278. 4.92e-13 0.00961
## 16 3 4 39 278. 3.03e-11 0.0549
## 17 2 3 40 280. 7.16e-11 0.0504
## 18 3 2 40 280. 1.46e-10 0.473
## 19 2 0 41 281. 9.09e-10 0.0795
## 20 2 1 41 283. 1.83e-12 0.126
ok <- cand_df |> filter(pW_min < alpha_wig20, pR_min < alpha_ropa)
if(nrow(ok) > 0){
best <- ok |> arrange(BIC) |> slice(1)
cat("\nSelected model meeting significance WIG20 & ROPA.\n")
} else {
best <- cand_df |> arrange(pR_min, BIC) |> slice(1)
cat("\nWarning: no model with both WIG20 and OIL jointly significant at the specified thresholds.\nSelected the best compromise (minimum p-value for OIL).\n")
}
##
## Selected model meeting significance WIG20 & ROPA.
m_dyn <- best$model[[1]]; dat_dyn <- best$data[[1]]
termsW_best <- best$termsW[[1]]; termsR_best <- best$termsR[[1]]
cat("\nLags of the selected model – WIG20:", best$pW, " ROPA:", best$pR, "\n")
##
## Lags of the selected model <U+2013> WIG20: 2 ROPA: 6
cat("BIC:", best$BIC, " pW_min:", signif(best$pW_min,4), " pR_min:", signif(best$pR_min,4), " n:", best$n, "\n")
## BIC: 264.3382 pW_min: 4.334e-10 pR_min: 0.02605 n: 37
# Simultaneous (for comparison)
m_sim <- lm(rORLEN ~ rWIG20 + rROPA + rGAZ, data=df)
The analysis employed OLS models in two variants – simultaneous and dynamic – to quantify the relationships between ORLEN’s returns and the market (WIG20) and commodity factors (oil, gas). The use of classical and robust standard deviations (HC3, Newey–West, HAC) allowed for control of potential heteroscedasticity and autocorrelation, ensuring greater reliability of the conclusions. The results of the simultaneous model clearly indicate that the WIG20 is the main determinant of ORLEN’s short-term returns – the index coefficient is strong and highly statistically significant, with the explained variance reaching approximately 56%. Despite the positive signs of the parameters, oil and gas do not achieve significance, meaning their impact is not immediate. Consequently, the simultaneous model accurately demonstrates ORLEN’s strong link with the broader market, but it does not account for delayed effects and loses some of its predictive power.
In the dynamic model, which introduces lags for the WIG20 and crude oil and additionally controls for the autoregression of ORLEN returns, the fit improves significantly – the adjusted coefficient of determination increases to approximately 66%, and standard errors decrease. It turns out that the WIG20’s impact is not limited to the current period but also manifests itself with a two-month lag, which is significant at the 5% level. Although individual lags remain insignificant, crude oil, in the block approach, increases the model’s stability and provides additional information about the market environment, while gas has a marginal and uncertain impact. These findings confirm that the ORLEN price is strongly coupled with the Polish stock market, and the mechanism of effect transmission extends over time, which increases the importance of dynamic analysis for forecasting.
Incorporating this stage into the project is significant because it provides empirical evidence of the dominant role of the WIG20 in shaping ORLEN prices and simultaneously demonstrates that the dynamic model better reflects reality, capturing lagged effects and a greater portion of volatility. For the forecast for September 2025, this means relying on statistically justified relationships, and not just a simple adjustment of momentary correlations, which increases the reliability and practical usefulness of the forecast.
ols_report <- function(mod, name){
header(paste("OLS –", name, "(summary)")); print(summary(mod))
header(paste("OLS –", name, "ANOVA")); print(anova(mod))
cat("\nR^2:", summary(mod)$r.squared, " Adj.R^2:", summary(mod)$adj.r.squared,
" AIC:", AIC(mod), " BIC:", BIC(mod), "\n")
cat("\n95% confidence intervals (classical SE):\n"); print(confint(mod))
vcv <- list(Classic = vcov(mod),
HC3 = sandwich::vcovHC(mod, type="HC3"),
NeweyWest = sandwich::NeweyWest(mod, prewhite=TRUE),
HAC = sandwich::vcovHAC(mod, prewhite=TRUE))
all_tab <- purrr::imap_dfr(vcv, function(V, tag){
lmtest::coeftest(mod, vcov.=V) |> broom::tidy() |> mutate(SE_type=tag)
})
header(paste("p-values wg SE –", name))
print(all_tab |> select(SE_type, term, estimate, std.error, statistic, p.value), n=Inf)
write_csv(all_tab, paste0("outputs/ols_pvalues_", gsub(" ","_",name), ".csv"))
hc3 <- all_tab |> filter(SE_type=="HC3", term!="(Intercept)") |>
mutate(conf.low = estimate - 1.96*std.error,
conf.high= estimate + 1.96*std.error)
show_and_save(
ggplot(hc3, aes(reorder(term, estimate), estimate))+
geom_point(size=2)+
geom_errorbar(aes(ymin=conf.low, ymax=conf.high), width=.15)+
coord_flip()+ labs(title=paste0("Coef-plot (HC3) – ", name), x=NULL, y="β (p.p.)")+ theme_minimal(),
paste0("outputs/plots/coefplot_HC3_", gsub(" ","_",name), ".png"), 900, 600
)
}
ols_report(m_sim, "simultaneous model")
##
## =============================================
## OLS <U+2013> simultaneous model (summary)
## =============================================
##
## Call:
## lm(formula = rORLEN ~ rWIG20 + rROPA + rGAZ, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.2384 -3.1832 -0.1425 3.7954 10.3659
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03133 0.90933 -0.034 0.973
## rWIG20 0.96598 0.14226 6.790 4.17e-08 ***
## rROPA 0.18402 0.14318 1.285 0.206
## rGAZ 0.06592 0.04432 1.487 0.145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.879 on 39 degrees of freedom
## Multiple R-squared: 0.5898, Adjusted R-squared: 0.5583
## F-statistic: 18.69 on 3 and 39 DF, p-value: 1.125e-07
##
##
## =========================================
## OLS <U+2013> simultaneous model ANOVA
## =========================================
## Analysis of Variance Table
##
## Response: rORLEN
## Df Sum Sq Mean Sq F value Pr(>F)
## rWIG20 1 1796.75 1796.75 51.9805 1.106e-08 ***
## rROPA 1 65.30 65.30 1.8892 0.1771
## rGAZ 1 76.48 76.48 2.2125 0.1449
## Residuals 39 1348.07 34.57
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R^2: 0.5898286 Adj.R^2: 0.558277 AIC: 280.1735 BIC: 288.9795
##
## 95% confidence intervals (classical SE):
## 2.5 % 97.5 %
## (Intercept) -1.8706279 1.8079705
## rWIG20 0.6782278 1.2537333
## rROPA -0.1055824 0.4736260
## rGAZ -0.0237192 0.1555529
##
## ==============================================
## p-values wg SE <U+2013> simultaneous model
## ==============================================
## # A tibble: 16 x 6
## SE_type term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Classic (Intercept) -0.0313 0.909 -0.0345 0.973
## 2 Classic rWIG20 0.966 0.142 6.79 0.0000000417
## 3 Classic rROPA 0.184 0.143 1.29 0.206
## 4 Classic rGAZ 0.0659 0.0443 1.49 0.145
## 5 HC3 (Intercept) -0.0313 1.02 -0.0307 0.976
## 6 HC3 rWIG20 0.966 0.194 4.99 0.0000131
## 7 HC3 rROPA 0.184 0.150 1.22 0.228
## 8 HC3 rGAZ 0.0659 0.0509 1.29 0.203
## 9 NeweyWest (Intercept) -0.0313 0.819 -0.0383 0.970
## 10 NeweyWest rWIG20 0.966 0.134 7.23 0.0000000104
## 11 NeweyWest rROPA 0.184 0.0948 1.94 0.0595
## 12 NeweyWest rGAZ 0.0659 0.0409 1.61 0.115
## 13 HAC (Intercept) -0.0313 0.725 -0.0432 0.966
## 14 HAC rWIG20 0.966 0.148 6.53 0.0000000956
## 15 HAC rROPA 0.184 0.106 1.73 0.0911
## 16 HAC rGAZ 0.0659 0.0377 1.75 0.0886
## Saved: outputs/plots/coefplot_HC3_simultaneous_model.png
ols_report(m_dyn, "dynamic model (selected)")
##
## ===================================================
## OLS <U+2013> dynamic model (selected) (summary)
## ===================================================
##
## Call:
## lm(formula = form, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7805 -1.6582 -0.2001 3.4982 8.2095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.80651 1.18391 -1.526 0.1401
## rGAZ 0.09204 0.05455 1.687 0.1045
## rORLEN_L1 -0.25031 0.17923 -1.397 0.1753
## rWIG20 1.14321 0.17734 6.446 1.15e-06 ***
## rWIG20_L1 0.44811 0.29242 1.532 0.1385
## rWIG20_L2 0.36685 0.17219 2.131 0.0436 *
## rROPA 0.13631 0.16431 0.830 0.4149
## rROPA_L1 0.13929 0.17883 0.779 0.4436
## rROPA_L2 -0.16948 0.18096 -0.937 0.3583
## rROPA_L3 0.03087 0.16154 0.191 0.8500
## rROPA_L4 0.11074 0.17100 0.648 0.5234
## rROPA_L5 -0.08837 0.19296 -0.458 0.6511
## rROPA_L6 0.07228 0.18196 0.397 0.6947
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.4 on 24 degrees of freedom
## Multiple R-squared: 0.7733, Adjusted R-squared: 0.66
## F-statistic: 6.824 on 12 and 24 DF, p-value: 3.533e-05
##
##
## ===============================================
## OLS <U+2013> dynamic model (selected) ANOVA
## ===============================================
## Analysis of Variance Table
##
## Response: rORLEN
## Df Sum Sq Mean Sq F value Pr(>F)
## rGAZ 1 61.37 61.37 2.1042 0.15984
## rORLEN_L1 1 3.78 3.78 0.1296 0.72201
## rWIG20 1 2027.45 2027.45 69.5163 1.507e-08 ***
## rWIG20_L1 1 90.17 90.17 3.0918 0.09144 .
## rWIG20_L2 1 94.93 94.93 3.2548 0.08378 .
## rROPA 1 13.99 13.99 0.4796 0.49524
## rROPA_L1 1 26.76 26.76 0.9176 0.34765
## rROPA_L2 1 40.18 40.18 1.3777 0.25200
## rROPA_L3 1 0.01 0.01 0.0005 0.98312
## rROPA_L4 1 18.27 18.27 0.6264 0.43644
## rROPA_L5 1 6.72 6.72 0.2304 0.63558
## rROPA_L6 1 4.60 4.60 0.1578 0.69472
## Residuals 24 699.96 29.17
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R^2: 0.7733426 Adj.R^2: 0.6600138 AIC: 241.7854 BIC: 264.3382
##
## 95% confidence intervals (classical SE):
## 2.5 % 97.5 %
## (Intercept) -4.24999209 0.6369672
## rGAZ -0.02054577 0.2046318
## rORLEN_L1 -0.62022794 0.1196132
## rWIG20 0.77719320 1.5092290
## rWIG20_L1 -0.15540742 1.0516364
## rWIG20_L2 0.01147160 0.7222256
## rROPA -0.20281068 0.4754356
## rROPA_L1 -0.22979073 0.5083796
## rROPA_L2 -0.54297214 0.2040137
## rROPA_L3 -0.30252009 0.3642647
## rROPA_L4 -0.24218155 0.4636535
## rROPA_L5 -0.48661145 0.3098702
## rROPA_L6 -0.30326417 0.4478161
##
## ====================================================
## p-values wg SE <U+2013> dynamic model (selected)
## ====================================================
## # A tibble: 52 x 6
## SE_type term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Classic (Intercept) -1.81 1.18 -1.53 0.140
## 2 Classic rGAZ 0.0920 0.0546 1.69 0.105
## 3 Classic rORLEN_L1 -0.250 0.179 -1.40 0.175
## 4 Classic rWIG20 1.14 0.177 6.45 0.00000115
## 5 Classic rWIG20_L1 0.448 0.292 1.53 0.138
## 6 Classic rWIG20_L2 0.367 0.172 2.13 0.0436
## 7 Classic rROPA 0.136 0.164 0.830 0.415
## 8 Classic rROPA_L1 0.139 0.179 0.779 0.444
## 9 Classic rROPA_L2 -0.169 0.181 -0.937 0.358
## 10 Classic rROPA_L3 0.0309 0.162 0.191 0.850
## 11 Classic rROPA_L4 0.111 0.171 0.648 0.523
## 12 Classic rROPA_L5 -0.0884 0.193 -0.458 0.651
## 13 Classic rROPA_L6 0.0723 0.182 0.397 0.695
## 14 HC3 (Intercept) -1.81 1.63 -1.11 0.279
## 15 HC3 rGAZ 0.0920 0.102 0.905 0.374
## 16 HC3 rORLEN_L1 -0.250 0.159 -1.58 0.128
## 17 HC3 rWIG20 1.14 0.200 5.71 0.00000702
## 18 HC3 rWIG20_L1 0.448 0.320 1.40 0.175
## 19 HC3 rWIG20_L2 0.367 0.320 1.15 0.263
## 20 HC3 rROPA 0.136 0.163 0.834 0.412
## 21 HC3 rROPA_L1 0.139 0.276 0.505 0.618
## 22 HC3 rROPA_L2 -0.169 0.295 -0.574 0.571
## 23 HC3 rROPA_L3 0.0309 0.198 0.156 0.877
## 24 HC3 rROPA_L4 0.111 0.335 0.331 0.744
## 25 HC3 rROPA_L5 -0.0884 0.330 -0.267 0.791
## 26 HC3 rROPA_L6 0.0723 0.205 0.353 0.727
## 27 NeweyWest (Intercept) -1.81 0.723 -2.50 0.0197
## 28 NeweyWest rGAZ 0.0920 0.0602 1.53 0.139
## 29 NeweyWest rORLEN_L1 -0.250 0.0998 -2.51 0.0193
## 30 NeweyWest rWIG20 1.14 0.146 7.84 0.0000000448
## 31 NeweyWest rWIG20_L1 0.448 0.201 2.22 0.0358
## 32 NeweyWest rWIG20_L2 0.367 0.159 2.30 0.0302
## 33 NeweyWest rROPA 0.136 0.0835 1.63 0.116
## 34 NeweyWest rROPA_L1 0.139 0.187 0.746 0.463
## 35 NeweyWest rROPA_L2 -0.169 0.149 -1.14 0.265
## 36 NeweyWest rROPA_L3 0.0309 0.110 0.282 0.781
## 37 NeweyWest rROPA_L4 0.111 0.165 0.671 0.509
## 38 NeweyWest rROPA_L5 -0.0884 0.202 -0.436 0.666
## 39 NeweyWest rROPA_L6 0.0723 0.0886 0.816 0.422
## 40 HAC (Intercept) -1.81 0.912 -1.98 0.0593
## 41 HAC rGAZ 0.0920 0.0744 1.24 0.228
## 42 HAC rORLEN_L1 -0.250 0.125 -2.01 0.0563
## 43 HAC rWIG20 1.14 0.184 6.23 0.00000196
## 44 HAC rWIG20_L1 0.448 0.254 1.76 0.0905
## 45 HAC rWIG20_L2 0.367 0.192 1.91 0.0687
## 46 HAC rROPA 0.136 0.0987 1.38 0.180
## 47 HAC rROPA_L1 0.139 0.236 0.590 0.561
## 48 HAC rROPA_L2 -0.169 0.188 -0.903 0.375
## 49 HAC rROPA_L3 0.0309 0.130 0.238 0.814
## 50 HAC rROPA_L4 0.111 0.205 0.539 0.595
## 51 HAC rROPA_L5 -0.0884 0.245 -0.361 0.721
## 52 HAC rROPA_L6 0.0723 0.110 0.656 0.518
## Saved: outputs/plots/coefplot_HC3_dynamic_model_(selected).png
# Effect of Oil Lags – bars
ropa_terms <- grep("^rROPA(_L\\d+)?$", names(coef(m_dyn)), value=TRUE)
if(length(ropa_terms)){
b <- broom::tidy(m_dyn) |> filter(term %in% ropa_terms)
b <- b |> mutate(lag = ifelse(grepl("_L", term), as.integer(sub(".*_L","",term)), 0)) |> arrange(lag)
show_and_save(
ggplot(b, aes(factor(lag), estimate))+
geom_col()+
geom_errorbar(aes(ymin=estimate-1.96*std.error, ymax=estimate+1.96*std.error), width=.2)+
labs(title="OIL – distribution of lagged effects", x="Lag (months)", y="β (p.p.)")+
theme_minimal(),
"outputs/plots/ropa_lag_effects.png", 900, 600
)
}
## Warning: There was 1 warning in `mutate()`.
## i In argument: `lag = ifelse(...)`.
## Caused by warning in `ifelse()`:
## ! NAs introduced by coercion
## Saved: outputs/plots/ropa_lag_effects.png
The results clearly show that the WIG20 index is significant in the block approach (p < 0.001), confirming its key role in shaping the ORLEN share price. Oil also reaches significance (p ≈ 0.026), although its effect is diffuse and difficult to capture through individual lags – this indicates that oil exerts a longer-term, cumulative impact on prices. Gas, on the other hand, did not reach significance (p ≈ 0.37), indicating that its volatility does not systematically translate into changes in ORLEN share prices.
The presented heatmap visualization facilitates quick comparisons – significant factors are marked in green, and insignificant ones in brown. This step provides the project with empirical confirmation of the hierarchy of factors: the stock market (WIG20) is the most important source of volatility, oil plays a secondary but significant role, and gas can be omitted from the forecast. The collective materiality analysis thus strengthens the credibility of the final forecast for September 2025, basing it on factors that actually determine the performance of the ORLEN share price.
pW <- robust_block_p(m_dyn, termsW_best)
pR <- robust_block_p(m_dyn, termsR_best)
get_p_gaz <- function(mod){
vc <- list(HC3=sandwich::vcovHC(mod, type="HC3"),
NW =sandwich::NeweyWest(mod, prewhite=TRUE),
HAC=sandwich::vcovHAC(mod, prewhite=TRUE))
pp <- sapply(vc, function(V){
tt <- lmtest::coeftest(mod, vcov.=V) |> broom::tidy()
tt$p.value[tt$term=="rGAZ"][1]
})
max(pp, na.rm=TRUE)
}
pG <- get_p_gaz(m_dyn)
sig_tbl <- tibble(Variable=c("rWIG20","rROPA","rGAZ"),
p_value=c(pW,pR,pG),
Significant=c(pW<alpha_wig20, pR<alpha_ropa, pG<alpha_gaz)) |>
mutate(Significant=ifelse(Significant,"YES","NO"))
header("SIGNIFICANCE (block-wise, model selected)"); print(sig_tbl, n=Inf)
##
## =============================================
## SIGNIFICANCE (block-wise, model selected)
## =============================================
## # A tibble: 3 x 3
## Variable p_value Significant
## <chr> <dbl> <chr>
## 1 rWIG20 4.33e-10 YES
## 2 rROPA 2.60e- 2 YES
## 3 rGAZ 3.74e- 1 NO
write_csv(sig_tbl, "outputs/significance_selected_model.csv")
show_and_save(
ggplot(sig_tbl, aes("HC3/NW/HAC (blok)", Variable, fill=Significant))+
geom_tile(color="white")+geom_text(aes(label=Significant))+
scale_fill_manual(values=c("YES"="#7AC943","NO"="#C86B00"))+
labs(title="Statistical significance (block-wise, model selected)", x=NULL, y=NULL)+theme_minimal(),
"outputs/plots/significance_selected_heatmap.png", 820, 540
)
## Saved: outputs/plots/significance_selected_heatmap.png
Breusch–Pagan Test and White’s Test Both tests detect heteroscedasticity, or the variable variance of residuals. Heteroscedasticity underestimates or overestimates standard errors, which can lead to a false assessment of variable significance. The results of both tests (p-value > 0.25 in the SIM model and > 0.42 in the DYN model) did not provide grounds for rejecting the null hypothesis of homoscedasticity. This means that the variance of residuals is stable, and conclusions about parameter significance are credible.
Durbin–Watson Test This test examines the autocorrelation of residuals, which, if present, would indicate the omission of significant dynamics in the model. The DW statistic values in both models (2.31 in SIM, 1.94 in DYN) and high p-values (> 0.41) indicate that autocorrelation of residuals does not occur. This means that the dynamics of the variables were well captured, especially in the dynamic model.
Normality Tests of the Residual Distribution (Jarque–Bera, Shapiro–Wilk) Normality of the residuals is significant for classical significance tests and the construction of confidence intervals. The results of both tests for both models (p-value ~0.54 for SIM, ~0.44–0.69 for DYN) do not provide grounds for rejecting the hypothesis of normality. The residuals are therefore consistent with theoretical expectations, which strengthens confidence in the drawn conclusions.
Influence Analysis (Hat Values, Cook’s Distance, Influence Plots) Influence analysis was used to identify potential outliers or points of significant influence. Although several observations showed elevated diagnostic statistics, none were dominant enough to significantly distort the model results.
Summary of Diagnostic Results Both models—the simultaneous and dynamic—meet the key assumptions of classical linear regression: no heteroscedasticity, no residual autocorrelation, and a normal error distribution. However, the dynamic model achieves a better fit (higher R² and lower BIC) and more realistically reflects the lagged impact of the WIG20 and crude oil on the ORLEN price. The diagnostics confirm that forecasts based on this model are statistically reliable and can be used in further forecasting analysis.
safe_png("outputs/plots/ols_diag_SIM.png", 1200, 500, {
par(mfrow=c(1,3)); plot(m_sim, which=1); plot(m_sim, which=2); acf(residuals(m_sim), main="ACF reszt (SIM)"); par(mfrow=c(1,1))
})
safe_png("outputs/plots/ols_diag_DYN.png", 1200, 500, {
par(mfrow=c(1,3)); plot(m_dyn, which=1); plot(m_dyn, which=2); acf(residuals(m_dyn), main="ACF reszt (DYN)"); par(mfrow=c(1,1))
})
suppressWarnings(safe_png("outputs/plots/avPlots_SIM.png", 1100, 850, { car::avPlots(m_sim) }))
suppressWarnings(safe_png("outputs/plots/influence_SIM.png", 900, 700, { car::influencePlot(m_sim, id.method="identify", main="Influence plot (SIM)") }))
## StudRes Hat CookD
## 3 2.0588023 0.20569936 0.25337966
## 22 -2.2695200 0.06920027 0.08652391
## 32 -2.1480431 0.12353727 0.14879967
## 39 0.5773768 0.20011173 0.02121239
header("DIAGNOSTYKA testy – SIM")
##
## ==================================
## DIAGNOSTYKA testy <U+2013> SIM
## ==================================
print(bp_test_safe(m_sim)); print(white_test_safe(m_sim, TRUE))
##
## Breusch<U+2013>Pagan LM (aux: u^2 ~ X)
##
## data: rORLEN ~ rWIG20 + rROPA + rGAZ
## LM = 4.0452, df = 3, p-value = 0.2566
##
## White LM (z interakcjami)
##
## data: rORLEN ~ rWIG20 + rROPA + rGAZ
## LM = 10.494, df = 9, p-value = 0.312
print(lmtest::dwtest(m_sim)); print(tseries::jarque.bera.test(residuals(m_sim))); print(shapiro.test(residuals(m_sim)))
##
## Durbin-Watson test
##
## data: m_sim
## DW = 2.3189, p-value = 0.8488
## alternative hypothesis: true autocorrelation is greater than 0
##
## Jarque Bera Test
##
## data: residuals(m_sim)
## X-squared = 1.1851, df = 2, p-value = 0.5529
##
## Shapiro-Wilk normality test
##
## data: residuals(m_sim)
## W = 0.97727, p-value = 0.5429
print(tryCatch(car::vif(m_sim), error=function(e) "VIF: n/d"))
## rWIG20 rROPA rGAZ
## 1.028777 1.026285 1.010859
header("DIAGNOSTYKA testy – DYN")
##
## ==================================
## DIAGNOSTYKA testy <U+2013> DYN
## ==================================
print(bp_test_safe(m_dyn)); print(white_test_safe(m_dyn, TRUE))
##
## Breusch<U+2013>Pagan LM (aux: u^2 ~ X)
##
## data: rORLEN ~ rGAZ + rORLEN_L1 + rWIG20 + rWIG20_L1 + rWIG20_L2 + rROPA + rROPA_L1 + rROPA_L2 + rROPA_L3 + rROPA_L4 + rROPA_L5 + rROPA_L6
## LM = 5.1562, df = 12, p-value = 0.9526
##
## White LM (z interakcjami)
##
## data: rORLEN ~ rGAZ + rORLEN_L1 + rWIG20 + rWIG20_L1 + rWIG20_L2 + rROPA + rROPA_L1 + rROPA_L2 + rROPA_L3 + rROPA_L4 + rROPA_L5 + rROPA_L6
## LM = 37, df = 36, p-value = 0.4226
print(lmtest::dwtest(m_dyn)); print(tseries::jarque.bera.test(residuals(m_dyn))); print(shapiro.test(residuals(m_dyn)))
##
## Durbin-Watson test
##
## data: m_dyn
## DW = 1.9434, p-value = 0.4161
## alternative hypothesis: true autocorrelation is greater than 0
##
## Jarque Bera Test
##
## data: residuals(m_dyn)
## X-squared = 0.73232, df = 2, p-value = 0.6934
##
## Shapiro-Wilk normality test
##
## data: residuals(m_dyn)
## W = 0.97102, p-value = 0.4368
print(tryCatch(car::vif(m_dyn), error=function(e) "VIF: n/d"))
## rGAZ rORLEN_L1 rWIG20 rWIG20_L1 rWIG20_L2 rROPA rROPA_L1 rROPA_L2
## 1.293976 3.496704 1.399024 3.774111 1.397844 1.339042 1.548098 1.531447
## rROPA_L3 rROPA_L4 rROPA_L5 rROPA_L6
## 1.248412 1.405639 1.544980 1.474249
The stability of the estimated parameters was formally evaluated using three complementary procedures: the Chow test, the CUSUM test based on recursive residuals, and Bai–Perron multiple structural break tests. These methods are designed to detect potential parameter instability or structural changes in the underlying regression model, which could undermine the reliability of inference and forecasting.
The Chow test reported a test statistic of F = 1.2547 with a p-value of 0.6885. As the p-value is well above conventional significance thresholds (0.01–0.10), the null hypothesis of parameter stability cannot be rejected. This indicates that the estimated regression coefficients are stable across subsamples, with no evidence of structural shifts in the underlying relationships.
The CUSUM test, which is based on the cumulative sum of recursive residuals, also confirmed parameter stability. The reported statistic (S = 0.67764, p = 0.2809) showed no significant deviations from the null, supporting the view that the regression coefficients evolve consistently over time. Graphical inspection of the CUSUM plot further corroborated the absence of systematic drifts or breaks in the recursive residuals.
Finally, the Bai–Perron multiple breakpoint procedure was employed to detect possible structural breaks endogenously. Based on the Bayesian Information Criterion (BIC), the optimal number of breaks was selected as m = 0, meaning that no statistically significant breakpoints were identified in the dynamic model. Consequently, confidence intervals for break dates were not computed.
Taken together, the results provide strong evidence in favor of parameter stability. Neither the Chow nor the CUSUM test suggested instability, and the Bai–Perron procedure confirmed the absence of statistically relevant structural breaks. This outcome strengthens the credibility of the estimated model by ensuring that the regression coefficients remain consistent over the sample period, thereby improving the robustness of inference and the reliability of subsequent forecasts.
library(strucchange)
## Warning: package 'strucchange' was built under R version 4.4.3
##
## Attaching package: 'strucchange'
## The following object is masked from 'package:stringr':
##
## boundary
# Chow test
header("CHOW TEST – Stability of parameters")
##
## ==============================================
## CHOW TEST <U+2013> Stability of parameters
## ==============================================
chow_test <- sctest(m_dyn, type = "Chow")
print(chow_test)
##
## M-fluctuation test
##
## data: m_dyn
## f(efp) = 1.2547, p-value = 0.6885
# CUSUM test
header("CUSUM TEST – Recursive residuals")
##
## ===========================================
## CUSUM TEST <U+2013> Recursive residuals
## ===========================================
cusum_test <- efp(formula(m_dyn), data = dat_dyn, type = "Rec-CUSUM")
print(sctest(cusum_test))
##
## Recursive CUSUM test
##
## data: cusum_test
## S = 0.67764, p-value = 0.2809
safe_png("outputs/plots/cusum_test.png", 1000, 600, {
plot(cusum_test, main = "CUSUM Test – recursive residuals")
})
# Breakpoints (Bai–Perron)
header("BREAKPOINTS – Bai–Perron multiple structural breaks")
##
## =====================================================================
## BREAKPOINTS <U+2013> Bai<U+2013>Perron multiple structural breaks
## =====================================================================
# helper: count regressors and choose h
num_regressors <- function(form, data){
X <- model.matrix(form, data)
max(0L, ncol(X) - 1L)
}
choose_h <- function(form, data, frac_min = 0.15, buffer = 2L){
n <- nrow(na.omit(model.frame(form, data)))
k <- num_regressors(form, data)
h_by_k <- k + 1L + buffer
h_by_frac <- ceiling(frac_min * n)
h <- max(h_by_k, h_by_frac)
h <- min(h, n - (k + 2L))
if(!is.finite(h) || h <= k) h <- k + 2L
list(h = h, n = n, k = k)
}
# full dynamic model
form_full <- formula(m_dyn)
ch <- choose_h(form_full, dat_dyn, frac_min = 0.15, buffer = 2L)
cat(sprintf("Full model: n=%d, k=%d, chosen h=%d\n", ch$n, ch$k, ch$h))
## Full model: n=37, k=12, chosen h=15
bp_full <- try(breakpoints(form_full, data = dat_dyn, h = ch$h), silent = TRUE)
use_compact <- inherits(bp_full, "try-error")
# compact model if full fails
if(use_compact){
cat("Too many regressors for breakpoints (full model). Trying compact specification…\n")
form_compact <- as.formula("rORLEN ~ rORLEN_L1 + rWIG20 + rWIG20_L1 + rROPA")
ch2 <- choose_h(form_compact, dat_dyn, frac_min = 0.15, buffer = 2L)
cat(sprintf("Compact model: n=%d, k=%d, chosen h=%d\n", ch2$n, ch2$k, ch2$h))
bp_try <- try(breakpoints(form_compact, data = dat_dyn, h = ch2$h), silent = TRUE)
if(!inherits(bp_try, "try-error")){
bp <- bp_try; used <- "compact"
} else {
bp <- NULL
}
} else {
bp <- bp_full; used <- "dynamic_full"
}
# fallback – breakpoints in residual mean
if(is.null(bp)){
cat("Falling back to breaks in residual mean (intercept-only on residuals)…\n")
u <- residuals(m_dyn)
dat_res <- tibble(u = as.numeric(u))
h_res <- max(5L, ceiling(0.15 * nrow(dat_res)))
bp <- breakpoints(u ~ 1, data = dat_res, h = h_res)
used <- "residual_mean"
}
# reporting + plots
report_breaks <- function(bp, label = used, dates = NULL){
bic_vals <- BIC(bp)
m_star <- which.min(bic_vals) - 1L
header(sprintf("BREAKPOINTS SUMMARY (%s)", label))
cat("BIC values by number of breaks (m):",
paste0(seq_along(bic_vals)-1, ":", round(bic_vals,1), collapse=" "), "\n")
cat("Selected number of breaks (m*):", m_star, "\n\n")
safe_png(paste0("outputs/plots/breakpoints_", label, "_BIC.png"), 1100, 650, {
plot(bp, main = paste("Breakpoints – BIC profile (", label, ")"))
})
if(m_star > 0 && !all(is.na(bp$breakpoints))){
bp_star <- breakpoints(bp$model, breaks = m_star, h = bp$h, data = bp$call$data)
ci <- try(confint(bp_star), silent = TRUE)
safe_png(paste0("outputs/plots/breakpoints_", label, "_CI.png"), 1100, 650, {
plot(bp_star)
if(!inherits(ci,"try-error")) lines(ci, col = "steelblue")
title(paste("Breakpoints with 95% CIs (", label, ")"))
})
br_idx <- if(inherits(ci,"try-error")) bp$breakpoints else ci$breakpoints
cat("Breakpoints at observations:", paste(na.omit(br_idx), collapse=", "), "\n")
if(!is.null(dates) && length(dates) >= max(na.omit(br_idx),0)){
cat("Breakpoints at dates:", paste(format(dates[na.omit(br_idx)]), collapse=", "), "\n")
}
} else {
cat("No structural breaks selected by BIC (m* = 0). Confidence intervals not computed.\n")
}
}
dates_dyn <- dat_dyn$ym %||% dat_dyn$Date %||% NULL
## Warning: Unknown or uninitialised column: `ym`.
## Warning: Unknown or uninitialised column: `Date`.
report_breaks(bp, label = used, dates = dates_dyn)
##
## ======================================
## BREAKPOINTS SUMMARY (dynamic_full)
## ======================================
## BIC values by number of breaks (m): 0:264.3
## Selected number of breaks (m*): 0
## No structural breaks selected by BIC (m* = 0). Confidence intervals not computed.
In the final stage of the analysis, an ARIMA model was used to forecast the ORLEN share price in USD through September 2025. The time series was constructed based on monthly quotations, and then an automated ARIMA model was fitted, which optimally accounts for both the trend, seasonality, and internal data dynamics. The forecast was performed for a six-month futures horizon, allowing for an estimate of the value for the point in time we are interested in – September 2025.
orlen_ts <- ts(levels_all$ORLEN,
start=c(as.numeric(format(as.Date(min(levels_all$ym)),"%Y")),
as.numeric(format(as.Date(min(levels_all$ym)),"%m"))),
frequency=12)
m_arima <- suppressWarnings(forecast::auto.arima(orlen_ts, stepwise=FALSE, approximation=FALSE))
fc_h <- max(0, 12*(as.integer(substr(target_month,1,4)) -
as.integer(format(as.Date(max(levels_all$ym)),"%Y"))) +
(as.integer(substr(target_month,6,7)) -
as.integer(format(as.Date(max(levels_all$ym)),"%m"))))
fc <- if(fc_h>0) forecast::forecast(m_arima, h=fc_h) else forecast::forecast(m_arima, h=1)
future_dates <- seq(from = as.Date(max(levels_all$ym)) %m+% months(1),
by="month", length.out = length(fc$mean))
fc_tbl <- tibble(Month = zoo::as.yearmon(future_dates),
ORLEN_fc = round(as.numeric(fc$mean), 2),
Lo95 = round(as.numeric(fc$lower[,"95%"]), 2),
Hi95 = round(as.numeric(fc$upper[,"95%"]), 2))
write_csv(fc_tbl, "outputs/ORLEN_forecast_to_target_USD.csv")
header("PROGNOZA ORLEN (USD) – tabela"); print(fc_tbl, n=Inf)
##
## ========================================
## PROGNOZA ORLEN (USD) <U+2013> tabela
## ========================================
## # A tibble: 1 x 4
## Month ORLEN_fc Lo95 Hi95
## <yearmon> <dbl> <dbl> <dbl>
## 1 Sep 2025 21.8 19.0 24.7
show_and_save(
ggplot()+
geom_line(data=tibble(ym=levels_all$ym, ORLEN=levels_all$ORLEN),
aes(as.Date(ym), ORLEN, group=1), colour="black", linewidth=.9)+
geom_line(data=fc_tbl, aes(as.Date(Month), ORLEN_fc, group=1),
colour="steelblue", linewidth=1)+
labs(title="ORLEN (USD) – historia i prognoza do IX-2025 (ARIMA)", x=NULL, y="USD")+
scale_y_continuous(labels=usd_lab)+theme_minimal(),
"outputs/plots/forecast_ORLEN_path_USD.png", 1200, 720
)
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?
## Saved: outputs/plots/forecast_ORLEN_path_USD.png
target_y <- fc_tbl |> filter(Month==zoo::as.yearmon(target_month)) |> pull(ORLEN_fc)
show_and_save(
ggplot()+
geom_line(data=tibble(ym=levels_all$ym, ORLEN=levels_all$ORLEN),
aes(as.Date(ym), ORLEN, group=1), colour="grey30", linewidth=.8)+
{ if(length(target_y)) geom_vline(xintercept = as.Date(zoo::as.yearmon(target_month)),
linetype="dashed", colour="firebrick") }+
{ if(length(target_y)) geom_point(aes(x=as.Date(zoo::as.yearmon(target_month)), y=target_y),
colour="firebrick", size=3) }+
{ if(length(target_y)) ggrepel::geom_label_repel(
data = tibble(x=as.Date(zoo::as.yearmon(target_month)), y=target_y,
lab=sprintf("ORLEN IX-2025: %s USD",
format(target_y, big.mark=" ", decimal.mark=","))),
aes(x=x, y=y, label=lab),
colour="firebrick", fill="white", segment.color="firebrick", seed=1) }+
labs(title="ORLEN – punktowa prognoza na SEPTEMBER 2025 (USD)", x=NULL, y="USD")+
scale_y_continuous(labels=usd_lab)+theme_minimal(),
"outputs/plots/forecast_ORLEN_sep2025_USD.png", 1050, 680
)
## Saved: outputs/plots/forecast_ORLEN_sep2025_USD.png
if(length(target_y)){
cat("\nPrognozowany ORLEN (USD) na SEPTEMBER 2025:",
format(target_y, big.mark=" ", decimal.mark=","), "USD\n")
} else {
cat("\nSeptember 2025 is already in the data or beyond the forecast horizon – check 'ORLEN_forecast_to_target_USD.csv'.\n")
}
##
## Prognozowany ORLEN (USD) na SEPTEMBER 2025: 21,85 USD
message("Done ✅ Reports and plots saved in 'outputs/' oraz 'outputs/plots/'.")
## Done <U+2705> Reports and plots saved in 'outputs/' oraz 'outputs/plots/'.
The Monte Carlo simulation was added as the final step of the project to go beyond a single-point ARIMA forecast and show the full range of possible futures for ORLEN’s price. A point forecast on its own – even with a 95% confidence band – can feel overly rigid, while in reality financial markets are full of randomness and unexpected shocks. Monte Carlo addresses this by generating thousands of alternative paths consistent with the model and its error structure, letting us see not just the “average” prediction but also how wide the uncertainty really is. This makes it a very important addition, because as an investor I don’t just want to know one number, I want to know the distribution of risks around it.
The results fit neatly into the story built throughout the project. The distribution of September 2025 outcomes centers around ~22 USD, very close to the ARIMA point forecast of 21.85 USD, which reassures that the baseline model is consistent. At the same time, the 5–95% Monte Carlo range 19.5–23.9 USD provides a more intuitive picture of plausible scenarios, showing that most outcomes remain in a relatively narrow corridor. Interestingly, the simulation suggests a slightly higher probability of finishing above the ARIMA forecast ~56%, which tilts the balance modestly to the upside. Almost all simulated paths (97.5%) stayed within the ARIMA confidence band, confirming that the earlier inference was statistically sound.
In the broader context of the project, Monte Carlo ties everything together: the stationarity tests and regressions identified the true drivers -mainly WIG20, with oil as a delayed but real factor, the diagnostics confirmed the models are stable and reliable, the ARIMA forecast set the baseline, and now Monte Carlo translates all of this into a probability distribution investors can actually use. It bridges the gap between econometrics and practice – showing not only what the most likely price will be, but also how much uncertainty surrounds that outcome. For me, as someone considering whether ORLEN is worth buying, this stage is crucial: it doesn’t just tell me 21.85 USD in 2025, it tells me the risk profile of that forecast, which is ultimately what investing decisions are built on.
stopifnot(exists("m_arima"), exists("fc_tbl"), nrow(fc_tbl) > 0)
set.seed(123)
n_sim <- 5000
fc_h <- nrow(fc_tbl)
mu <- fc_tbl$ORLEN_fc
sd95 <- pmax(1e-8, (fc_tbl$Hi95 - mu) / 1.96)
sim_fun_try <- try(getS3method("simulate", "Arima", envir = asNamespace("forecast")),
silent = TRUE)
use_fallback <- inherits(sim_fun_try, "try-error")
# one Monte Carlo path
sim_one <- function() {
if (!use_fallback) {
out <- try(as.numeric(sim_fun_try(m_arima,
nsim = fc_h,
future = TRUE,
bootstrap = TRUE)),
silent = TRUE)
if (!inherits(out, "try-error") && length(out) == fc_h &&
any(is.finite(out))) return(out)
}
rnorm(fc_h, mean = mu, sd = sd95)
}
## --- Robust simulation matrix (works also when fc_h == 1) ---
sim_list <- replicate(n_sim, sim_one(), simplify = FALSE)
sim_mat <- do.call(cbind, sim_list) # [fc_h x n_sim] even if fc_h==1
colnames(sim_mat) <- paste0("sim", seq_len(n_sim))
## Tidy outputs
mc_paths <- tibble::tibble(Month = fc_tbl$Month) |>
dplyr::bind_cols(as.data.frame(sim_mat))
final_vals <- as.numeric(sim_mat[fc_h, , drop = TRUE])
final_tbl <- tibble::tibble(value = final_vals)
## Headline metrics
q_vec <- stats::quantile(final_vals, probs = c(.05, .50, .95), na.rm = TRUE)
p_above <- mean(final_vals > mu[fc_h], na.rm = TRUE)
p_inside <- mean(final_vals >= fc_tbl$Lo95[fc_h] &
final_vals <= fc_tbl$Hi95[fc_h], na.rm = TRUE)
header("MONTE CARLO – Final (Sep 2025) distribution")
##
## ======================================================
## MONTE CARLO <U+2013> Final (Sep 2025) distribution
## ======================================================
cat("Quantiles 5% | 50% | 95%:", paste(round(q_vec, 2), collapse = " | "), "\n")
## Quantiles 5% | 50% | 95%: 19.5 | 21.99 | 23.88
cat("P[MC > ARIMA point]:", round(100 * p_above, 1), "%\n")
## P[MC > ARIMA point]: 55.6 %
cat("P[MC within ARIMA 95% band]:", round(100 * p_inside, 1), "%\n\n")
## P[MC within ARIMA 95% band]: 97.5 %
readr::write_csv(mc_paths, "outputs/montecarlo_paths_ORLEN.csv")
readr::write_csv(final_tbl, "outputs/montecarlo_final_ORLEN.csv")
# Plots
## Histogram for September
show_and_save(
ggplot(final_tbl, aes(value)) +
geom_histogram(bins = 50, alpha = .8) +
geom_vline(xintercept = mu[fc_h], linetype = "dashed") +
geom_vline(xintercept = as.numeric(q_vec[c(1,3)]), linetype = "dotted") +
labs(title = "Monte Carlo – Final Value Distribution (Sep 2025)",
subtitle = sprintf("Point=%.2f, 5%%=%.2f, 95%%=%.2f",
mu[fc_h], q_vec[1], q_vec[3]),
x = "ORLEN price (USD)", y = "Frequency") +
theme_minimal(),
"outputs/plots/montecarlo_histogram_sep2025.png", 1100, 650
)
## Saved: outputs/plots/montecarlo_histogram_sep2025.png
if (fc_h >= 2) {
## 50 random paths vs ARIMA mean (multi-step only)
set.seed(123)
show_n <- min(50, n_sim)
pick <- sample(seq_len(n_sim), show_n)
paths_long <- mc_paths |>
tidyr::pivot_longer(cols = tidyselect::all_of(colnames(sim_mat)[pick]),
names_to = "path", values_to = "value")
show_and_save(
ggplot() +
geom_line(data = paths_long,
aes(x = as.Date(Month), y = value, group = path),
alpha = .25) +
geom_line(data = tibble::tibble(Month = fc_tbl$Month, mean = mu),
aes(as.Date(Month), mean), linewidth = 1) +
labs(title = "Monte Carlo – Simulated Paths vs ARIMA Mean",
x = NULL, y = "USD") +
theme_minimal(),
"outputs/plots/montecarlo_paths_vs_arima.png", 1200, 720
)
## Fan chart 5–95%
q_by_t <- apply(sim_mat, 1, stats::quantile, probs = c(.05, .50, .95), na.rm = TRUE)
fan_df <- tibble::tibble(
Month = fc_tbl$Month,
q05 = q_by_t[1, ], q50 = q_by_t[2, ], q95 = q_by_t[3, ],
mean = mu
)
show_and_save(
ggplot(fan_df, aes(x = as.Date(Month))) +
geom_ribbon(aes(ymin = q05, ymax = q95), alpha = .25) +
geom_line(aes(y = q50)) +
geom_line(aes(y = mean), linetype = "dashed") +
labs(title = "Monte Carlo – Forecast Fan (5%–95%)",
x = NULL, y = "USD",
subtitle = "Solid: MC median | Dashed: ARIMA mean") +
theme_minimal(),
"outputs/plots/montecarlo_fan.png", 1200, 720
)
}
This project provides a rigorous econometric assessment of the short-term determinants and long-term forecasting potential of ORLEN’s stock price in USD. The analysis covered the period 2022–2025 and combined stationarity testing, regression modeling, block significance evaluation, conditional volatility analysis, parameter stability checks, and advanced forecasting techniques in order to build a reliable basis for inference and prediction. Empirical results consistently identified the WIG20 index as the central factor shaping ORLEN’s performance. Both contemporaneous and lagged effects of the domestic equity market were statistically and economically significant, underscoring the strong co-movement between ORLEN and overall market conditions in Poland. Oil prices exerted a secondary, delayed influence, becoming relevant when considered in aggregate across multiple lags, while natural gas did not provide systematic explanatory power. The explanatory capacity of the models reflected these relationships: while the static specification accounted for roughly 56% of return variability, the dynamic model incorporating lag structures achieved close to 66%, demonstrating the necessity of capturing temporal spillovers to adequately describe ORLEN’s dynamics. Block significance testing corroborated this hierarchy of drivers, confirming the dominant role of equity market dynamics and the secondary but non-negligible contribution of oil. Extensive diagnostic checks further verified the robustness of the estimated models: no evidence of heteroscedasticity, serial correlation, or severe deviations from residual normality was detected, while influence analysis confirmed that no single observation biased the results. The analysis of conditional volatility using ARIMA–GARCH(1,1) revealed that volatility clustering was not a persistent feature in ORLEN’s returns. The ARCH-LM test before and after GARCH estimation consistently failed to reject the null hypothesis of no ARCH effects, indicating that conditional heteroscedasticity was limited. The fitted GARCH model nevertheless provided insight into the short-term volatility path, which displayed a gradual decline over the forecast horizon, suggesting a stabilization of return dynamics. Tests of parameter stability (Chow, CUSUM, Bai–Perron breakpoints) confirmed that the estimated model was stable across the entire sample. Both Chow and CUSUM statistics were insignificant, and the Bai–Perron procedure selected no structural breaks according to BIC, reinforcing confidence in the temporal consistency of estimated coefficients. Finally, the forecasting stage, based on ARIMA projections, produced a central scenario in which ORLEN’s stock price is expected to reach USD 21.85 in September 2025, with a 95% confidence interval spanning USD 18.96 to USD 24.74. This projection implies a moderate recovery from earlier volatility, aligned with the dominant role of equity market conditions and the delayed influence of oil. Taken together, the findings emphasize that ORLEN’s valuation is primarily driven by domestic equity market dynamics, with global commodity prices playing a complementary role. In the end, this project not only highlights the key drivers of ORLEN’s stock price but also shows how statistical tools can be applied to real financial problems. The results give a clearer picture of the risks and opportunities, and they may serve as a useful hint when considering whether investing in ORLEN could be a good idea in the near future.