1. Wprowadzenie

W ramach niniejszej analizy prognozujemy ceny ropy Brent przy użyciu różnych metod modelowania szeregów czasowych.

ARIMA (AutoRegressive Integrated Moving Average) to klasyczny model statystyczny używany do analizy i prognozowania danych szeregów czasowych.

Model ARIMA łączy trzy komponenty:

Prophet to model opracowany przez Facebooka, zaprojektowany specjalnie z myślą o prognozowaniu danych z wyraźną sezonowością dzienną, tygodniową i roczną. Prophet jest odporny na brakujące dane i nietypowe wartości (outliery) oraz umożliwia uwzględnianie dodatkowych zmiennych wyjaśniających (regresorów) oraz świąt (holidays).

Komponenty Propheta
Komponenty Propheta

W projekcie wykorzystano m.in. modele Prophet, Prophet z dodatkowymi zmiennymi, ARIMA oraz KNN. Oceniono ich dokładność i przeprowadzono predykcję na przyszłe 30 dni.


2. Pobranie danych

brent_prices <- tq_get("BZ=F")
brent_prices %>% plot_time_series(date, close, .smooth = T,
                                  .interactive = T,
                                  .title = "Brent Oil price (USD)")

3. Dodanie zmiennych objaśniających

dolar_index <- tq_get("DX-Y.NYB")

brent_prices <- brent_prices %>% 
  mutate(date = as.Date(date)) %>%
  left_join(dolar_index %>% mutate(date = as.Date(date)), by = "date") %>%
  select(date, close = close.x, dxy = close.y) %>%
  drop_na()

4. Stacjonarność szeregu

adf.test(brent_prices$close, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  brent_prices$close
## Dickey-Fuller = -2.3628, Lag order = 13, p-value = 0.4247
## alternative hypothesis: stationary
# Okres pandemii COVID-19 (marzec 2020 - czerwiec 2021)
covid_dates <- tibble(
  holiday = 'covid',
  ds = seq(as.Date("2020-03-01"), as.Date("2021-06-30"), by = "day"),
  lower_window = 0,
  upper_window = 0
)

# Okres wojny w Ukrainie (luty 2022 - grudzień 2022)
war_dates <- tibble(
  holiday = 'war',
  ds = seq(as.Date("2022-02-20"), as.Date("2022-12-31"), by = "day"),
  lower_window = 0,
  upper_window = 0
)

# Połączenie wszystkich wydarzeń w jedną tabelę
holidays_df <- bind_rows(covid_dates, war_dates)

5. Inżynieria cech: analiza techniczna

MACD explained
MACD explained
Bollinger Bands explained
Bollinger Bands explained
brent_prices <- brent_prices %>%
  arrange(date) %>%
  mutate(
    macd_full = MACD(close, nFast = 12, nSlow = 26, nSig = 9, maType = EMA),
    macd = macd_full[, "macd"],
    signal = macd_full[, "signal"],
    sma_14 = SMA(close, n = 14),
    macd_hist = macd - signal
  ) %>%
  mutate(
    bb = BBands(close, n = 20, sd = 2),
    bb_up = bb[, "up"],
    bb_dn = bb[, "dn"],
    bb_mavg = bb[, "mavg"],
    bb_pctb  = (close - bb_dn) / (bb_up - bb_dn)
  ) %>%
  select(-macd_full, -bb) %>%
  drop_na()

6. Podział na dane treningowe i testowe

splits <- brent_prices %>% time_series_split(assess = "1 month", cumulative = TRUE)
## Using date_var: date
splits %>% tk_time_series_cv_plan() %>% plot_time_series_cv_plan(date, close)

7. Modelowanie

model_prophet <- prophet_reg(mode = "regression", growth = "linear", season = "additive") %>%
  set_engine("prophet", weekly.seasonality = TRUE) %>%
  fit(close ~ date, training(splits))
## Warning: The following arguments cannot be manually modified and were removed:
## weekly.seasonality.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_prophet_with_holidays <- prophet_reg(mode = "regression", growth = "linear", season = "additive", prior_scale_holidays = 20) %>%
  set_engine("prophet", weekly.seasonality = TRUE, holidays = holidays_df) %>%
  fit(close ~ date, training(splits))
## Warning: The following arguments cannot be manually modified and were removed:
## weekly.seasonality.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_prophet_with_dxy <- prophet_reg(mode = "regression", growth = "linear", season = "additive") %>%
  set_engine("prophet", weekly.seasonality = TRUE) %>%
  fit(close ~ date + dxy, training(splits))
## Warning: The following arguments cannot be manually modified and were removed:
## weekly.seasonality.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_prophet_with_MACD <- prophet_reg(mode = "regression", growth = "linear", season = "additive") %>%
  set_engine("prophet", weekly.seasonality = TRUE) %>%
  fit(close ~ date + dxy + macd_hist, training(splits))
## Warning: The following arguments cannot be manually modified and were removed:
## weekly.seasonality.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_prophet_with_SMA14 <- prophet_reg(mode = "regression", growth = "linear", season = "additive") %>%
  set_engine("prophet", weekly.seasonality = TRUE) %>%
  fit(close ~ date + dxy + sma_14, training(splits))
## Warning: The following arguments cannot be manually modified and were removed:
## weekly.seasonality.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_prophet_with_MACD_SMA <- prophet_reg(mode = "regression", growth = "linear", season = "additive") %>%
  set_engine("prophet", weekly.seasonality = TRUE) %>%
  fit(close ~ date + dxy + macd_hist + sma_14, training(splits))
## Warning: The following arguments cannot be manually modified and were removed:
## weekly.seasonality.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_prophet_with_bb <- prophet_reg(mode = "regression", growth = "linear", season = "additive") %>%
  set_engine("prophet", weekly.seasonality = TRUE) %>%
  fit(close ~ date + dxy + macd_hist + sma_14 + bb_pctb, training(splits))
## Warning: The following arguments cannot be manually modified and were removed:
## weekly.seasonality.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_arima <- arima_reg(mode = "regression", seasonal_period = 12) %>%
  set_engine("auto_arima", stepwise = FALSE, approximation = FALSE) %>%
  fit(close ~ date, training(splits))

model_knn <- nearest_neighbor(mode = "regression", neighbors = 5) %>%
  set_engine("kknn") %>%
  fit(close ~ date, training(splits))

# Budowa modelu Prophet z uwzględnieniem holidays

8. Symulacja przyszłych wartości zmiennych

simulate_regressor_prophet <- function(data, future_dates, reg_name, periods = NULL) {
  if (is.null(periods)) { periods <- nrow(future_dates) }
  reg_data <- data %>% select(ds = date, y = !!sym(reg_name)) %>% drop_na()
  model <- prophet(reg_data, daily.seasonality = TRUE)
  future <- make_future_dataframe(model, periods = periods)
  forecast <- predict(model, future)
  forecast_tail <- tail(forecast$yhat, periods)
  return(forecast_tail)
}

future_dates <- tibble(date = seq.Date(from = max(brent_prices$date) + 1, by = "day", length.out = 30))

simulated_dxy <- simulate_regressor_prophet(brent_prices, future_dates, "dxy")
simulated_macd <- simulate_regressor_prophet(brent_prices, future_dates, "macd_hist")
simulated_sma14 <- simulate_regressor_prophet(brent_prices, future_dates, "sma_14")
simulated_bb_pctb <- simulate_regressor_prophet(brent_prices, future_dates, "bb_pctb")

future_data <- future_dates %>%
  mutate(dxy = simulated_dxy, macd_hist = simulated_macd, sma_14 = simulated_sma14,bb_pctb = simulated_bb_pctb)

9. Kalibracja i porównanie modeli

models_table <- modeltime_table(
  model_prophet,
  model_arima,
  model_prophet_with_holidays,
  model_prophet_with_dxy,
  model_prophet_with_MACD,
  model_prophet_with_SMA14,
  model_prophet_with_MACD_SMA,
  model_prophet_with_bb,
  model_knn
)

models_table <- update_model_description(models_table, 3, "Prophet + Holidays")
models_table <- update_model_description(models_table, 4, "Prophet + DXY")
models_table <- update_model_description(models_table, 5, "Prophet + DXY + MACD")
models_table <- update_model_description(models_table, 6, "Prophet + DXY + SMA14")
models_table <- update_model_description(models_table, 7, "P. + DXY + MACD + SMA14")
models_table <- update_model_description(models_table, 8, "P. + DXY + MACD + SMA14 + BB")

calibration_table <- models_table %>%
  modeltime_calibrate(testing(splits))

calibration_table %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy()
## Warning: There was 1 warning in `dplyr::mutate()`.
## ℹ In argument: `.nested.col = purrr::map(...)`.
## Caused by warning:
## ! A correlation computation is required, but `estimate` is constant and has 0
## standard deviation, resulting in a divide by 0 error. `NA` will be returned.
library(kableExtra)
## Warning: pakiet 'kableExtra' został zbudowany w wersji R 4.4.3
## 
## Dołączanie pakietu: 'kableExtra'
## Następujący obiekt został zakryty z 'package:dplyr':
## 
##     group_rows
acc_table <- calibration_table %>%
  modeltime_accuracy()
## Warning: There was 1 warning in `dplyr::mutate()`.
## ℹ In argument: `.nested.col = purrr::map(...)`.
## Caused by warning:
## ! A correlation computation is required, but `estimate` is constant and has 0
## standard deviation, resulting in a divide by 0 error. `NA` will be returned.
# Ładna tabelka z wyśrodkowanym tytułem i widocznymi liniami
acc_table %>%
  knitr::kable(
    digits = 3,
    caption = "<center><strong>Podsumowanie wyników modeli - Miary dokładności</strong></center>",
    format = "html"
  ) %>%
  kable_styling(
    full_width = FALSE,
    position = "center",
    font_size = 14,
    bootstrap_options = c("striped", "hover"),
    fixed_thead = TRUE  # Wyrównanie nagłówka
  ) %>%
  row_spec(0, bold = TRUE, background = "#f2f2f2") %>%
  row_spec(1:nrow(acc_table), extra_css = "border-bottom: 2px solid black;") %>%
  column_spec(1:ncol(acc_table), extra_css = "border-right: 1px solid black; border-left: 1px solid black;") %>%
  column_spec(1, width = "1in")  # Wyrównanie szerokości pierwszej kolumny
Podsumowanie wyników modeli - Miary dokładności
.model_id .model_desc .type mae mape mase smape rmse rsq
1 PROPHET Test 10.261 15.598 6.779 14.255 11.220 0.728
2 ARIMA(0,1,5) Test 6.601 10.068 4.361 9.465 7.398 0.449
3 Prophet + Holidays Test 11.862 17.972 7.837 16.279 12.684 0.722
4 Prophet + DXY Test 9.674 14.713 6.391 13.506 10.606 0.784
5 Prophet + DXY + MACD Test 9.523 14.440 6.291 13.315 10.239 0.023
6 Prophet + DXY + SMA14 Test 4.091 6.247 2.703 5.937 5.324 0.051
7 P. + DXY + MACD + SMA14 Test 1.705 2.580 1.127 2.526 2.377 0.713
8 P. + DXY + MACD + SMA14 + BB Test 1.528 2.278 1.009 2.267 1.689 0.850
9 KKNN Test 6.331 9.647 4.182 9.100 7.054 NA

Wyjaśnienie metryk

Oceny wydajności predykcyjnej zastosowanych modeli dokonano w oparciu o następujące metryki:

  • Średni Błąd Bezwzględny (MAE): Mierzy średnią wartość bezwzględnych różnic między rzeczywistymi a prognozowanymi wartościami.

    Wzór: \[ MAE = \frac{1}{n} \sum_{t=1}^{n} |y_t - \hat{y}_t| \]

  • Średni Procentowy Błąd Bezwzględny (MAPE): Oblicza średnią procentową różnicę między rzeczywistymi a prognozowanymi wartościami.

    Wzór: \[ MAPE = \frac{1}{n} \sum_{t=1}^{n} \left| \frac{y_t - \hat{y}_t}{y_t} \right| \times 100 \]

  • Skalowany Średni Błąd Bezwzględny (MASE): Skaluje błąd w stosunku do błędu naiwnego modelu.

    Wzór: \[ MASE = \frac{MAE}{\frac{1}{n-1} \sum_{t=2}^{n} |y_t - y_{t-1}|} \]

  • Symetryczny Średni Procentowy Błąd Bezwzględny (SMAPE): Jest to wersja MAPE, która uwzględnia symetrię w obliczaniu błędów.

    Wzór: \[ SMAPE = \frac{1}{n} \sum_{t=1}^{n} \frac{|y_t - \hat{y}_t|}{\frac{|y_t| + |\hat{y}_t|}{2}} \times 100 \]

  • Pierwiastek Średniego Błędu Kwadratowego (RMSE): Mierzy rozrzut błędów prognozy, biorąc pod uwagę kwadraty różnic.

    Wzór: \[ RMSE = \sqrt{\frac{1}{n} \sum_{t=1}^{n} (y_t - \hat{y}_t)^2} \]

  • Współczynnik Determinacji (R²): Mierzy, jak dobrze model wyjaśnia zmienność danych.

    Wzór: \[ R^2 = 1 - \frac{\sum_{t=1}^{n} (y_t - \hat{y}_t)^2}{\sum_{t=1}^{n} (y_t - \bar{y})^2} \]

10. Prognoza

Testowy forecast

calibration_table %>%
  modeltime_forecast(new_data = testing(splits), actual_data = brent_prices) %>%
  plot_modeltime_forecast(.interactive = TRUE, .legend_max_width = 25) 

Finalny forecast

calibration_table %>%
  modeltime_refit(brent_prices) %>%
  modeltime_forecast(new_data = future_data, actual_data = brent_prices) %>%
  plot_modeltime_forecast(.interactive = TRUE, .plotly_slider = TRUE)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

11. Podsumowanie


12. Dalsze kroki