Download

Licença

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

License: CC BY-SA 4.0

License: CC BY-SA 4.0

Citação

Sugestão de citação: FIGUEIREDO, Adriano Marcos Rodrigues. Séries Temporais com R: Modelo híbrido em opera. Campo Grande-MS,Brasil: RStudio/Rpubs, 2019. Disponível em http://rpubs.com/amrofi/Time_Series_Hybrid_opera.

1 Apresentação

Este artigo procura ilustrar os procedimentos para trabalhar com métodos de séries temporais para Modelo híbrido em opera. Baseado em postagem de Rob J Hyndman (HYNDMAN, 2016).

A combinação de forecasts, ou também chamado de modelo híbrido de previsão, tem por ideia básica a combinação de forecasts de diferentes métodos. Por exemplo, na postagem de Peter Ellis (ELLIS, 2015), mostra-se que os modelos com combinação de forecasts podem ter melhores resultados que os melhores modelos da competição M3 de forecasts (competição de Spyros Makridakis e Michele Hibon, em que testaram vários métodos de previsão para 3.003 séries temporais) (Figura 1).

Comparação de métodos de previsão da competição M3. Fonte: ELLIS, 2015

Comparação de métodos de previsão da competição M3. Fonte: ELLIS, 2015

O nome do pacote opera significa “Online Prediction by ExpeRt Aggregation” (Previsão Online por Agregação por ExpeRt). Foi escrito por Pierre Gaillard e Yannig Goude e será aqui considerado para uma previsão simples de série temporal univariada mensal de dados de consumo no varejo de São Paulo de Morettin e Toloi (2005).

2 Dados

library(forecast)
library(ggplot2)
library(readxl)
library(seasonal)
dados <- structure(list(obs = structure(c(441763200, 444441600, 446947200, 449625600, 
    452217600, 454896000, 457488000, 460166400, 462844800, 465436800, 468115200, 
    470707200, 473385600, 476064000, 478483200, 481161600, 483753600, 486432000, 
    489024000, 491702400, 494380800, 496972800, 499651200, 502243200, 504921600, 
    507600000, 510019200, 512697600, 515289600, 517968000, 520560000, 523238400, 
    525916800, 528508800, 531187200, 533779200, 536457600, 539136000, 541555200, 
    544233600, 546825600, 549504000, 552096000, 554774400, 557452800, 560044800, 
    562723200, 565315200, 567993600, 570672000, 573177600, 575856000, 578448000, 
    581126400, 583718400, 586396800, 589075200, 591667200, 594345600, 596937600, 
    599616000, 602294400, 604713600, 607392000, 609984000, 612662400, 615254400, 
    617932800, 620611200, 623203200, 625881600, 628473600, 631152000, 633830400, 
    636249600, 638928000, 641520000, 644198400, 646790400, 649468800, 652147200, 
    654739200, 657417600, 660009600, 662688000, 665366400, 667785600, 670464000, 
    673056000, 675734400, 678326400, 681004800, 683683200, 686275200, 688953600, 
    691545600, 694224000, 696902400, 699408000, 702086400, 704678400, 707356800, 
    709948800, 712627200, 715305600, 717897600, 720576000, 723168000, 725846400, 
    728524800, 730944000, 733622400, 736214400, 738892800, 741484800, 744163200, 
    746841600, 749433600, 752112000, 754704000, 757382400, 760060800, 762480000, 
    765158400, 767750400, 770428800, 773020800, 775699200, 778377600, 780969600, 
    783648000, 786240000, 788918400, 791596800, 794016000, 796694400, 799286400, 
    801964800, 804556800, 807235200, 809913600, 812505600, 815184000, 817776000, 
    820454400, 823132800, 825638400, 828316800, 830908800, 833587200, 836179200, 
    838857600, 841536000, 844128000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    t = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
        20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
        37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 
        54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 
        71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 
        88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 
        104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 
        118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 
        132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 
        146, 147, 148, 149, 150, 151, 152, 153, 154), consumo = c(114.13, 110.79, 
        116.46, 111.57, 120.66, 121.15, 121.27, 127.02, 129.04, 133.3, 130.6, 
        179.39, 120.64, 114.05, 130.6, 118.26, 145.54, 135.13, 153.35, 159.95, 
        150.01, 164.93, 170.37, 220.96, 134.26, 133.11, 147.84, 164.46, 181.86, 
        170.44, 186.64, 174.21, 181.62, 194.16, 181.9, 232.01, 140.16, 130.78, 
        119.04, 120.73, 129.81, 111.04, 122.75, 133.95, 125.41, 132.05, 129.54, 
        176.37, 110.09, 113.25, 124.03, 110.63, 116.72, 124.63, 124.38, 130.27, 
        119.87, 115.75, 122.44, 162.43, 105.89, 115.59, 147, 131.7, 131.32, 
        136.66, 126.43, 134.88, 128.26, 125.32, 124.61, 166.11, 116.25, 96.93, 
        89.27, 101.87, 125.57, 113.31, 109.39, 127.33, 120.56, 117.73, 113.81, 
        147.25, 100.15, 95.11, 112.26, 109.39, 114.2, 113.8, 126.47, 128.36, 
        115.71, 116.09, 99.53, 127.27, 87.08, 85.67, 82.02, 98.2, 96.44, 90.23, 
        97.15, 95.08, 94, 93, 96.09, 129.21, 75.39, 77.7, 97.34, 84.97, 87.55, 
        86.64, 90.52, 95.4, 95.2, 95.8, 101.23, 128.49, 85.63, 82.77, 96.55, 
        81.33, 96.91, 83.76, 90.19, 114.84, 108.4, 106.05, 109.71, 143.86, 99.12, 
        99.28, 114.75, 106.13, 110.02, 108.07, 112.52, 113.87, 107.84, 112.12, 
        112.03, 139.37, 92.24, 93.56, 107.37, 102.89, 114.78, 102.88, 118.41, 
        119.23, 117.36, 122.06)), row.names = c(NA, -154L), class = c("tbl_df", 
    "tbl", "data.frame"))
attach(dados)
dados2 <- matrix(consumo)
dados.ts <- ts(dados2, start = c(1984, 1), frequency = 12)

train <- window(dados.ts, end = c(1994, 3))
test <- window(dados.ts, start = c(1994, 4))
h <- length(test)
# construir forecasts dos experts
ETS <- forecast(ets(train), h = h)
ARIMA <- forecast(auto.arima(train, stepwise = FALSE, approximation = FALSE), 
    h = h)
STL <- stlf(train, lambda = 0, h = h)


# ajuste calendário brasileiro
library(RCurl)
usingR_url_wd <- getURL("https://raw.githubusercontent.com/pedrocostaferreira/Articles/master/using-R-to-teach-seasonal-adjustment/work_days.csv")
wd <- read.csv2(text = usingR_url_wd)
head(wd)
Workdays_ok <- ts(wd[, 5], start = c(1970, 1), freq = 12)

# construção feriados
usingR_url_mh <- getURL("https://raw.githubusercontent.com/pedrocostaferreira/Articles/master/using-R-to-teach-seasonal-adjustment/moving_holidays.csv")
mh <- read.csv2(text = usingR_url_mh)
head(mh)  # datas de Pascoa e Carnaval
# Pascoa (DOMINGO)
dates.easter <- as.Date(as.character(mh$Easter), "%d/%m/%Y")
easter <- genhol(dates.easter, start = -8, end = 1, frequency = 12)

# Carnaval (TERCA-FEIRA DE CARNAVAL)
dates.carnival <- as.Date(as.character(mh$Carnival), "%d/%m/%Y")
carnival <- genhol(dates.carnival, start = -3, end = 1, frequency = 12, center = "calendar")
# carnival <- genhol(dates.carnival, start = -3, end = 1, frequency = 12)
regs <- na.omit(cbind(Workdays_ok, easter, carnival))
x13 <- series(seas(train, xreg = regs, regression.aictest = NULL, regression.usertype = "holiday", 
    forecast.save = "forecasts"), c("forecast.forecasts"))

# agregar experts
X <- cbind(ETS = ETS$mean, ARIMA = ARIMA$mean, STL = STL$mean, x13 = x13[1:31, 
    1])
df <- cbind(dados.ts, X)
colnames(df) <- c("Dados", "ETS", "ARIMA", "STL", "x13")
autoplot(df) + xlab("Ano") + ylab(expression("Consumo no varejo de São Paulo"))

Vou olhar o oráculo das previsões.

require(opera)
oracle.convex <- oracle(Y = test, experts = X, loss.type = "square", model = "convex")
plot(oracle.convex)

oracle.convex
Call:
oracle.default(Y = test, experts = X, model = "convex", loss.type = "square")

Coefficients:
 ETS ARIMA STL       x13
   0     1   0 -1.68e-16

                       rmse   mape
Best expert oracle:    8.71 0.0716
Uniform combination:  11.20 0.0962
Best convex oracle:    8.71 0.0716

Existe algum expert individualmente melhor ao longo do tempo? Existem quebras?

require(opera)
oracle.shift <- oracle(Y = test, experts = X, loss.type = "percentage", model = "shifting")
plot(oracle.shift)

oracle.shift
Call:
oracle.default(Y = test, experts = X, model = "shifting", loss.type = "percentage")

      0 shifts 7 shifts 15 shifts 23 shifts 30 shifts
mape:   0.0716   0.0656    0.0671     0.069    0.0734

3 opera

A função de mistura do pacote opera calcula pesos ao combinar as previsões com base no desempenho até esse período de tempo.

library(opera)
# iniciar regra de agregação ou mistura
MLpol0 <- mixture(model = "MLpol", loss.type = "square")
m1.MLpol <- MLpol0
for (i in 1:length(test)) {
    m1.MLpol <- predict(m1.MLpol, newexperts = X[i, ], newY = test[i])
}
# fazer agregação diretamente
m3.MLpol <- mixture(Y = test, experts = X, model = "MLpol", loss.type = "square")

weights <- predict(m1.MLpol, X, test, type = "weights")
(weights)
      ETS ARIMA STL x13
 [1,]   0     1   0   0
 [2,]   0     1   0   0
 [3,]   0     1   0   0
 [4,]   0     1   0   0
 [5,]   0     1   0   0
 [6,]   0     1   0   0
 [7,]   0     1   0   0
 [8,]   0     1   0   0
 [9,]   0     1   0   0
[10,]   0     1   0   0
[11,]   0     1   0   0
[12,]   0     1   0   0
[13,]   0     1   0   0
[14,]   0     1   0   0
[15,]   0     1   0   0
[16,]   0     1   0   0
[17,]   0     1   0   0
[18,]   0     1   0   0
[19,]   0     1   0   0
[20,]   0     1   0   0
[21,]   0     1   0   0
[22,]   0     1   0   0
[23,]   0     1   0   0
[24,]   0     1   0   0
[25,]   0     1   0   0
 [ reached getOption("max.print") -- omitted 6 rows ]

Ele começa com a ponderação já igual a 1 para o ARIMA e segue até o fim, sem misturar.

Aqui estão as previsões resultantes:

z <- ts(predict(m3.MLpol, X, test, type = "response"), start = c(1994, 4), freq = 12)
df <- cbind(dados.ts, z)
colnames(df) <- c("Dados", "Mistura")
autoplot(df) + xlab("Ano") + ylab(expression("Consumo no varejo de São Paulo"))

Referências

ELLIS, Peter. X13-SEATS-ARIMA as an automated forecasting tool. 2015. Disponível em: https://www.r-bloggers.com/x13-seats-arima-as-an-automated-forecasting-tool/ e http://freerangestats.info/blog/2015/12/21/m3-and-x13.

GAILLARD, Pierre; GOUDE, Yannig. opera: Online Prediction by Expert Aggregation. R package version 1.0. 2016. Disponível em: https://CRAN.R-project.org/package=opera.

HYNDMAN, Rob J. R packages for forecast combinations. 2016. Disponível em:https://robjhyndman.com/hyndsight/forecast-combinations/

SHAUB, David; ELLIS, Peter. forecastHybrid: Convenient Functions for Ensemble Time Series Forecasts. R package version 3.0.14. 2018. Disponível em: https://CRAN.R-project.org/package=forecastHybrid.

SHAUB, David; ELLIS, Peter.forecastHybrid: vignette. 2016. Disponível em: https://cran.r-project.org/web/packages/forecastHybrid/vignettes/forecastHybrid.html.