This data tell us about sales of Monthly gasoline demand Ontario gallon millions 1960 – 1975, and will compare some model also evaluation model.
library(pacman)
p_load("tidyverse", "lubridate", "forecast", "TTR", "MLmetrics", "tseries", "fpp", "TSstudio", "xfun")
Para esto usamos la base de datos llamada “Time series data library”
# Para instalar tsdl
# install.packages("devtools")
# devtools::install_github("FinYang/tsdl")
library(tsdl)
tsdl
## Time Series Data Library: 648 time series
##
## Frequency
## Subject 0.1 0.25 1 4 5 6 12 13 52 365 Total
## Agriculture 0 0 37 0 0 0 3 0 0 0 40
## Chemistry 0 0 8 0 0 0 0 0 0 0 8
## Computing 0 0 6 0 0 0 0 0 0 0 6
## Crime 0 0 1 0 0 0 2 1 0 0 4
## Demography 1 0 9 2 0 0 3 0 0 2 17
## Ecology 0 0 23 0 0 0 0 0 0 0 23
## Finance 0 0 23 5 0 0 20 0 2 1 51
## Health 0 0 8 0 0 0 6 0 1 0 15
## Hydrology 0 0 42 0 0 0 78 1 0 6 127
## Industry 0 0 9 0 0 0 2 0 1 0 12
## Labour market 0 0 3 4 0 0 17 0 0 0 24
## Macroeconomic 0 0 18 33 0 0 5 0 0 0 56
## Meteorology 0 0 18 0 0 0 17 0 0 12 47
## Microeconomic 0 0 27 1 0 0 7 0 1 0 36
## Miscellaneous 0 0 4 0 1 1 3 0 1 0 10
## Physics 0 0 12 0 0 0 4 0 0 0 16
## Production 0 0 4 14 0 0 28 1 1 0 48
## Sales 0 0 10 3 0 0 24 0 9 0 46
## Sport 0 1 1 0 0 0 0 0 0 0 2
## Transport and tourism 0 0 1 1 0 0 12 0 0 0 14
## Tree-rings 0 0 34 0 0 0 1 0 0 0 35
## Utilities 0 0 2 1 0 0 8 0 0 0 11
## Total 1 1 300 64 1 1 240 3 16 21 648
sales <- tsdl[[4]]
autoplot(sales)
anyNA(sales)
## [1] FALSE
Esto signigica que no tengo ningún valor faltante
sales_dc <- sales %>%
decompose(type="multiplicative") %>%
autoplot()
sales_dc
1. Using Exponential Smoothing Model
test_sales <- tail(sales, 42) #20% para pruebas
train_sales <- head(sales, length(sales)-42) #80% para entrenamiento
Based on decompose result, sales data has a trend and seasonality. for the next step, will using Triple Exponential Smooting/Holt Winters Exponential Smoothing
sales_ses <- HoltWinters(train_sales, seasonal = "multiplicative",)
sales_ses
## Holt-Winters exponential smoothing with trend and multiplicative seasonal component.
##
## Call:
## HoltWinters(x = train_sales, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha: 0.01480706
## beta : 0.2034893
## gamma: 0.3541218
##
## Coefficients:
## [,1]
## a 1.916298e+05
## b 8.376888e+02
## s1 1.185041e+00
## s2 1.151593e+00
## s3 1.061319e+00
## s4 1.056244e+00
## s5 9.994617e-01
## s6 1.019175e+00
## s7 8.875402e-01
## s8 8.736523e-01
## s9 9.472530e-01
## s10 9.455824e-01
## s11 1.069419e+00
## s12 1.095952e+00
Forecasting
sales_forecast <- forecast(object = sales_ses, h=36)
Visualizando resultados
sales %>%
autoplot(series = "actual") +
autolayer(sales_forecast$fitted, series = "train") +
autolayer(sales_forecast$mean, series = "test") +
theme_minimal()
## Warning: Removed 12 row(s) containing missing values (geom_path).
eval_ses <- accuracy(sales_forecast, test_sales)
eval_ses
## ME RMSE MAE MPE MAPE MASE
## Training set 892.4036 5559.481 4202.427 0.6126137 2.791996 0.5465397
## Test set 3778.5320 10318.195 8266.382 1.6463648 3.797144 1.0750707
## ACF1 Theil's U
## Training set -0.2021575 NA
## Test set -0.1576150 0.5358628
De acuerdo con la prueba de precision el modelo tiene un error medio es de: 3.7%
Los modelos SARIMA captan el comportamiento puramente estacional de una serie, en forma similar, como hemos visto, se realiza para la componente regular o no estacional. Una serie con influencia solamente por la componente estacional puede ser descrito por un modelo SARIMA (P,D,Q)
Empezaremos haciendo una prueba al conjunto de datos
adf.test(sales)
## Warning in adf.test(sales): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: sales
## Dickey-Fuller = -10.096, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Deacuerdo con el valor de P, los datos no son estacionarios
Analisis de ajuste
sales_diff <- sales %>%
diff(lag=12) %>%
diff(lag=1)
sales_diff %>%
autoplot()
Analizando la serie de tiempo por medio de un modelo ARIMA
sales_auto <- auto.arima(y = train_sales, seasonal = T)
summary(sales_auto)
## Series: train_sales
## ARIMA(3,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 ma1 sma1
## -0.3517 -0.1234 0.4087 -0.8320 -0.7736
## s.e. 0.1243 0.1352 0.1160 0.0878 0.0957
##
## sigma^2 estimated as 19522851: log likelihood=-1348.41
## AIC=2708.81 AICc=2709.46 BIC=2726.33
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 682.6604 4144.889 3074.124 0.342796 2.014285 0.3998001 -0.02807681
Visualizacion grafica de los datos reales versus el modelo SARIMA
train_sales %>%
autoplot(series= "actual") +
autolayer (sales_auto$fitted, series = "SARIMA auto") +
theme_minimal()
Analisis de los cambios diferenciales en la serie de tiempo
tsdisplay(sales_diff)