TimeSeries

Dr. Roberto Chang López
2021-09-07

_

JuveYell

Fuente:Aptech

##Haga Click Aqui para ver Certificado Machine Learning MIT https://www.credential.net/4dd365ea-ea5a-46a2-a72e-539e70545c6e

##Haga Click Aqui para ver Certificado Columbia Python for Managers https://certificates.emeritus.org/0a2e1de7-add2-4710-ad49-417d1dadfb61#gs.4a92hv ##Contacto:

Algunos Dashboards elaborados son: Para Bolsa de Valores https://rchang.shinyapps.io/rchang-stock-exchange/

Para el Estado del Clima https://rchang.shinyapps.io/rchang-app_clima_ho/

Para Machine Learning https://rchang.shinyapps.io/rchang-app/

Para Empresariales e Industriales https://rchang.shinyapps.io/rchang-app_final_emp/

Para Dashboards con log in https://rchang.shinyapps.io/clase_3-shiny-2/_w_ae4e775f/_w_f249a9a1/?page=sign_in

y para Sistemas de Información Geográfica

Modeltime H2O provides an H2O backend to the Modeltime Forecasting Ecosystem. The main algorithm is H2O AutoML, an automatic machine learning library that is built for speed and scale.

Getting Started with Modeltime H2O Forecasting with modeltime.h2o made easy! This short tutorial shows how you can use:

H2O AutoML for forecasting implemented via automl_reg(). This function trains and cross-validates multiple machine learning and deep learning models (XGBoost GBM, GLMs, Random Forest, GBMs…) and then trains two Stacked Ensembled models, one of all the models, and one of only the best models of each kind. Finally, the best model is selected based on a stopping metric. And we take care of all this for you!

Save & Load Models functionality to ensure the persistence of your models.

Collect data and split into training and test sets Next, we load the walmart_sales_weekly data containing 7 time series and visualize them using the timetk::plot_time_series() function.

Dataset Next, let’s use the walmart_sales_weekly dataset that comes with timetk.

data_tbl <- walmart_sales_weekly %>%
    dplyr::select(id, Date, Weekly_Sales)

data_tbl %>% 
  group_by(id) %>% 
  plot_time_series(
      .date_var    = Date,
      .value       = Weekly_Sales,
      .facet_ncol  = 2,
      .smooth      = F,
      .interactive = F
  )

Then, we separate the data with the initial_time_split() function and generate a training dataset and a test one.

splits <- time_series_split(data_tbl, assess = "3 month", cumulative = TRUE)

recipe_spec <- recipe(Weekly_Sales ~ ., data = training(splits)) %>%
    step_timeseries_signature(Date) 

train_tbl <- training(splits) %>% bake(prep(recipe_spec), .)
test_tbl  <- testing(splits) %>% bake(prep(recipe_spec), .)

Model specification, training and prediction In order to correctly use modeltime.h2o, it is necessary to connect to an H2O cluster through the h2o.init() function. You can find more information on how to set up the cluster by typing ?h2o.init or by visiting the official site.

h2o.init(
    nthreads = -1,
    ip       = 'localhost',
    port     = 54321
)
 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         17 minutes 31 seconds 
    H2O cluster timezone:       America/Regina 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.32.1.3 
    H2O cluster version age:    3 months and 18 days !!! 
    H2O cluster name:           H2O_started_from_R_ladylee_lvm109 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.98 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4 
    R Version:                  R version 4.1.1 (2021-08-10) 
# Optional - Turn off progress indicators during training runs
h2o.no_progress()

Now comes the fun part! We define our model specification with the automl_reg() function and pass the arguments through the engine:

model_spec <- automl_reg(mode = 'regression') %>%
    set_engine(
         engine                     = 'h2o',
         max_runtime_secs           = 5, 
         max_runtime_secs_per_model = 3,
         max_models                 = 3,
         nfolds                     = 5,
         exclude_algos              = c("DeepLearning"),
         verbosity                  = NULL,
         seed                       = 786
    ) 

model_spec
H2O AutoML Model Specification (regression)

Engine-Specific Arguments:
  max_runtime_secs = 5
  max_runtime_secs_per_model = 3
  max_models = 3
  nfolds = 5
  exclude_algos = c("DeepLearning")
  verbosity = NULL
  seed = 786

Computational engine: h2o 
Next, let’s train the model!
model_fitted <- model_spec %>%
    fit(Weekly_Sales ~ ., data = train_tbl)
                                             model_id
1 StackedEnsemble_BestOfFamily_AutoML_20210907_174932
2    StackedEnsemble_AllModels_AutoML_20210907_174932
3                        DRF_1_AutoML_20210907_174932
4                        GBM_1_AutoML_20210907_174932
5                        GLM_1_AutoML_20210907_174932
  mean_residual_deviance     rmse        mse       mae     rmsle
1              127322018 11283.71  127322018  7448.604 0.3167796
2              127426423 11288.33  127426423  7420.237 0.3483023
3              151356413 12302.70  151356413  7753.125 0.3010581
4              720727604 26846.37  720727604 21732.328 0.6672956
5             1314471116 36255.64 1314471116 31032.413 0.8352711

[5 rows x 6 columns] 

We can check out the trained H2O AutoML model.

model_fitted
parsnip model object

Fit time:  14.2s 

H2O AutoML - Stackedensemble
--------
Model: Model Details:
==============

H2ORegressionModel: stackedensemble
Model ID:  StackedEnsemble_BestOfFamily_AutoML_20210907_174932 
Number of Base Models: 3

Base Models (count by algorithm type):

drf gbm glm 
  1   1   1 

Metalearner:

Metalearner algorithm: glm
Metalearner cross-validation fold assignment:
  Fold assignment scheme: AUTO
  Number of folds: 5
  Fold column: NULL
Metalearner hyperparameters: 


H2ORegressionMetrics: stackedensemble
** Reported on training data. **

MSE:  56164199
RMSE:  7494.278
MAE:  4793.973
RMSLE:  0.2215232
Mean Residual Deviance :  56164199



H2ORegressionMetrics: stackedensemble
** Reported on cross-validation data. **
** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **

MSE:  127322018
RMSE:  11283.71
MAE:  7448.604
RMSLE:  0.3167796
Mean Residual Deviance :  127322018
Finally, we predict on the test dataset:
predict(model_fitted, test_tbl)
# A tibble: 84 x 1
     .pred
     <dbl>
 1  17622.
 2  34585.
 3  33541.
 4  35729.
 5  75500.
 6  79359.
 7 141289.
 8  17949.
 9  27627.
10  34538.
# ... with 74 more rows

Modeltime Workflow Once we have our fitted model, we can follow the Modeltime Workflow:

Add fitted models to a Model Table.

Calibrate the models to a testing set.

Perform Testing Set Forecast Assessment & Accuracy Evaluation.

Refit the models to Full Dataset & Forecast Forward

Add fitted models to a Model Table First, we create the model table:

modeltime_tbl <- modeltime_table(
    model_fitted
) 

modeltime_tbl
# Modeltime Table
# A tibble: 1 x 3
  .model_id .model   .model_desc                 
      <int> <list>   <chr>                       
1         1 <fit[+]> H2O AUTOML - STACKEDENSEMBLE

Calibrate & Testing Set Forecast & Accuracy Evaluation Next, we calibrate to the testing set and visualize the forecasts:

modeltime_tbl %>%
  modeltime_calibrate(test_tbl) %>%
    modeltime_forecast(
        new_data    = test_tbl,
        actual_data = data_tbl,
        keep_data   = TRUE
    ) %>%
    group_by(id) %>%
    plot_modeltime_forecast(
        .facet_ncol = 2, 
        .interactive = FALSE
    )

Refit to Full Dataset & Forecast Forward Before using refit on our dataset, let’s prepare our data. We create data_prepared_tbl which represents the complete dataset (the union of train and test) with the variables created with the recipe named recipe_spec. Subsequently, we create the dataset future_prepared_tbl that represents the dataset with the future data to one year and the required variables.

data_prepared_tbl <- bind_rows(train_tbl, test_tbl)

future_tbl <- data_prepared_tbl %>%
    group_by(id) %>%
    future_frame(.length_out = "1 year") %>%
    ungroup()

future_prepared_tbl <- bake(prep(recipe_spec), future_tbl)
Finally, we use modeltime_refit() to re-train our model on the entire dataset. This is a best-practice for improving forecast results.
refit_tbl <- modeltime_tbl %>%
    modeltime_refit(data_prepared_tbl)
                                             model_id
1 StackedEnsemble_BestOfFamily_AutoML_20210907_174949
2    StackedEnsemble_AllModels_AutoML_20210907_174949
3                        DRF_1_AutoML_20210907_174949
4                        GBM_1_AutoML_20210907_174949
5                        GLM_1_AutoML_20210907_174949
  mean_residual_deviance      rmse        mse       mae     rmsle
1               73463632  8571.093   73463632  5993.595       NaN
2               73779236  8589.484   73779236  6012.089       NaN
3              121155103 11007.048  121155103  6730.014 0.2714179
4              373989304 19338.803  373989304 15816.111 0.5350841
5             1314752626 36259.518 1314752626 31017.107 0.8289850

[5 rows x 6 columns] 

Let’s visualize the final forecast We can quickly visualize the final forecast with modeltime_forecast() and it’s plotting utility function, plot_modeltime_forecast().

refit_tbl %>%
    modeltime_forecast(
        new_data    = future_prepared_tbl,
        actual_data = data_prepared_tbl,
        keep_data   = TRUE
    ) %>%
    group_by(id) %>%
    plot_modeltime_forecast(
        .facet_ncol  = 2,
        .interactive = FALSE
    )

We can likely do better than this if we train longer but really good for a quick example!

Saving and Loading Models H2O models will need to “serialized” (a fancy word for saved to a directory that contains the recipe for recreating the models). To save the models, use save_h2o_model().

Provide a directory where you want to save the model. This saves the model file in the directory.

#model_fitted %>% 
 # save_h2o_model(path = "../model_fitted", overwrite = TRUE)

#You can reload the model into R using load_h2o_model().

#model_h2o <- load_h2o_model(path = "../model_fitted/")