Introductiion

Automatic forecasts of large numbers of time series are often needed in business and other contexts.

Why Businesses are demanding for automated time series forecasting?

Retail or B2B:

Weekly Sales Forecasts by Store. Many companies could have hundreds or thousands of stores. One of our previous employers operated 1,500 pet stores in the United States, Canada, and Puerto Rico.
Daily or Monthly Units Sold by SKU (item). Many retail and B2B companies could have tens of thousands to hundreds of thousands of SKUs. At a previous employer of one of the Remix Institute co-founders, they had over 650,000 SKUs. Imagine having to do that many item demand forecasts.

eCommerce:

Daily or Weekly Visits by Channel, Source, and/or Medium using Google Analytics data. This could have thousands of combinations depending on how complex your digital marketing operation is.
Daily Customers, New Customers, Revenue, and Units Sold by Channel. This could also have hundreds of combinations to forecast for.

What are the challenges of making automated forecasts?

Lack of Automation.

Many current forecasting processes at companies require someone or multiple people to update an ugly, complicated Excel spreadsheet with multiple tabs and formulas. The process for doing this is error prone.

Scalability. Often, forecasting processes at companies are done by using someone’s own non-statistical methodology for forecasting, and that someone usually leaves no documentation for how to update it, reverse engineer it, or integrate it to current business processes.

Computation and Turnaround Time.

Let’s face it. Doing thousands or hundreds of thousands of forecasts takes a long time to do. Especially if they’re manual. At past companies, we’ve seen this process takes several hours and sometimes days. The other thing is that the managers, VPs, and business stakeholders need it done yesterday and run around like it’s a big deal if it’s not done on their arbitrary deadlines.

Lack of Resources and Personnel.

Several people could be involved in creating forecasts for thousands of stores or SKUs, and it becomes an even bigger challenge if those people need to be quantitative experts.

Bias and Lack of Accuracy.

Oftentimes, there’s too much manual and human intervention giving “guard rails” to the forecasts with no documentation on why they were put in place. Any form of human intervention leads to what is called “error” in time series forecasts, which is the difference between the actual and the predicted value.

Our solutions of automated forecasting problems

We delivers cutting edge, fully automate, automatic data forecasting services. Our fully automated forecasting solution models and forecasts time-series (time-stamped) data in a fully automated manner. A powerful, fully integrated Artificial Neural Network (ANN) compliments our fully automated data forecasting technology and services. Automatic time series forecasting by us provides insight and value. Allow our fully automated (fully automatic) forecasting technology/solution/platform to model and forecast your time-series (time series), time stamped (time stamped) daily, weekly, monthly and quarterly data. Acclaimed Labs owns and operates the most powerful, fully automatic/automated data forecasting solutions. Our technology uses OSS (open source software) Python and R in our proprietary, in house, fully automated central forecasting solution. Providing the finest statistical based forecasts available, our scaleable platform offers an innovative, disruptive, value added technology to you and your clients. Health data forecasting, airline data and airline revenue forecasting, financial forecasting, time-series data forecasting automation, automatic data forecasting for health, retail, financial and airline related data. Using an expansive, virtually unlimited model repository, our platform solves the expert statistical modeling labor skills shortage, removes human bias, and improves speed.

Case Study: Weather Forecast from Historical Buoys Measurements

Purpose

The purpose of this case study is to provide an overview of forecasting with multiple time series with machine learning technicque. The benefits to modeling multiple time series with machine learning in one go with a single model or ensemble of models include (a) modeling simplicity, (b) potentially more robust results from pooling data across time series, and (c) solving the cold-start problem when few data points are available for a given time series.

To illustrate forecasting with multiple time series, we’ll use the data_buoy dataset, which consists of daily sensor measurements of several environmental conditions collected by 14 buoys in Lake Michigan from 2012 through 2018. The data were obtained from NOAA’s National Buoy Data Center available at https://www.ndbc.noaa.gov/ using the rnoaa package.

Outcome: Average daily wind speed in Lake Michigan.
Forecast horizon: Daily, 1 to 30 days into the future which is essentially January 2019 for this dataset.
Time series: 14 outcome time series collected from buoys throughout Lake Michigan.
Model: A single gradient boosted tree model with xgboost for each of 3 direct forecast horizons.

Overview of the Dateset

data_buoy_gaps consists of:

date: A date column which will be removed for modeling.
wind_spd: The outcome which is treated as a lagged feature by default.
lat and lon: Latitude and longitude which are features that are static or unchanging through time.
day and year: Dynamic features which won’t be lagged but whose future values will be filled in when forecasting.
air_temperature and sea_surface_temperature: Data collected from the buoys through time (lagged features).

Visualisation of Wind Speed Outcome

Machine Learning Modelling and Forecasting

Model Setup

# All defaults

outcome_col <- 1  # The column position of our 'wind_spd' outcome.

horizons <- c(1, 7, 30)  # Forecast 1, 1:7, and 1:30 days into the future.

lookback <- c(1:30, 360:370)  # Features from 1 to 30 days in the past and annually.

dates <- data$date  # Grouped time series forecasting requires dates.
data$date <- NULL  # Dates, however, don't need to be in the input data.

frequency <- "1 day"  # A string that works in base::seq(..., by = "frequency").

dynamic_features <- c("day", "year")  # Features that change through time but which will not be lagged.

groups <- "buoy_id"  # 1 forecast for each group or buoy.

static_features <- c("lat", "lon")  # Features that do not change through time.

Training dataset

We have 3 datasets for training models that forecast 1, 1 to 7, and 1 to 30 days into the future. We’ll view a sample of 1-day-ahead training data below.

Cross-Validation (CV) Setup

We’ll model with 3 validation datasets. Given that our measurements are taken daily, we’ll set the skip = 730 argument to skip 2 years between validation datasets.

Model training

This should take ~1 minute to train our ‘3 forecast horizons’ * ‘3 validation datasets’ = 9 models.
These models could be trained in parallel on any OS. To avoid nested parallelization, models are either trained in parallel across forecast horizons or validation windows, whichever is longer (when equal, the default is parallel across forecast horizons).

summary(model_results_cv$horizon_1$window_1$model)

                Length Class              Mode       
handle               1 xgb.Booster.handle externalptr
raw             318357 -none-             raw        
best_iteration       1 -none-             numeric    
best_ntreelimit      1 -none-             numeric    
best_score           1 -none-             numeric    
niter                1 -none-             numeric    
evaluation_log       3 data.table         list       
call                10 -none-             call       
params               5 -none-             list       
callbacks            2 -none-             list       
feature_names      128 -none-             character  
nfeatures            1 -none-             numeric

Historical model fit

Overview of All Sites

A Closer look at buoy_id: 1, 2 and 3

Forecasting

Plot all forecasts.

Visualisation of a single forecast for buoy_id: 10

Automated Time Series Forecasting Showcase