Automatic forecasts of large numbers of time series are often needed in business and other contexts.
Many current forecasting processes at companies require someone or multiple people to update an ugly, complicated Excel spreadsheet with multiple tabs and formulas. The process for doing this is error prone.
Scalability. Often, forecasting processes at companies are done by using someone’s own non-statistical methodology for forecasting, and that someone usually leaves no documentation for how to update it, reverse engineer it, or integrate it to current business processes.
Let’s face it. Doing thousands or hundreds of thousands of forecasts takes a long time to do. Especially if they’re manual. At past companies, we’ve seen this process takes several hours and sometimes days. The other thing is that the managers, VPs, and business stakeholders need it done yesterday and run around like it’s a big deal if it’s not done on their arbitrary deadlines.
Several people could be involved in creating forecasts for thousands of stores or SKUs, and it becomes an even bigger challenge if those people need to be quantitative experts.
Oftentimes, there’s too much manual and human intervention giving “guard rails” to the forecasts with no documentation on why they were put in place. Any form of human intervention leads to what is called “error” in time series forecasts, which is the difference between the actual and the predicted value.
We delivers cutting edge, fully automate, automatic data forecasting services. Our fully automated forecasting solution models and forecasts time-series (time-stamped) data in a fully automated manner. A powerful, fully integrated Artificial Neural Network (ANN) compliments our fully automated data forecasting technology and services. Automatic time series forecasting by us provides insight and value. Allow our fully automated (fully automatic) forecasting technology/solution/platform to model and forecast your time-series (time series), time stamped (time stamped) daily, weekly, monthly and quarterly data. Acclaimed Labs owns and operates the most powerful, fully automatic/automated data forecasting solutions. Our technology uses OSS (open source software) Python and R in our proprietary, in house, fully automated central forecasting solution. Providing the finest statistical based forecasts available, our scaleable platform offers an innovative, disruptive, value added technology to you and your clients. Health data forecasting, airline data and airline revenue forecasting, financial forecasting, time-series data forecasting automation, automatic data forecasting for health, retail, financial and airline related data. Using an expansive, virtually unlimited model repository, our platform solves the expert statistical modeling labor skills shortage, removes human bias, and improves speed.
The purpose of this case study is to provide an overview of forecasting with multiple time series with machine learning technicque. The benefits to modeling multiple time series with machine learning in one go with a single model or ensemble of models include (a) modeling simplicity, (b) potentially more robust results from pooling data across time series, and (c) solving the cold-start problem when few data points are available for a given time series.
To illustrate forecasting with multiple time series, we’ll use the data_buoy dataset, which consists of daily sensor measurements of several environmental conditions collected by 14 buoys in Lake Michigan from 2012 through 2018. The data were obtained from NOAA’s National Buoy Data Center available at https://www.ndbc.noaa.gov/ using the rnoaa package.
Outcome: Average daily wind speed in Lake Michigan.
Forecast horizon: Daily, 1 to 30 days into the future which is essentially January 2019 for this dataset.
Time series: 14 outcome time series collected from buoys throughout Lake Michigan.
Model: A single gradient boosted tree model with xgboost for each of 3 direct forecast horizons.
data_buoy_gaps consists of:
date: A date column which will be removed for modeling.
wind_spd: The outcome which is treated as a lagged feature by default.
lat and lon: Latitude and longitude which are features that are static or unchanging through time.
day and year: Dynamic features which won’t be lagged but whose future values will be filled in when forecasting.
air_temperature and sea_surface_temperature: Data collected from the buoys through time (lagged features).
# All defaults
outcome_col <- 1 # The column position of our 'wind_spd' outcome.
horizons <- c(1, 7, 30) # Forecast 1, 1:7, and 1:30 days into the future.
lookback <- c(1:30, 360:370) # Features from 1 to 30 days in the past and annually.
dates <- data$date # Grouped time series forecasting requires dates.
data$date <- NULL # Dates, however, don't need to be in the input data.
frequency <- "1 day" # A string that works in base::seq(..., by = "frequency").
dynamic_features <- c("day", "year") # Features that change through time but which will not be lagged.
groups <- "buoy_id" # 1 forecast for each group or buoy.
static_features <- c("lat", "lon") # Features that do not change through time.We have 3 datasets for training models that forecast 1, 1 to 7, and 1 to 30 days into the future. We’ll view a sample of 1-day-ahead training data below.
We’ll model with 3 validation datasets. Given that our measurements are taken daily, we’ll set the skip = 730 argument to skip 2 years between validation datasets.
This should take ~1 minute to train our ‘3 forecast horizons’ * ‘3 validation datasets’ = 9 models.
These models could be trained in parallel on any OS. To avoid nested parallelization, models are either trained in parallel across forecast horizons or validation windows, whichever is longer (when equal, the default is parallel across forecast horizons).
Length Class Mode
handle 1 xgb.Booster.handle externalptr
raw 318357 -none- raw
best_iteration 1 -none- numeric
best_ntreelimit 1 -none- numeric
best_score 1 -none- numeric
niter 1 -none- numeric
evaluation_log 3 data.table list
call 10 -none- call
params 5 -none- list
callbacks 2 -none- list
feature_names 128 -none- character
nfeatures 1 -none- numeric
Overview of All Sites
A Closer look at buoy_id: 1, 2 and 3
Plot all forecasts.
Visualisation of a single forecast for buoy_id: 10