1. Environment Setup & Parallel Processing

The set of libraries and features used in the nested forecast analysis differ compared to the ensembled blueprint. This approach is considered lighter and faster alternative to the ensemble technique. Instead of creating a superlearner model here the aim is at selecting the best performing model for each of the time series groups.

2. Pre-processing Pipeline

This step is identical to the ensembled blueprint.

## Rows: 110,710
## Columns: 8
## $ account_number  <dbl> 3433920, 5051374, 9098558, 8692010, 5728512, 5728512, ~
## $ stock_ticker    <chr> "AML", "AML", "AML", "AML", "AML", "AML", "AML", "AML"~
## $ country         <chr> "UK", "UK", "UK", "UK", "UK", "UK", "UK", "UK", "UK", ~
## $ filled_quantity <dbl> -150.000000, -10.000000, 550.000000, -8.000000, -10.00~
## $ status          <chr> "FILLED", "FILLED", "FILLED", "FILLED", "FILLED", "FIL~
## $ filled_price    <dbl> 17.975, 17.960, 18.300, 17.975, 20.620, 20.460, NA, 20~
## $ time_created    <dttm> 2021-04-28 16:56:00, 2021-05-13 23:25:00, 2021-05-14 ~
## $ time_executed   <dttm> 2021-07-19 11:34:00, 2021-07-19 11:34:00, 2021-07-08 ~

2.1. Wrangle & Transform

Same assumptions are applied here about categories and groups. The grouping will be done on daily basis as this proves to be the most accurate way to forecast this particular dataset.

## # A tibble: 109,567 x 3
## # Groups:   time_executed, stock_ticker [44,428]
##    stock_ticker filled_quantity time_executed      
##    <fct>                  <dbl> <dttm>             
##  1 AML                      150 2021-07-19 11:34:00
##  2 AML                       10 2021-07-19 11:34:00
##  3 AML                      550 2021-07-08 10:05:00
##  4 AML                        8 2021-07-19 11:34:00
##  5 AML                       10 2021-08-06 15:32:00
##  6 AML                       10 2021-08-06 10:25:00
##  7 AML                       10 2021-08-12 10:04:00
##  8 AML                        8 2021-07-08 17:02:00
##  9 AML                       10 2021-07-20 16:02:00
## 10 AML                       20 2021-08-06 10:02:00
## # ... with 109,557 more rows

## # A tibble: 258 x 3
##    stock_ticker time_executed       filled_quantity
##    <fct>        <dttm>                        <dbl>
##  1 AML          2021-07-01 00:00:00           5011.
##  2 AML          2021-07-02 00:00:00           4305.
##  3 AML          2021-07-05 00:00:00           3983.
##  4 AML          2021-07-06 00:00:00           6367.
##  5 AML          2021-07-07 00:00:00           8725.
##  6 AML          2021-07-08 00:00:00           6576.
##  7 AML          2021-07-09 00:00:00          13319.
##  8 AML          2021-07-12 00:00:00          10934.
##  9 AML          2021-07-13 00:00:00           6543.
## 10 AML          2021-07-14 00:00:00           4206.
## # ... with 248 more rows

One major drawback of using nested forecasting is that external regressors are very difficult to introduce in the modeling workflow. For the time being they have not been added.

2.2. Inspect temporal dynamics

There are apparent events (spikes) that can drastically worsen the predictions. These events represent seasonal patterns and are the result of periodic orders occurring on those days. Some are more pronounced than others and less sporadic.

2.3. Summary Diagnostics

## # A tibble: 1 x 12
##   n.obs start               end                 units scale  tzone diff.minimum
##   <int> <dttm>              <dttm>              <chr> <chr>  <chr>        <dbl>
## 1   258 2021-07-01 00:00:00 2021-08-31 00:00:00 secs  second UTC       -5270400
## # ... with 5 more variables: diff.q1 <dbl>, diff.median <dbl>, diff.mean <dbl>,
## #   diff.q3 <dbl>, diff.maximum <dbl>

2.4. TS Diag - analyze seasonal patterns

ACF - Autocorrelation between a target variable and lagged versions of itself.

PACF - Partial Autocorrelation removes the dependence of lags on other lags highlighting key seasonalities.

CCF - Shows how lagged predictors can be used for prediction of a target variable.

2.5. Rename column names to “date”, “value” and “item_id”

## # A tibble: 258 x 3
##    item_id date                 value
##    <fct>   <dttm>               <dbl>
##  1 AML     2021-07-01 00:00:00  5011.
##  2 AML     2021-07-02 00:00:00  4305.
##  3 AML     2021-07-05 00:00:00  3983.
##  4 AML     2021-07-06 00:00:00  6367.
##  5 AML     2021-07-07 00:00:00  8725.
##  6 AML     2021-07-08 00:00:00  6576.
##  7 AML     2021-07-09 00:00:00 13319.
##  8 AML     2021-07-12 00:00:00 10934.
##  9 AML     2021-07-13 00:00:00  6543.
## 10 AML     2021-07-14 00:00:00  4206.
## # ... with 248 more rows

3. Nested Time Series Data Tibble

## # A tibble: 6 x 4
##   item_id .actual_data      .future_data      .splits        
##   <fct>   <list>            <list>            <list>         
## 1 AML     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]>
## 2 CCL     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]>
## 3 HSBA    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]>
## 4 INRG    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]>
## 5 JDW     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]>
## 6 OCDO    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]>

4. Modeling

Recipe Specification

## # A tibble: 29 x 52
##     value date_index.num date_month date_day date_wday date_mday date_qday
##     <dbl>          <dbl>      <int>    <int>     <int>     <int>     <int>
##  1  5011.     1625097600          7        1         5         1         1
##  2  4305.     1625184000          7        2         6         2         2
##  3  3983.     1625443200          7        5         2         5         5
##  4  6367.     1625529600          7        6         3         6         6
##  5  8725.     1625616000          7        7         4         7         7
##  6  6576.     1625702400          7        8         5         8         8
##  7 13319.     1625788800          7        9         6         9         9
##  8 10934.     1626048000          7       12         2        12        12
##  9  6543.     1626134400          7       13         3        13        13
## 10  4206.     1626220800          7       14         4        14        14
## # ... with 19 more rows, and 45 more variables: date_yday <int>,
## #   date_mweek <int>, date_week <int>, date_week2 <int>, date_week3 <int>,
## #   date_week4 <int>, date_mday7 <int>, date_sin2_K1 <dbl>, date_cos2_K1 <dbl>,
## #   date_sin2_K2 <dbl>, date_sin4_K1 <dbl>, date_cos4_K1 <dbl>,
## #   date_sin4_K2 <dbl>, date_cos4_K2 <dbl>, date_sin7_K1 <dbl>,
## #   date_cos7_K1 <dbl>, date_sin7_K2 <dbl>, date_cos7_K2 <dbl>,
## #   date_sin14_K1 <dbl>, date_cos14_K1 <dbl>, date_sin14_K2 <dbl>, ...

Using combination of XGBoost, Random Forest & Prophet Boost for the Workflow

## # Nested Modeltime Table
##   # A tibble: 6 x 5
##   item_id .actual_data      .future_data      .splits         .modeltime_tables 
##   <fct>   <list>            <list>            <list>          <list>            
## 1 AML     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [3 ~
## 2 CCL     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [3 ~
## 3 HSBA    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [3 ~
## 4 INRG    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [3 ~
## 5 JDW     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [3 ~
## 6 OCDO    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [3 ~

4.2. Check for Errors

## # A tibble: 0 x 4
## # ... with 4 variables: item_id <fct>, .model_id <int>, .model_desc <chr>,
## #   .error_desc <chr>

4.3. Review Test Accuracy

4.4. Visualize Test Forecast

5.0 Select the Best Models

## # Nested Modeltime Table
##   # A tibble: 6 x 5
##   item_id .actual_data      .future_data      .splits         .modeltime_tables 
##   <fct>   <list>            <list>            <list>          <list>            
## 1 AML     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [1 ~
## 2 CCL     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [1 ~
## 3 HSBA    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [1 ~
## 4 INRG    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [1 ~
## 5 JDW     <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [1 ~
## 6 OCDO    <tibble [43 x 2]> <tibble [14 x 2]> <split [29|14]> <mdl_time_tbl [1 ~

5.1. Visualize the Best Models

Fitting the best models for each of the time series to the testing dataset is not great but given the limited data it is a good fit.

6.0 Refit

6.1. Check for Errors

## # A tibble: 0 x 4
## # ... with 4 variables: item_id <fct>, .model_id <int>, .model_desc <chr>,
## #   .error_desc <chr>

6.2 Visualize Future Forecast

For the most part the future forecast seems to be providing reasonable approximation of the future values, following the range of variance across the time series.

7. Save and export forecast results

## # A tibble: 6 x 5
##   item_id .model_desc .conf_hi .conf_lo predicted
##   <fct>   <chr>          <dbl>    <dbl>     <dbl>
## 1 AML     RANGER        12149.    1123.     6636.
## 2 CCL     PROPHET       11266.    -960.     5153.
## 3 HSBA    PROPHET       51815.   -7055.    22380.
## 4 INRG    PROPHET       20101.    1435.    10768.
## 5 JDW     PROPHET        3157.    -846.     1155.
## 6 OCDO    RANGER         9203.   -1990.     3607.

8.0 Backup code with hyper parameters

Equity Trading Forecast Analysis

Nested Forecasting Approach - Pick the Best

Metodi Simeonov

October, 2021