Background

This is my Machine Learning Capstone Project for Algoritma Data Science Academy Batch 4 (Cohort Deragon).

For more information about Algoritma Data Science Academy, please see https://algorit.ma/

This is a project about a ride-sharing business that operating in several big cities in Turkey. The company provide motorcycles ride-sharing service for Turkey’s citizen, and really value the efficiency in traveling through the traffic–the apps even give some reference to Star Trek “beam me up” to their order buttons. In this project, we are going to help them and forecast the driver demand for the next 5 working days.

Data Preprocessing

We input the Beam_Me dataset into R :

These are the library we are going to use :

Let’s peek into the original Beam_Me dataset :

## Observations: 366,784
## Variables: 10
## $ timeStamp        <chr> "2017-11-03 19:02:31", "2017-10-01 17:45:56", "…
## $ driverID         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ riderID          <chr> "5941083dc01c9f3eeeaac726", "59d0d4363d32b86176…
## $ orderStatus      <chr> "nodrivers", "nodrivers", "nodrivers", "nodrive…
## $ confirmedTimeSec <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ srcGeohash       <chr> "6", "7", "7", "7", "7", "7", "7", "7", "7", "7…
## $ srcLong          <dbl> -60.87779, 0.00000, 0.00000, 0.00000, 0.00000, …
## $ srcLat           <dbl> -23.04455, 0.00000, 0.00000, 0.00000, 0.00000, …
## $ destLong         <dbl> 40.98718, 40.99750, 40.99750, 41.03570, 41.0449…
## $ destLat          <dbl> 29.02819, 28.85056, 28.85056, 29.06256, 29.0070…

As timeStamp is the date and hour of order, we change the class into Date or POSIXct with ymd_hms() funtion from lubridate library.

## [1] "POSIXct" "POSIXt"

The dataset includes some “cancelled” and duplicated “nodrivers” transactions, which is not representing a real demand, and should not be included in the time series analysis and forecast model. Based on this condition, mfirst we preprocess the data into a proper time series data that represent the true Beam_Me’s demands.

Based on customer behaviour, we can say that there are 3 scenarios of the orderStatus that will be counted as 1 legitimate order : 1. Costumer ordering and get “confirmed” status. 2. Customer ordering, get “nodrivers” (and it can be more than 1 time) and keep ordering until he/she get “confirmed” status. 3. Customer ordering, get “nodrivers” and “cancelled” his/her order.

We’d like to throw the “cancelled” data, as the “cancelled” order is previously presented as “nodrivers”. We add Date and Hour columns, as we want to predict the Hourly Demand and group our dataset per riderID, Date, and Hour.

Then we’d like to throw away the “nodrivers” duplicates. We add prevOrder to change the first “nodrivers” status into “NA” with lag() function, dupOrder to see if prevOrder is the same as orderStatus, and orderStatus to see if the orderStatus == “nodrivers” and dupOrder == “Duplicates” we will change it to “NA”, then we can filter our dataset, throwing the all “NA” data from orderStatus.

## # A tibble: 6 x 15
##   timeStamp           driverID riderID orderStatus confirmedTimeSec
##   <dttm>              <chr>    <chr>   <chr>                  <int>
## 1 2017-11-18 13:18:06 599eff5… 58c420… confirmed                 31
## 2 2017-11-25 21:48:23 596bb16… 58c52e… confirmed                 87
## 3 2017-10-01 14:19:09 58c4320… 58c6e4… confirmed                  6
## 4 2017-10-08 14:51:13 null     58c6e4… nodrivers                  0
## 5 2017-10-08 18:28:59 5998156… 58c6e4… confirmed                  9
## 6 2017-10-11 13:16:24 59c9a6f… 58c6e4… confirmed                 21
## # … with 10 more variables: srcGeohash <chr>, srcLong <dbl>, srcLat <dbl>,
## #   destLong <dbl>, destLat <dbl>, Date <chr>, Hour <int>,
## #   prevOrder <chr>, dupOrder <chr>, nextOrder <chr>

We subset the dataset into Beam_Me_data.

## # A tibble: 6 x 6
##   timeStamp           riderID          orderStatus srcGeohash Date     Hour
##   <dttm>              <chr>            <chr>       <chr>      <chr>   <int>
## 1 2017-11-18 13:18:06 58c420322eb67e3… confirmed   s          2017-1…    13
## 2 2017-11-25 21:48:23 58c52e464a1df94… confirmed   s          2017-1…    21
## 3 2017-10-01 14:19:09 58c6e4502776161… confirmed   s          2017-1…    14
## 4 2017-10-08 14:51:13 58c6e4502776161… nodrivers   s          2017-1…    14
## 5 2017-10-08 18:28:59 58c6e4502776161… confirmed   s          2017-1…    18
## 6 2017-10-11 13:16:24 58c6e4502776161… confirmed   s          2017-1…    13

These are the sum of “confirmed” and “nodrivers” orders which represent the real demand from 2017-10-01 to 2017-12-01.

## 
## confirmed nodrivers 
##    212224     14956

We calculate the demand of Beam_Me drivers, using summarise() and n(), then complete the hour category ( there are some hour that are missing in the data, means there is no order at that Hour, so we change the Demand into 0).

## Classes 'tbl_df', 'tbl' and 'data.frame':    1488 obs. of  3 variables:
##  $ Date  : chr  "2017-10-01" "2017-10-01" "2017-10-01" "2017-10-01" ...
##  $ Hour  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Demand: num  0 0 0 67 31 18 8 10 7 5 ...
## # A tibble: 6 x 4
##   Date        Hour Demand timeStamp          
##   <chr>      <int>  <dbl> <dttm>             
## 1 2017-10-01     0      0 2017-10-01 00:00:00
## 2 2017-10-01     1      0 2017-10-01 01:00:00
## 3 2017-10-01     2      0 2017-10-01 02:00:00
## 4 2017-10-01     3     67 2017-10-01 03:00:00
## 5 2017-10-01     4     31 2017-10-01 04:00:00
## 6 2017-10-01     5     18 2017-10-01 05:00:00
## # A tibble: 6 x 4
##   Date        Hour Demand timeStamp          
##   <chr>      <int>  <dbl> <dttm>             
## 1 2017-12-01    18    498 2017-12-01 18:00:00
## 2 2017-12-01    19    728 2017-12-01 19:00:00
## 3 2017-12-01    20    462 2017-12-01 20:00:00
## 4 2017-12-01    21    654 2017-12-01 21:00:00
## 5 2017-12-01    22    568 2017-12-01 22:00:00
## 6 2017-12-01    23    376 2017-12-01 23:00:00

Cleaned Data Used for Time Series Analysis and Forecasting Future Demands

## Observations: 1,488
## Variables: 4
## $ Date      <date> 2017-10-01, 2017-10-01, 2017-10-01, 2017-10-01, 2017-…
## $ Hour      <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ Demand    <dbl> 0, 0, 0, 67, 31, 18, 8, 10, 7, 5, 8, 23, 31, 19, 56, 4…
## $ timeStamp <dttm> 2017-10-01 00:00:00, 2017-10-01 01:00:00, 2017-10-01 …
## [1] "2017-10-01 00:00:00 +03" "2017-12-01 23:00:00 +03"
## [1] "Sunday"
## [1] "Friday"
##      Date      Hour    Demand timeStamp 
##         0         0         0         0

Time Series Model of Beam_Me Drivers Demand

Based on the service offered, the Beam_Me’s demands should have a seasonality. Let’s try to see the seasonality:

The seasonality is showing a strong additive pattern, but we also have to consider the trend pattern. If you look up closely, you could see that sometimes the trend showing a pattern too. This pattern in trend might be sourced from uncaptured extra seasonality from higher natural period–in this case, weekly or even monthly seasonality. To solve this complex seasonality, we need to convert the data into msts() object which accept multiple frequency setting. Let’s start by trying another higher natural period: a weekly period. Using msts() object, we can use mstl() to decompose multiple seasonalities:

Now we can see a clearer trend and could confirm the daily and weekly seasonality for the data.

Another way to see this in clear view is to plot the Demand distribution between every hours and weekdays:

From the distribution, we can confirm that not every weekdays have the same hour seasonality pattern.

Forecasting Beam_Me’s Demands per Hour

Based on previous trend and seasonality analysis, we must consider forecast model that could accept multiple seasonality. There are some options for this kind of model: 1. Use mstl() for seasonality adjustment, then use ets() to forecast the trend–this can be done simultaneously using stlm() 2. Use tbats() for complex combination of ets() and modelling the error with auto.arima() with complex seasonality setting

Another thing that need to be considered is the zero values. With so many zero value, we can not do log transformation for the data. One technique to solve this problem is adding a constant value to our data, and readjust the following forecast by substracting the same value of constant.

I will use a function to do the substraction for a forecast class objects:

Let’s split the dataset into train and test set. As one of the seasonal periods is 7 days and started at Sunday, we want to end the train set in a full loop and cut it to the last Saturday, which we have the last 6 days in the test set.

## Time Series:
## Start = c(17440, 1) 
## End = c(17495, 24) 
## Frequency = 24 
##    [1]    1    1    1   68   32   19    9   11    8    6    9   24   32
##   [14]   20   57   44   75   76   66  106   59   72   34   38   48   47
##   [27]   43   38   27   13    4    1    4   27   82  189  130   92   77
##   [40]   89  115  112  159  168  208  309  222  138   80   90   87   77
##   [53]   18   19    3    3    3   11   95  232  197  136   93  107  124
##   [66]  120  139  190  241  282  231  146  128   67   84   80   41   16
##   [79]    9    8    6   13  104  235  177  113  120  126  108  136  201
##   [92]  187  255  319  273  179  115  115   90   80   44   21   12   10
##  [105]    5   15  123  221  214  109  119  117  145  136  181  220  241
##  [118]  348  295  185  178  134   98   76   41   26    9    6    6   15
##  [131]  118  267  188  124  134  135  128  165  227  243  331  377  367
##  [144]  198  177  175  160  125   83   47   30   17    8   13   46  135
##  [157]  113  103  128  147  205  187  219  224  232  248  289  207  157
##  [170]  166  201  148  114   64   27   28    9    2    9   24   34   54
##  [183]   58   81  106  101  105  121  110  137  141  119   96   94   63
##  [196]   59   40   20   10    3    7   20  129  258  188  125  118  110
##  [209]  122  157  173  207  264  303  264  152  132  120   73   74   42
##  [222]   22   18    5    5   17  131  287  197  132  128  156  169  161
##  [235]  185  221  266  298  309  191  140  146  111   77   39   21   10
##  [248]    7    9   17  126  298  241  191  166  153  173  193  184  212
##  [261]  286  344  294  206  180  159   98  101   51   35   16   12    7
##  [274]   26  150  298  228  147  161  177  148  192  163  230  298  375
##  [287]  283  184  131  125   97   61   41   13   21   11    3   11  117
##  [300]  221  194  118  121  139  141  188  229  310  380  454  436  260
##  [313]  195  187  170  110   68   52   34   18    8   20   67  157  129
##  [326]   96  109  143  149  204  207  197  234  296  272  188  148  185
##  [339]  160  144  112   51   37   22   12   11   28   53   74   85   83
##  [352]  113  142  194  189  214  205  229  198  164  172  188  122   98
##  [365]   45   23   20   15    8   20  155  335  247  145  140  147  136
##  [378]  166  172  211  286  362  263  176  168  143   86   68   29   20
##  [391]   20    6    2   12  127  281  216  117  135  137  178  165  205
##  [404]  243  310  395  345  223  166  108  119   88   49   22    5    8
##  [417]    5   17  146  311  234  156  134  143  199  187  216  246  302
##  [430]  384  308  207  151  128  113  107   61   27   17    4    5   16
##  [443]  119  307  236  163  141  165  175  146  193  236  322  454  302
##  [456]  196  176  150  121  100   39   28   14   11    9   15  151  289
##  [469]  193  146  137  172  168  200  286  332  378  447  453  257  198
##  [482]  155  167  138   84   55   18   11    9   13   64  138  125   85
##  [495]  150  166  214  264  286  254  271  296  304  229  194  206  174
##  [508]  143   84   68   44   24   18   10   17   45   80   91  111  113
##  [521]  167  192  203  193  186  256  173  111  170  178  125   67   59
##  [534]   27   14    5    8   20  144  274  192  134  124  125  137  219
##  [547]  188  240  306  389  305  198  144  181  125   84   49   22   12
##  [560]    9    5   16  124  176  133   69   55   77   69   89  101  178
##  [573]  225  279  236  181  113  110  104   58   55   15   12    6    2
##  [586]   19  123  232  134  111   88  118  126  131  157  144  142  146
##  [599]   71   47   32   25   16   27   20   13    7    5    2    9   81
##  [612]  146  106   66   69   74  101  114  128  172  235  253  208  124
##  [625]  119   81   91   83   37   24   14    8    5   13  115  231  196
##  [638]  116  132  129  162  185  219  237  398  449  414  213  170  146
##  [651]  147  111   70   36   23   16    8   16   59   98  132  116   99
##  [664]  143  170  201  208  196  209  214  199  156  139  121   66   69
##  [677]   31   19   18   10    6    4    9   45   41   58   76   79  116
##  [690]  111  140  137  141  160  147  117   93   95   72   66   39   17
##  [703]   10    3    4   10   82  220  172   91  115  108  107  128  144
##  [716]  185  234  347  234  158  110   79   81   33   25   11    9    1
##  [729]    2   11   85  197  176   95  122  113  125  140  157  173  242
##  [742]  255  209  144  103   88   65   57   29   24    7    3    5   13
##  [755]   97  214  194   99  125  102  136  139  186  239  289  400  333
##  [768]  142  122  160   99   77   41   17   11    5    3   10   84  289
##  [781]  190  144  134  143  123  127  180  217  265  370  256  136  128
##  [794]  107   82   66   45   27   14    5    4    9  104  287  177  165
##  [807]  137  154  143  352  572  623  728 1160  809  468  331  319  260
##  [820]  169   64   27   31    6    4   10   22   99  131  132   90  133
##  [833]  169  201  181  134  128  175  228  156  124  117  120  106   76
##  [846]   69   33   10   10    3   12   60   80   69   82  108  155  146
##  [859]  172  147  128  146  149  142  148  122  130   93   40   27    7
##  [872]    5    7   12  141  283  207  155  116  133  163  174  219  257
##  [885]  288  354  274  136  176  118  104   75   42   23   24   14    5
##  [898]   12  139  301  204  142  132  289  243  269  292  347  440  687
##  [911]  449  229  212  186  163  104   71   24    8   18    3   13  111
##  [924]  301  236  178  171  183  258  276  284  385  436  608  579  289
##  [937]  193  182  170   90   78    7    1    1    1    1   33  395  274
##  [950]  177  169  175  224  236  319  329  416  617  566  326  257  215
##  [963]  190   96   72   25   19    7    9   21  142  449  337  202  155
##  [976]  269  274  306  382  472  579  835  724  397  270  244  251  161
##  [989]  103   60   60   42   20   19   51  161  154  149  184  220  270
## [1002]  345  283  333  326  392  299  248  263  246  194  154  123   82
## [1015]   36   40   27   23   25   81   99   92  154  175  292  333  274
## [1028]  202  198  220  227  190  170  172  139   95   53   43   11    4
## [1041]   10    6  103  439  255  163  151  170  171  215  288  303  356
## [1054]  472  355  236  156  131  147  125  110   23   10   17   11   18
## [1067]  140  454  254  198  153  227  260  280  291  353  414  656  491
## [1080]  222  234  169  195   77   87   26   18    6    5   11  180  352
## [1093]  296  196  169  249  248  261  319  328  545  593  473  321  248
## [1106]  207  139  129   70   57   17    7    6    9  128  378  280  258
## [1119]  175  187  211  211  252  323  389  483  363  234  144  182  115
## [1132]  113   57   58   30   22   13   12  140  384  260  201  169  217
## [1145]  317  321  397  517  622  834  709  336  249  224  238  195  107
## [1158]   78   66   42   12   13   45  180  204  155  185  202  267  310
## [1171]  293  300  320  385  367  205  191  203  210  180  162  111   68
## [1184]   33   22   11   23   63  111  104  116  135  192  171  193  148
## [1197]  220  218  140   79   86   65   33   37   15   26   12    6    3
## [1210]    5  101  262  160  107   72   58   80   77  115  110   77  157
## [1223]  119  105   63   58   29   19   26   13    2    2    2    6   73
## [1236]  221  175  105  102  163  167  155  197  262  400  568  379  155
## [1249]  147  174   85   72   54   35   17   10    8   11   96  254  194
## [1262]  129   89  121  125  143  202  210  252  339  242  147   96   84
## [1275]   78   85   33   21   19    9    3    6   80  246  170  111  136
## [1288]  142  191  187  285  247  283  470  306  174  165  144  147  134
## [1301]   56   50   23    5    6   18  127  307  251  212  222  336  339
## [1314]  442  483  622  840 1115 1039  576  416  381  372  194  101   56
## [1327]   38   26    8   12   27  168  170  140  194  226  385  359  407
## [1340]  332  369  357  370  231
## Multi-Seasonal Time Series:
## Start: 1 1
## Seasonal Periods: 24 168
## Data:
##    [1]    1    1    1   68   32   19    9   11    8    6    9   24   32
##   [14]   20   57   44   75   76   66  106   59   72   34   38   48   47
##   [27]   43   38   27   13    4    1    4   27   82  189  130   92   77
##   [40]   89  115  112  159  168  208  309  222  138   80   90   87   77
##   [53]   18   19    3    3    3   11   95  232  197  136   93  107  124
##   [66]  120  139  190  241  282  231  146  128   67   84   80   41   16
##   [79]    9    8    6   13  104  235  177  113  120  126  108  136  201
##   [92]  187  255  319  273  179  115  115   90   80   44   21   12   10
##  [105]    5   15  123  221  214  109  119  117  145  136  181  220  241
##  [118]  348  295  185  178  134   98   76   41   26    9    6    6   15
##  [131]  118  267  188  124  134  135  128  165  227  243  331  377  367
##  [144]  198  177  175  160  125   83   47   30   17    8   13   46  135
##  [157]  113  103  128  147  205  187  219  224  232  248  289  207  157
##  [170]  166  201  148  114   64   27   28    9    2    9   24   34   54
##  [183]   58   81  106  101  105  121  110  137  141  119   96   94   63
##  [196]   59   40   20   10    3    7   20  129  258  188  125  118  110
##  [209]  122  157  173  207  264  303  264  152  132  120   73   74   42
##  [222]   22   18    5    5   17  131  287  197  132  128  156  169  161
##  [235]  185  221  266  298  309  191  140  146  111   77   39   21   10
##  [248]    7    9   17  126  298  241  191  166  153  173  193  184  212
##  [261]  286  344  294  206  180  159   98  101   51   35   16   12    7
##  [274]   26  150  298  228  147  161  177  148  192  163  230  298  375
##  [287]  283  184  131  125   97   61   41   13   21   11    3   11  117
##  [300]  221  194  118  121  139  141  188  229  310  380  454  436  260
##  [313]  195  187  170  110   68   52   34   18    8   20   67  157  129
##  [326]   96  109  143  149  204  207  197  234  296  272  188  148  185
##  [339]  160  144  112   51   37   22   12   11   28   53   74   85   83
##  [352]  113  142  194  189  214  205  229  198  164  172  188  122   98
##  [365]   45   23   20   15    8   20  155  335  247  145  140  147  136
##  [378]  166  172  211  286  362  263  176  168  143   86   68   29   20
##  [391]   20    6    2   12  127  281  216  117  135  137  178  165  205
##  [404]  243  310  395  345  223  166  108  119   88   49   22    5    8
##  [417]    5   17  146  311  234  156  134  143  199  187  216  246  302
##  [430]  384  308  207  151  128  113  107   61   27   17    4    5   16
##  [443]  119  307  236  163  141  165  175  146  193  236  322  454  302
##  [456]  196  176  150  121  100   39   28   14   11    9   15  151  289
##  [469]  193  146  137  172  168  200  286  332  378  447  453  257  198
##  [482]  155  167  138   84   55   18   11    9   13   64  138  125   85
##  [495]  150  166  214  264  286  254  271  296  304  229  194  206  174
##  [508]  143   84   68   44   24   18   10   17   45   80   91  111  113
##  [521]  167  192  203  193  186  256  173  111  170  178  125   67   59
##  [534]   27   14    5    8   20  144  274  192  134  124  125  137  219
##  [547]  188  240  306  389  305  198  144  181  125   84   49   22   12
##  [560]    9    5   16  124  176  133   69   55   77   69   89  101  178
##  [573]  225  279  236  181  113  110  104   58   55   15   12    6    2
##  [586]   19  123  232  134  111   88  118  126  131  157  144  142  146
##  [599]   71   47   32   25   16   27   20   13    7    5    2    9   81
##  [612]  146  106   66   69   74  101  114  128  172  235  253  208  124
##  [625]  119   81   91   83   37   24   14    8    5   13  115  231  196
##  [638]  116  132  129  162  185  219  237  398  449  414  213  170  146
##  [651]  147  111   70   36   23   16    8   16   59   98  132  116   99
##  [664]  143  170  201  208  196  209  214  199  156  139  121   66   69
##  [677]   31   19   18   10    6    4    9   45   41   58   76   79  116
##  [690]  111  140  137  141  160  147  117   93   95   72   66   39   17
##  [703]   10    3    4   10   82  220  172   91  115  108  107  128  144
##  [716]  185  234  347  234  158  110   79   81   33   25   11    9    1
##  [729]    2   11   85  197  176   95  122  113  125  140  157  173  242
##  [742]  255  209  144  103   88   65   57   29   24    7    3    5   13
##  [755]   97  214  194   99  125  102  136  139  186  239  289  400  333
##  [768]  142  122  160   99   77   41   17   11    5    3   10   84  289
##  [781]  190  144  134  143  123  127  180  217  265  370  256  136  128
##  [794]  107   82   66   45   27   14    5    4    9  104  287  177  165
##  [807]  137  154  143  352  572  623  728 1160  809  468  331  319  260
##  [820]  169   64   27   31    6    4   10   22   99  131  132   90  133
##  [833]  169  201  181  134  128  175  228  156  124  117  120  106   76
##  [846]   69   33   10   10    3   12   60   80   69   82  108  155  146
##  [859]  172  147  128  146  149  142  148  122  130   93   40   27    7
##  [872]    5    7   12  141  283  207  155  116  133  163  174  219  257
##  [885]  288  354  274  136  176  118  104   75   42   23   24   14    5
##  [898]   12  139  301  204  142  132  289  243  269  292  347  440  687
##  [911]  449  229  212  186  163  104   71   24    8   18    3   13  111
##  [924]  301  236  178  171  183  258  276  284  385  436  608  579  289
##  [937]  193  182  170   90   78    7    1    1    1    1   33  395  274
##  [950]  177  169  175  224  236  319  329  416  617  566  326  257  215
##  [963]  190   96   72   25   19    7    9   21  142  449  337  202  155
##  [976]  269  274  306  382  472  579  835  724  397  270  244  251  161
##  [989]  103   60   60   42   20   19   51  161  154  149  184  220  270
## [1002]  345  283  333  326  392  299  248  263  246  194  154  123   82
## [1015]   36   40   27   23   25   81   99   92  154  175  292  333  274
## [1028]  202  198  220  227  190  170  172  139   95   53   43   11    4
## [1041]   10    6  103  439  255  163  151  170  171  215  288  303  356
## [1054]  472  355  236  156  131  147  125  110   23   10   17   11   18
## [1067]  140  454  254  198  153  227  260  280  291  353  414  656  491
## [1080]  222  234  169  195   77   87   26   18    6    5   11  180  352
## [1093]  296  196  169  249  248  261  319  328  545  593  473  321  248
## [1106]  207  139  129   70   57   17    7    6    9  128  378  280  258
## [1119]  175  187  211  211  252  323  389  483  363  234  144  182  115
## [1132]  113   57   58   30   22   13   12  140  384  260  201  169  217
## [1145]  317  321  397  517  622  834  709  336  249  224  238  195  107
## [1158]   78   66   42   12   13   45  180  204  155  185  202  267  310
## [1171]  293  300  320  385  367  205  191  203  210  180  162  111   68
## [1184]   33   22   11   23   63  111  104  116  135  192  171  193  148
## [1197]  220  218  140   79   86   65   33   37   15   26   12    6    3
## [1210]    5  101  262  160  107   72   58   80   77  115  110   77  157
## [1223]  119  105   63   58   29   19   26   13    2    2    2    6   73
## [1236]  221  175  105  102  163  167  155  197  262  400  568  379  155
## [1249]  147  174   85   72   54   35   17   10    8   11   96  254  194
## [1262]  129   89  121  125  143  202  210  252  339  242  147   96   84
## [1275]   78   85   33   21   19    9    3    6   80  246  170  111  136
## [1288]  142  191  187  285  247  283  470  306  174  165  144  147  134
## [1301]   56   50   23    5    6   18  127  307  251  212  222  336  339
## [1314]  442  483  622  840 1115 1039  576  416  381  372  194  101   56
## [1327]   38   26    8   12   27  168  170  140  194  226  385  359  407
## [1340]  332  369  357  370  231

We will also include a basic ets() with log transformation as a benchmark model. In modelling the log transformation, we can use lambda = 0 in stlm() setting. But tbats() function can not accept initial lambda = 0 value, so we need to convert to log before passing into tbats(), and reset the lambda value manually:

## Warning in InvBoxCox(y.forecast, object$lambda, biasadj, list(level =
## level, : biasadj information not found, defaulting to FALSE.

Let’s check the accuracy for each forecast models on our test dataset :

## Warning: 'tidy.matrix' is deprecated.
## See help("Deprecated")
## # A tibble: 3 x 8
##   models      ME  RMSE   MAE   MPE  MAPE  ACF1 Theil.s.U
##   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
## 1 ets    -117.   183.  124.   -Inf   Inf 0.872         0
## 2 stlm     23.6   80.0  43.2  -Inf   Inf 0.751         0
## 3 tbats     3.36  76.0  44.7  -Inf   Inf 0.728         0

It seems the accuracy of stlm() is showing the best MAE results, and tbats() being the second. But if we analyze the forecast using graph:

The forecast from tbats() is good because it follow the two seasonality very well. Meanwhile, the forecast from ets() is giving the highest error, too overfitting compared to the actual data. The last but not least, stlm(), showing a better seasonality modelling, eventhough the stlm() function following the same notion of ets(), the model choose to take a constant mean, but instead of one, the model give a constant mean for two seasonality–daily and weekly seasonality:

Notes from our teaching assistant : This is why stlm() results is slightly differ from tbats(), which modelling multiple seasonality using arima with fourier terms–we could say it like smoothing the seasonalities pattern. In other words, a tbats() model would trying to fit multiple seasonality by trying the best value to replicating the patterns, instead of using mean value produced from seasonality decomposition:

## Warning in InvBoxCox(y.forecast, object$lambda, biasadj, list(level =
## level, : biasadj information not found, defaulting to FALSE.

Notes from our teaching assistant : You can notice that the seasonalities are smoother than the forecast from stlm(). But don’t be deceived by the smoothing property of tbats(), because if look back at the principle of “believing the mean”, then tbats() forecast would be more fluctuative in term of seasonalities mean; recall that it does the smoothing process using arima, which is by nature seek a better fit to the fluctuation, instead of a better fit to the mean.

So we are faced with a common trade-off in forecast problems: trusting a “safe” constant mean model, the ets(), or take a “risk” on a more fluctuative model, tbats(). Turns out, the answer is “in between” of the two: stlm().

Key-advice from our teaching assistant : First, refit the models on full train dataset. Recall that the actual evaluation dataset is on the next 5 business days, so we need to forecast to 7 days ahead, then subset to only the last 5 days:

## Warning in InvBoxCox(y.forecast, object$lambda, biasadj, list(level =
## level, : biasadj information not found, defaulting to FALSE.
##             timeStamp etsForecast stlmForecast tbatsForecast
## 1 2017-12-04 00:00:00         271          126           126
## 2 2017-12-04 01:00:00         237          113           116
## 3 2017-12-04 02:00:00         189           98           109
## 4 2017-12-04 03:00:00         152           82            92
## 5 2017-12-04 04:00:00          88           34            59
## 6 2017-12-04 05:00:00          49           25            33

Based on previous modelings, we will choose the model with the lowest MAE, the stlmForecast, as the result of the demand forecasting for the next 5 business days (Monday, 2017/12/04 to Friday Friday, 2017/12/08).

##             timeStamp nOrder
## 1 2017-12-04 00:00:00    126
## 2 2017-12-04 01:00:00    113
## 3 2017-12-04 02:00:00     98
## 4 2017-12-04 03:00:00     82
## 5 2017-12-04 04:00:00     34
## 6 2017-12-04 05:00:00     25
##               timeStamp nOrder
## 115 2017-12-08 18:00:00    396
## 116 2017-12-08 19:00:00    497
## 117 2017-12-08 20:00:00    538
## 118 2017-12-08 21:00:00    733
## 119 2017-12-08 22:00:00    626
## 120 2017-12-08 23:00:00    356

The blue line : the actual data of Beam_Me driver demand from Oct 1, 2017 to Dec 1, 2017. The red line : the forecast for the next 5 working days, from Dec 4, 2017 to Dec 8, 2017. There’s a gap in the data, it means the weekends after Dec 1, 2017 so they don’t show in the data visualization.