1 Introduction

Scotty is a ride-sharing business that operating in several big cities in Turkey. The company provide motorcycles ride-sharing service for Turkey’s citizen, and really value the efficiency in traveling through the traffic.

Scotty provides customer with real-time transaction dataset. With this dataset, we are going to help them in solving some forecasting and classification problems in order to improve their business processes.

1.1 Business Question

In this analysis we build forecast model to help Scotty ready for the end of 2017 demands. Unfortunately, Scotty is not old enough to have last year data for December, so we can not look back at past demands to prepare forecast for December’s demands. As an investment for the business’ future, we need to develop an automated forecasting framework so we don’t have to meddling with forecast model selection anymore in the future.
Build an automated forecasting model for hourly demands that would be evaluated on the next 7 days (Sunday, December 3rd 2017, to Monday, December 9th 2017)!
This analysis is to forecast hourly demand of scotty in 3 areas.

2 Basic Concept : Time Series

2.1 Time Series

Time series is a method of analyzing and processing data which the values are affected by time. The action of predicting future values based on its value in the previous period of time is called forecasting. The data which formatted into a time series (ts) object must have some charactersitics:
* no missing intervals
* no missing values
* data should be ordered by time

2.2 Exploratory Data Analysis (EDA)

A ts object can be decomposed into 3 main components which will be calculated for forecasting. These components are :
- trend (T) : the movement of mean, globally,throughout an interval
- seasonal (S) : the pattern captured on each seasonal interval
- error (E) : the pattern /value that cannot be captured by both trend and seasonal.

3 Solution

3.1 Import Library

library(readr) #to read data
library(DT) #to visualize data in table
library(lubridate) #to dea with data
library(tidyverse) #for data wrangling 
library(dplyr) #for data wrangling
library(ggplot2) #for basic EDA
library(TSstudio) #time series library
library(padr) # for padding
library(forecast) # for forecasting
library(tseries) # for adf.test
library(MLmetrics)#for calculating error
library(yardstick) #for measuing forecast performance
library(purrr) #for functional programming
library(tidyquant) #for some ggplot aesthetic
library(recipes) #for data preprocess
library(zoo) #for fill NA data
library(tidyr) #for function spread/nest
library(recipes) #for function ts and msts
library(tibble) #for function enframe
library(tidymodels) #for function rmse_vec
library(png) #for function readPNG
library(grid) #for function grid.raster

3.2 Read Data

scotty <- read_csv("data-train.csv")

3.3 Data Preparation

colSums(is.na(scotty))
##                 id            trip_id          driver_id           rider_id 
##                  0               4676               4676                  0 
##         start_time            src_lat            src_lon           src_area 
##                  0                  0                  0                  0 
##       src_sub_area           dest_lat           dest_lon          dest_area 
##                  0                  0                  0                  0 
##      dest_sub_area           distance             status confirmed_time_sec 
##                  0                  0                  0                  0

The main variable of this research are src_sub_area (area/location where the cutomers request for ride) and start_time (time where the customers make request for the ride). And there are no missing value of those variables, so we can proceed to the next step.

3.4 Data Preprocess

#check structure of the data
str(scotty)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 90113 obs. of  16 variables:
##  $ id                : chr  "59d005e1ffcfa261708ce9cd" "59d0066affcfa261708ceb11" "59d006a1ffcfa261708ceba4" "59d006d8cb564761a8fe5efd" ...
##  $ trip_id           : chr  "59d005e9cb564761a8fe5d3e" NA "59d006c131e39c618969343d" "59d007193d32b861760d4a77" ...
##  $ driver_id         : chr  "59a892c5568be44b2734f276" NA "599dc0dfa5b4fd5471ad8453" "59a5856573d56e7103b5d51d" ...
##  $ rider_id          : chr  "59ad2d6efba75a581666b506" "59cd704bcf482f6ce2fadfdb" "59bd62cc7e3c3b663924ba86" "59c17cd1f6da2d274e16d0a7" ...
##  $ start_time        : POSIXct, format: "2017-10-01 00:00:17" "2017-10-01 00:02:34" ...
##  $ src_lat           : num  41.1 41.1 41 41.1 41.1 ...
##  $ src_lon           : num  29 29 29 29 29 ...
##  $ src_area          : chr  "sxk9" "sxk9" "sxk9" "sxk9" ...
##  $ src_sub_area      : chr  "sxk9s" "sxk9e" "sxk9s" "sxk9e" ...
##  $ dest_lat          : num  41.1 41.1 41 41.1 41 ...
##  $ dest_lon          : num  29 29 29 29 29.1 ...
##  $ dest_area         : chr  "sxk9" "sxk9" "sxk9" "sxk9" ...
##  $ dest_sub_area     : chr  "sxk9u" "sxk9e" "sxk9e" "sxk9e" ...
##  $ distance          : num  5.38 1.13 4.17 3.36 11.69 ...
##  $ status            : chr  "confirmed" "nodrivers" "confirmed" "confirmed" ...
##  $ confirmed_time_sec: num  8 0 32 65 0 27 32 30 0 24 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_character(),
##   ..   trip_id = col_character(),
##   ..   driver_id = col_character(),
##   ..   rider_id = col_character(),
##   ..   start_time = col_datetime(format = ""),
##   ..   src_lat = col_double(),
##   ..   src_lon = col_double(),
##   ..   src_area = col_character(),
##   ..   src_sub_area = col_character(),
##   ..   dest_lat = col_double(),
##   ..   dest_lon = col_double(),
##   ..   dest_area = col_character(),
##   ..   dest_sub_area = col_character(),
##   ..   distance = col_double(),
##   ..   status = col_character(),
##   ..   confirmed_time_sec = col_double()
##   .. )

3.4.1 Data Aggregation

  1. Since we want to forecast total demand per hour so we need to floor the date into specific time, in this case to hour.
  2. We also need to summarize the total demand/order per hour using count n().
  3. It is compulsary to have full and continuous data based on series (hour) therefore we need to do padding
  4. Fill value of total_order with 0 when there is no demand/order in that time.
# Data Aggregation and Padding
scotty_agg <- scotty %>% 
    mutate(datetime = floor_date(start_time, unit = "hours")) %>% #1
    group_by(src_sub_area, datetime) %>% 
    summarise(demand = n()) %>% #2
    pad() %>% #3
    fill_by_value(demand, value = 0) %>% #4
    ungroup()

3.4.1.1 To visualize total order of scotty based on area :

plot1 <- ggplot(data = scotty_agg, aes(x=datetime, y=demand))+
  geom_line()+
  labs(x = NULL, Y = NULL)+
  facet_wrap(~ src_sub_area, scale ="free", ncol=1)+
  tidyquant::theme_tq()

plot1

#check the range of the data
range(scotty_agg$datetime)
## [1] "2017-10-01 00:00:00 UTC" "2017-12-02 23:00:00 UTC"

The series data start from 2017-10-01 00:00:00 UTC until 2017-12-02 23:00:00 UTC

3.5 Cross Validation Scheme

In this step we need to the following steps as followed :
1. Splitting data into data training and validation data set.

3.5.1 Splitting Data

In this splitting data, we do not use rolling origin method, we use latest 7 days data series for data validation by the source area. It is because this analysis’ purpose is to forecast total demand for 7 days.

# train-test-size
test_size <- 24*7 # the total of data validation is one week (7 days) since the real data test to predict is 7 days also
train_size <- nrow(scotty_agg)/3 - test_size

# get the min-max of the time index for each sample
test_end <- max(scotty_agg$datetime)
test_start <- test_end - hours(test_size) + hours(1)

train_end <- test_start - hours(1)
train_start <- train_end - hours(train_size) + hours(1)

To make it more handy, I combined the start and end indices into a date interval :

# get the interval of each samples
intrain <- interval(train_start, train_end)
intest <- interval(test_start, test_end)

intrain
## [1] 2017-10-01 UTC--2017-11-25 23:00:00 UTC
intest
## [1] 2017-11-26 UTC--2017-12-02 23:00:00 UTC

Visualize the train and test series using interval

# plot the train and test
scotty_agg %>% 
  mutate(sample = case_when(
    datetime %within% intrain ~ "train",
    datetime %within% intest ~ "test"
  )) %>% 
  drop_na() %>% 
  mutate(sample = factor(sample, levels = c("train","test"))) %>% 
  ggplot(aes(x=datetime, y = demand, color = sample)) +
    geom_line()+
    labs(x=NULL, y=NULL, color = NULL)+
    facet_wrap(~ src_sub_area, scale = "free", ncol =1)+
    tidyquant::theme_tq()+
    tidyquant::scale_colour_tq()

3.6 Data Processing

Data processing is a very crucial step in time series model fitting. In this tutorial, I will use recipes package for data preprocessing.
Since recipes package work columnwise, we need to convert our data into a wide format first :

#converting to wide format
scotty_agg <- scotty_agg %>% 
  spread(src_sub_area, demand)

Then we could start to define the preprocess recipe(), and bake() our data based on the defined recipe :

# recipes : 
scotty_recipe <- recipe(formula = ~.,
                        data = scotty_agg) %>% 
  #step
  step_scale(all_numeric()) %>% #do scaling
  #prep
  prep()

# preview the bake results
scotty1 <- bake(scotty_recipe,scotty_agg)

Note : When we use recipes package, the next steps is to create a revert back function:

#revert back function
scotty_recipe_revert <- function(vector, recipe, varname){
  
  #store recipe values
  results <- recipe$steps[[1]]$sds[varname]*vector
  
  #add additional adjustment if necessary
  results <- round(results)
  
  #return the results
  results
}

Now we can convert our data into the long format again :

#convert back to long format
scotty1 <- scotty1 %>% 
  gather(src_sub_area, demand, -datetime)

Adding one new variable sample to differ data-train and data-test:

scotty1 <- scotty1 %>% 
  mutate(sample = case_when(
    datetime %within% intrain ~ "train",
    datetime %within% intest ~ "test"
  ))

Creating list data based on sub area. In the next process it will be divided into data-train and data-test:

scotty_nest <- scotty1 %>% 
  group_by(src_sub_area, sample) %>% 
  nest(.key= "data") %>% 
  pivot_wider(names_from = sample, values_from = data)

To convert our data into time series format ts() or msts(). So in this step we will make list data function named ts_function_list that consist of those functions, in which for time series we will use seasonality as following :
- ts for single seasonality, will use daily seasonality (frequency = 24)
- msts for multiple or complex seasonality, will use daily seasonality (frequency = 24) and weekly (frequency = 24*7)

# list function that will be used to creat timse series format data
ts_function_list <- list(
  ts = function(x) ts(x$demand, frequency = 24),
  msts = function(x) msts(x$demand, seasonal.periods = c(24, 24*7))
)

# combining ts_function_list with each sub area
ts_function_list <- ts_function_list %>% 
  rep(length(unique(scotty_nest$src_sub_area))) %>% 
  enframe("func_name", "func") %>% 
  mutate(
    src_sub_area = sort(rep(unique(scotty_nest$src_sub_area), length(unique(.$func_name))))
  )

At the end, grouping ts_function_list to scotty_nest :

scotty_nest <- scotty_nest %>% 
  left_join(ts_function_list)

4 Modelling Using Data Train

4.1 Make list time series model

To make model for time series format data, we will make list model named model_function_list that contained functions to do time series model fitting.

#list model that will be used in fitting model
model_function_list <- list(
  ets = function(x) ets(x),
  auto.arima = function(x) auto.arima(x),
  stlm = function(x) stlm(x),
  tbats = function(x) tbats(x, use.box.cox = FALSE)
)

#Combining model_function_list with each sub area
model_function_list <- model_function_list %>% 
  rep(length(unique(scotty_nest$src_sub_area))) %>% 
  enframe("model_name", "model") %>% 
  mutate(src_sub_area =
           sort(rep(unique(scotty_nest$src_sub_area), length(unique(.$model_name)))))

The next step is to combine model_funtion_list to scotty_nest. Since model ets and auto.arima cannot be applied for multiseasonal time series type, therefore we would not combine them with msts time series:

scotty_nest <- scotty_nest %>% 
  left_join(model_function_list) %>% 
  filter(
    !(model_name == "ets" & func_name == "msts"),
    !(model_name == "auto.arima" & func_name == "msts")
  )

4.2 Modelling

To run data in scotty_nest based on function and model that contained in that dataframe. First stage is running function to create time series data based on variable func then create model based on variable model in which the result will be restored to variable fitted :

scotty_nest <- scotty_nest %>% 
  mutate(
    params = map(train, ~ list(x=.x)),
    data = invoke_map(func, params),
    params = map(data, ~list(x=.x)),
    fitted = invoke_map(model,params)
  ) %>% 
  select(-data, -params)

4.3 Compare to Data Test

After we did modelling, here we will extract our error value if the model apply to our Data-test:

scotty_nest <- scotty_nest %>% 
  mutate(
    error = map(fitted, ~ forecast(.x, h =24*7)) %>% 
            map2_dbl(test, ~ rmse_vec(truth = .y$demand, estimate = .x$mean))
  ) %>% 
  arrange(src_sub_area, error)

Next we will plot our data forecast to compare our data forecast and data-val.

scotty_test <- scotty_nest %>% 
  mutate(
    forecast = map(fitted, ~ forecast(.x, h=24*7)) %>%
               map2(test, ~tibble(datetime = .y$datetime, demand = as.vector(.x$mean))),
    key = paste(func_name, model_name, sep="-"))

Next, we will convert our data that was in column format into row format :

scotty_test <- scotty_test %>% 
  select(src_sub_area, key, actual = test, forecast) %>% 
  spread(key, forecast) %>% 
  gather(key, value, -src_sub_area)

Since our forecast result is still in scotty_recipe function, therefore we need to convert the origini value using function scotty_recipe_revert.

scotty_test <- scotty_test %>% 
  unnest(value) %>% 
  mutate(demand = scotty_recipe_revert(demand, scotty_recipe,src_sub_area))

To visualize forecast result and data val in plot :

scotty_test %>% 
  ggplot(aes(x=datetime, y=demand, colour = key))+
    geom_line()+
    labs(x=NULL, y=NULL, colour = NULL)+
    facet_wrap(~src_sub_area, scale = "free", ncol =1)+
    theme_light()+
    tidyquant::scale_colour_tq()

4.4 Automated Model Selection

Based on model have been built before, we will take model with has lowest error value based on sub area using following steps:

scotty_model_selected <- scotty_nest %>% 
  group_by(src_sub_area) %>% 
  filter(error == min(error)) %>% 
  ungroup()

Evaluate model using MAE to data test :

scotty_mae <- scotty_test %>% 
  filter(
    paste(key,src_sub_area, sep = "-")%in%
      paste(scotty_model_selected$func_name,
            scotty_model_selected$model_name,
            scotty_model_selected$src_sub_area,
            sep = "-")
    | key == "actual"
  ) %>% 
  mutate(
    demand = as.numeric(demand),
    key = if_else(key == "actual", "actual", "forecast")
  ) %>% 
  spread(key, demand)

# MAE per sub area
scotty_mae_by_src_sub_area <- scotty_mae %>% 
  group_by(src_sub_area) %>% 
  nest() %>% 
  mutate(
    tmp = map(data, ~invoke_map_dfc(list(mae=mae_vec),
                                    truth = .x$actual, estimate =.x$forecast))
  ) %>% 
  select(-data) %>% 
  unnest() %>% 
  ungroup() %>% 
  add_row(src_sub_area = "all_sub_area",
          mae = MLmetrics::MAE(scotty_mae$forecast, scotty_mae$actual)) %>% 
  mutate(
    mae = round(mae,4)
  )

5 Improving Model

Here we do model improvement in our data pre-processing. In our function scotty_recipe where we only do data scalling, here we try to make new function scotty_rec_imp as following process :
- square
- mean

scotty_rec_imp <- recipe(formula = ~.,
                         data = scotty_agg) %>% 
  #step
  step_sqrt(all_numeric()) %>% #squarring
  step_center(all_numeric()) %>%  #mean subtraction
  step_scale(all_numeric()) %>%  #do scalling
  #prep
  prep()

Next, we do implementation scotty_rec_imp to data scotty_agg we use bake() function :

scotty2 <- bake(scotty_rec_imp, scotty_agg)

After we do data manipulation, we have to revert out data value to origin value in function scotty_rec_imp_rev :

scotty_rec_imp_rev <- function(vector, recipe, varname){
  #store recipe values
  recipe_center <- recipe$steps[[2]]$means[varname]
  recipe_scale <- recipe$steps[[3]]$sds[varname]
  
  #convert back based on the recipe
  results <- (vector * recipe_scale + recipe_center)^2
  
  #add additional adjustment if necessary
  results <- round(results)
  
  #return the results
  results
}

Next we convert our data to row-form using function gather() and create new variable sample to differ data train and data test :

scotty2 <- scotty2 %>% 
  gather(src_sub_area, demand, -datetime) %>% 
  mutate(sample = case_when(
    datetime %within% intrain ~ "train",
    datetime %within% intest ~ "test"
  ))

Next, we make data list based on sub area. Then it will be divided into train and test:

scotty_nest_imp <- scotty2 %>% 
  group_by(src_sub_area, sample) %>% 
  nest(.key = "data") %>% 
  pivot_wider(names_from = sample, values_from = data)

Next, we will combine function ts_function_list (contained list function of time series) that we made before using data scotty_nest_imp :

scotty_nest_imp <- scotty_nest_imp %>%
  left_join(ts_function_list)

Next step we combine model_function_list1 to data scotty_nest_imp :

scotty_nest_imp <- scotty_nest_imp %>% 
  left_join(model_function_list) %>% 
  filter(
    !(model_name == "ets" & func_name == "msts"),
    !(model_name == "auto.arima" & func_name == "msts")
  )

Next, we run all data in scotty_nest_imp to function and model in that data the the result we stored to variable fitted :

scotty_nest_imp <- scotty_nest_imp %>% 
  mutate(
    params = map(train, ~ list(x=.x)),
    data = invoke_map(func, params),
    params = map(data, ~ list(x=.x)),
    fitted = invoke_map(model, params)
  ) %>% 
  select(-data, -params)

Then, from models we make before, we take model with has lowest error value based on sub area with following steps :

scotty_model_imp_sel <- scotty_nest_imp %>% 
  #check error when it's implemented to data val
  mutate(
    error = map(fitted, ~ forecast(.x, h = 24*7)) %>% 
          map2_dbl(test, ~rmse_vec(truth = .y$demand, estimate = .x$mean))
  ) %>% 
  arrange(src_sub_area, error) %>% 
  # take model with lowest error value
  group_by(src_sub_area) %>%
  filter(error == min(error)) %>%
  ungroup()

Implement to data test, the MAE value is as followed :

scotty_mae_imp <- scotty_model_imp_sel %>% 
  # calculating forecast value based on the data val
  mutate(
    forecast = map(fitted, ~ forecast(.x, h=24*7)) %>% 
               map2(test, ~ tibble(datetime = .y$datetime, demand = as.vector(.x$mean))),
    key = paste(func_name, model_name, sep = "-")
  ) %>% 
  # Converting column-format-data to row-format-data
  select(src_sub_area, key, actual = test, forecast) %>% 
  spread(key, forecast) %>% 
  gather(key, value, -src_sub_area) %>% 
  # restore origin value using `scotty_rec_imp_rev`
  unnest(value) %>% 
  mutate(demand = scotty_rec_imp_rev(demand, scotty_rec_imp, src_sub_area)) %>%   
  # comparing model that has lowest value error
  filter(
    paste(key, src_sub_area, sep = "-")%in%
      paste(scotty_model_imp_sel$func_name,
            scotty_model_imp_sel$model_name,
            scotty_model_imp_sel$src_sub_area,
            sep="-")
    | key == "actual"
  ) %>%
  mutate(
    demand = as.numeric(demand),
    key = if_else(key == "actual", "actual", "forecast")
  ) %>%
  spread(key, demand)

scotty_mae_imp_by_src_sub_area <- scotty_mae_imp %>% 
  #calculating MAE per sub area
  group_by(src_sub_area) %>% 
  nest() %>% 
  mutate(
    tmp=map(data, ~invoke_map_dfc(list(mae_improve = mae_vec),
                                  truth = .x$actual, estimate = .x$forecast))
  ) %>% 
  select(-data) %>% 
  unnest() %>% 
  ungroup() %>% 
  add_row(src_sub_area = "all-sub-area",
          mae_improve = MLmetrics::MAE(scotty_mae_imp$forecast, scotty_mae_imp$actual)) %>% 
  mutate(
    mae_improve = round(mae_improve,4)
  )

#Forecasting Final for demand of date Dec-03-2017 until Dec-09-2017 Since our data train until Nov-25-2017, so we need to add 14 days ahead to do forecasting :

scotty_forecast <- scotty_model_imp_sel %>% 
  select(src_sub_area, train, everything(), -test) %>% 
  mutate(forecast = map(fitted, ~forecast(.x, h = 24*2*7)) %>% 
                    map2(train, ~tibble(datetime = timetk::tk_make_future_timeseries(.y$datetime, 24*7*2),
                                        demand = as.vector(.x$mean))))

To restore forecast value to origin value using function scotty_rec_imp_rev:

scotty_forecast <- scotty_forecast %>% 
  select(src_sub_area, actual = train, forecast) %>% 
  gather(key,value, -src_sub_area) %>% 
  unnest(value) %>%  
  mutate(demand = scotty_rec_imp_rev(demand,scotty_rec_imp, src_sub_area))

To see forecast result in plot form :

scotty_forecast %>% 
  ggplot(aes(x = datetime, y = demand, colour = key))+
  geom_line()+
  labs(x = NULL, y = NULL, colour = NULL)+
  facet_wrap(~src_sub_area, scale = "free", ncol =1)+
  theme_light()+
  tidyquant::scale_colour_tq()

Export data forecast result into CSV with data in range Dec-3-2017 until Dec-9-2017 :

# total data forecast
forecast_size <- 24*7

#start and end range for data forecast
forecast_end <- max(scotty_forecast$datetime)
forecast_start <- forecast_end - hours(forecast_size) + hours(1)

#interval for data forecast
inforecast <- interval(forecast_start, forecast_end)

#filter data for interval of forecast data
scotty_forecast <- scotty_forecast %>% 
  mutate(type = case_when(
    datetime %within% inforecast ~ "forecast"
  )) %>% 
  filter(
    type == 'forecast'
  ) %>% 
  select(-c(key, type))

# export to CSV file
write.csv(scotty_forecast, file= "submission-train-data.csv", row.names = FALSE)

6 Creating Model using combing data train and data val

Combining data train and data val :

scotty_combine <- scotty_model_imp_sel %>% 
  mutate(fulldata = map2(train, test, ~bind_rows(.x,.y))) %>% 
  select(src_sub_area, fulldata, everything(), -train, -test, -error) %>% 
  select(-fitted)

7 Create Model Based On Full Data

To run all data in scotty_combine to function and model that contained in the data.First step is to run function to create time series data based on var func then create new variable modeland the result will be stored to variable fitted:

scotty_combine <- scotty_combine %>% 
  mutate(
    params = map(fulldata, ~list(x=.x)),
    data = invoke_map(func, params),
    params = map(data, ~list(x=.x)),
    fitted = invoke_map(model, params)
  )

7.1 Forecasting Final (Dec-3-2017 until Dec-9-2017)

Here we have data until Dec-2-2017 since we have already combined the data train and val. Therefore to do forecasting we need to 7days data ahead :

scotty_forecast_full <- scotty_combine %>% 
  mutate(forecast =  map(fitted, ~ forecast(.x, h = 24 * 7)) %>%
                     map2(fulldata, ~ tibble(datetime = timetk::tk_make_future_timeseries(.y$datetime, 24 * 7),
                                             demand = as.vector(.x$mean)))
  )

Restoring forecast value using function scotty_rec_imp_rev:

scotty_forecast_full <- scotty_forecast_full %>% 
  select(src_sub_area, actual = fulldata, forecast) %>%
  gather(key, value, -src_sub_area) %>%
  unnest(value) %>%
  mutate(demand = scotty_rec_imp_rev(demand, scotty_rec_imp, src_sub_area))

Visualizing forecasting result :

scotty_forecast_full %>% 
  ggplot(aes(x=datetime, y= demand, colour = key))+
    geom_line()+
    labs(x=NULL, y=NULL, colour= NULL)+
    facet_wrap(~src_sub_area, scale = "free", ncol=1)+
    theme_light()+
    tidyquant::scale_colour_tq()

Export forecasting data result to to csv for data range Dec-3-2017 - Dec-9-2017 :

#filter data for forecast type
scotty_forecast_full <- scotty_forecast_full %>% 
  filter(key == "forecast") %>% 
  select(-key)

#export to csv file
write.csv(scotty_forecast_full, file = "submission-combine-data.csv", row.names = FALSE)

8 Summary

  1. Error value based on MAE :
scotty_mae_imp_by_src_sub_area
  1. Based on automated model selection, time series data using patern Complex Seasonality (daily and weekly) produce lower error value than time series data using single seasonality pattern (daily).

  2. After data submissions were sent to Scorring Dashboard, model forecasting resulted from data train is as followed :