DCD Proposal

Background

The global business landscapes have been rapidly changed due to the growth of technologies that enable business entities to enhance their business processes in many aspects. Online Commerce, or commonly known as e-commerce, has been one of the technologies that majorly influenced the changes as this technology provides a worldwide channel to market the products, connecting directly to the customers, and the ability to foresee the market demands to be fulfilled. The growth of the e-commerce activities is also forecasted to gradually grow as seen in the visualization below. According to eMarketer¹, e-commerce activities have an immense increase in 2019 and 2020 which respectively have a percentage of increase by 20.5 % and 25.7 % from their previous periods. The expected growth would also be seen in 2025 as it is foreseen that e-commerce activities would contribute 24.5% of the total retail sales activities.

With the growth of the e-commerce activities being expected, the prospective growth of main processes that supports the activities would also be foreseen. As one of the support activities, last-mile delivery is one of the crucial aspects of the logistics operations in e-commerce activities. According to World Economic Forum², the global last-mile delivery market would be expected to grow by 78% as of 2030.

By the growth of the last-mile ecosystem being predicted, the importance to optimize the costs that follow within the operation would also be highlighted. This is due to its high contribution towards the overall costs of the logistics process and the high expectations by the end customers to receive fast and reliable deliveries. To attain an optimized solution within the last-mile operation, however, is highly complex due to certain constraints that businesses have towards achieving the goals to optimize their operations.

One of the problems that consider the efficiency and effectiveness of a logistics and last-mile delivery process is the routing plan for the deliveries. The efficient route plans would support the reduction of operational costs and enhancement of customers’ satisfaction levels. This is due to its association with several aspects including the use of resources (human labor and gasoline), the production of carbon emission by the deliveries, and the lead time of deliveries between the order was created and the goods being delivered to the customer’s door.

The delivery route problem has been one of the limelight in the supply chain and logistics world as each of the operations would have different issues and constraints that require to be considered. By this matter, an advanced algorithm would be required to optimize the route plans that would be established for the business’s operations. Several algorithms have been applied for the problems including, but not limited to, Travelling Salesman Problem (TSP), Dynamic Programming, and Genetic Algorithm. Each of these algorithms, however, has several advantages and disadvantages which causes each of the algorithms may not be compatible with each of the unique routing problems.

Problem Statement

In this project, several algorithms would be demonstrated to optimize the solution of a Capacitated Vehicle Routing Problem (CVRP) including Genetic Algorithm, Travelling Salesman Problem, and Nearest Neighbor. The project may benefit the business stakeholders to comprehend the benefits and trade-offs of several algorithms to a specific case of a routing problem. To compare the algorithms that would be deployed in the project, The computational time and the cost evaluation by the total distance for each route plans generated by the algorithms.

Specifically, the cost evaluation would be performed by comparing each of the total distances generated by each algorithms to the upper limit and the lower limit of the cost evaluation. The upper limit would be the total distance cost that would be generated by a naive method, in which a method to generate the route plans randomly without considering the efficiency of the route that would be generated. This would be helpful for the users as it could compare the total distance generated by the algorithms with the total distance generated by any random route plans. Separately, the lower limit would be the true optimal that has been generated from the library data set. The purpose of this limit is to discern whether the algorithm would exceed the performance of the route optimization that has been conducted and provided in the library of the data sets. To catch a glimpse of the of the Capacitated Vehicle Routing Problem Library, the snapshot of the library would be seen as below. Note that the Opt column is the indication whether the total distances generated in UB were the true optimal from the library.

knitr::include_graphics("asset/library_snapshot.PNG")

Purpose of The Project

The output of this project is to establish a dashboard analysis that provides the visualization of a route plan that is developed by each of the utilized algorithms. The trade-offs of the computational time along with the cost evaluation would also be visualized. Further, the users would also be able to determine the input of orders and then simulate the routing plan to see the differences of each algorithm.

Project Scope

The project is scoped by the use of data sets acquired from the Capacitated Vehicle Routing Problem Library³ which is an open-source library to acquire several data sets of CVRP problems. The data sets are mainly divided into two groups including the .vrp files (including the information of coordinates, depot coordinate, vehicle capacity, and demand for each coordinate) and the .sol files (true optimal of the solution). The data set that would be used within this project is under the license by Uchoa et al. (2017)⁴ which consists of several instances of CVRP ranging from 101 nodes to 1001 nodes of locations and would only have a single depot for each instances.

Note that by the scope of the project is focused on Capacitated VRP, other constraints such as time-delivery window, multiple depots, and varieties of products would not be included in the project. Moreover, the Euclidean distances that would be used to measure the total distances generated would also become the limitation of this project. This is due to the measurement of a driving distance in a real case scenario would not be simply measured by a straight line. To attain the accurate measurements of the driving distances, several APIs such as API Key Google Maps⁵ could be used in R software which could be seen in the R Documentation⁶ of gmapsdistance.

Proposed Business Impact

By having an optimized route planning for the goods delivery, it would not only increase efficiency within the logistics operations and the last-mile delivery costs but also create a sustainable operation where the goods are traveled with the shortest distances by the optimization. In other words, the shortest distances of the deliveries would reduce the costs as well as the carbon footprint that has been created by the delivery operations. Thus, it would also align with the three bottom-line aspects of a business including Economy, Social, and Sustainability.

Target User of The Project

The target users for this project are the stakeholders of a business that concerns about the efficiency and the sustainability of their logistics operations. As the logistics problems are highly complex due to their ties to many elements in the business, the stakeholders should certainly comprehend the significance of the logistics efficiency. The benefit that could be acquired within this project is by giving a fundamental understanding of how a route optimization algorithm, such as Vehicle Routing Problem, could be implemented effectively within their business to vamp their operations and meet their bottom-lines

Relevance to other Industries

Despite the limitations of the problem, Capacitated Vehicle Routing Problem is one of the common issue and fundamental to route optimization problems. It could be implemented within various industries including, but not limited to, e-commerce, banking, aquaculture, etc. It also could also be implemented in a real case scenario such as truck delivery planning, KYC operations in banking, and so forth. In other words, the route optimization case that would be presented in the project could be highly implemented across industries

Data Wrangling

Reading the Data and Libraries

Previously, the data has been prepared by extracting the information of the parameter of each CVRP case and the locations and demands for each nodes from .vrp files and .sol files. The extraction code would be shown in the hidden code-chunk below.

## Parameter for n106_k14
#
#n106_k14 <- read.delim("data_input/X-n106-k14.vrp", header = F)
#
#n106_k14_meta <- n106_k14[1:6,] %>% select(-V3) %>% 
#  mutate(V1 = str_replace(V1, ":", "")) %>% 
#  t() %>% as.data.frame() 
#
#rownames(n106_k14_meta) <- NULL
#
#colnames(n106_k14_meta) <- n106_k14_meta[1,]
#
#n106_k14_meta <- slice(n106_k14_meta, 2)
#
## Parameter for n115_k10
#
#n115_k10 <- read.delim("data_input/X-n115-k10.vrp", header = F)
#
#
#compiled_meta <- rbind(n106_k14_meta, n115_k10[1:6, "V2"]) %>% 
#  clean_names()
#
## Parameter for n153_k22 
#
#n153_k22 <- read.delim("data_input/X-n153-k22.vrp", header = F)
#
#compiled_meta[3,] <- n153_k22[1:6, "V2"]
#
## Parameter for n186_k15
#
#n186_k15 <- read.delim("data_input/X-n186-k15.vrp", header = F) 
#
#compiled_meta[4,] <- n186_k15[1:6, "V2"]
#
## Parameter for n190_k8
#
#n190_k8 <- read.delim("data_input/X-n190-k8.vrp", header = F)
#
#compiled_meta[5,] <- n190_k8[1:6, "V2"]
#
## Extracting True Optimal
#
#sol_n106 <- read.delim("data_input/X-n106-k14.sol", header = F)[-c(1:14),] %>% 
#  str_replace("Cost ", "")
#
#sol_n115 <- read.delim("data_input/X-n115-k10.sol", header = F)[-c(1:10),] %>% 
#  str_replace("Cost ", "")
#
#sol_n153 <- read.delim("data_input/X-n153-k22.sol", header = F)[-c(1:23),] %>% 
#  str_replace("Cost ", "")
#
#sol_186 <- read.delim("data_input/X-n186-k15.sol", header = F)[-c(1:15),] %>% 
#  str_replace("Cost ", "")
#
#sol_190 <- read.delim("data_input/X-n190-k8.sol", header = F)[-c(1:8),] %>% 
#  str_replace("Cost ", "")
#
#
#true_optimal <- c(sol_n106, sol_n115, sol_n153, sol_186, sol_190)
#
#compiled_meta <- compiled_meta %>% 
#  mutate(true_optimal = true_optimal)
#
#write.csv(compiled_meta, "data_input/compiled_meta.csv")
#
## Extracting Node and Demand
#
#n106_node <- slice(n106_k14, -c(1:7)) %>% 
#  slice(-c(107:217)) %>% 
#  rename(node = "V1", x1 = "V2", x2 = "V3") %>% 
#  mutate(demand = slice(n106_k14, 115:220)$V2,
#         instance = compiled_meta[1, "name"]) %>% 
#  select(instance, everything()) %>% 
#  imap_dfc(~ if(is.character(.x) & .y != "instance") as.integer(.x) else .x)
#
#n115_node <- n115_k10 %>% slice(-c(1:7)) %>% 
#  slice(-c(116:235)) %>% 
#  rename(node = "V1", x1 = "V2", x2 = "V3") %>% 
#  mutate(demand = slice(n115_k10, 124:238)$V2,
#         instance = compiled_meta[2, "name"]) %>% 
#  select(instance, everything()) %>% 
#  imap_dfc(~ if(is.character(.x) & .y != "instance") as.integer(.x) else .x)
#  
#
#n153_node <- n153_k22 %>% slice(-c(1:7)) %>% 
#  slice(-c(154:311)) %>% 
#  rename(node = "V1", x1 = "V2", x2 = "V3") %>% 
#  mutate(demand = slice(n153_k22, 162:314)$V2,
#         instance = compiled_meta[3, "name"]) %>% 
#  select(instance, everything()) %>% 
#  imap_dfc(~if(is.character(.x) & .y != "instance") as.integer(.x) else .x)
#
#n186_node <- n186_k15 %>% slice(-c(1:7)) %>% 
#  slice(-c(187:377)) %>% 
#  rename(node = "V1", x1 = "V2", x2 = "V3") %>% 
#  mutate(demand = slice(n186_k15, 195:380)$V2,
#         instance = compiled_meta[4, "name"]) %>% 
#  select(instance, everything()) %>% 
#  imap_dfc(~if(is.character(.x) & .y != "instance") as.integer(.x) else .x)
#
#n190_node <- n190_k8 %>% slice(-c(1:7)) %>% 
#  slice(-c(191:385)) %>% 
#  rename(node = "V1", x1 = "V2", x2 = "V3") %>% 
#  mutate(demand = slice(n190_k8, 199:388)$V2,
#         instance = compiled_meta[5, "name"]) %>% 
#  select(instance, everything()) %>% 
#  imap_dfc(~if(is.character(.x) & .y != "instance") as.integer(.x) else .x)
#
#compiled_node <- rbind(n106_node, n115_node, n153_node, n186_node, n190_node) %>% 
#  mutate(instance = as.factor(instance))
#
#write.csv(compiled_node, "data_input/compiled_node.csv")

Note that there are 5 instances that have been selected for this project including X-n106-k14, X-n115-k10, X-n153-k22, X-n186-k15, X-n190-k8. Likewise, the numbers succeeding the ‘k’ in instance names represents the number of vehicle that would be available to each problem. To visualize the number of vehicles that would be used on each problem, the value of ‘k’ would then be extracted from the case names.

library(tidyverse)
library(janitor)
library(gridExtra)
library(scales)

params <- read_csv("data_input/compiled_meta.csv") %>% select(-c(...1, comment))
  
params <- params %>% 
  mutate(num_of_k = str_sub(params$name, start = -2, end = -1)) %>% 
  mutate(num_of_k = as.integer(str_replace(num_of_k, "k", "")))
  
node <- read_csv("data_input/compiled_node.csv") %>% 
  select(-...1)

params

The data set consists of two data frames namely params and node. In params, the information that is included are as follow:

name : the name of the instance/case of CVRP
type : the type of VRP
dimension : number of nodes for each instances
edge_weight_type : the type of distance calculation
capacity : the capacity of each vehicle in each cases
true_optimal : the optimal solutions (by sum of distance) for each cases

node %>% head()

Separately, the node data frame consists of a compiled data of locations and demands for each instances. The descriptions are as follow:

instance : name of the case
node : node number ID
x1 & x2 : node location
demand : number of demand for each locations

Likewise, the node 1 could be considered as the depot location as there are no demand for these nodes.

node %>% filter(node == 1)

Exploratory Data Analysis

In this section, the data would be explored by several visualizations to comprehend the overall insights of the data contents. The visualizations would be divided by the parameters of the data sets and the grid of each instances.

Parameter EDAs

In the parameter of the data sets, the data would be visualized to explore each of the parameters or constraints that would be used in the CVRP problems for each instances. The constraints that are identified are number of vehicles ‘k’, number of dimensions and nodes, the capacity of each vehicles, and the true optimal of the solution based from the .sol files of the data sets.

grid.arrange(
  params %>% ggplot(aes(x = reorder(name, num_of_k), y = num_of_k)) +
    geom_col() + 
    scale_y_continuous(labels = comma) +
    coord_flip() + labs(title = "Number of Vehicle",
                                     x = NULL,
                                     y = "Unit(s)"),
  
  params %>% ggplot(aes(x = reorder(name, dimension), y = dimension)) +
    geom_col() + 
    scale_y_continuous(labels = comma) +
    coord_flip() + labs(title = "Number of Nodes",
                                     x = NULL,
                                     y = "Unit(s)"),
  
  params %>% ggplot(aes(x = reorder(name, capacity), y = capacity)) +
    geom_col() + 
    scale_y_continuous(labels = comma) +
    coord_flip() + labs(title = "Vehicle Capacity",
                                    x = NULL,
                                    y = "Demand Unit(s)"),
  
  params %>% ggplot(aes(x = reorder(name, true_optimal), y = true_optimal)) +
    geom_col() + 
    scale_y_continuous(labels = comma) +
    coord_flip() + labs(title = "True Optimal",
                                     x = NULL,
                                     y = "Units of Distance")
)

The visualizations above are the compilation of information based on the extraction of parameters for each instances. It is shown that the highest numbers of vehicles, dimension, vehicle capacity, and true optimal are 22 vehicles, 190 nodes, 900 (approximation), and 27,000 (approximation) distance unit respectively. These parameters would be used and assigned to each problem to determine the optimized route for delivery case.

Plotting The Demand

In this section, The demand distribution for each nodes on each instances would be visualized by the use of histogram.

node %>% filter(node != 1) %>% 
  ggplot(aes(x = demand)) +
  geom_histogram() +
  facet_wrap(~ instance)

Interestingly, the plots shows different patterns of demand distributions for each instances. For example, the demand distributions for “X-n106-k14” and “X-n186-k15” have a range of demand distribution between 50 and 100. On the contrary, the other instances are more focused on the range between 0 to 12.5. The different demands would likely be contributed to plan of routes that would be established by the algorithm as the capacity of the vehicle would constrain each vehicle to deliver all the goods in a single dispatch.

Plotting The Coordinates

In this section, the location of each nodes for each instances would be visualized. This is performed to explore the overall distribution of locations for each instances. To map the location, the visualization is created by extracting the x1 and x2 variable from the node data frame. Note that node 1 for each instances would be considered as the depot or warehouse for the vehicle to dispatch. This is due to the 0 value of node 1 demand in each instances.

node %>%
  ggplot() +
  geom_point(aes(x1, x2)) +
  geom_point(data = node %>%
               filter(node == 1), aes(x1, x2), colour = "red", size = 3) +
  facet_wrap(~instance)

The plot shows the different patterns for each instances. Interestingly, each of the depot (shown as the red dots) and the distribution of locations are different among the instances. Moreover, the node locations for each instance are divided by the patterns as “X-n186-k15” and “X-n115-k10” have a more dispersed location distribution than the rest of the instances. The distance of each node locations to the depot would likely affect the optimal numbers of the solutions as the solutions for the problem is to find the shortest route for each deliveries.

Proposed Dashboard Design

With the business case and the data sets explained above, this section would further describe the contents of the dashboard that is proposed. The dashboard would consist of 5 main pages including Home, EDAs, Simulation, Evaluation, and About Me sections.

Home

knitr::include_graphics("asset/home.PNG")

In the Home page, a brief descriptions of the Rise of E-Commerce and the Delivery Routing Problem in Supply Chain and Logistics World would be provided. This main page has a purpose to convey a brief introduction towards other pages and to rise the concerns regarding a route optimization.

Descriptive Analysis Page

knitr::include_graphics("asset/eda1.PNG")

knitr::include_graphics("asset/eda2.PNG")

In this page, several exploratory visualizations of the data sets would be provided including the parameter of each cases, the demand distributions, and the location points of each cases. The purpose of this page is to deliver the users regarding the overall information of the data sets that would be used.

CVRP simulation

knitr::include_graphics("asset/simulation.PNG")

In the simulation page, the utilized algorithms for this project would be simulated by each instance. This would give the purpose to convey the users about the route plans that would be generated for each instances by each algorithm. Likewise, the viewers could also filter the route that would be generated to see the specific path for each route.

Evaluation Page

knitr::include_graphics("asset/eval.PNG")

On the evaluation page, the users could then see the evaluation of each algorithm and then be compared. The comparison of the algorithm would be based on the computational time of each algorithm and the errors of the distances generated by each algorithm to the true optimal of each instances.

About Me and Reference

knitr::include_graphics("asset/about_me.PNG")

This page would be the last page of the dashboard as this page would describe several informations about the creator of the dashboard and references of other route optimization problems that could be explored for further studies.

Reference

eMarketer: Worldwide ecommerce continues double-digit growth following pandemic push to online ↩︎
World Economic Forum: The Future of the Last-Mile Ecosystem ↩︎
Capacitated Vehicle Routing Problem Library ↩︎
Uchoa, E., Pecin, D., Pessoa, A., Poggi, M., Vidal, T., & Subramanian, A. (2017). New benchmark instances for the capacitated vehicle routing problem. European Journal of Operational Research, 257(3), 845-858.↩︎
API Key Google Maps ↩︎
R Documentation: ‘gmapsdistance’↩︎