Introduction

All documentation and source code for this article is available at my github repository .

Background

Have you ever wondered how does your favorite snack arrived at your local market? Or how does your brand new shoes that you bought online delivered to your front door? Everything that you see at the store or delivered to you has gone through a long long journey, since your brand new shoes still in form of a pile of leathers and cottons until they come before your eyes in such a beatiful shape. This long journey can be described in a network, called the supply chain. The supply chain describe the flow of materials and goods from the first resource, the supplier, until it reach the customer. There are many variations of supply chain, one example is the following graph.

knitr::include_graphics("asset/supply chain.png")

Let’s say you just bought a new smartphone. Before it became a product, its just a collection of materials and electronics. All materials from various suppliers is collected into a production plant or the factory where the product is made. After the smartphone has been produced and packaged, ready for sale, they have to be delivered into one or more distribution center. This distribution center is functioned as a warehouse to store the product before the they are shipped into various retail store. Only after the product has arrived at the retail store, you can finally see it and buy it. As you can see, there is a lot of process that need to be managed by various parties (suppliers, the producer, the warehouse manager) in order to deliver your favorite smartphone in time so they can make profit and make you happy. What would happened if the delivery processes are not managed properly? The product may come late because the truck that deliver the goods may go through the long path instead of the short one. This can also increase the cost of delivery, which in turn will increase the phone price, which in turn may lower your intention to buy the product and so the producer will not make a profit. Managing the delivery process is one of the most challenging task in supply chain management.

In this article, we will focus on managing the delivery process from a distribution center to various retail stores. How can we distribute all goods to all available shops with the shortest possible amount of distance/time? Since our concern is to reach an optimal amount of distance/time, this problem belongs to an optimization problem. The problem of managing route and visit all customers belong to the Traveling Salesman Problem (TSP). The following figures represent the TSP.

knitr::include_graphics("asset/tsp.png")

Each node or point represent a single location, be it a city or a house or a store and the line represent the route from a point to the next one.

Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?

So, if we have a warehouse in Seattle and want to distribute our product to a list of cities around the United States via land transportation, what is the shortest possible path? However, we have only a single route from the graph, which means that we have only a single truck to deliver all of the product. This seems impractical in a real world scenario where company always have a fleet of trucks to deliver the product. Thus, we should have more than a single route because we have more than a single truck. This variant of TSP belong to a problem called the Vehicle Routing Problem. Vehicle Routing Problem (VRP) involves the service of a set of customer with known demands by a fleet of vehicles from a single distribution center. The objective of the VRP is to minimize the total distance and the number of vehicles which start and end their tours at the central depot. In this article, we will focus on a specific problem of VRP called the Capacitated Vehicle Routing Problem and how to solve it using the Nearest Neighbour Algorithm and Genetic Algorithm.

Objective

This article is dedicated to give my best explanation for:

Capacitated Vehicle Routing Problem (CVRP)
Nearest Neighbour Algorithm and its application in CVRP
Genetic Algorithm and its application in CVRP

Library and Setup

The algorithm is written in R programming language and requires the following package if you wish to reproduce the code.

library(tidyverse)
library(ggrepel)
library(GA)

General Concept

Capacitated Vehicle Routing Problem

Capacitated Vehicle Routing Problem (CVRP) is a variant of VRP where a fleet of homogeneous vehicles has to serve a set of customers. Each customer can be visited more than once contrary to what is usually assumed in the classical vehicle routing problem (VRP). No constraint on the number of available vehicles is considered. There is a single depot for the vehicles and each vehicle has to start and end its tour at the cross depot. The objective is to find a set of vehicle routes that serves all the customers such that the sum of the quantities delivered in each tour does not exceed the capacity of the vehicle and the total distance travelled is minimized.

The following graph is an example of CVRP.

The CVRP has its own unique challenge because we have to manage a number of vehicle with limited carrying capacity and minimize their total travel distance. It is known that the CVRP is an NP-hard problem in which its real-life applications are considerably large in scale and finding the optimal solution of an instance is very hard and requires very long computational time. Therefore, metaheuristics are often more suitable for practical applications and have been applied for CVRP to find a near optimal solution in a reasonable amount of time.

Nearest Neighbour Algorithm

Nearest Neighbour Algorithm is a simple algorithm where it will look for the closest node for the next destination. Instead of enumerating all possibilities, nearest neighbour only concern with the closest neighbour, node that has the shortest distance from the current position. The distance between node can be calculated using Euclidean Distance on the node coordinate (will be discussed further below). For CVRP, there are 2 variant of nearest neighbour algorithm that can be employed to solve the problem¹:

Algorithm A

The first algorithm search for the Hamiltonian Cycle² and then split the route into several sub-route according to the number of vehicle.

A Hamiltonian cycle is a closed loop on a graph where every node (vertex) is visited exactly once. A loop is just an edge that joins a node to itself; so a Hamiltonian cycle is a path traveling from a point back to itself, visiting every node en route.

The step-by-step for this algorithm is as follows:

Use Nearest Neighbour Algorithm to find the giant path (Hamiltonian Cycle), start from the depot and also finish at the depot.
Route new vehicle on giant path serving the various nodes of the path till its capacity exhausted.
Route vehicle back to the depot station.
Calculate the total cost incurred for all vehicles.

Below is the illustration of the first algorithm.

Algorithm B

The second algorithm assign individual vehicles to make a route until its capacity is exhausted and go back to the depot.

The step-by-step for this algorithm is as follows:

Route new vehicle using nearest neighbor strategy, starting from depot till its capacity exhausted.
Route back the vehicle to CD. If the demand is partially fulfilled, it will be visited again by the next vehicles.

3: Calculate the total cost incurred for the vehicle.

Below is the illustration of the second algorithm.

According to the research paper, the second algorithm is proven to be better. We will explore them further in later section.

Genetic Algorithm

To compare the performance of Nearest Neighbour Algorithm, we will use a general purpose, meta-heuristic method called Genetic Algorithm. Compared to a heuristic method, which is only work for specific problem, Genetic Algorithm is more flexible and can handle various problem. Genetic Algorithm has been proven to be able to solve CVRP³.

Genetic Algorithm is an optimization algorithm that use the concept of evolution by natural selection. Evolution by natural selection, as proposed by Charles Darwin, is the mechanism on how many varieties of living things are adapting to their environment to survive, through 2 main principles: natural selection and random mutation. The genetic algorithm (GA) borrowed the concept and use it to gain optimal solutions by selecting the best or the fittest solution alongside the rare and random occurence of mutation. The detail and another illustrattive case of Genetic Algorithm can be found at my another post⁴. Below is the general flow of Genetic Algorithm:

There are several advantages of Genetic Algorithm:

Genetic Algorithm can escape local optima
Can handle complex problem with many decision variables and big solution space
It considers many points in the searching space simultaneously, not a single point, to deal with large parameter spaces.
Can be parallelized since GA consists of multiple chromosomes/solutions
Support multi-objective optimization

Case Study

Data

The data come from Christofides and Eilon⁵. The datasets consists of 13 instances or problems. For illustration, we will load the instance E-nn2-k4, which means that it is from set E and the data consists of 22 nodes/points and 4 vehicles. The first node is the depot/warehouse and the rest is the customer. The location of each node is represented in cartesian coordinate with x1 and x2. The demand for each node is also mentioned.

route <- read.csv("data_input/E-n22-k4.csv")
route

We will visualize the position of the depot and the customer. The red dot represent the depot or station center. Our job is how to distribute our fleet of 4 vehicles into 21 different locations, each with their own respective demand. Our goal is to create distribution routes that will cover all customer with minimum distance.

Objective Function

The objective function for this problem is to minimize the total travel distance.

\[ min\ Z = \Sigma_{i=1}^{j=N} D_{ij}\]

\(D_{ij}\) = distance between point \(i\) to point \(j\)

Distance

Since we want to minimize the distance, we will calculate the distance between locations with Euclidean Distance.

\[D_{ij} = \sqrt{(x_{1i}-x_{1j})^2 + (x_{2i}-x_{2j})^2}\]

Below is the sample of distance matrix between locations.

route_distance <- dist(route[ , 2:3], upper = T, diag = T) %>% as.matrix()
head(route_distance)

##          1         2         3         4         5        6        7         8
## 1  0.00000 49.365980 48.083261 41.785165 40.718546 36.71512 31.01612 31.384710
## 2 49.36598  0.000000  8.544004 23.259407 25.942244 20.80865 18.68154 24.166092
## 3 48.08326  8.544004  0.000000 29.832868 32.280025 14.56022 19.84943 19.104973
## 4 41.78516 23.259407 29.832868  0.000000  2.828427 33.73426 17.88854 33.241540
## 5 40.71855 25.942244 32.280025  2.828427  0.000000 35.35534 18.97367 34.481879
## 6 36.71512 20.808652 14.560220 33.734256 35.355339  0.00000 17.02939  5.385165
##          9       10       11       12       13       14        15       16
## 1 24.18677 27.65863 17.26268 23.34524 11.18034 16.03122  7.071068 20.24846
## 2 26.57066 30.46309 32.14032 40.22437 47.26521 54.62600 56.222771 57.48913
## 3 27.80288 25.31798 31.01612 43.13931 44.10215 55.75841 54.571055 53.23533
## 4 19.20937 37.58989 28.42534 23.08679 45.22168 40.01250 48.703183 57.20140
## 5 19.10497 38.48376 28.28427 21.00000 44.82187 38.01316 47.539457 56.85068
## 6 22.47221 11.00000 21.21320 38.48376 30.80584 47.38143 42.544095 39.01282
##          17       18       19       20       21       22
## 1  9.848858 22.09072 29.06888 30.52868 31.62278 33.54102
## 2 58.855756 71.11259 72.18033 78.16009 79.10120 82.87340
## 3 57.870545 69.05071 68.18358 78.00000 76.10519 81.49233
## 4 49.244289 63.32456 69.83552 65.00769 73.38937 72.56032
## 5 47.801674 61.98387 69.11584 63.00794 72.23573 70.85901
## 6 46.529560 56.32051 54.00926 67.23095 62.51400 69.28925

Optimization

We will now compare the performance of Nearest Neighbour with the Genetic Algorithm.

Nearest Neighbour

Algorithm A

On algorithm A, we need to define the Hamiltonian Cycle first.

source("script/nearest_a.R")

best_route <- list(total_distance = Inf)

for (i in 1:10^4) {
  try_route <- giant_path(demand = route$demand, distance =  route_distance)
  
  if (best_route$total_distance > try_route$total_distance) {
    best_route <- try_route
  }
}

best_route

## $route
##  [1]  1  9  7 11 10  8  6  3  2  4  5 12 14 17 15 13 16 19 21 18 22 20  1
## 
## $total_distance
## [1] 300.8411

Let’s visualize the giant path.

Next, we just need to divide the giant into 4 separate routes that satisfy the capacity constraint.

nearest_result <- assign_vehicle(x = best_route$route, demand = route$demand, capacity = 6000, distance = route_distance)
nearest_result

## $distance
## [1] 449.8543
## 
## $route
##  [1]  1  9  7 11 10  8  6  3  1  2  4  5 12 14  1 17 15 13 16 19  1 21 18 22 20
## [26]  1
## 
## $num_vehicle
## [1] 4

The total distance for this problem according to Algorithm A is 449. We can visualize the route for each vehicle.

The result is still quite messy and perhaps is not the optimal solution. Next, we will try the algorithm B and compare the result.

Algorithm B

We will use the second nearest neighbour algorithm. Remember that in algorithm B, we will directly search the closest path for individual vehicle.

source("script/nearest_b.R")

The starting point is randomized. So, in order to acquire the best solution, we will iterate the algorithm 10,000 times.

best_route <- list(total_distance = Inf)

for (i in 1:10^4) {
  try_route <- nearest_b(demand = route$demand, distance = route_distance, capacity = 6000)
  
  if (best_route$total_distance > try_route$total_distance & try_route$vehicle_num <= 4) {
    best_route <- try_route
  }
}

best_route

## $vehicle_num
## [1] 4
## 
## $route
##  [1]  1 10  8  6  3  2  7  9 11  1 11 13 16 19 21 18  1  4  5 12 14 17  1 18 22
## [26] 20 17 15  1
## 
## $total_distance
## [1] 410.5266

From the result, we can see that the total distance is 410 with 4 vehicles. Thus, algorithm B can create a shorter path compared to algorithm A. Several location is visited twice. We can visualize the route into following graph.

The route is look better than the previous one.

Genetic Algorithm

Before we run the Genetic Algorithm, first we need to setup the parameter. Genetic Algorithm will run with population size of 100 for 100,000 iterations/generations to ensure the solution convergence. The mutation probability is set to 0.1 while crossover probability is 0.8. The mutation is done using swap/exchange mutation method while for the crossover method I use the order crossover.

The encoding for the solution is in Permutation Encoding with the following scheme:

\[ 2, 3, 4, ..., N\]

with \(N\) is the number of all available node.

The encoding will be translated into the traveling routes via the following steps:

Add 1 (the depot) at the start and the end of the route.
Go through the route and calculate the total demand along the way.
If the total demand is more than the vehicle capacity, the vehicle will go back to depot and the next destination will be served by the next vehicle.
Calculate the total distance from all vehicles.

The limitation of this method is that it assumed that each customer can only be visited once. After we acquire the routes, we will calculate the total distances. In order to speed up the convergence, we will insert the giant path from Nearest Neighbor Algorithm as suggestions for the initial population. The population size is 100, so we will also create 100 suggestions in form of matrix.

suggestion_route <- matrix(ncol = (max(route$node)-1), nrow = 100)
  
  for (i in 1:100) {
    try_route <- giant_path(demand = route$demand, distance =  route_distance)
    
    suggestion_route[i, ] <- try_route$route[2:max(route$node)] %>% as.numeric()
  }

head(suggestion_route)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]   18   21   19   16   13   15   17   14   12     9     7    11    10     8
## [2,]   12    9    7   11   10    8    6    3    2     4     5    14    17    15
## [3,]   20   22   18   21   19   16   13   15   17    14    12     9     7    11
## [4,]   20   22   18   21   19   16   13   15   17    14    12     9     7    11
## [5,]   16   13   15   17   18   21   19   22   20    14    12     9     7    11
## [6,]   21   18   22   20   17   15   13   16   19    14    12     9     7    11
##      [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## [1,]     6     3     2     4     5    20    22
## [2,]    13    16    19    21    18    22    20
## [3,]    10     8     6     3     2     4     5
## [4,]    10     8     6     3     2     4     5
## [5,]    10     8     6     3     2     4     5
## [6,]    10     8     6     3     2     4     5

We run the Genetic Algorithm with the mentioned parameter.

source("script/fitness.R")

ga_route <- ga(type = "permutation", fitness = fitness, capacity = 6000, demand = route$demand, distance = route_distance, 
               lower = 2, upper = max(route$node), 
               mutation = gaperm_swMutation, popSize = 100, pmutation = 0.1, maxiter = 5 * 10^4, 
               suggestions = suggestion_route, monitor = F, seed = 123)

summary(ga_route)

## -- Genetic Algorithm ------------------- 
## 
## GA settings: 
## Type                  =  permutation 
## Population size       =  100 
## Number of generations =  50000 
## Elitism               =  5 
## Crossover probability =  0.8 
## Mutation probability  =  0.1 
## Suggestions = 
##       x1 x2 x3 x4 x5 x6 x7 x8 x9 x10  ...  x20 x21
## 1      6  8 10 11  9  7  4  5 12  14         3   2
## 2      8  6 10 11  9  7  4  5 12  14         3   2
## 3     21 18 22 20 17 15 13 16 19  14         4   5
## 4     19 21 18 22 20 17 15 13 16  10         5   4
## 5     16 13 15 17 18 21 19 22 20  14         4   5
## 6      8  6 10 11  9  7  4  5 12  14         3   2
## 7      9  7 11 10  8  6  3  2  4   5        22  20
## 8     15 17 18 21 19 16 13 11  9   7        20  22
## 9     18 21 19 16 13 15 17 14 12   9        20  22
## 10    20 22 18 21 19 16 13 15 17  14         4   5
##  ...                                              
## 99    15 17 18 21 19 16 13 11  9   7        20  22
## 100    2  3  6  8 10 11  9  7  4   5        22  20
## 
## GA results: 
## Iterations             = 50000 
## Fitness function value = -375.2798 
## Solutions = 
##      x1 x2 x3 x4 x5 x6 x7 x8 x9 x10  ...  x20 x21
## [1,] 10  8  6  3  2  7 13 16 19  21         9  11
## [2,] 10  8  6  3  2  7 13 16 19  21         9  11
## [3,]  7  2  3  6  8 10 13 16 19  21         9  11
## [4,]  7  2  3  6  8 10 13 16 19  21         9  11

We can also see the progress on the algorithm by visualizing its fitness value across generations.

plot(ga_route)

The total distance that we can achieve with Genetic Algorithm is 375, far lower than the Nearest Neighbour Algorithm. There are several solutions that can be used to reach the minimum total distance. Let’s check if there is any duplicated solution.

any(ga_route@solution %>% as.matrix() %>% duplicated(.))

## [1] FALSE

There is no duplicated solution, meaning that each solution is equally feasible to be used as the solution. Here is the interpretation of the first solution after we transform the encoding into real travel route.

solution_explain <- fitness_explain(ga_route@solution[1, ], capacity = 6000, 
                                    distance = route_distance, demand = route$demand)

solution_explain

## $route
##  [1]  1 10  8  6  3  2  7  1 13 16 19 21 18  1 17 20 22 15  1 14 12  5  4  9 11
## [26]  1
## 
## $total_distance
## [1] 375.2798
## 
## $vehicle_num
## [1] 4
## 
## $total_demand
## [1] 5600 5900 5600 5400

The route for each vehicle would be:

Vehicle 1: 1 -> 10 -> 8 -> 6 -> 3 -> 2 -> 7 -> 1
Vehicle 2: 1 -> 13 -> 16 -> 19 -> 21 -> 18 -> 1
Vehicle 3: 1 -> 17 -> 20 -> 22 -> 15 -> 1
Vehicle 4: 1 -> 14 -> 12 -> 5 -> 4 -> 9 -> 11 -> 1

The total demand fulfilled for each vehicle is close to the vehicle capacity. We can visualize the result.

Conclusion

Genetic Algorithm can achieve better solution than Nearest Neighbour alone for the first instance problem. However, we should also check the performance on another problem as well. After we have done several testing toward all of the problems instance, the following table shows the performance of each algorithm toward each instance problem compared to the true optimal solution. NA value indicate that no feasible solution is found by the algorithm.

The true optimal shows the actual optimal value and available for each instance problem, which serves as a benchmark for any researcher who wish to create a new method and compare their performance either with the optimal value itself or with other previous methods.

The Nearest Neighbour with Algorithm A (draw the giant path first) is not able to generate feasible solution for most of instances. This is because the algorithm give a solution that violate the total number of vehicles. The closest destination may have higher total demand compared to next closest one, making the vehicle more frequently go back to the depot. Meanwhile, the Nearest Neighbour with Algorithm B is better than the first algorithm, although it still unable to achieve optimal solution. Both algorithms are fast to implement and only took seconds to complete.

Genetic Algorithm with nearest neighbour as the initial population has shown a great performance, achieving near optimality in some solutions even though Genetic Algorithm is a meta-heuristic method and does not specifically designed to solve this problem. With the increase of the problem size, 100,000 iterations may not be enough and need more iterations to converge. However, the main drawback of Genetic Algorithm is its computation time, which is far longer compared to the Nearest Neighbour. The algorithm computation time is heavily depends on the problem size and the number of iterations. This algorithm can still be improved in order to solve CVRP by changing the parameter, such as the selection, crossover and mutation methods, on which some researchers have designed an improved operator⁶. Improving Genetic Algorithm will create a better solution with faster convergence.

Capacitated Vehicle Routing Problem (CVRP) with Nearest Neighbour and Genetic Algorithm

Arga Adyatama

2/14/2020

Introduction

Background

Objective

Library and Setup

General Concept

Capacitated Vehicle Routing Problem

Nearest Neighbour Algorithm

Genetic Algorithm

Case Study

Data

Objective Function

Distance

Optimization

Nearest Neighbour

Algorithm A

Algorithm B

Genetic Algorithm

Conclusion

Reference