It’s one of the oldest problems in the field of optimization. A salesman wants to travel to a number of cities and then return back home. The million dollar question now is: How should he plan his route to minimize his time on the road? At first, this might sound like trivial problem. Why not just calculate the distance of every possible route and then select the shortest one? Well, with just 20 different cities there are already more than a quadrillion possible routes to travel on. And with every additional city the problem’s complexity increases exponentially And pretty soon you just cannot go through every single possible route anymore. So instead of going through every possible route, we will need to work with a good rule of thumb or heuristic. In this post you will learn about the basics of different heuristics and search algorithms that try to find good solutions in an acceptable amount of time.
Before we dive into the mathematics let us write a few functions to simulate a bunch of cities and routes which we can then plot on a canvas. To keep things simple, all of our cities will be located in a two-dimensional space between (-1,-1) and (1,1). We will name our cities just like Excel names its spreadsheets, i.e., the first city is called “A”, the second “B”, and so on until we reach “ZZ”.
# "not in" operator
'%!in%' <- function(x,y)!('%in%'(x,y))
# extend "LETTERS" function to run from "A" to "ZZ"
MORELETTERS <- c(LETTERS, sapply(LETTERS, function(x) paste0(x, LETTERS)))
# example
MORELETTERS[20:30]
## [1] "T" "U" "V" "W" "X" "Y" "Z" "AA" "AB" "AC" "AD"
Now, we will create a matrix with two columns in which will store the x and y coordinates of our cities.
# function to simulate cities
simCities <- function(n_cities = 5) {
cities <- matrix(runif(2*n_cities,-1,1),ncol=2)
rownames(cities) <- MORELETTERS[1:n_cities]
colnames(cities) <- c("x","y")
return(cities)
}
# example
cities <- simCities(n = 10)
cities
## x y
## A -0.4689827 -0.588050850
## B -0.2557522 -0.646886495
## C 0.1457067 0.374045693
## D 0.8164156 -0.231792564
## E -0.5966361 0.539682840
## F 0.7967794 -0.004601516
## G 0.8893505 0.435237017
## H 0.3215956 0.983812190
## I 0.2582281 -0.239929641
## J -0.8764275 0.554890443
Next, we need a way of visualizing our cities. We can do this by simply calling the plot()
function on our city matrix.
# function to plot cities
plotCities <- function(cities,
main="",
bg="white",
main_col="black",
point_col="deepskyblue") {
par(bg=bg)
plot(cities,
pch=16, cex=3,
col=point_col,
ylim=c(-1,1.1), xlim=c(-1,1),
yaxt="n", xaxt="n",
ylab="", xlab="",
bty="n",
main=main, col.main=main_col)
}
# example
plotCities(cities)
Since each city has a name, a route is really just a character vector containing these city names. Let’s write a function to generate random routes that start and end in city “A”.
# function to generate a random route
randomRoute <- function(cities) {
start <- rownames(cities)[1]
route <- sample(rownames(cities)[-1])
route <- c(start,route,start) # return back home
return(route)
}
# example
route <- randomRoute(cities)
route
## [1] "A" "J" "H" "F" "I" "B" "G" "C" "E" "D" "A"
Now we need a function to visualize this route. We can reuse our earlier plotCities()
function and then add a series of line plots on top that show the route. Lastly, we will add a legend in the top left corner of the diagram that shows the distance of the route but more on that later.
# function to plot route
plotRoute <- function(cities,
route,
dist=NULL,
main="",
bg="white",
main_col = "black",
point_col = "deepskyblue",
start_col = "red",
line_col = "black") {
# plot cities
city_colors <- c(start_col, rep(point_col,nrow(cities)))
plotCities(cities,
bg=bg,
main=main,
main_col=main_col,
point_col=city_colors[order(route)])
# plot route
for(i in 2:(nrow(cities)+1)) {
lines(cities[c(route[i],route[i-1]),],
col=line_col,
lwd=1.5)
}
# add distance in legend
if(!is.null(dist)) legend("topleft",
bty="n", # no box around legend
legend=round(dist,4),
bg="transparent",
text.col="black")
}
# example
plotRoute(cities, route)
Our plotRoute()
function contained an argument called dist
. We will use this argument to display the distance of a route. However, we haven’t defined a function yet that actually computes this distance. We’ll start by first defining a function that computes the “adjacency matrix” for our system of cities. Thereafter, we can easily define a function that computes the distance of any given route.
# function to compute adjacency matrix
compAdjMat <- function(cities) return(as.matrix(dist(cities)))
# example
cities <- simCities(5)
adjmat <- compAdjMat(cities)
adjmat
## A B C D E
## A 0.0000000 0.4258592 0.5226517 1.471118 0.3714963
## B 0.4258592 0.0000000 0.3442195 1.121281 0.6019520
## C 0.5226517 0.3442195 0.0000000 1.388875 0.8386443
## D 1.4711179 1.1212806 1.3888754 0.000000 1.3763971
## E 0.3714963 0.6019520 0.8386443 1.376397 0.0000000
# function to compute the distance of a route
distRoute <- function(adjmat, route) {
d <- 0
for(i in 2:nrow(adjmat)) {
d <- d + adjmat[route[i-1],route[i]]
}
return(d)
}
# example
route1 <- c("A","C","E","B","D","A")
distRoute(adjmat, route1)
## [1] 3.084529
route2 <- c("A","E","D","C","B","A")
distRoute(adjmat, route2)
## [1] 3.480988
This means that a round trip from A to C, to E, to B, to D and back to A has a total distance of 3.08. Conversely, the alternative route from A to E, to D, to C, to B and then back to A has a distance of 3.48. This means that the first route would be a better route than the second one.
As said before, the most obvious algorithm would be to go through every single route possible. Of course, in real-world applications this is not feasible. However, when judging the performance of an algorithm, it is always good to know what the optimum route looks like. And for a small number of cities, e.g., \(n=10\), we can easily work out every single possible route.
# simulate ten cities
set.seed(1)
cities <- simCities(10)
# brute force algorithm
bruteForce <- function(cities, sleep=.05) {
# stop if too many cities
if(nrow(cities) > 10) stop("Please don't use Brute Force for n > 10 many cities!")
# library
library(combinat)
start <- rownames(cities)[1] # start
routes <- permn(rownames(cities)[-1]) # list of every possible route
# compute adjacency matrix
adjmat <- compAdjMat(cities)
# initial route and distance
best_route <- NULL
min_d <- Inf
# all possible routes
for(i in 1:length(routes)) {
route <- routes[[i]]
route <- c(start,route,start)
# distance
new_d <- distRoute(adjmat,route)
# update distance and plot
if(new_d < min_d) {
min_d <- new_d
best_route <- route
plotRoute(cities, route, min_d)
Sys.sleep(sleep)
}
}
# return
return(list(distance = min_d,
route = best_route)
)
}
# example
bf <- bruteForce(cities)
As you can see, the algorithm starts at a non-optimal solution and then works itself through every possible route. Whenever it finds a shorter route, it updates its previous best and plots the new optimal route. The algorithm finishes once it has gone through every possible route. As we can see in the top left corner of our animation, in our case, the optimal route is of length 4.5775.
Obviously, even with just ten cities, going through every single route takes time. A surprisingly simple yet very powerful heuristic is the so-called “nearest neighbor algorithm”. Starting from our home city “A” it simply travels to nearest city and keeps on doing that until it reached all cities in the system, after which it returns back home to “A”. Of course, this algorithm will hardly yield the optimal solution. It is, however, extremely fast.
# nearest neighbor algorithm
nearestNeighbor <- function(cities) {
# compute adjacency matrix
adjmat <- compAdjMat(cities)
# start of route
current_city <- rownames(cities)[1]
route <- c(current_city)
# add nearest neighbor to route
for(i in 1:(nrow(cities)-1)){
distances <- adjmat[current_city,-which(rownames(cities) %in% route)]
current_city <- names(which(distances == min(distances)))
route <- c(route,current_city)
}
# add last city and start point
last <- which(rownames(cities) %!in% route)
route <- c(route,rownames(cities)[last])
route <- c(route,rownames(cities)[1])
best_route <- route # only one route
# compute distance of route
min_d <- distRoute(adjmat,route)
# plot route
plotRoute(cities,route,min_d)
# return
return(list(distance = min_d,
route = best_route)
)
}
# example
nearestNeighbor(cities)
## $distance
## [1] 4.793531
##
## $route
## [1] "A" "B" "I" "D" "F" "G" "C" "H" "E" "J" "A"
As we can see, the nearest neighbor algorithm finds a route which is of length 4.7935. This is slightly longer than the optimal route (4.5775), but the nearest neighbor algorithm found this route almost instantaneously.
The nearest neighbor heuristic worked already quite well. The nearest neighbor heuristic is a so-called “greedy” algorithm. It always does what’s best in the moment. Computationally, this is big advantage. However, the final route developed by the algorithm is heavily path dependent on the algorithm’s first couple of choices. In certain instances, this can lead to very poor overall routes. Let us therefore have a look at a different approach. Let’s simply try a series of random routes and after a certain number of iterations choose the one with the shortest distance. Obviously, this will not be the best of algorithms.
# random search algorithm
randomSearch <- function(cities, n_iter=1e3, sleep=.05) {
# compute adjacency matrix
adjmat <- compAdjMat(cities)
# initial route and distance
best_route <- NULL
min_d <- Inf
# loop
for(i in 1:n_iter) {
# random route
route <- randomRoute(cities)
# distance
new_d <- distRoute(adjmat,route)
# plot
plotRoute(cities, route, new_d)
Sys.sleep(sleep)
# if improvement, update minium distance best route
if(new_d < min_d) {
min_d <- new_d
best_route <- route
}
}
# return
return(list(distance = min_d,
route = best_route)
)
}
# example
randomSearch(cities)
Unsurprisingly, the random search algorithm did not perform very well. It did avoid, however, the key problem of the nearest neighbor heuristic, which is being too path dependent. Nonetheless, leaving everything up to chance is certainly not much better. So let’s try a second alternative called “2-opt”. This algorithm starts out with a random route and then checks whether it can rewire pairs of cities in its current route to achieve a shorter overall distance. If such a rewiring leads to an improvement, it accepts this new route as its new best and keeps on doing this until it cannot find any further improvements. In our algorithm, we will include a vector called track_dist
which tracks how the total distance of our route search evolves throughout our search.
# function to swap two cities within a route (= rewire)
swap <- function(route,i,k) {
new_route <- route[1:(i-1)]
new_route <- c( new_route, route[k:(i)] )
new_route <- c( new_route, route[(k+1):length(route)] )
return(new_route)
}
# two-opt algorithm
twoOpt <- function(cities, sleep=.05) {
# compute adjacency matrix
adjmat <- compAdjMat(cities)
# start with random route
best_route <- randomRoute(cities)
min_d <- distRoute(adjmat,best_route)
# variable tracking
track_dist <- c()
# while-loop to perform complete 2-opt swap
while(T) {
# record distance before looping through i and k
old_d <- min_d
# for-loops through values for i and k
break_loop <- F
for(i in 2:nrow(cities)) {
for(k in (i+1):nrow(cities)) {
# perform swap
route <- swap(best_route,i,k)
new_d <- distRoute(adjmat,route)
# update distance and plot
if(new_d < min_d) {
min_d <- new_d
best_route <- route
plotRoute(cities, route, min_d)
Sys.sleep(sleep)
break_loop <- T
break() # break out of inner loop
} # end outer if-statement
} # close inner loop
# break out of outer loop
if(break_loop) break()
} # close outer loop
# update on variable tracking
track_dist <- c(track_dist,new_d)
# check if the for-loops made any improvements
if(old_d == new_d) break() # break out of while loop
} # close while loop
# return
return(list(distance = min_d,
route = best_route,
track_dist = track_dist)
)
}
# example
twoOpt(cities)
The algorithm kind of looks like someone untangling ball of wool and as we can see the algorithm actually manages to find the optimal solution of 4.5775.
Finally, let’s have a look at a technique called “simulated annealing”. The term “simulated annealing” derives from the process of annealing in metallurgy. It’s whole purpose is to avoid bad path dependencies in our process of finding the optimal route. This is very similar to the issue of local extrema in any sort of optimization problem. Without a probabilistic element, an optimization algorithm, like plain vanilla gradient descent, will stop once it converged to a local extremum. Conversely, a simulated annealing algorithm will keep on randomly trying alternative solutions. Depending on “how bad” an alternative solution is in relation to the previously accepted solution, the algorithm will abandon the previously accepted solution with a certain probability. Besides the difference in the quality of the two solutions, the probability of acceptance also depends on how many iterations the algorithm has gone through so far. At first, the acceptance probability will be fairly high, but later on the algorithm typically only accepts alternative solutions if they offer great gains in performance. Within the annealing analogy, the algorithm is said “to cool down”. Let \(\Delta\) be the difference in performance between a better solutions \(i\) and worse solution \(j\). A typical function for the probability of accepting the alternative solution \(j\), despite it being worse than \(i\), is \(p = e^{\Delta/t}\), where \(t\) is the “temperature” of the algorithm which decreases with every iteration.
# exmaple
delta = -2
temp = seq(10,1,-.5)
p = exp(delta/temp)
round(p,2)
## [1] 0.82 0.81 0.80 0.79 0.78 0.77 0.75 0.74 0.72 0.70 0.67 0.64 0.61 0.56 0.51
## [16] 0.45 0.37 0.26 0.14
temp
## [1] 10.0 9.5 9.0 8.5 8.0 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0
## [16] 2.5 2.0 1.5 1.0
As we can see, at first, we accept \(j\) with a chance of \(8\,\%\), but after a few iterations this probability drops to just \(14\,\%\). We are now ready to program our own simulated annealing algorithm. The algorithm will start with a random route. In each iteration the initial temperature temp
will be reduced by a factor called cooling
. Throughout the search process, we will not only keep track of the current distance, but also the current temperature and the probability of acceptance. We will call store these in three vectors called track_temp
, track_prob
and track_dist
, respectively.
simAnneal <- function(cities, temp=1e4,
cooling=5e-3, break_after=1e2,
sleep=0.05) {
# compute adjacency matrix
adjmat <- compAdjMat(cities)
# start with random route
best_route <- randomRoute(cities)
min_d <- distRoute(adjmat,best_route)
# variable tracking
track_temp <- c()
track_prob <- c()
track_dist <- c()
# iterative loop
stable_count <- 0
while(stable_count < break_after) {
# conduct swap
ik <- sort(sample(2:nrow(cities),2))
new_route <- swap(best_route, i=ik[1], k=ik[2])
new_d <- distRoute(adjmat,new_route)
# probability of adjusting route
improvement <- min_d - new_d
p_adjust <- ifelse(improvement > 0, 1, exp(improvement/temp))
# adjust route?
adjust <- ifelse(p_adjust >= runif(1,0,1), T, F)
# if adjustment
if(adjust) {
best_route <- new_route
min_d <- new_d
stable_count <- 0
plotRoute(cities, best_route, min_d)
legend("topright", legend=round(temp,4),
bg="transparent",
bty="n",
text.col="black")
Sys.sleep(sleep)
} else {
stable_count <- stable_count+1
}
# update on variable tracking
track_temp <- c(track_temp,temp)
track_prob <- c(track_prob,p_adjust)
track_dist <- c(track_dist,new_d)
# cool down
temp <- temp*(1-cooling)
} # end of iterative loop
# return
return(list(distance = min_d,
route = best_route,
track_temp = track_temp,
track_prob = track_prob,
track_dist = track_dist)
)
}
# example
simAnneal(cities)
As before we manage to find the optimal route.