It’s one of the oldest problems in the field of optimization. A salesman wants to travel to a number of cities and then return back home. The million dollar question now is: How should he plan his route to minimize his time on the road? At first, this might sound like trivial problem. Why not just calculate the distance of every possible route and then select the shortest one? Well, with just 20 different cities there are already more than a quadrillion possible routes to travel on. And with every additional city the problem’s complexity increases exponentially And pretty soon you just cannot go through every single possible route anymore. So instead of going through every possible route, we will need to work with a good rule of thumb or heuristic. In this post you will learn about the basics of different heuristics and search algorithms that try to find good solutions in an acceptable amount of time.

Cities

Before we dive into the mathematics let us write a few functions to simulate a bunch of cities and routes which we can then plot on a canvas. To keep things simple, all of our cities will be located in a two-dimensional space between (-1,-1) and (1,1). We will name our cities just like Excel names its spreadsheets, i.e., the first city is called “A”, the second “B”, and so on until we reach “ZZ”.

# "not in" operator
'%!in%' <- function(x,y)!('%in%'(x,y))

# extend "LETTERS" function to run from "A" to "ZZ"
MORELETTERS <- c(LETTERS, sapply(LETTERS, function(x) paste0(x, LETTERS)))

# example
MORELETTERS[20:30]
##  [1] "T"  "U"  "V"  "W"  "X"  "Y"  "Z"  "AA" "AB" "AC" "AD"

Now, we will create a matrix with two columns in which will store the x and y coordinates of our cities.

# function to simulate cities
simCities <- function(n_cities = 5) {
  cities <- matrix(runif(2*n_cities,-1,1),ncol=2)
  rownames(cities) <- MORELETTERS[1:n_cities]
  colnames(cities) <- c("x","y")
  return(cities)
}

# example
cities <- simCities(n = 10)
cities 
##            x            y
## A -0.4689827 -0.588050850
## B -0.2557522 -0.646886495
## C  0.1457067  0.374045693
## D  0.8164156 -0.231792564
## E -0.5966361  0.539682840
## F  0.7967794 -0.004601516
## G  0.8893505  0.435237017
## H  0.3215956  0.983812190
## I  0.2582281 -0.239929641
## J -0.8764275  0.554890443

Next, we need a way of visualizing our cities. We can do this by simply calling the plot() function on our city matrix.

# function to plot cities
plotCities <- function(cities, 
                       main="",
                       bg="white", 
                       main_col="black",
                       point_col="deepskyblue") {
  par(bg=bg)
  plot(cities, 
       pch=16, cex=3,
       col=point_col, 
       ylim=c(-1,1.1), xlim=c(-1,1),
       yaxt="n", xaxt="n",
       ylab="", xlab="",
       bty="n",
       main=main, col.main=main_col)
}

# example
plotCities(cities)

Routes

Since each city has a name, a route is really just a character vector containing these city names. Let’s write a function to generate random routes that start and end in city “A”.

# function to generate a random route
randomRoute <- function(cities) {
  start <- rownames(cities)[1]
  route <- sample(rownames(cities)[-1])
  route <- c(start,route,start) # return back home
  return(route)
}

# example
route <- randomRoute(cities)
route
##  [1] "A" "J" "H" "F" "I" "B" "G" "C" "E" "D" "A"

Now we need a function to visualize this route. We can reuse our earlier plotCities() function and then add a series of line plots on top that show the route. Lastly, we will add a legend in the top left corner of the diagram that shows the distance of the route but more on that later.

# function to plot route
plotRoute <- function(cities, 
                      route, 
                      dist=NULL, 
                      main="",
                      bg="white", 
                      main_col = "black",
                      point_col = "deepskyblue",
                      start_col = "red",
                      line_col = "black") {
  
  # plot cities
  city_colors <- c(start_col, rep(point_col,nrow(cities)))
  plotCities(cities, 
             bg=bg, 
             main=main, 
             main_col=main_col,
             point_col=city_colors[order(route)])
  
  # plot route
  for(i in 2:(nrow(cities)+1)) {
    lines(cities[c(route[i],route[i-1]),],
          col=line_col,
          lwd=1.5)
  }
  
  # add distance in legend
  if(!is.null(dist)) legend("topleft", 
                            bty="n", # no box around legend
                            legend=round(dist,4), 
                            bg="transparent", 
                            text.col="black")
  
}

# example
plotRoute(cities, route)

Distances

Our plotRoute() function contained an argument called dist. We will use this argument to display the distance of a route. However, we haven’t defined a function yet that actually computes this distance. We’ll start by first defining a function that computes the “adjacency matrix” for our system of cities. Thereafter, we can easily define a function that computes the distance of any given route.

# function to compute adjacency matrix
compAdjMat <- function(cities) return(as.matrix(dist(cities)))

# example
cities <- simCities(5)
adjmat <- compAdjMat(cities)
adjmat
##           A         B         C        D         E
## A 0.0000000 0.4258592 0.5226517 1.471118 0.3714963
## B 0.4258592 0.0000000 0.3442195 1.121281 0.6019520
## C 0.5226517 0.3442195 0.0000000 1.388875 0.8386443
## D 1.4711179 1.1212806 1.3888754 0.000000 1.3763971
## E 0.3714963 0.6019520 0.8386443 1.376397 0.0000000
# function to compute the distance of a route
distRoute <- function(adjmat, route) {
  d <- 0
  for(i in 2:nrow(adjmat)) {
    d <- d + adjmat[route[i-1],route[i]]
  }
  return(d)
}

# example
route1 <- c("A","C","E","B","D","A")
distRoute(adjmat, route1)
## [1] 3.084529
route2 <- c("A","E","D","C","B","A")
distRoute(adjmat, route2)
## [1] 3.480988

This means that a round trip from A to C, to E, to B, to D and back to A has a total distance of 3.08. Conversely, the alternative route from A to E, to D, to C, to B and then back to A has a distance of 3.48. This means that the first route would be a better route than the second one.

Brute force

As said before, the most obvious algorithm would be to go through every single route possible. Of course, in real-world applications this is not feasible. However, when judging the performance of an algorithm, it is always good to know what the optimum route looks like. And for a small number of cities, e.g., \(n=10\), we can easily work out every single possible route.

# simulate ten cities
set.seed(1)
cities <- simCities(10)
# brute force algorithm
bruteForce <- function(cities, sleep=.05) {
  
  # stop if too many cities
  if(nrow(cities) > 10) stop("Please don't use Brute Force for n > 10 many cities!")
  
  # library
  library(combinat)
  start <- rownames(cities)[1] # start
  routes <- permn(rownames(cities)[-1]) # list of every possible route
  
  # compute adjacency matrix
  adjmat <- compAdjMat(cities)
  
  # initial route and distance
  best_route <- NULL
  min_d <- Inf
  
  # all possible routes
  for(i in 1:length(routes)) {
    
    route <- routes[[i]]
    route <- c(start,route,start)
    
    # distance
    new_d <- distRoute(adjmat,route)
    
    # update distance and plot
    if(new_d < min_d) {
      min_d <- new_d
      best_route <- route
      plotRoute(cities, route, min_d)
      Sys.sleep(sleep)
    }
  }

  
  # return
  return(list(distance = min_d,
              route = best_route)
  )
  
}
# example
bf <- bruteForce(cities)

As you can see, the algorithm starts at a non-optimal solution and then works itself through every possible route. Whenever it finds a shorter route, it updates its previous best and plots the new optimal route. The algorithm finishes once it has gone through every possible route. As we can see in the top left corner of our animation, in our case, the optimal route is of length 4.5775.

Nearest neighbor

Obviously, even with just ten cities, going through every single route takes time. A surprisingly simple yet very powerful heuristic is the so-called “nearest neighbor algorithm”. Starting from our home city “A” it simply travels to nearest city and keeps on doing that until it reached all cities in the system, after which it returns back home to “A”. Of course, this algorithm will hardly yield the optimal solution. It is, however, extremely fast.

# nearest neighbor algorithm
nearestNeighbor <- function(cities) {

  # compute adjacency matrix
  adjmat <- compAdjMat(cities)
  
  # start of route
  current_city <- rownames(cities)[1]
  route <- c(current_city)
  
  # add nearest neighbor to route
  for(i in 1:(nrow(cities)-1)){
    distances <- adjmat[current_city,-which(rownames(cities) %in% route)]
    current_city <- names(which(distances == min(distances)))
    route <- c(route,current_city)
  }
  
  # add last city and start point
  last <- which(rownames(cities) %!in% route)
  route <- c(route,rownames(cities)[last])
  route <- c(route,rownames(cities)[1])
  best_route <- route # only one route
  
  # compute distance of route
  min_d <- distRoute(adjmat,route)
  
  # plot route
  plotRoute(cities,route,min_d)
  
  # return
  return(list(distance = min_d,
              route = best_route)
  )
  
}

# example
nearestNeighbor(cities)

## $distance
## [1] 4.793531
## 
## $route
##  [1] "A" "B" "I" "D" "F" "G" "C" "H" "E" "J" "A"

As we can see, the nearest neighbor algorithm finds a route which is of length 4.7935. This is slightly longer than the optimal route (4.5775), but the nearest neighbor algorithm found this route almost instantaneously.

2-opt

Unsurprisingly, the random search algorithm did not perform very well. It did avoid, however, the key problem of the nearest neighbor heuristic, which is being too path dependent. Nonetheless, leaving everything up to chance is certainly not much better. So let’s try a second alternative called “2-opt”. This algorithm starts out with a random route and then checks whether it can rewire pairs of cities in its current route to achieve a shorter overall distance. If such a rewiring leads to an improvement, it accepts this new route as its new best and keeps on doing this until it cannot find any further improvements. In our algorithm, we will include a vector called track_dist which tracks how the total distance of our route search evolves throughout our search.

# function to swap two cities within a route (= rewire)
swap <- function(route,i,k) {
  new_route <- route[1:(i-1)]
  new_route <- c( new_route, route[k:(i)] )
  new_route <- c( new_route, route[(k+1):length(route)] )
  return(new_route)
}

# two-opt algorithm
twoOpt <- function(cities, sleep=.05) {
  
  # compute adjacency matrix
  adjmat <- compAdjMat(cities)
  
  # start with random route
  best_route <- randomRoute(cities)
  min_d <- distRoute(adjmat,best_route)
  
  # variable tracking
  track_dist <- c()
  
  # while-loop to perform complete 2-opt swap
  while(T) { 
    
    # record distance before looping through i and k
    old_d <- min_d
  
    # for-loops through values for i and k
    break_loop <- F
    for(i in 2:nrow(cities)) {
      for(k in (i+1):nrow(cities)) {
        
        # perform swap
        route <- swap(best_route,i,k)
        new_d <- distRoute(adjmat,route)
        
        # update distance and plot
        if(new_d < min_d) {
          min_d <- new_d
          best_route <- route
          plotRoute(cities, route, min_d)
          Sys.sleep(sleep)
          break_loop <- T
          break() # break out of inner loop
        } # end outer if-statement
        
      } # close inner loop
      
      # break out of outer loop
      if(break_loop) break()
      
    } # close outer loop
    
    # update on variable tracking
    track_dist <- c(track_dist,new_d)
    
    # check if the for-loops made any improvements
    if(old_d == new_d) break() # break out of while loop
    
  } # close while loop
    
  # return
  return(list(distance = min_d,
              route = best_route,
              track_dist = track_dist)
  )
  
}
# example
twoOpt(cities)

The algorithm kind of looks like someone untangling ball of wool and as we can see the algorithm actually manages to find the optimal solution of 4.5775.

Simulated annealing

Finally, let’s have a look at a technique called “simulated annealing”. The term “simulated annealing” derives from the process of annealing in metallurgy. It’s whole purpose is to avoid bad path dependencies in our process of finding the optimal route. This is very similar to the issue of local extrema in any sort of optimization problem. Without a probabilistic element, an optimization algorithm, like plain vanilla gradient descent, will stop once it converged to a local extremum. Conversely, a simulated annealing algorithm will keep on randomly trying alternative solutions. Depending on “how bad” an alternative solution is in relation to the previously accepted solution, the algorithm will abandon the previously accepted solution with a certain probability. Besides the difference in the quality of the two solutions, the probability of acceptance also depends on how many iterations the algorithm has gone through so far. At first, the acceptance probability will be fairly high, but later on the algorithm typically only accepts alternative solutions if they offer great gains in performance. Within the annealing analogy, the algorithm is said “to cool down”. Let \(\Delta\) be the difference in performance between a better solutions \(i\) and worse solution \(j\). A typical function for the probability of accepting the alternative solution \(j\), despite it being worse than \(i\), is \(p = e^{\Delta/t}\), where \(t\) is the “temperature” of the algorithm which decreases with every iteration.

# exmaple
delta = -2
temp = seq(10,1,-.5)
p = exp(delta/temp)
round(p,2)
##  [1] 0.82 0.81 0.80 0.79 0.78 0.77 0.75 0.74 0.72 0.70 0.67 0.64 0.61 0.56 0.51
## [16] 0.45 0.37 0.26 0.14
temp
##  [1] 10.0  9.5  9.0  8.5  8.0  7.5  7.0  6.5  6.0  5.5  5.0  4.5  4.0  3.5  3.0
## [16]  2.5  2.0  1.5  1.0

As we can see, at first, we accept \(j\) with a chance of \(8\,\%\), but after a few iterations this probability drops to just \(14\,\%\). We are now ready to program our own simulated annealing algorithm. The algorithm will start with a random route. In each iteration the initial temperature temp will be reduced by a factor called cooling. Throughout the search process, we will not only keep track of the current distance, but also the current temperature and the probability of acceptance. We will call store these in three vectors called track_temp, track_prob and track_dist, respectively.

simAnneal <- function(cities, temp=1e4,
                      cooling=5e-3, break_after=1e2,
                      sleep=0.05) {
  
  # compute adjacency matrix
  adjmat <- compAdjMat(cities)
  
  # start with random route
  best_route <- randomRoute(cities)
  min_d <- distRoute(adjmat,best_route)

  # variable tracking
  track_temp <- c()
  track_prob <- c()
  track_dist <- c()
  
  # iterative loop
  stable_count <- 0
  while(stable_count < break_after) {

    # conduct swap
    ik <- sort(sample(2:nrow(cities),2))
    new_route <- swap(best_route, i=ik[1], k=ik[2])
    new_d <- distRoute(adjmat,new_route)
  
    # probability of adjusting route
    improvement <- min_d - new_d
    p_adjust <- ifelse(improvement > 0, 1, exp(improvement/temp))

    # adjust route?
    adjust <- ifelse(p_adjust >= runif(1,0,1), T, F)

    # if adjustment
    if(adjust) {
      best_route <- new_route
      min_d <- new_d
      stable_count <- 0
      plotRoute(cities, best_route, min_d)
      legend("topright", legend=round(temp,4), 
             bg="transparent", 
             bty="n",
             text.col="black")
      Sys.sleep(sleep)
    } else {
      stable_count <- stable_count+1
    }
    
    # update on variable tracking
    track_temp <- c(track_temp,temp)
    track_prob <- c(track_prob,p_adjust)
    track_dist <- c(track_dist,new_d)

    # cool down
    temp <- temp*(1-cooling)

  } # end of iterative loop
  
  # return
  return(list(distance = min_d,
              route = best_route,
              track_temp = track_temp,
              track_prob = track_prob,
              track_dist = track_dist)
  )
  
  
}
# example
simAnneal(cities)

As before we manage to find the optimal route.