During an evening in my local pub I was chatting to a friend who is a local tour guide. During the evening I asked him how he decides to include various venues when planning his route. The answer being ‘…all my years of experience’.

Having recently finished modules on the Johns Hopkins Data Specialisation course I wondered if it would be possible to select venues using the wisdom of the crowd and then create a route using web services underpinned by a little R code.

Eager to practice my newly acquired R skills, I created this code. I must warn you, I have around 6 months experience of R and I’m not a professional coder. In the spirit of continual improvement, I welcome any constructive feedback.

Here’s the rough plan I sketched out on the back of a beer mat.

  1. Query Yelp.com to pull out highly rated venues for a given City.
  2. Run these venues through a ‘Travelling Salesperson’ algorithm to identify the shortest path.
  3. Plot the route and list of waypoints(venues) using Google Maps

Querying Yelp using Oauth authentication

The first challenge I had was authenticating with the Yelp service. OAuth works on the basis of encoding a URL using private keys to create an encrypted signature.Adding this signature along with keys into the URL request allows authentication with no password. Very neat.

For demonstration purposes I’ve selected bars in Chester. Chester is my home town and there seem to be more bars and restaurants than anything else here.

If you wish to pick a different Yelp category you can pick one from this page.

Yelp’s API returns a rich set of data. For my purposes I selected the business name and the latitude/longitude values (for mapping later). Their API returns JSON which is a data structure presented by name, value pairs. I used an R package called RJSONIO along with Regular Expressions to extract the key data into an R data structure. JSON is very easy to work with.

#Yelp API - Search for 10 bars in Chester.

#Add your own keys here from the yelp site
#CONSUMER_KEY = ""
#CONSUMER_SECRET = ""
#TOKEN = ""
#TOKEN_SECRET = ""

yelp.url <- function(category, city,radius,cons_key, cons_sec, tok, tok_sec) {
  
  #Add escape codes to the search parameter
  category1 <- gsub(pattern = " ", replacement = "%2520", x = category)
  category2 <- gsub(pattern = " ", replacement = "+", x = category)
  city1 <- gsub(pattern = " ", replacement = "%2520", x = city)
  city2 <- gsub(pattern = " ", replacement = "+", x = city)
  
  
  #Create random string for use in encoding
  nonce <- paste(sample(x = c(0:9, letters, LETTERS, "_"), size = 32, replace = TRUE), collapse= "")
  
  #Time stamp
  tm <- format(x = Sys.time(), "%s")
  
  #Prepare URL for authenticating     
  #Tried CurlEscape() but needed finer control over escape strings
  api.url <- paste("GET\u0026http%3A%2F%2Fapi.yelp.com%2Fv2%2Fsearch\u0026",
                   "category_filter%3D", category1,"%26",
                   "limit%3D10%26location%3D",city1,"%26",
                   "oauth_consumer_key%3D", cons_key, "%26",
                   "oauth_nonce%3D", nonce, "%26",
                   "oauth_signature_method%3DHMAC-SHA1%26",
                   "oauth_timestamp%3D", tm, "%26",
                   "oauth_token%3D", tok, "%26",
                   "radius_filter%3D",radius,"%26",
                   "sort%3D2",
                   sep = "")
  
  #Encode signature
  signature <- as.character(curlPercentEncode(base64(
    hmac(key=paste(cons_sec, tok_sec, sep="&"), 
    object=api.url, algo="sha1", serialize=FALSE, raw=TRUE))))
 
  
  #Build the URL to be used in the request
  api.url <- paste("http://api.yelp.com/v2/search?",
                   "category_filter=", category2,"&",
                   "limit=10&location=",city2,"&",
                   "oauth_consumer_key=", cons_key, "&",
                   "oauth_nonce=", nonce, "&",
                   "oauth_signature=", signature, "&",
                   "oauth_signature_method=HMAC-SHA1&",
                   "oauth_timestamp=", tm, "&",
                   "oauth_token=", tok, "&",
                   "radius_filter=",radius,"&",
                   "sort=2",
                    sep = "")
    
  return(api.url)
}

yelp.data <- function(category="bars", city="Chester", radius=3500) {
  require(package="digest")
  require(package="RJSONIO")
  require(package="RCurl")
  require(package="stringr")
  require("dplyr")
  #Create the signed URL 
  y.url <- yelp.url(category = category,city=city,
                  radius=radius,
                  cons_key = CONSUMER_KEY,
                  cons_sec = CONSUMER_SECRET,
                  tok = TOKEN,
                  tok_sec = TOKEN_SECRET)
  
  #Post URL to Yelp and process the JSON returned
  x <- getURL(y.url) %>% fromJSON()
  #Extract lat/long pairs and concatenate
  businesses <- unlist(x$businesses)
  business.names <- unname(businesses[grep('^name$',names(businesses))])
  latlong <- businesses[grep('^location.coordinate.',names(businesses))]
  latlong.matrix <- apply(matrix(latlong, ncol = 2, byrow = TRUE), 1, paste, collapse = ",")
  latlong.matrix2 <- matrix(latlong, ncol = 2, byrow = TRUE)
  o.latlong <- paste(latlong.matrix,sep="|",collapse="|")
  df <- data.frame(business.names,latlong.matrix2)
  output <- list(LatLong = o.latlong, "Names" = business.names, df=df)
  
  return(output)
}

We call this function by passing in parameters as follows.

city <- "Chester UK"
radius <- "3500"  #metres
category <- "bars"
#Find the the top 10 bars from Yelp.com
y <- yelp.data(category=category, city=city,radius=radius)
y$df
##        business.names               X1                X2
## 1  Telfords Warehouse       53.1940384        -2.8972299
## 2    Old Harkers Arms        53.193418         -2.881248
## 3       Bear & Billet        53.186744         -2.890165
## 4          Albion Inn       53.1884117          -2.88802
## 5    Ye Olde Boot Inn 53.1905138631287 -2.89043889005541
## 6          Bar Lounge        53.189836         -2.893896
## 7  The Liverpool Arms       53.1932812        -2.8929561
## 8        The Botanist       53.1912196        -2.8902453
## 9       The Mount Inn       53.1912378        -2.8709356
## 10       Amber Lounge       53.1902965        -2.8919373

Travelling Sales Person

Thanks to the R package TSP we can calculate the optimal route between points using different algorithms that are the product of computational research carried out over many years. I didn’t want to concern myself with the details of each algorithm so for the purpose of this experiment I ran each of them and picked the one generating the shortest distance.

Before I could call the TSP package I had to create a symmetrical distance matrix.

I queried the distances between venues using Google Maps Distance API. This is a useful API enabling data acquisition in 1 call.

Much like Yelp, Google returns data in the JSON format. One problem I discovered here is that Google don’t always give back a symmetrical matrix. I guess this is related to their route planning algorithms and 1 way streets. In any case, we can solve this using forceSymmetric().

construct.distance.url <- function(origins, return.call = "json", sensor = "false") {
  root <- "https://maps.googleapis.com/maps/api/distancematrix/"
  u <- paste(root, return.call, "?origins=", origins, "&destinations=",
             origins,"&mode=walking", sep = "")
  return(URLencode(u))
}


distance.matrix <- function(address,verbose=FALSE) {
  require("RCurl")
  require("Matrix")
  require("dplyr")
  if(verbose) cat(address,"\n")
  u <- construct.distance.url(address) %>% getURL(ssl.verifypeer = FALSE) %>% fromJSON()
  if(u$status=="OK") {
    #Turn JSON into 2 dim matrix, measure=Distance Value in Metres
    x2 <- unlist(u$rows)
    output<-matrix(as.numeric(unname(x2[grep('distance.value',names(x2))])),
                   ncol=as.numeric(nrow(y$df))) %>% forceSymmetric()  
    return(output)
  } else {
    return("There was a problem with the web query")
  }
}

x <- distance.matrix(y$LatLong)
x
## 10 x 10 Matrix of class "dsyMatrix"
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    0 1181 1150 1197  784  907  381  701 2026   769
##  [2,] 1181    0 1183  912  774 1018  972  826  849   880
##  [3,] 1150 1183    0  271  486  556  770  612 1788   418
##  [4,] 1197  912  271    0  459  602  816  479 1516   464
##  [5,]  784  774  486  459    0  243  403  126 1415   105
##  [6,]  907 1018  556  602  243    0  527  369 1658   138
##  [7,]  381  972  770  816  403  527    0  320 1767   389
##  [8,]  701  826  612  479  126  369  320    0 1435   231
##  [9,] 2026  849 1788 1516 1415 1658 1767 1435    0  1520
## [10,]  769  880  418  464  105  138  389  231 1520     0

This distance matrix is all we need for TSP. Notice the different TSP methods are being passed to solveTSP() using sapply. After the solutions have been created I pick the route with the shortest distance.

#Calculate the minimal distances using algorithms described in 'methods'
#Returns the shortest route

tsp.route <- function(places,names){
  require("TSP")
  items <- as.numeric(NROW(names))
  city.matrix <- matrix(places,nrow=items, ncol=items, dimnames=list(names,names))
  tsp <- TSP(city.matrix)
  
  methods <- c("nearest_insertion", "farthest_insertion", 
               "cheapest_insertion","arbitrary_insertion","nn", 
               "repetitive_nn", "2-opt")
  
    
  tours <- sapply(methods, FUN = function(m) 
    solve_TSP(tsp,method = m),simplify=FALSE)
  best <- tours[which.min(c(sapply(tours, FUN = attr, "tour_length")))]
  best.route <- names(best[[1]])
  best.distance <- tour_length(tsp,best[[1]])
  
  output <- list(route = best.route, distance.travelled = best.distance)
    
  return(output)            
}

To call this function I pass in the distance matrix along with names of the venue (for displaying on the map)

w.route<-tsp.route(x,y$Names)
w.route
## $route
##  [1] "Albion Inn"         "Bear & Billet"      "Bar Lounge"        
##  [4] "Amber Lounge"       "Ye Olde Boot Inn"   "The Botanist"      
##  [7] "The Liverpool Arms" "Telfords Warehouse" "Old Harkers Arms"  
## [10] "The Mount Inn"     
## 
## $distance.travelled
## [1] 5443

A little function coupling code sneaked in that was needed to sort the Yelp venues using the route returned by TSP. I did this as the Yelp data already has the location data I need for the mapping. Of course, this can be eliminated with a little code refactoring.

#Sort the Yelp data using optimal route (to be refactored)
sorted.route <- y$df[match(w.route$route, y$df$business.names),]

Plotting the route on Google Maps

We’re near the end. The final step is to plot the route on Google Maps. I used the excellent GGMAP package for this. I needed to plot a route, waypoints, and display a legend.

The Route class within GGMAP queries the Google Directions API. We push in the various lat/long pairs and Google conveniently returns coordinates that represent a leg of the route. The Route class combines each leg of the journey, making a route. I then plot the completed route to a map layer. The points are placed on a layer using the lat/long pairs for each venue.

I hit a thorny problem when it came to adding a legend. GGMAP is usually good at this but in my case I didn’t really need to associate the legend with a data series. Thanks to the wizards in the community for suggesting adding a dummy layer and associating the legend with that.

create.map<-function(lst){
  require("ggmap")
  require("plyr")
  require("dplyr")
  
  #Create DF and prevent factors from being created.
  way.points <- data.frame(lapply(lst[,1:3], as.character), stringsAsFactors=FALSE)
  
  #Combine the row number with the business names to related points to legend labels
  way.points <- mutate(way.points, business.names = paste(seq_along(X1), business.names, sep = " "))
  
  #Call Route() in 1 pass 
  rte.from <- apply(way.points[-nrow(way.points),2:3],1,paste,collapse=",")
  rte.to <- apply(way.points[-1,2:3],1,paste,collapse=",")
  rte <- do.call(rbind,
                 mapply(route, rte.from, rte.to, SIMPLIFY=FALSE,
                        MoreArgs=list(mode="walking",output="simple",structure="leg")))
  
  
  #Work out the rough centre point of the map
  map.centre <- c(mean(as.numeric(way.points$X2)),mean(as.numeric(way.points$X1)))
  
  #Load the coordinates from Route() to be used to plot the paths
  coords <- rbind(as.matrix(rte[,7:8]),as.matrix(rte[nrow(rte),9:10])) %>% 
    as.data.frame() 
  
  #Create the Map - first 2 layers are the path and point.  The second geom_point is a
  #a dummy one used to define the legend. 
    
  ggm <- qmap(location=map.centre,zoom = 15,maptype = "road",legend="bottomleft")  
  ggm +
    geom_path(data=coords,aes(x=startLon,y=startLat),color="blue",size=2)+
    geom_point(data=way.points,aes(x=as.numeric(X2),y=as.numeric(X1)),
               size=10,color="yellow")+
    geom_point(data=way.points,
               aes(x=as.numeric(X2),y=as.numeric(X1),color = 
                     factor(business.names, levels=unique(business.names))), 
               alpha = 0) +
    geom_text(data=way.points,
              aes(x=as.numeric(X2),y=as.numeric(X1), label=seq_along(X1)))+
    scale_color_discrete(name = "Venues") +
    labs(title=paste("The optimal route for the top rated Venues to visit in ",city,
                     " within a ",as.numeric(radius)/1000,"km radius",sep=""))+
    theme(legend.key = element_rect(fill = NA),legend.position = c(-0.28, 0.61),
          plot.title = element_text(hjust = 0, vjust = 1, face = c("bold")))
    
}
create.map(sorted.route)

Next Steps

Key Learning

Conclusion

In my day job as a data analyst I’m more accustomed to querying traditional data sources using methods such as SQL, DAX, and MDX. In my opinion, as more and more business applications move to the cloud, we’ll see the need to use techniques demonstrated in this experiment increase.

It’s thanks to the amazing R community (in particular, Stack contributors’ & package authors) that I was able to solve this problem. The R community is a strong foundation of the R eco-system.

Appendix

If you want to reproduce this you’ll need access to the following packages.

You’ll also need to set up a Yelp developer account to source your own keys.