The Problem

Often when working on spatially oriented problems, two questions come up:
1. Where is something?
2. How far is it between two places?

Data may be received as text street addresses. In that format however, they’re not much more than labels. They are difficult to compare or perform any meaningful analysis with.

Enter geocoding! Geocoding takes an address and return a coordinate in space. This coordinate, along with other coordinates, can then be meaningfully used and analysed. A type of coordinate system people are familiar with are longitude (x) and latitude (y) coordinates expressed in decimal degrees.

Once we’re able to locate points in space, we can then ask how far are things apart. One approach you might take is to calculate a straight line distance between two points. An alternative method would be to calculate a distance by road, which would take into account physical (e.g. existence of roads) and topological features (e.g. waterways)

This vignette will guide you through how you can use Google’s APIs to locate points in space, and also return routed distances/durations between them.


The Solution

Begin by loading the RCurl and RJSONIO packages

# Used for submitting http requests to the API
#install.packages("RCurl")
library(RCurl)
## Warning: package 'RCurl' was built under R version 3.2.5
## Loading required package: bitops
# Used for processing the results returned in JSON format
#install.packages("RJSONIO")
library(RJSONIO)

Here are two functions that will act like workers to build request URL strings to submit to the API using inputs from our data. These functions will called from other functions we’ll define below.

# A function to construct a URL for querying Google's geocoding API
# At a minimum, the API requires two parameters, reflected in this function
  # 1. A text address
  # 2. the format of the data to be returned. JSON by default. Option is XML
construct.geocode.url <- function(u_address, return.call = "json") {
  # The base API URL
  root <- "https://maps.googleapis.com/maps/api/geocode/"
  # Join the components of the url together
  u <- paste(root, return.call, "?address=", u_address, sep = "")
  # Return an encoded URL string
  return(URLencode(u))
}

# A function to construct a URL for querying Google's distance matrix API
# At a minimum, the API requires three parameters
  # A 'from' address, 
  # A 'to' address, and  
  # The format to be returned. JSON by default. Option is XML
# There are optional parameters, which may be useful
  # Units (metric, imperial. Default metric)
  # Mode of transport (driving, walking, bicycling, transit. Default driving)
construct.distance.url <- function(u_from, u_to, return.call = "json", u_units = "metric", u_mode = "driving") {
  # The base API URL
  root <- "https://maps.googleapis.com/maps/api/distancematrix/"
  # Join the components of the url together
  u <- paste(root, return.call, "?origins=", u_from, "&destinations=", u_to, "&mode=", u_mode, "&units=", u_units, sep = "")
  # Return an encoded URL string
  return(URLencode(u))
}

When given an address, this function will return a latitude and longitude coordinate using Google’s geocoding API.

# This function queries Google's geocoding API
gGeoCode <- function(address, return.type = "json") {
  # Call the function we made earlier to build a URL string
  u <- construct.geocode.url(address, return.type)
  # Request information from the API
  doc <- getURL(u)
  # Parse the JSON into a structured format
  x <- fromJSON(doc,simplify = FALSE)

  # Check the response status from the API
  if(x$status=="OK") {
    # Extract the latitude and longitude coordinates
    lat <- x$results[[1]]$geometry$location$lat
    lng <- x$results[[1]]$geometry$location$lng
    return(paste(lat, lng, sep = ", "))
    
  } else {
    # Return NAs
    return(paste(NA, NA, sep = ", "))
    
  }
}

When given an origin and destination, this function will return the distance and time between the locations according to the mode of transport specified.

# Ths function queries Google's distance matrix API
gDistanceTime <- function(from, to, return.type = "json", units = "metric", mode = "driving") {
  # Call the function to construct a distance matrix URL
  u <- construct.distance.url(from, to, return.type, units, mode)
  # Request information from the API
  doc <- getURL(u)
  # Parse the JSON into a structured format
  x <- fromJSON(doc,simplify = FALSE)
  
  # Check the response status from the API  
  if(x$status=="OK") {
    # Extract the distance (m) and time (seconds)
    txt.distance <- x$rows[[1]]$elements[[1]]$distance$text
    txt.time <- x$rows[[1]]$elements[[1]]$duration$text
    val.distance <- x$rows[[1]]$elements[[1]]$distance$value
    val.time <- x$rows[[1]]$elements[[1]]$duration$value
    return(paste(txt.distance, " (", val.distance, "m) and ", txt.time, " (", val.time, "s)", sep = ""))
    
  } else {
    # Return NAs
    return(c(NA,NA))
    
  }
}

Examples

The first example demonstrates Google’s geocoding API returning coordinates from an address

fromAddr <- "University of New South Wales, Kingsford, NSW, Australia"
fromCoord <- gGeoCode(fromAddr)
print(fromCoord)
## [1] "-33.917347, 151.2312675"
toAddr <- "University of Technology Sydney, Broadway, NSW, Australia"
toCoord <- gGeoCode(toAddr)
print(toCoord)
## [1] "-33.8832376, 151.2004942"

Now let’s try the distance matrix, using the two addresses from before

# Return driving distance and time 
gDistanceTime(fromAddr, toAddr, return.type = "json", mode = "driving")
## [1] "6.0 km (5996m) and 17 mins (1011s)"

Let’s see how this changes when we set the mode to walking

# Return driving distance and time 
gDistanceTime(fromAddr, toAddr, return.type = "json", mode = "walking")
## [1] "5.1 km (5126m) and 1 hour 5 mins (3872s)"

A bit more effort involved!


Limitations and Considerations

There are some things to keep in mind when using using Google for a geocoding service. The API limits your usage to 2500 requests each day, so if you are working with a large dataset, you may need to be strategic about your usage if you want to use the free service. You can register for an API key which wil give you additional access to their services.

Alternative geocoding services you may wish to consider are Bing and Open Street Map (OSM). Bing’s API will allow you to make more free requests than Google.

Finally, keep in mind the quality of your input data and how it could affect the precision of the results returned. The API is effectively translating your address inputs and return a precise location. It may be difficult to verify the accuracy of geocoded addreses. My recommendation would be to apply some high level tests on your data such as checking for points outside of known bounds.

Finally, there is a package called ggmap, which embeds calls to Google’s various APIs for you. While it simplifies this process, the reason for this vignette was to give readers a behind-the-scenes look at how it’s possible to interact with an API directly. This was not intended to be an exhaustive overview of Google’s Maps API, so please refer to the link below for all the detail.

Other Documentation & References

The Google Maps API
RCurl documentation
RJSONIO documentation
ggmap documentation