The following code will demo how to calculate the commute times and distances via roads between two sets of coordinates using the OSRM package. The benefit of the OSRM package is that it is free and open sourced, as opposed to the Google maps API or the traveltime API. Information on the OSRM package can be found at the following link: https://cran.r-project.org/web/packages/osrm/osrm.pdf

Libraries and Data

Your data should be structured such that each row contains an ID column and a separate column for the latitude and longitude values for both points (a total of 5 columns). Your data can have other columns, however these 5 are the primary inputs to calculate the commute times and distance. Additionally, the coordinate variables need must be numeric.

# Load Libraries 
library(tidyverse)
library(osrm)
library(readxl)
# Create Sample DF 
coordinates <- tibble(
  id = c(1:30), 
  lat_x = c(38.9602257, 38.9188691, 38.75145699949, 39.1733098860145, 38.9725462,
            38.9245788, 39.0344945557767, 38.9557725, 38.961446617296, 38.9354417,
            38.7959068134916, 38.94608585, 38.85574885, 38.95314385, 39.0663382563416,
            38.9154308, 38.8833787, 38.831872, 38.95056785, 38.9101801, 38.97626435,
            38.944957, 39.0640352563591, 38.9298821, 38.9434917, 38.8570795,
            38.8359819, 38.8282029, 38.8397272309088, 38.64001325), 
  long_x = c(-76.9808986, -77.0337542316473, -77.0912972499416, -77.2134039454291,
              -77.0240985, -77.0380635715856, -77.0393849638859, -77.0237716715352,
              -76.9929210785651, -76.8965550971963, -77.0764954838148, -76.9207195315118,
              -76.8933188484041, -76.8972937258547, -77.06835155214, -77.0176835721485,
              -76.9398380923472, -76.995784, -76.9032617867723, -77.0311444243821,
              -77.0186678063586, -76.9978570454727, -77.0657632722379, -77.0211567813762,
              -76.9295391956639, -76.9895396066425, -77.007232964649, -77.0001848,
              -76.9895726917367, -77.0779929088052), 
  lat_y = c(38.9566327099215, 38.9409817019037, 38.9409817019037, 38.9409817019037,
             38.9409817019037, 38.9193374095254, 38.9409817019037, 38.9409817019037,
             38.9409817019037, 38.9409817019037, 38.9409817019037, 38.9409817019037,
             38.9409817019037, 38.9409817019037, 38.9409817019037, 38.9409817019037,
             38.9097878588907, 38.8284813984227, 38.9563841747171, 38.9563841747171,
             38.9563841747171, 38.9236957448861, 38.9563841747171, 38.9563841747171,
             38.9563841747171, 38.8284813984227, 38.833760373766, 38.8284813984227,
             38.8284813984227, 38.8948545102086), 
  long_y = c(-77.0161909242981, -77.0266813674006, -77.0266813674006, -77.0266813674006,
              -77.0266813674006, -77.0318780635116, -77.0266813674006, -77.0266813674006,
              -77.0266813674006, -77.0266813674006, -77.0266813674006, -77.0266813674006,
              -77.0266813674006, -77.0266813674006, -77.0266813674006, -77.0266813674006,
              -77.0238950764798, -76.9984206731634, -77.0200729928712, -77.0200729928712,
              -77.0200729928712, -76.9671932736695, -77.0200729928712, -77.0200729928712,
              -77.0200729928712, -76.9984206731634, -76.985317912937, -76.9984206731634,
              -76.9984206731634, -77.0199788270182)
)

Data Wrangling

We need to create two dataframes to seperate our origin and destination coordinates. Additionally, there we need to transform our subsets into a manner that our distance function will recognize. First, we need to rename the coordiante variables. Second, we structure the dataframes such that variables are in the order ID, longitude, latitude.Lastly, we need to convert the data into a data.frame object.

origin <-  coordinates %>% 
  # Subset your origin coordinates and the ID
  select(id, long_x, lat_x) %>% 
  # Rename the Variables 
  rename(lon = long_x, 
         lat = lat_x) %>% 
  # Convert to data.frame object
  as.data.frame()
destination <- coordinates %>% 
  # Subset your destination coordinates
  select(id, long_y, lat_y) %>% 
  # Rename the Variables
  rename(lon = long_y, 
         lat = lat_y) %>% 
  # Convert to data.frame object
  as.data.frame()

Calculate Commute Distance and Duration

We will use the osrmRoute() function to calculate the distance and commute times between our two sets of points. The osrmRoute() function can calculate distance and time for travel by car, bike, or walking. We will use a for loop to iterate over the coordinates in our data set and create a new set of data containing our commute times and distances. Note that this is a computationally heavy function, and can potentially take some time to run if you have a large dataset to iterate over. Lastly, note that the outputs will be in minutes for time and kilometers for distance.

## Create an empty list to initialize the loop
commute_data <- list()

## Use For Loop to iterate over values and calculate commute distance and times
for (i in 1:nrow(coordinates)) {
  commute_data[[i]] <- osrmRoute(
    # Origin Points
    src = origin[i, 2:3], 
    # Destination Points
    dst = destination[i, 2:3],
    # Simplify output 
    overview = F, 
    # Set mode of transportation (car, bike, or foot)
    osrm.profile = "car"
  )
}
## Combine the list output from the for loop into a single data frame
commute_data <- as.data.frame(do.call(rbind, commute_data))

## Add ID variable back in 
commute_data$id <- coordinates$id
commute_data
##    duration distance id
## 1      5.87     3.98  1
## 2      5.11     2.95  2
## 3     35.27    24.33  3
## 4     37.46    38.25  4
## 5      4.11     3.83  5
## 6      2.71     1.56  6
## 7     14.88    11.59  7
## 8      2.61     2.15  8
## 9      4.92     4.26  9
## 10    16.32    15.80 10
## 11    29.68    20.21 11
## 12    15.62    13.33 12
## 13    23.90    18.56 13
## 14    18.29    15.77 14
## 15    20.32    15.83 15
## 16     6.50     3.75 16
## 17    11.64     9.71 17
## 18     1.83     0.82 18
## 19    16.25    14.10 19
## 20     9.30     6.10 20
## 21     4.24     3.12 21
## 22     6.04     4.36 22
## 23    18.63    14.27 23
## 24     4.83     3.82 24
## 25    13.34    11.11 25
## 26     8.87     4.76 26
## 27     5.00     2.88 27
## 28     1.10     0.43 28
## 29     3.58     2.11 29
## 30    39.81    38.21 30

The final output “commute_data” should have three variables: - Distance: the distance in kilometers between your origin and destination points via roads. - Duration: the time in minutes between your origin and destination points by you chosen means of travel - id: the ID’s associated with each unique origin and destination points, taken from the original data frame.