Introduction

This tutorial covers obtaining free OpenStreetMap (OSM) data and using it for planning applications. OpenstreetMap (OSM) is a free and open map of the world created largely by voluntary contribution of millions of people around the world. Much like Wikipedia. Since the data is free and open, there are much less restrictions to obtaining and using the data. The only condition of using OSM data is proper attribution to OSM contributors.

Comparison between OSM and Google

Availability of data is reasonably good in both Google and OSM in most parts of the world. However, the availability of Google data is better in places where there is more commercial interest and that in OSM where there is more humanitarian interest.

OSM data is free to download. Overpass can be used free of cost to download small amount of data. For large datasets, use Geofabrik. Google requires users to pay based on volume of data served after a limited daily quota. Policies of Google change frequently, so note that your code will eventually and frequently break.

Downloading data

OSM serves two APIs, namely Main API for editing OSM, and Overpass API for providing OSM data. We will use Overpass API to gather data in this tutorial.

Data can be queried for download using a combination of search criteria like location and type of objects. It helps to understand how OSM data is structured. OSM data is stored as a list of attributes tagged in key - value pairs of geospatial objects (points, lines or polygons). For example, for an architect’s office, the key is “office”, and the value is “architect.” For the name of the office, key is “name” and value is “ABC Design Studio.” Access an extensive list of key-value pairs through OSM Wiki Map features.

Obtaining point locations of restaurants in Durham from OSM

Restaurants are tagged under amenities. Amenities, according to OSM Wiki are facilities used by visitors and residents. Here, ‘key’ is “amenity” and ‘value’ is “restaurant.” Do not forget to look for related amenities such as “pub”, “food court”, “cafe”, “fast food”, etc. Other amenities include: “university”, “music school”, “kindergarten” and the likes in education, “bus station”, “fuel”, “parking” and others in transportation, and much more.

library(osmdata)
library(sf)
library(tidyverse)
library(leaflet) #interactive visualization

A peek into some of the available features to get using osmdata

head(available_features(), 15)
##  [1] "4wd_only"                "abandoned"              
##  [3] "abutters"                "access"                 
##  [5] "addr"                    "addr:city"              
##  [7] "addr:conscriptionnumber" "addr:country"           
##  [9] "addr:county"             "addr:district"          
## [11] "addr:flats"              "addr:full"              
## [13] "addr:hamlet"             "addr:housename"         
## [15] "addr:housenumber"

A peek into some of the available amenities to get using osmdata

head(available_tags("amenity"), 15)
## # A tibble: 15 × 2
##    Key     Value                 
##    <chr>   <chr>                 
##  1 amenity animal_boarding       
##  2 amenity animal_breeding       
##  3 amenity animal_shelter        
##  4 amenity animal_training       
##  5 amenity arts_centre           
##  6 amenity atm                   
##  7 amenity baby_hatch            
##  8 amenity baking_oven           
##  9 amenity bank                  
## 10 amenity bar                   
## 11 amenity bbq                   
## 12 amenity bench                 
## 13 amenity bicycle_parking       
## 14 amenity bicycle_rental        
## 15 amenity bicycle_repair_station

It is helpful to learn about the distinctions between different tags (key-value pairs). For example, the key “landuse” is used to describe the purpose for which an area is being used. Examples of values for the key “landuse” are “commercial”, “retail”, “vineyard”, “cemetery”, “religious”, etc. Landuse tags are more generic than amenities and are only used for area objects while amenities can also be used for point objects. In case of any confusion, refer to Map features.

Various amenities, land-use, roads (e.g. key=“highway”, value = “primary”, “service”, “footway”), natural land features (key=“natural”, value = “grassland”), settlements (key = “place”, value = “suburb”), power (key=“power”, value=“line”, “pole”, “transformer”), etc. may be useful in planning applications.

Downloading OpenStreetMap data

The first step is to define a bounding box of the geographical area we are interested in. It is defined by four geographic coordinates, representing the minimum and maximum latitude and longitude of the area. These coordinates define the corners of the box. The order of the coordinates is usually (min Longitude, min Latitude, max Longitude, max Latitude).

We can use the getbb() function to get bounding box.

getbb("Carrboro, North carolina")
##         min       max
## x -79.12468 -79.06390
## y  35.88796  35.96761

Download OSM data on restaurants using Overpass Query API.

bb <- getbb ("Durham, North carolina")
durham_rst <- opq(bb) %>% #gets bounding box
  add_osm_feature(key = "amenity", value = "restaurant") %>% #searches for restaurants
  osmdata_sf() #download OSM data as sf
  • opq(getbb ("Durham, North carolina")): The getbb() function retrieves the bounding box for a specific geographical location, in this case, Durham, North Carolina. This bounding box describes the rectangle covering the area of interest. The opq() function (OverPass query) creates a new query to the OverPass API, using the bounding box as a constraint.

  • add_osm_feature(key = "amenity", value = "restaurant"): The add_osm_feature() function adds a feature to the OverPass query. In this case, it is looking for features where the amenity key is set to restaurant, indicating it’s a restaurant.

  • osmdata_sf(): This function retrieves the data from the OverPass API and returns it as an sf (simple features) object. sf is a spatial data format used in R.

Looking into the osmdata object

In your RStudio environment, click on the durham_rst object. It’s a list of 8 elements with information about various things, including bounding box ($bbox), points data ($osm_points), and other lines, multilines and multipolygons geometries.

In $osm_points and $osm_polygons you see various information related to restaurants. It’s because the information about restaurants are entered either as restaurants and polygons.

head(durham_rst$osm_points)$name
## [1] "Gonza Tacos y Tequila" "Breugger's Bagel"      "Rue Cler"             
## [4] "The Original Q Shack"  "Sho Nuff Seafood"      NA

Notice the use of $ sign to access name of restaurants.

head(durham_rst$osm_polygons)$name
## [1] "Devil's Pizzeria"       "International Delights" "Blue Corn Cafe"        
## [4] "Cosmic Cantina"         "Elmo's Diner"           "Vin Rouge"
#select name and geometry for restaurants
rst_osm_points <- durham_rst$osm_points %>% #select point data from downloaded OSM data
  select(name, geometry) #for now just selecting the name and geometry to plot

rst_osm_polygons <- durham_rst$osm_polygons %>%
  select(name, geometry)

durham_restaurants <- rbind(rst_osm_points, rst_osm_polygons)

OpenStreetMap found 1005 observations as restaurants.

We will use leaflet visualization to create interact map with background layers.

# install.packages("leaflet")
library(leaflet)

leaflet() %>%
  addPolygons(
    data = durham_rst$osm_polygons,
    label = durham_rst$osm_polygons$name
  ) %>% 
  addCircles(
    data = durham_rst$osm_points,
    label = durham_rst$osm_points$name
  ) %>% 
  addProviderTiles("CartoDB.Positron")

Now that we know the location of the restaurants, we can use it to calculate how many restaurants lie within each Census tracts.

head(durham_restaurants)
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -78.92438 ymin: 35.97457 xmax: -78.89976 ymax: 36.01573
## Geodetic CRS:  WGS 84
##                            name                   geometry
## 299507889 Gonza Tacos y Tequila POINT (-78.90563 35.99995)
## 299508277      Breugger's Bagel POINT (-78.92187 36.00741)
## 299513078              Rue Cler POINT (-78.89977 35.99705)
## 299528024  The Original Q Shack POINT (-78.92438 35.97457)
## 300269657      Sho Nuff Seafood POINT (-78.91903 36.01573)
## 307417766                  <NA> POINT (-78.92196 36.00953)

Getting data from tidycensus

# Load packages
library(tidycensus)
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
options(tigris_use_cache = TRUE)
# Set your Census Bureau API key
#census_api_key("your_census_api_key_here")
# Set the FIPS code for Durham County, North Carolina
state_fips <- "37"
county_fips <- "063"
# Download the data
durham_income <- get_acs(geography = "tract", 
                variables = "B19013_001", 
                state = state_fips, 
                county = county_fips,
                geometry = TRUE)  # Include geography
## Getting data from the 2017-2021 5-year ACS
# Print the data
head(durham_income)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -78.95888 ymin: 35.90875 xmax: -78.88265 ymax: 36.02528
## Geodetic CRS:  NAD83
##         GEOID                                              NAME   variable
## 1 37063000301  Census Tract 3.01, Durham County, North Carolina B19013_001
## 2 37063002022 Census Tract 20.22, Durham County, North Carolina B19013_001
## 3 37063001504 Census Tract 15.04, Durham County, North Carolina B19013_001
## 4 37063000700     Census Tract 7, Durham County, North Carolina B19013_001
## 5 37063001100    Census Tract 11, Durham County, North Carolina B19013_001
## 6 37063000302  Census Tract 3.02, Durham County, North Carolina B19013_001
##   estimate   moe                       geometry
## 1    47045 13537 MULTIPOLYGON (((-78.91938 3...
## 2    62886  8807 MULTIPOLYGON (((-78.93465 3...
## 3    30000  7721 MULTIPOLYGON (((-78.95832 3...
## 4    66949  9594 MULTIPOLYGON (((-78.92617 3...
## 5    27807  8032 MULTIPOLYGON (((-78.8971 35...
## 6    90156 20197 MULTIPOLYGON (((-78.9122 36...

Assign each restaurant to respective Census tracts

Making sue the CRS is same for both data

durham_restaurants <- st_transform(durham_restaurants, st_crs(durham_income))

Using st_join to join the datasets. Since this join is a left join, it will add attributes from durham_income (with the adjacent Census tract IDs) to the data on restaurants.

# Perform the spatial join
joined_data <- st_join(durham_restaurants, durham_income)

Look at the joined_data in the RStudio environment. There are many Restaurants without names. There are some restaurants for which census tracts are not assigned, most likely they do not belong within Durham County. We will ignore those for the time being.

durham_rests_NO_NA <- joined_data %>% 
  filter(!is.na(name)) %>% 
  filter(!is.na(GEOID))

Exercise

Let’s summarize the number of restaurants by census tracts.

durham_rests_NO_NA %>%
  select(name, GEOID) %>% 
  as_tibble() %>%
  group_by(GEOID) %>% 
  summarize(num_rsts = n()) %>% 
  arrange(desc(num_rsts))
## # A tibble: 48 × 2
##    GEOID       num_rsts
##    <chr>          <int>
##  1 37063002200       26
##  2 37063002021       23
##  3 37063000402       22
##  4 37063002029       14
##  5 37063000600       12
##  6 37063002036       11
##  7 37063002015        7
##  8 37063002034        7
##  9 37063002031        6
## 10 37063002037        6
## # ℹ 38 more rows

We can now get data about various spatial objects from OpenStreetMap and combine it with other spatial data to answer planning adjacent questions. Let’s use road network data from OpenStreetMap for another planning analysis.

Comparing walkability of cities

Walkability is a measure of accessibility of places to walking. You can find a variety of Walkability scores/ratings of places created using various methodologies.

Let’s compare the walkability of some counties in North Carolina using some back-of-the-napkin quick and dirty analysis.

First, we have to defined walkability. For this exercise, I define walkability as the length of sidewalks divided by population. We can estimate length of sidewalks from OpenStreetMap data and get population data from tidycensus.

A function to calculate total length of sidewalks for a given geography.

get_length <- function (place) {
  # Get the bounding box of the place
  bb <- opq(bbox = place) 
  
  # Set up the query
  query <- add_osm_feature(bb, key = 'highway', value = 'footway')
  
  # Run the query
  footpaths <- osmdata_sf(query)
  
  # Get the footpaths (lines)
  footpaths_df <- footpaths$osm_lines
  
  with_length <- footpaths_df %>% 
  mutate(length = st_length(geometry))
  
  return(sum(with_length$length))

}

Exercise

Another function to get the population of a given geography.

get_pop <- function(place) {
  
  get_acs(geography = "county", 
                 variables = "B01003_001", 
                 state = "NC", 
                 county = place,
                 survey = "acs5",
                 year = 2019) %>%
    summarise(total_population = sum(estimate)) %>% 
    as.numeric()
}

Exercise

Computing Walkability score for various counties using the above functions.

walkability_orange <- get_length("Orange County") / get_pop("Orange County")
## Getting data from the 2015-2019 5-year ACS
walkability_orange
## 17.79836 [m]
walkability_durham <- get_length("Durham County") / get_pop("Durham County")
## Getting data from the 2015-2019 5-year ACS
walkability_durham
## 6.116302 [m]

Exercise

References