This tutorial covers obtaining free OpenStreetMap (OSM) data and using it for planning applications. OpenstreetMap (OSM) is a free and open map of the world created largely by voluntary contribution of millions of people around the world. Much like Wikipedia. Since the data is free and open, there are much less restrictions to obtaining and using the data. The only condition of using OSM data is proper attribution to OSM contributors.
Availability of data is reasonably good in both Google and OSM in most parts of the world. However, the availability of Google data is better in places where there is more commercial interest and that in OSM where there is more humanitarian interest.
OSM data is free to download. Overpass can be used free of cost to download small amount of data. For large datasets, use Geofabrik. Google requires users to pay based on volume of data served after a limited daily quota. Policies of Google change frequently, so note that your code will eventually and frequently break.
OSM serves two APIs, namely Main API for editing OSM, and Overpass API for providing OSM data. We will use Overpass API to gather data in this tutorial.
Data can be queried for download using a combination of search criteria like location and type of objects. It helps to understand how OSM data is structured. OSM data is stored as a list of attributes tagged in key - value pairs of geospatial objects (points, lines or polygons). For example, for an architect’s office, the key is “office”, and the value is “architect.” For the name of the office, key is “name” and value is “ABC Design Studio.” Access an extensive list of key-value pairs through OSM Wiki Map features.
Restaurants are tagged under amenities. Amenities, according to OSM Wiki are facilities used by visitors and residents. Here, ‘key’ is “amenity” and ‘value’ is “restaurant.” Do not forget to look for related amenities such as “pub”, “food court”, “cafe”, “fast food”, etc. Other amenities include: “university”, “music school”, “kindergarten” and the likes in education, “bus station”, “fuel”, “parking” and others in transportation, and much more.
library(osmdata)
library(sf)
library(tidyverse)
library(leaflet) #interactive visualization
A peek into some of the available features to get using
osmdata
head(available_features(), 15)
## [1] "4wd_only" "abandoned"
## [3] "abutters" "access"
## [5] "addr" "addr:city"
## [7] "addr:conscriptionnumber" "addr:country"
## [9] "addr:county" "addr:district"
## [11] "addr:flats" "addr:full"
## [13] "addr:hamlet" "addr:housename"
## [15] "addr:housenumber"
A peek into some of the available amenities to get using
osmdata
head(available_tags("amenity"), 15)
## # A tibble: 15 × 2
## Key Value
## <chr> <chr>
## 1 amenity animal_boarding
## 2 amenity animal_breeding
## 3 amenity animal_shelter
## 4 amenity animal_training
## 5 amenity arts_centre
## 6 amenity atm
## 7 amenity baby_hatch
## 8 amenity baking_oven
## 9 amenity bank
## 10 amenity bar
## 11 amenity bbq
## 12 amenity bench
## 13 amenity bicycle_parking
## 14 amenity bicycle_rental
## 15 amenity bicycle_repair_station
It is helpful to learn about the distinctions between different tags (key-value pairs). For example, the key “landuse” is used to describe the purpose for which an area is being used. Examples of values for the key “landuse” are “commercial”, “retail”, “vineyard”, “cemetery”, “religious”, etc. Landuse tags are more generic than amenities and are only used for area objects while amenities can also be used for point objects. In case of any confusion, refer to Map features.
Various amenities, land-use, roads (e.g. key=“highway”, value = “primary”, “service”, “footway”), natural land features (key=“natural”, value = “grassland”), settlements (key = “place”, value = “suburb”), power (key=“power”, value=“line”, “pole”, “transformer”), etc. may be useful in planning applications.
The first step is to define a bounding box
of the
geographical area we are interested in. It is defined by four geographic
coordinates, representing the minimum and maximum latitude and longitude
of the area. These coordinates define the corners of the box. The order
of the coordinates is usually (min Longitude, min Latitude, max
Longitude, max Latitude).
We can use the getbb()
function to get bounding box.
getbb("Carrboro, North carolina")
## min max
## x -79.12468 -79.06390
## y 35.88796 35.96761
bb <- getbb ("Durham, North carolina")
durham_rst <- opq(bb) %>% #gets bounding box
add_osm_feature(key = "amenity", value = "restaurant") %>% #searches for restaurants
osmdata_sf() #download OSM data as sf
opq(getbb ("Durham, North carolina"))
: The
getbb()
function retrieves the bounding box for a specific
geographical location, in this case, Durham, North Carolina. This
bounding box describes the rectangle covering the area of interest. The
opq()
function (OverPass query) creates a new query to the
OverPass API, using the bounding box as a constraint.
add_osm_feature(key = "amenity", value = "restaurant")
:
The add_osm_feature()
function adds a feature to the
OverPass query. In this case, it is looking for features where the
amenity key is set to restaurant, indicating it’s a restaurant.
osmdata_sf()
: This function retrieves the data from
the OverPass API and returns it as an sf
(simple features)
object. sf is a spatial data format used in R.
osmdata
objectIn your RStudio environment, click on the durham_rst
object. It’s a list of 8 elements with information about various things,
including bounding box ($bbox
), points data
($osm_points
), and other lines, multilines and
multipolygons geometries.
In $osm_points
and $osm_polygons
you see
various information related to restaurants. It’s because the information
about restaurants are entered either as restaurants and polygons.
head(durham_rst$osm_points)$name
## [1] "Gonza Tacos y Tequila" "Breugger's Bagel" "Rue Cler"
## [4] "The Original Q Shack" "Sho Nuff Seafood" NA
Notice the use of $
sign to access name of
restaurants.
head(durham_rst$osm_polygons)$name
## [1] "Devil's Pizzeria" "International Delights" "Blue Corn Cafe"
## [4] "Cosmic Cantina" "Elmo's Diner" "Vin Rouge"
#select name and geometry for restaurants
rst_osm_points <- durham_rst$osm_points %>% #select point data from downloaded OSM data
select(name, geometry) #for now just selecting the name and geometry to plot
rst_osm_polygons <- durham_rst$osm_polygons %>%
select(name, geometry)
durham_restaurants <- rbind(rst_osm_points, rst_osm_polygons)
OpenStreetMap found 1005 observations as restaurants.
We will use leaflet
visualization to create interact map
with background layers.
# install.packages("leaflet")
library(leaflet)
leaflet() %>%
addPolygons(
data = durham_rst$osm_polygons,
label = durham_rst$osm_polygons$name
) %>%
addCircles(
data = durham_rst$osm_points,
label = durham_rst$osm_points$name
) %>%
addProviderTiles("CartoDB.Positron")
Now that we know the location of the restaurants, we can use it to calculate how many restaurants lie within each Census tracts.
head(durham_restaurants)
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -78.92438 ymin: 35.97457 xmax: -78.89976 ymax: 36.01573
## Geodetic CRS: WGS 84
## name geometry
## 299507889 Gonza Tacos y Tequila POINT (-78.90563 35.99995)
## 299508277 Breugger's Bagel POINT (-78.92187 36.00741)
## 299513078 Rue Cler POINT (-78.89977 35.99705)
## 299528024 The Original Q Shack POINT (-78.92438 35.97457)
## 300269657 Sho Nuff Seafood POINT (-78.91903 36.01573)
## 307417766 <NA> POINT (-78.92196 36.00953)
Getting data from tidycensus
# Load packages
library(tidycensus)
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
options(tigris_use_cache = TRUE)
# Set your Census Bureau API key
#census_api_key("your_census_api_key_here")
# Set the FIPS code for Durham County, North Carolina
state_fips <- "37"
county_fips <- "063"
# Download the data
durham_income <- get_acs(geography = "tract",
variables = "B19013_001",
state = state_fips,
county = county_fips,
geometry = TRUE) # Include geography
## Getting data from the 2017-2021 5-year ACS
# Print the data
head(durham_income)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -78.95888 ymin: 35.90875 xmax: -78.88265 ymax: 36.02528
## Geodetic CRS: NAD83
## GEOID NAME variable
## 1 37063000301 Census Tract 3.01, Durham County, North Carolina B19013_001
## 2 37063002022 Census Tract 20.22, Durham County, North Carolina B19013_001
## 3 37063001504 Census Tract 15.04, Durham County, North Carolina B19013_001
## 4 37063000700 Census Tract 7, Durham County, North Carolina B19013_001
## 5 37063001100 Census Tract 11, Durham County, North Carolina B19013_001
## 6 37063000302 Census Tract 3.02, Durham County, North Carolina B19013_001
## estimate moe geometry
## 1 47045 13537 MULTIPOLYGON (((-78.91938 3...
## 2 62886 8807 MULTIPOLYGON (((-78.93465 3...
## 3 30000 7721 MULTIPOLYGON (((-78.95832 3...
## 4 66949 9594 MULTIPOLYGON (((-78.92617 3...
## 5 27807 8032 MULTIPOLYGON (((-78.8971 35...
## 6 90156 20197 MULTIPOLYGON (((-78.9122 36...
Making sue the CRS is same for both data
durham_restaurants <- st_transform(durham_restaurants, st_crs(durham_income))
Using st_join
to join the datasets. Since this join is a
left join, it will add attributes from durham_income (with the adjacent
Census tract IDs) to the data on restaurants.
# Perform the spatial join
joined_data <- st_join(durham_restaurants, durham_income)
Look at the joined_data
in the RStudio environment.
There are many Restaurants without names. There are some restaurants for
which census tracts are not assigned, most likely they do not belong
within Durham County. We will ignore those for the time being.
durham_rests_NO_NA <- joined_data %>%
filter(!is.na(name)) %>%
filter(!is.na(GEOID))
filter(!is.na(name))
Let’s summarize the number of restaurants by census tracts.
durham_rests_NO_NA %>%
select(name, GEOID) %>%
as_tibble() %>%
group_by(GEOID) %>%
summarize(num_rsts = n()) %>%
arrange(desc(num_rsts))
## # A tibble: 48 × 2
## GEOID num_rsts
## <chr> <int>
## 1 37063002200 26
## 2 37063002021 23
## 3 37063000402 22
## 4 37063002029 14
## 5 37063000600 12
## 6 37063002036 11
## 7 37063002015 7
## 8 37063002034 7
## 9 37063002031 6
## 10 37063002037 6
## # ℹ 38 more rows
We can now get data about various spatial objects from OpenStreetMap and combine it with other spatial data to answer planning adjacent questions. Let’s use road network data from OpenStreetMap for another planning analysis.
Walkability is a measure of accessibility of places to walking. You can find a variety of Walkability scores/ratings of places created using various methodologies.
Let’s compare the walkability of some counties in North Carolina using some back-of-the-napkin quick and dirty analysis.
First, we have to defined walkability. For this exercise, I define walkability as the length of sidewalks divided by population. We can estimate length of sidewalks from OpenStreetMap data and get population data from tidycensus.
A function to calculate total length of sidewalks for a given geography.
get_length <- function (place) {
# Get the bounding box of the place
bb <- opq(bbox = place)
# Set up the query
query <- add_osm_feature(bb, key = 'highway', value = 'footway')
# Run the query
footpaths <- osmdata_sf(query)
# Get the footpaths (lines)
footpaths_df <- footpaths$osm_lines
with_length <- footpaths_df %>%
mutate(length = st_length(geometry))
return(sum(with_length$length))
}
Another function to get the population of a given geography.
get_pop <- function(place) {
get_acs(geography = "county",
variables = "B01003_001",
state = "NC",
county = place,
survey = "acs5",
year = 2019) %>%
summarise(total_population = sum(estimate)) %>%
as.numeric()
}
Computing Walkability score for various counties using the above functions.
walkability_orange <- get_length("Orange County") / get_pop("Orange County")
## Getting data from the 2015-2019 5-year ACS
walkability_orange
## 17.79836 [m]
walkability_durham <- get_length("Durham County") / get_pop("Durham County")
## Getting data from the 2015-2019 5-year ACS
walkability_durham
## 6.116302 [m]