This is a general overview of some useful commands for plotting both static and interactive NYC maps, including shapefiles for Census tracts and neighborhood boundaries, and spatial joins for mapping points into regions. It’s adapted from a great tutorial on using the tigris package for easy access to shapefiles and some other work on maps with ggplot2.
You’ll need to install a number of packages (listed below) to get things working. See the tigris README for tips. Mac users may need to reinstall the sp package to fix broken dependencies.
library(tigris)
library(dplyr)
library(leaflet)
library(sp)
library(ggmap)
library(maptools)
library(broom)
library(httr)
library(rgdal)
There are many different ways to plot maps in R. Here are two of the easiest.
Make basic maps with ggmap by specifying a location (as a string or by lat/long) and zoom level. The geocode function is used to convert a string to lat/long behind the scenes and tiles are pulled from Google maps.
nyc_map <- get_map(location = c(lon = -74.00, lat = 40.71), maptype = "terrain", zoom = 11)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=40.71,-74&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
ggmap(nyc_map)
Or make interactive maps using Leaflet for R in Rstudio.
leaflet() %>%
addTiles() %>%
setView(-74.00, 40.71, zoom = 12)
Leaflet offers a variety of provider tiles for different look/feel.
leaflet() %>%
addTiles() %>%
setView(-74.00, 40.71, zoom = 12) %>%
addProviderTiles("CartoDB.Positron")
The tigris package makes it easy to grab Census shapefiles at various levels, from tracts and zipcodes all the way up to national borders. First we look up the appropriate codes.
lookup_code("New York", "New York")
## [1] "The code for New York is '36' and the code for New York County is '061'."
#lookup_code("New York", "Kings")
#lookup_code("New York", "Queens")
#lookup_code("New York", "Bronx")
#lookup_code("New York", "Richmond?")
Then we download the data and take a look at it. The summary below shows the data associated with each polygon (e.g., the GEOID and NAME). We can access data columns via nyc_tracts@data.
nyc_tracts <- tracts(state = '36', county = c('061','047','081','005','085'))
summary(nyc_tracts)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -74.25909 -73.70001
## y 40.47740 40.91758
## Is projected: FALSE
## proj4string :
## [+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0]
## Data attributes:
## STATEFP COUNTYFP TRACTCE
## Length:2167 Length:2167 Length:2167
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## GEOID NAME NAMELSAD
## Length:2167 Length:2167 Length:2167
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## MTFCC FUNCSTAT ALAND
## Length:2167 Length:2167 Min. : 0
## Class :character Class :character 1st Qu.: 156383
## Mode :character Mode :character Median : 184733
## Mean : 360310
## 3rd Qu.: 291132
## Max. :18319107
## AWATER INTPTLAT INTPTLON
## Min. : 0 Length:2167 Length:2167
## 1st Qu.: 0 Class :character Class :character
## Median : 0 Mode :character Mode :character
## Mean : 199280
## 3rd Qu.: 0
## Max. :121773132
You can take a quick look at these with the plot command from base R. You can visualize them with ggplot and leaflet as well, but we’ll hold off on that for a moment.
plot(nyc_tracts)
Census tracts are great, but they might be too detailed for many purposes, where neighborhoods might be a more familiar unit of analysis. The PUMA Census neighborhoods for New York City lump together a few distinct areas (e.g., Chelsea, Flatiron, and the West Village), but Pediacities has more fine-grained neighborhood shapefiles available as a GeoJSON file.
r <- GET('http://data.beta.nyc//dataset/0ff93d2d-90ba-457c-9f7e-39e47bf2ac5f/resource/35dd04fb-81b3-479b-a074-a27a37888ce7/download/d085e2f8d0b54d4590b1e7d1f35594c1pediacitiesnycneighborhoods.geojson')
nyc_neighborhoods <- readOGR(content(r,'text'), 'OGRGeoJSON', verbose = F)
## No encoding supplied: defaulting to UTF-8.
summary(nyc_neighborhoods)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -74.25559 -73.70001
## y 40.49613 40.91553
## Is projected: FALSE
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Data attributes:
## neighborhood boroughCode borough
## Jamaica Bay : 16 1:38 Bronx :75
## Pelham Islands : 14 2:74 Brooklyn :71
## Broad Channel : 10 3:67 Manhattan :37
## Marine Park : 3 4:77 Queens :73
## Pelham Bay Park: 3 5:54 Staten Island:54
## Bayswater : 2
## (Other) :262
## X.id
## http://nyc.pediacities.com/Resource/Neighborhood/Jamaica_Bay : 16
## http://nyc.pediacities.com/Resource/Neighborhood/Pelham_Islands : 14
## http://nyc.pediacities.com/Resource/Neighborhood/Broad_Channel : 10
## http://nyc.pediacities.com/Resource/Neighborhood/Marine_Park : 3
## http://nyc.pediacities.com/Resource/Neighborhood/Pelham_Bay_Park: 3
## http://nyc.pediacities.com/Resource/Neighborhood/Bayswater : 2
## (Other) :262
ggplot can plot shapefiles but requires them to be converted from a spatial polygon data frame format to plain old data frames first. broom handles this with its tidy command. (fortify from the sp package also works.)
nyc_neighborhoods_df <- tidy(nyc_neighborhoods)
## Regions defined for each Polygons
ggplot() +
geom_polygon(data=nyc_neighborhoods_df, aes(x=long, y=lat, group=group))
We can clean this up a bit and add tiles from Google maps with ggmap.
ggmap(nyc_map) +
geom_polygon(data=nyc_neighborhoods_df, aes(x=long, y=lat, group=group), color="blue", fill=NA)
Or we can do an interactive version which shows neighborhood names when you click on a polygon.
leaflet(nyc_neighborhoods) %>%
addTiles() %>%
addPolygons(popup = ~neighborhood) %>%
addProviderTiles("CartoDB.Positron")
There are also convenient functions for mapping lat/long pairs into polygons. This is useful if you want to know what neighborhood a certain point is in, for instance.
To see how this works, let’s first generate 10 random points in Manhattan.
set.seed(42)
lats <- 40.7544882 + rnorm(10)/100
lngs <- -73.9879923 + rnorm(10)/200
points <- data.frame(lat=lats, lng=lngs)
points
## lat lng
## 1 40.76820 -73.98147
## 2 40.74884 -73.97656
## 3 40.75812 -73.99494
## 4 40.76082 -73.98939
## 5 40.75853 -73.98866
## 6 40.75343 -73.98481
## 7 40.76960 -73.98941
## 8 40.75354 -74.00127
## 9 40.77467 -74.00019
## 10 40.75386 -73.98139
In order to do the join, we first need to convert the points data frame to a spatial data frame. The coordinates function specifies which columns should be used for positioning, and the proj4string function specifies what type of projection should be used. In this case we just want a projection that’s consistent with the neighborhood shapes, so we copy it over. Finally, we use the matches function to do the spatial join and bind the columns back together.
points_spdf <- points
coordinates(points_spdf) <- ~lng + lat
proj4string(points_spdf) <- proj4string(nyc_neighborhoods)
matches <- over(points_spdf, nyc_neighborhoods)
points <- cbind(points, matches)
points
## lat lng neighborhood boroughCode borough
## 1 40.76820 -73.98147 Central Park 1 Manhattan
## 2 40.74884 -73.97656 Murray Hill 1 Manhattan
## 3 40.75812 -73.99494 Hell's Kitchen 1 Manhattan
## 4 40.76082 -73.98939 Hell's Kitchen 1 Manhattan
## 5 40.75853 -73.98866 Theater District 1 Manhattan
## 6 40.75343 -73.98481 Midtown 1 Manhattan
## 7 40.76960 -73.98941 Hell's Kitchen 1 Manhattan
## 8 40.75354 -74.00127 Chelsea 1 Manhattan
## 9 40.77467 -74.00019 <NA> <NA> <NA>
## 10 40.75386 -73.98139 Midtown 1 Manhattan
## X.id
## 1 http://nyc.pediacities.com/Resource/Neighborhood/Central_Park
## 2 http://nyc.pediacities.com/Resource/Neighborhood/Murray_Hill
## 3 http://nyc.pediacities.com/Resource/Neighborhood/Hell_s_Kitchen
## 4 http://nyc.pediacities.com/Resource/Neighborhood/Hell_s_Kitchen
## 5 http://nyc.pediacities.com/Resource/Neighborhood/Theater_District
## 6 http://nyc.pediacities.com/Resource/Neighborhood/Midtown
## 7 http://nyc.pediacities.com/Resource/Neighborhood/Hell_s_Kitchen
## 8 http://nyc.pediacities.com/Resource/Neighborhood/Chelsea
## 9 <NA>
## 10 http://nyc.pediacities.com/Resource/Neighborhood/Midtown
We can verify the matches by poking around on an interactive map and clicking the polygons and markers to see that proper neighborhoods have been assigned.
leaflet(nyc_neighborhoods) %>%
addTiles() %>%
addPolygons(popup = ~neighborhood) %>%
addMarkers(~lng, ~lat, popup = ~neighborhood, data = points) %>%
addProviderTiles("CartoDB.Positron") %>%
setView(-73.98, 40.75, zoom = 13)
Finally, we can use other data to color our polygons. As an example, we’ll count the number of points in each polygon and map that to a color. The geo_join function makes it easy to join the spatial polygon data frame to a regular data frame. Here’s the plot with Leaflet.
points_by_neighborhood <- points %>%
group_by(neighborhood) %>%
summarize(num_points=n())
map_data <- geo_join(nyc_neighborhoods, points_by_neighborhood, "neighborhood", "neighborhood")
pal <- colorNumeric(palette = "RdBu",
domain = range(map_data@data$num_points, na.rm=T))
leaflet(map_data) %>%
addTiles() %>%
addPolygons(fillColor = ~pal(num_points), popup = ~neighborhood) %>%
addMarkers(~lng, ~lat, popup = ~neighborhood, data = points) %>%
addProviderTiles("CartoDB.Positron") %>%
setView(-73.98, 40.75, zoom = 13)
We can do the same thing with ggplot and ggmap for a different look/feel, but we need to construct the underlying data frame in a slightly different way. First we use broom to convert the spatial data frame to a tidy data frame by neighborhood, then we join the number of points by neighborhood to the tidy data frame, and plot.
plot_data <- tidy(nyc_neighborhoods, region="neighborhood") %>%
left_join(., points_by_neighborhood, by=c("id"="neighborhood")) %>%
filter(!is.na(num_points))
## Warning in left_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
manhattan_map <- get_map(location = c(lon = -74.00, lat = 40.77), maptype = "terrain", zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=40.77,-74&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
ggmap(manhattan_map) +
geom_polygon(data=plot_data, aes(x=long, y=lat, group=group, fill=num_points), alpha=0.75)