This is a general overview of some useful commands for plotting both static and interactive NYC maps, including shapefiles for Census tracts and neighborhood boundaries, and spatial joins for mapping points into regions. It’s adapted from a great tutorial on using the tigris package for easy access to shapefiles and some other work on maps with ggplot2.

You’ll need to install a number of packages (listed below) to get things working. See the tigris README for tips. Mac users may need to reinstall the sp package to fix broken dependencies.

Load Packages

library(tigris)
library(dplyr)
library(leaflet)
library(sp)
library(ggmap)
library(maptools)
library(broom)
library(httr)
library(rgdal)

Basic maps

There are many different ways to plot maps in R. Here are two of the easiest.

ggmap

Make basic maps with ggmap by specifying a location (as a string or by lat/long) and zoom level. The geocode function is used to convert a string to lat/long behind the scenes and tiles are pulled from Google maps.

nyc_map <- get_map(location = c(lon = -74.00, lat = 40.71), maptype = "terrain", zoom = 11)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=40.71,-74&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
ggmap(nyc_map)

leaflet

Or make interactive maps using Leaflet for R in Rstudio.

leaflet() %>%
  addTiles() %>%
  setView(-74.00, 40.71, zoom = 12)

Leaflet offers a variety of provider tiles for different look/feel.

leaflet() %>%
  addTiles() %>%
  setView(-74.00, 40.71, zoom = 12) %>%
  addProviderTiles("CartoDB.Positron")

Shapefiles

Census tracts from tigris

The tigris package makes it easy to grab Census shapefiles at various levels, from tracts and zipcodes all the way up to national borders. First we look up the appropriate codes.

lookup_code("New York", "New York")
## [1] "The code for New York is '36' and the code for New York County is '061'."
#lookup_code("New York", "Kings")
#lookup_code("New York", "Queens")
#lookup_code("New York", "Bronx")
#lookup_code("New York", "Richmond?")

Then we download the data and take a look at it. The summary below shows the data associated with each polygon (e.g., the GEOID and NAME). We can access data columns via nyc_tracts@data.

nyc_tracts <- tracts(state = '36', county = c('061','047','081','005','085'))

summary(nyc_tracts)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x -74.25909 -73.70001
## y  40.47740  40.91758
## Is projected: FALSE 
## proj4string :
## [+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0]
## Data attributes:
##    STATEFP            COUNTYFP           TRACTCE         
##  Length:2167        Length:2167        Length:2167       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##     GEOID               NAME             NAMELSAD        
##  Length:2167        Length:2167        Length:2167       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##     MTFCC             FUNCSTAT             ALAND         
##  Length:2167        Length:2167        Min.   :       0  
##  Class :character   Class :character   1st Qu.:  156383  
##  Mode  :character   Mode  :character   Median :  184733  
##                                        Mean   :  360310  
##                                        3rd Qu.:  291132  
##                                        Max.   :18319107  
##      AWATER            INTPTLAT           INTPTLON        
##  Min.   :        0   Length:2167        Length:2167       
##  1st Qu.:        0   Class :character   Class :character  
##  Median :        0   Mode  :character   Mode  :character  
##  Mean   :   199280                                        
##  3rd Qu.:        0                                        
##  Max.   :121773132

Base R shapefile plots

You can take a quick look at these with the plot command from base R. You can visualize them with ggplot and leaflet as well, but we’ll hold off on that for a moment.

plot(nyc_tracts)

Neighborhoods

Census tracts are great, but they might be too detailed for many purposes, where neighborhoods might be a more familiar unit of analysis. The PUMA Census neighborhoods for New York City lump together a few distinct areas (e.g., Chelsea, Flatiron, and the West Village), but Pediacities has more fine-grained neighborhood shapefiles available as a GeoJSON file.

r <- GET('http://data.beta.nyc//dataset/0ff93d2d-90ba-457c-9f7e-39e47bf2ac5f/resource/35dd04fb-81b3-479b-a074-a27a37888ce7/download/d085e2f8d0b54d4590b1e7d1f35594c1pediacitiesnycneighborhoods.geojson')
nyc_neighborhoods <- readOGR(content(r,'text'), 'OGRGeoJSON', verbose = F)
## No encoding supplied: defaulting to UTF-8.
summary(nyc_neighborhoods)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x -74.25559 -73.70001
## y  40.49613  40.91553
## Is projected: FALSE 
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Data attributes:
##           neighborhood boroughCode          borough  
##  Jamaica Bay    : 16   1:38        Bronx        :75  
##  Pelham Islands : 14   2:74        Brooklyn     :71  
##  Broad Channel  : 10   3:67        Manhattan    :37  
##  Marine Park    :  3   4:77        Queens       :73  
##  Pelham Bay Park:  3   5:54        Staten Island:54  
##  Bayswater      :  2                                 
##  (Other)        :262                                 
##                                                                X.id    
##  http://nyc.pediacities.com/Resource/Neighborhood/Jamaica_Bay    : 16  
##  http://nyc.pediacities.com/Resource/Neighborhood/Pelham_Islands : 14  
##  http://nyc.pediacities.com/Resource/Neighborhood/Broad_Channel  : 10  
##  http://nyc.pediacities.com/Resource/Neighborhood/Marine_Park    :  3  
##  http://nyc.pediacities.com/Resource/Neighborhood/Pelham_Bay_Park:  3  
##  http://nyc.pediacities.com/Resource/Neighborhood/Bayswater      :  2  
##  (Other)                                                         :262

ggplot

ggplot can plot shapefiles but requires them to be converted from a spatial polygon data frame format to plain old data frames first. broom handles this with its tidy command. (fortify from the sp package also works.)

nyc_neighborhoods_df <- tidy(nyc_neighborhoods)
## Regions defined for each Polygons
ggplot() + 
  geom_polygon(data=nyc_neighborhoods_df, aes(x=long, y=lat, group=group))

ggmap

We can clean this up a bit and add tiles from Google maps with ggmap.

ggmap(nyc_map) + 
  geom_polygon(data=nyc_neighborhoods_df, aes(x=long, y=lat, group=group), color="blue", fill=NA)

leaflet

Or we can do an interactive version which shows neighborhood names when you click on a polygon.

leaflet(nyc_neighborhoods) %>%
  addTiles() %>% 
  addPolygons(popup = ~neighborhood) %>%
  addProviderTiles("CartoDB.Positron")

Spatial joins

There are also convenient functions for mapping lat/long pairs into polygons. This is useful if you want to know what neighborhood a certain point is in, for instance.

To see how this works, let’s first generate 10 random points in Manhattan.

set.seed(42)
lats <- 40.7544882 + rnorm(10)/100
lngs <- -73.9879923 + rnorm(10)/200
points <- data.frame(lat=lats, lng=lngs)
points
##         lat       lng
## 1  40.76820 -73.98147
## 2  40.74884 -73.97656
## 3  40.75812 -73.99494
## 4  40.76082 -73.98939
## 5  40.75853 -73.98866
## 6  40.75343 -73.98481
## 7  40.76960 -73.98941
## 8  40.75354 -74.00127
## 9  40.77467 -74.00019
## 10 40.75386 -73.98139

In order to do the join, we first need to convert the points data frame to a spatial data frame. The coordinates function specifies which columns should be used for positioning, and the proj4string function specifies what type of projection should be used. In this case we just want a projection that’s consistent with the neighborhood shapes, so we copy it over. Finally, we use the matches function to do the spatial join and bind the columns back together.

points_spdf <- points
coordinates(points_spdf) <- ~lng + lat
proj4string(points_spdf) <- proj4string(nyc_neighborhoods)
matches <- over(points_spdf, nyc_neighborhoods)
points <- cbind(points, matches)
points
##         lat       lng     neighborhood boroughCode   borough
## 1  40.76820 -73.98147     Central Park           1 Manhattan
## 2  40.74884 -73.97656      Murray Hill           1 Manhattan
## 3  40.75812 -73.99494   Hell's Kitchen           1 Manhattan
## 4  40.76082 -73.98939   Hell's Kitchen           1 Manhattan
## 5  40.75853 -73.98866 Theater District           1 Manhattan
## 6  40.75343 -73.98481          Midtown           1 Manhattan
## 7  40.76960 -73.98941   Hell's Kitchen           1 Manhattan
## 8  40.75354 -74.00127          Chelsea           1 Manhattan
## 9  40.77467 -74.00019             <NA>        <NA>      <NA>
## 10 40.75386 -73.98139          Midtown           1 Manhattan
##                                                                 X.id
## 1      http://nyc.pediacities.com/Resource/Neighborhood/Central_Park
## 2       http://nyc.pediacities.com/Resource/Neighborhood/Murray_Hill
## 3    http://nyc.pediacities.com/Resource/Neighborhood/Hell_s_Kitchen
## 4    http://nyc.pediacities.com/Resource/Neighborhood/Hell_s_Kitchen
## 5  http://nyc.pediacities.com/Resource/Neighborhood/Theater_District
## 6           http://nyc.pediacities.com/Resource/Neighborhood/Midtown
## 7    http://nyc.pediacities.com/Resource/Neighborhood/Hell_s_Kitchen
## 8           http://nyc.pediacities.com/Resource/Neighborhood/Chelsea
## 9                                                               <NA>
## 10          http://nyc.pediacities.com/Resource/Neighborhood/Midtown

We can verify the matches by poking around on an interactive map and clicking the polygons and markers to see that proper neighborhoods have been assigned.

leaflet(nyc_neighborhoods) %>%
  addTiles() %>% 
  addPolygons(popup = ~neighborhood) %>% 
  addMarkers(~lng, ~lat, popup = ~neighborhood, data = points) %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(-73.98, 40.75, zoom = 13)

Finally, we can use other data to color our polygons. As an example, we’ll count the number of points in each polygon and map that to a color. The geo_join function makes it easy to join the spatial polygon data frame to a regular data frame. Here’s the plot with Leaflet.

points_by_neighborhood <- points %>%
  group_by(neighborhood) %>%
  summarize(num_points=n())

map_data <- geo_join(nyc_neighborhoods, points_by_neighborhood, "neighborhood", "neighborhood")

pal <- colorNumeric(palette = "RdBu",
                    domain = range(map_data@data$num_points, na.rm=T))

leaflet(map_data) %>%
  addTiles() %>% 
  addPolygons(fillColor = ~pal(num_points), popup = ~neighborhood) %>% 
  addMarkers(~lng, ~lat, popup = ~neighborhood, data = points) %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(-73.98, 40.75, zoom = 13)

We can do the same thing with ggplot and ggmap for a different look/feel, but we need to construct the underlying data frame in a slightly different way. First we use broom to convert the spatial data frame to a tidy data frame by neighborhood, then we join the number of points by neighborhood to the tidy data frame, and plot.

plot_data <- tidy(nyc_neighborhoods, region="neighborhood") %>%
  left_join(., points_by_neighborhood, by=c("id"="neighborhood")) %>%
  filter(!is.na(num_points))
## Warning in left_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
manhattan_map <- get_map(location = c(lon = -74.00, lat = 40.77), maptype = "terrain", zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=40.77,-74&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
ggmap(manhattan_map) + 
  geom_polygon(data=plot_data, aes(x=long, y=lat, group=group, fill=num_points), alpha=0.75)