This is a short write-up to demonstrate one way to work with Zip Code Tabulation Areas (ZCTAs) in the tigris package under development by Bob Rudis and myself, which gives R users direct access to Census geographic data. It is in response to a Twitter question from Bryan Skelton that asked how to get a custom set of ZCTAs for a given area, like Memphis, TN in this instance.
One of the great things about tigris is that it gives R users access to (almost, we’re getting there) all of the Census TIGER/Line and cartographic shapefiles, which in turn facilitates analyses of how these geographies relate to one another.
ZCTAs are accessed in tigris with the zctas
function. Users can subset ZCTAs on load by supplying a character vector of prefixes to the starts_with
parameter; in turn, you can return those ZCTAs that begin with certain characters. ZCTAs in the Memphis area begin with 37
, 38
, or 72
, so I’ll ask tigris to limit my SpatialPolygonsDataFrame to these ZCTAs.
library(tigris)
library(sp)
library(rgeos)
library(leaflet)
# Get a SpatialPolygonsDataFrame of ZCTAs that start with 37, 38, or 72
df1 <- zctas(detailed = FALSE, starts_with = c("37", "38", "72"))
plot(df1)
That covers the area, but is way more extensive than is necessary. Certainly, if you had more information about the specific ZCTAs you need, you could pass this to the function and you’d be set. However, let’s say you don’t know exactly which ZCTAs are in the Memphis area. In this scenario, further subsetting will require additional data from tigris and overlay operations using the rgeos package. Let’s first find the boundary of the Memphis urbanized area. Note: this is different from the Memphis metropolitan area, which is county-based and accessible through the core_based_statistical_areas
function in tigris.
uas <- urban_areas(detailed = FALSE)
memphis_ua <- uas[grep("Memphis", uas$NAME10), ]
leaflet(memphis_ua) %>% addTiles() %>% addPolygons(fillOpacity = 0, color = "red")
Great - we’ve got the Memphis urbanized area. Now, let’s extract all those ZCTAs that intersect the urbanized area using the gIntersect
function from rgeos.
mem_zcta1 <- df1[as.vector(gIntersects(df1, memphis_ua, byid = TRUE)), ]
leaflet() %>%
addTiles() %>%
addPolygons(data = mem_zcta1) %>%
addPolygons(data = memphis_ua, fillOpacity = 0, color = "red")
The blue polygons represent the ZCTAs that intersect the Memphis urbanized area, and the red boundary represents the Memphis urbanized area itself. As we can see, there are several ZCTAs that touch the Memphis urbanized area boundary, but are mostly located outside the urbanized area. In turn, let’s retrieve only those ZCTAs that have their centroids in the urbanized area.
centroids <- gCentroid(df1, byid = TRUE)
mem_zcta2 <- df1[as.vector(gContains(memphis_ua, centroids, byid = TRUE)), ]
leaflet() %>%
addTiles() %>%
addPolygons(data = mem_zcta2) %>%
addPolygons(data = memphis_ua, fillOpacity = 0, color = "red")
This certainly subsets the data, but also leaves out some areas that are clearly in the Memphis urbanized area, as we can see from the OpenStreetMap tileset. These neighborhoods are part of larger ZCTAs that extend beyond the urbanized area and in turn have their centroids outside of that area.
Is there a “better” solution among the two, then? It all depends on what problem you want to solve - and it illustrates the importance of pairing technical know-how with domain expertise and knowledge of the Memphis area in this instance. This is especially true when it comes to geographic overlay, which has many, many different options.