Crafting Just the Map You Need

Want to work with maps in R, but not sure how to get started? Maps are a great way to convey potentially dense information very quickly, and with a handful of tools you can create compelling maps in R to share with your stakeholders. In this post, we’ll cover the basics of where to find map files, how to work with them, and how to use R packages to create a map that shows geographic data.

Why Maps?

Maps are an idiom we’re all familiar with, so the consumer of your information can concentrate on what a map shows, rather than spending time figuring out a complex, novel data visualization. Maps are a fantastic way to convey information about things like your customers’ spending trends, the location of your users, or demographic information that might be useful to know before rolling out an outreach program. They’re especially useful for communicating data insights to non-data-science stakeholders who need to consume a lot of information quickly – stakeholders like lawmakers, members of the C-suite, and journalists.

Getting Started

Working with map files can seem daunting, especially for the beginner. Often, you can find map files (shapefiles, say, or geoJSON files) that could be useful, but don’t contain exactly the region you want to map.

Recently, for example, I was asked to help construct a local map for an organization that wanted to map stakeholder addresses to census tracts in the greater Philadelphia area and display that information in a map. It was easy for me to find ready-made map files for the county of Philadelphia, for the state of Pennsylvania, and for the state of New Jersey, but no ready-made maps for the 7-county area that makes up the Philadelphia commuter community. What to do? Combine and filter some map data to get the area I needed!

Here were my steps:

Get a shapefile of census tracts for Pennsylvania, and filter on just the counties I wanted (Philadelphia, Bucks, Delaware, and Montgomery)
Get a shapefile of census tracts for New Jersey, and filter on just the counties I wanted (Burlington, Camden, and Gloucester)
Combine those two shapefiles
Map the polygons of the census tracts
Map the points of the addresses.

Want to follow along? I’ve included the complete code that will allow you to reproduce what I did for my client.

Get Census Bureau Shapefiles

The Census Bureau has a number of “shapefiles” – a commonly used file type that includes not only geographic shapes (lines, points, and polygons), but supplementary data, such as the id of the state, county, and census tract. It can take a bit of digging to find the right file, but the Census Bureau is dedicated to the dissemination of this kind of data, and has a free API that you might want to check out.

Here, I’m going to download and unzip shapefile data for Pennsylvania:

download.file("https://www2.census.gov/geo/tiger/TIGER2017/TRACT/tl_2017_42_tract.zip", "tl_2017_42_tract.zip")
unzip("tl_2017_42_tract.zip", exdir = "tl_2017_42_tract")

We want to bring in the geographic data using the rgdal library and limit our selection to a few counties. County codes can be found at https://www2.census.gov/geo/docs/reference/codes/files/st42_pa_cou.txt

Philadelphia (101)
Bucks (017)
Montgomery (091)
Delaware (045)

library(rgdal)
pa <- readOGR(dsn = "tl_2017_42_tract", verbose = FALSE)
pa_counties <- pa[which(pa$COUNTYFP %in% c("017","045","091", "101")),]

Our pa_counties data is complex, but the most important part for us is in the @data portion. Let’s peek!

head(pa_counties@data)

	STATEFP	COUNTYFP	TRACTCE	GEOID	NAME	NAMELSAD	MTFCC	FUNCSTAT	ALAND	AWATER	INTPTLAT	INTPTLON
0	42	017	100102	42017100102	1001.02	Census Tract 1001.02	G5020	S	7608630	2509728	+40.0745595	-074.9405810
1	42	017	100103	42017100103	1001.03	Census Tract 1001.03	G5020	S	1572548	9817	+40.0687610	-074.9713318
2	42	091	201411	42091201411	2014.11	Census Tract 2014.11	G5020	S	2409072	35484	+40.1444849	-075.2140269
3	42	017	100104	42017100104	1001.04	Census Tract 1001.04	G5020	S	1966895	12199	+40.0918428	-074.9501748
4	42	017	100105	42017100105	1001.05	Census Tract 1001.05	G5020	S	2663937	0	+40.0821350	-074.9478227
5	42	017	100206	42017100206	1002.06	Census Tract 1002.06	G5020	S	4735309	0	+40.1347795	-074.9441709

You may have already noticed, but the state, county, and tract codes (the first few columns) are combined into the GEOID column, which uniquely identifies a single census tract from among the over 11 million census tracts in the United States. The reason why we want to use census tracts is that the Census Bureau has lots of information at the tract level – mean and median household income, the racial and age makeup of the tract, the level of unemployment, the number of renters versus homeowners, and more! It’s a great way for us to get some level of insight into what our customers, users, patients, or other people we’re mapping are like. Even if we don’t have specific information about them, we know what their neighborhood is like in aggregate.

We’re going to do the same thing for New Jersey – download shapefile data, unzip it, open the shapefile and limit our selection to a few counties.

Codes from https://www2.census.gov/geo/docs/reference/codes/files/st34_nj_cou.txt

Gloucester (015)
Burlington (005)
Camden (007)

download.file("https://www2.census.gov/geo/tiger/TIGER2017/TRACT/tl_2017_34_tract.zip", "tl_2017_34_tract.zip")
unzip("tl_2017_34_tract.zip", exdir = "tl_2017_34_tract")
nj <- readOGR(dsn="tl_2017_34_tract", verbose = FALSE)
nj_counties <- nj[which(nj$COUNTYFP %in% c("005","007","015")),]

The nj_counties geographic data is similar to pa_counties:

head(nj_counties@data)

	STATEFP	COUNTYFP	TRACTCE	GEOID	NAME	NAMELSAD	MTFCC	FUNCSTAT	ALAND	AWATER	INTPTLAT	INTPTLON
189	34	007	600400	34007600400	6004	Census Tract 6004	G5020	S	862459	0	+39.9329219	-075.1180881
191	34	007	600700	34007600700	6007	Census Tract 6007	G5020	S	629533	537888	+39.9560193	-075.1255342
192	34	007	600800	34007600800	6008	Census Tract 6008	G5020	S	1109780	260022	+39.9513992	-075.1138309
194	34	007	608303	34007608303	6083.03	Census Tract 6083.03	G5020	S	3780973	22719	+39.8120904	-075.0725353
252	34	007	608001	34007608001	6080.01	Census Tract 6080.01	G5020	S	1799724	0	+39.8319418	-075.0078629
253	34	007	609000	34007609000	6090	Census Tract 6090	G5020	S	4442823	4000	+39.7312887	-074.8799265

Create the Geographic Area We Care About

We can simply stack our two sets of geographic data, since they have the same fields, using rbind, to get a combined dataset that covers the greater Philadelphia area, including neighboring counties:

philly_metro <- rbind(nj_counties, pa_counties)

Sample Data

Let’s create some fabricated latitude and longitude data to represent a handful of geographic points. These could be store locations, customer addresses, crime locations, or any other geographic points of interest. You can easily “geocode” (translate to latitude and longitude) any street addresses you have using a Google Maps API (some cost associated once you reach 40,000 geocodes) or a Census API (free). For our purposes, let’s assume we’ve already done that and have our latitude and longitude coordinates.

point_locations <- data.frame(lat = c(39.9931, 39.7103, 39.8592, 39.9526, 40.4108, 39.9078, 40.2290),
                              lon=c(-74.7879, -75.1078, -75.0144, -75.1652, -75.2479, -75.3879, -75.3879))

Map that Jawn*

* In Philadelphia, ‘jawn’ is a multipurpose term meaning ‘thing’, ‘place’, ‘situation’, etc. Since we’re talking about a map of Philly, it seems fitting to slip in a bit of non-technical jargon!

You can choose to make a static map using ggplot2, if you’re a dedicated user of ggplot, and that’s often what I do. This map is somewhat large, however, and if we were using real data instead of fabricated data, there could be hundreds or even thousands of points that we want to map on top of our census tracts. So instead of a static map, I chose here to use a dynamic map that users can zoom in or out on, hover over to get more details, and so on. leaflet is a very easy to use package for a use case like this.

I’m going to have two layers to my map:

a polygon layer, where I map my census tracts, with helpful labels (see addPolygons), and
a point layer, where I put markers for each point I plotted, with helpful labels (see addMarkers)

Before I map those layers, I set the view of my map to be centered around the middle of my points, and I fiddle with the zoom level to get the right initial zoom into the map (see setView).

In a real project, I would merge in other data into philly_metro@data, such as socioeconomic or demographic data about the tract, and include those data in the labels, and maybe even color-code my census tract fill color to reflect some numeric value in the data.

library(leaflet)

leaflet(philly_metro) %>%
  setView(lng = mean(as.numeric(point_locations$lon), na.rm=TRUE), 
          lat = mean(as.numeric(point_locations$lat), na.rm=TRUE), zoom = 9) %>%
  addPolygons(fillColor = "white",
              fillOpacity = 1,
              weight = 1,
              opacity = 0.5,
              color = "black",
              label = paste(philly_metro$NAMELSAD, 
                            ", GEOID ", 
                            philly_metro$GEOID, 
                            sep=""),
              labelOptions = labelOptions(
                style = list("font-weight" = "normal", 
                             padding = "3px 8px"),
                textsize = "13px",
                direction = "auto")) %>%
  addMarkers(lng = point_locations$lon, 
             lat=point_locations$lat, 
             label = paste("Client at",
                           point_locations$lat, 
                           ",",
                           point_locations$lon))

As you can see, this map would be great to show in a presentation or include on a website, or even take a quick screenshot of if you were in a hurry and didn’t have time to do a static map. With just a pair of tools (rgdal and leaflet), we’re able to create a simple map that can provide mouseover events, lots of customization of look and feel, and a clean, easy to navigate final product.

Want to learn more? I’ll be at ODSC Boston in a few short weeks to lead a workshop in working with geographic data in R. I hope to see you there!

Crafting Just the Map You Need

Joy Payton

2/8/2019

Crafting Just the Map You Need

Why Maps?

Getting Started

Get Census Bureau Shapefiles

Create the Geographic Area We Care About

Sample Data

Map that Jawn*