Maps are a fun way to explore and learn spatial characteristics of data. With modern R packages, it is extremely easy to create and distribute these visualizations. One particular use of a map is to see how demographic/economic characteristics are spatially distributed across geographic regions. I was particularly interested in how median household income was spatially distributed across Northern Virginia, where I live. There is a wealth of demographic and economic data provided by the U.S. Census Bureau at very low geographies. My goal was to create a Leaflet map that allowed me to see the distribution via a color scale and to click on a Census Tract to see the numbers associated with that area. Below is a preview for you to try out before learning how to create the map.
I will go through the following steps to create my map:
Let’s get started by loading the necessary packages.
We will get our data from the United States Census Bureau API, in particular, the information rich American Community Survey (ACS). The American Community Survey is a massive household survey that collects demographic and housing information and produces very detailed population information. The dataset we will use is the ACS 2017 5-year dataset due to its richness. For more information you can go to the ACS website here.
To gain access to the ACS data, we will use the acs package to query and pull ACS statistics from the API. Before you can use the package, you must sign up for a key to use the API. Sign up for a Census API key here.
Once we have a key, we register the key with the acs package using the api.key.install package as below.
#Census API Key, I included mine so you can try it out
#but you can get one of your own at their website
api.key<-"You're API KEY HERE"
#Register Your API key with the package
api.key.install(key=api.key)Our next task will be to select the geographies we are interested in, for our purpose this will be Census Tracts. To find geographies, you can look google search for State/County FIPS codes, or you can use the geo.lookup function provided by the acs package. Below is an example of some of the counties/cities I am particularly interested in.
#First What Geographies do we want?
#Lets pull a few counties of interest from VA
geo.lookup(state="VA",county=c(13,59,510,600))## state state.name county county.name
## 1 51 Virginia NA <NA>
## 2 51 Virginia 13 Arlington County
## 3 51 Virginia 59 Fairfax County
## 4 51 Virginia 510 Alexandria city
## 5 51 Virginia 600 Fairfax city
Once I know what county FIPS I want, I can grab all Census Tracts within these counties/cities using the geo.make function with the list above and submitting the wildcard to the tract argument of the function. We will the returned object to the API to get information from the desired geographies.
#Create Geography Object you Want to Pull
my.geo <- geo.make(state="VA",county=c(13,59,510,600),tract="*")The ACS contains a dizzying number of tabulations. So much that it can be frustrating to find exactly what you want. I have found that the Census Reporter website provides the best interface for looking up what table numbers you want. You can search via keywords in the upper right hand corner of the website. Searching for Median Household Income yields that the table I am interested in is “B19013”. I also supply my geography object to only pull the areas I am interested in.
Furthermore, we want to prepare this data to be linked to our Spatial Polygon Dataframe. I create a GEOID variable to link to the spatial dataset. This variable is a concatenation of the state, county, and tract FIPS code. We also make a variable named “median_income” holding the estimate of median household income.
#Create an object that we will later merge to our spatial data
#NOTE: we need to pad the county FIPS code with 0's so it has width = 3
income_df<- data.frame( GEOID = paste0(income_data@geography$state,
str_pad(income_data@geography$county,
width=3,
side="left",
pad="0"),
income_data@geography$tract
),
median_income = as.numeric(income_data@estimate),
row.names=NULL)To get the shapes for our geographic areas, we will use the tigris package. The tigris package will connect to the Census Bureau’s website, this time however, we will use it to grab shape files containing the geographic boundaries and map information necessary for plotting. We will then merge the information we grabbed in the previous section to it.
We can grab the boundaries of interest using the tigris package’s tracts function. This function will take a state and a set of county FIPS codes as arguments and return a Spatial Polygons Data Frame with the shapes of all Census Tracts in the desired areas.
#download and create shapefile for Arlington, Fairfax and Alexandria
nova.shape <- tracts(state="VA",county=c("013","059","510","600") )We can preview our map simply using the plot function to view its contents
Now that we have our geography, we want to be able to explore the spatial distribution of our ACS data. To do this, we need to join our data to the Spatial Polygon Dataframe nova.shape. Contained within the Spatial Polygon Dataframe is a data.frame containing data about the polygons (tracts) we downloaded. We can preview this by using the @ sign to access this data.frame. We look at the head of the object below.
## STATEFP COUNTYFP TRACTCE GEOID NAME NAMELSAD
## 98 51 510 201801 51510201801 2018.01 Census Tract 2018.01
## 99 51 510 201204 51510201204 2012.04 Census Tract 2012.04
## 100 51 510 200102 51510200102 2001.02 Census Tract 2001.02
## 101 51 510 200105 51510200105 2001.05 Census Tract 2001.05
## 102 51 510 200201 51510200201 2002.01 Census Tract 2002.01
## 110 51 510 200104 51510200104 2001.04 Census Tract 2001.04
## MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
## 98 G5020 S 2803690 98412 +38.8240656 -077.0445733
## 99 G5020 S 486129 0 +38.8341132 -077.0559571
## 100 G5020 S 1121587 2390 +38.8336198 -077.1254615
## 101 G5020 S 188743 0 +38.8327451 -077.1134292
## 102 G5020 S 2178440 0 +38.8272503 -077.0986804
## 110 G5020 S 889826 13370 +38.8287066 -077.1212514
You can see that this data frame holds a lot of geographic information, the tigris dataset also gives us nuggets such as land and water area in the tract as well (“ALAND” and “AWATER” variables respectively). We want to link our income data to this data set. We can do this by left joining our income data by the “GEOID” variable.
While it’s nice to see the geographic boundaries, we want for the user to be able to explore the map. Leaflet maps are great for this. They are javascript based maps that allow the user to pan, zoom, and click on map entities for more information. Luckily for us, the leaflet package in R allows use to do this with ease.
To prepare for this, we need to make a color scale. I want dark colors to represent areas with higher median income, while lighting areas represent areas of lower median household income. The leaflet package provides us with the colorNumeric function that will create the gradient we want.
We also want to make the user to be able to click for more information. we do this by creating a snippet of HTML that will popup when the user clicks on an area for more information.
#Create "on-click" popup information for out map
# it simply will display the tract number and the
#median income
tract_popup <- paste0("<strong>Tract: </strong>",
nova.shape@data$TRACTCE,
"<br><strong>",
"Value",
": </strong>",
paste("$",format(nova.shape@data$median_income, big.mark=",",
trim=TRUE),sep=""))It’s finally the moment we have all been waiting for, it’s time to create out map!!!
#finally create map; view and zoom were set by trial and error
leaflet() %>% setView(lng=-77.2876,
lat=38.8041,
zoom = 10) %>%
#Add Map Tiles (the background map)
addProviderTiles("CartoDB.Positron") %>%
#Add our Census Tracts and color them
#using our palette
#Also inclue pop-ups we created above
addPolygons(data = nova.shape,weight=1,color="#000000",
fillColor=~pal((nova.shape@data$median_income)),
fillOpacity = .6,
popup=tract_popup) %>%
#finally include
addLegend("bottomleft",
pal=pal,
values =nova.shape@data$median_income,
title="Median Family Income",labFormat=labelFormat(prefix = "$"))We finally submit the above code and VOILA! we have the map below.
Not too bad right!