I’m currently working on some vignettes to showcase how my new R package to use Census shapefiles, tigris, can fit within common geospatial workflows in R in preparation for a CRAN submission. A very common use case for Census data is visualization, to show how some attribute of interest varies geographically.

For this example I thought I’d prepare an interactive map of legislative districts for the Texas State House of Representatives (the lower house), shaded by the party of the representative, with a pop-up that shows a photo of the representative and links to the representative’s profile on the House website. I was sure that this sort of thing already existed, but as I couldn’t find it after some basic Googling, it doesn’t appear to be immediately accessible.

This isn’t the most complicated map in the world but it still might take some time to put together. Here’s one potential workflow:

  1. Find a legislative districts shapefile for Texas, unzip, and load into a GIS;
  2. Find a file on the web for Texas House members, clean it up in Excel, and load into a GIS;
  3. Join the House members data to the shape data in the GIS;
  4. Publish to a server, or export out as a zipped shapefile or GeoJSON for uploading to a hosting service;
  5. Build the web map with your favorite API (Leaflet, ArcGIS API for JS, D3, etc…)

This is certainly do-able but takes some time and requires multiple pieces of software to get working. Conversely, the workflow I’ll demonstrate can be completed entirely in RStudio.

To get started, I’ll load the required packages; my tigris package is not on CRAN but is installable from devtools::install_github('walkerke/tigris'). Please note that this package has a hard dependency on rgdal.

I’ll also need data on Texas House members. While it would be nice if this information lived in a clean .CSV file somewhere on the web, this isn’t often the case. As such, I’ll find the best-formatted HTML table of the data, which is available in this case from the Texas Tribune (which, IMO, is the best source of data journalism in Texas), and scrape the table with the rvest package.

To do this, I specify the XPath of the HTML table - I can do this by right-clicking the table in Chrome, selecting “Inspect element”, right-clicking the element corresponding to the table in the Element Inspector and choosing “Copy XPath”. For more information on how to do this, I recommend reading Cory Nissen’s post on the topic.

Now, I can get the table as an R data frame with the following code.

library(tigris)
library(rvest)
library(leaflet)
library(dplyr)
library(stringr)

url <- "https://www.texastribune.org/directory/"

xpath <- '//*[@id="texas_house_tab"]/div/div/table'

df <- url %>%
  html(encoding = "UTF-8") %>%
  html_nodes(xpath = xpath) %>%
  html_table(fill = TRUE) %>%
  data.frame()

df$id <- as.character(df$District) # String of House district for merging purposes

head(df)
##                       Name District          City Party  id
## 1           Allen, Alma A.      131       Houston     D 131
## 2       Alonzo, Roberto R.      104        Dallas     D 104
## 3          Alvarado, Carol      145       Houston     D 145
## 4           Anchia, Rafael      103        Dallas     D 103
## 5 Anderson, Charles\n"Doc"       56          Waco     R  56
## 6         Anderson, Rodney      105 Grand Prairie     R 105

I’ve got the data on House members; however, I need to figure out some way to get photos of the legislators. Fortunately, some acrobatics with rvest can get this done. While photos of the legislators are available from the Texas House website, they are not named in any consistent format. However, the Texas House website has profiles of all 150 representatives with the same HTML layout, a consistent URL scheme, and the same XPath to the representatives’ photos. As such, I can set up a for loop to walk through all the representatives’ profiles on the website and extract the names of their photos, then merge this information to my existing data frame.

img_path <- '//*[@id="wrapper"]/div[3]/img'

img_list <- list()

for (district in 1:150) {
  u1 <- paste0('http://www.house.state.tx.us/members/member-page/?district=', as.character(district))
  h1 <- html(u1)
  nodes <- html_nodes(h1, xpath = img_path) 
  src <- html_attr(nodes, "src") # Get the src element associated with the supplied XPath
  jpg <- str_sub(src, start = -8L) # All images are named "XXXX.jpg", so I extract the last 8 characters
  id <- as.character(district)
  pair <- data.frame(id, jpg, stringsAsFactors = FALSE)
  img_list[[district]] <- pair
}

img_df <- bind_rows(img_list)

names(img_df) <- c("id", "jpg")

df <- left_join(df, img_df, by = "id")

Now, I need a geographic dataset to which I can merge the data. This is where the tigris package comes in. Tigris doesn’t support all Census geographies yet, but it does give access to quite a few and simplifies the process of loading Census geography into R into one line of code. First, I need to figure out the FIPS code for Texas, which can be accomplished with tigris’s lookup_code function:

lookup_code("Texas")
## [1] "The code for Texas is '48'."

The FIPS code is 48, so I can plug that into the state_legislative_districts function to get House districts. I supply the argument "lower" to the house parameter to specify that I want districts for the House, not Senate, and I set detailed to FALSE to specify that I want a simplified dataset. I then can merge the data on house members to my downloaded geographic dataset with tigris’s geo_join function, which is a wrapper for match to help merge data frames to spatial data frames.

districts <- state_legislative_districts("48", house = "lower", detailed = FALSE)

txlege <- geo_join(districts, df, "NAME", "id")

My data are now ready for interactive mapping with the leaflet package. I’ve written elsewhere (check my RPubs) about how to create Leaflet maps in R, and I highly encourage you to check out the package documentation, which is quite helpful. Note the rep_url variable, which contains a custom URL for the representative’s Texas House page and in turn is included in the district’s pop-up, and the way I use the jpg column to show the photo in the pop-up as well.

pal <- colorFactor(c("blue", "red"), txlege$Party) # Blue for Democrats, red for Republicans

rep_url <- paste0('http://www.house.state.tx.us/members/member-page/?district=', txlege$id)

popup <- paste(sep = "<br/>", 
               paste0("<img src='http://www.house.state.tx.us/photos/members/", txlege$jpg, "' />"), 
               paste0("<b>Representative: </b>", txlege$Name), 
               paste0("<b>District: </b>", txlege$id), 
               paste0("<a href='", rep_url, "'>Link to website</a>"))

leaflet(txlege) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(fillColor = ~pal(Party), 
              color = "black", 
              popup = popup, 
              fillOpacity = 0.5, 
              weight = 0.5) %>%
  addLegend(position = "bottomleft",
            colors = c("blue", "red"), 
            labels = c("Democrat", "Republican"), 
            opacity = 1, 
            title = "Texas House of Representatives")

I’ve now accomplished what before was a multi-step, multi-software workflow entirely in R. The map itself shows expected patterns – urban areas and the border regions supporting Democrats, suburbs and rural areas supporting Republicans – though I was a bit surprised as to how clear the geographical distinctions are. Of course, legislative districts commonly have peculiar boundaries, like in my city of Fort Worth. For example, District 90 (shown in the image) stretches into west Fort Worth to include Como, a historically African-American neighborhood surrounded by areas with relatively small African-American populations. These decisions around boundary-making are often intensely political - just read this transcript of discussions around redistricting and the Como neighborhood from 2011 and search for “Como”.

como

While the rvest and leaflet packages do the heavy lifting here, my tigris package plays a key role and fits in nicely with this particular R visualization workflow. Further, this example just scratches the surface, as it could be wrapped in a full-fledged web application using Shiny with many more features. If you have any comments or questions, please let me know!

Many thanks to: