Map generation overview

There are three part to creating the map:

  1. Fetch the needed Census shapefiles (from Tigris)
  2. Fetch and process the ACS data
  3. Link the ACS data to the shapefile
  4. Use Leaflet to generate interactive maps.

Make sure that you have the following packages installed: tigris, acs, leaflet, and mapview.

##Load Packages First load the packages using the library function.

library(tigris)
library(acs)
library(leaflet)
library(mapview)
library(ggplot2)
library(maptools)
library(stringr)
library(rgdal)

Fetch a spatial shapefile containing the needed geography

Census data is provided at different geographic scales. Here is a good explanation of them. https://guides.lib.umich.edu/c.php?g=283004&p=1885629#:~:text=U.S.%20Census%20and%20Demographic%20Information,-A%20general%20guide&text=The%20Census%20uses%20its%20own,%2C%22%20which%20divides%20up%20counties.&text=For%20a%20full%20explanation%20of,its%20description%20on%20American%20Factfinder.

For our analysis, We want to limit the map to only include the geography needed in order to make it more manageable. We will limit by the relevant counties in the Boston area.

Now we limit to only the counties around Boston. The counties we will include are Suffolk (25), Norfolk (21), and Middlesex (17), and Essex (09). A list of the counties by state can be found here. https://en.wikipedia.org/wiki/List_of_United_States_FIPS_codes_by_county

We want the shapefile at the census tract level, so we use the tigris command tracts.

countylist <- c('17','21','25', '09')
shapefile <- tracts(state='MA', county=countylist, year=2018, class='sp')
## Warning in proj4string(obj): CRS object has comment, which is lost in output

Fetching ACS Data

Now that we have the shapefile for the census tracts, we need some data to associate with these tracts.

Much like the advanced search in Fact Finder (http://factfinder.census.gov), we need to determine a handful of things before we can fetch the data.

Choose the Census table of interest

https://censusreporter.org/

We want to map the percent of workers who commute by public transit. At censusreporter.org, scroll down, and under Topics click on Commute. Scroll down to the Means of Transportation to Work section and then look for B08301: Means of Transportation to Work and click on it. Enter cambridge,ma into the place box to show results for the city of Cambridge as a whole. Here (for 1-year data from 2018), 25.6% of Cambridge residents commute by public transit. We will download more granular data, at the census tract level.

Toward the right, click on “Switch to Totals.” Unfortunately, the imported data only comes with the totals, we will need to calculate the percentages manually.

We fetch the table of interest and put the results into the variable called mytable.

Choose the span

5 year estimates have the lowest margin of error, so we will specify that by putting the number 5 into the variable myspan for use later.

Choose the end year

We will use the most recent data available from the API, which is 2018, so we put 2018 into the variable myendyear for use later.

Limit the geography for the data download

Now we set the geography that we want to minimize the amount of data we want to download. For this, we will use the geo.make function. We want to pull all the census tracts in the counties we are interested in. The state code for Massachusetts is 25. Earlier, we defined a variable countylist containing the three counties of interest. Small glitch: the geo.make function wants that list as numbers and not text versions of the numbers, so we need to convert it. We want data for all the census tracts in these countires, so we use the wildcard "*" to denote ‘select all’.

# Span
myspan <- 5

# End year
myendyear <- 2018

# Which table
mytable <- acs.lookup(endyear=2018, table.number="B08301")
## Warning in acs.lookup(endyear = 2018, table.number = "B08301"): acs.lookup for endyear>2015: using 2015 variable codes to access 2018 data.
##   (See ?acs.lookup for details)
## Warning in acs.lookup(endyear = 2018, table.number = "B08301"): temporarily downloading and using archived XML variable lookup files;
##   since this is *much* slower, recommend running
##   acs.tables.install()
# Limit the geography
# Convert the list of counties that is in text form to numeric form
countylist2 <- as.numeric(countylist)
# Define the geography to limit the data import
mygeo <- geo.make(state=25, county=countylist2, tract="*")

** Choose the data columns of interest from the table **

Census tables usually have many “columns” of data.

We want column #1 which gives the absolute total and also column #10 which gives the total commute by Public transportation. Later we will calculate the percent. We indicate that we want those two columns by putting them into the variable myvars.

myvars <- mytable[1]+mytable[10] 

Putting it all together and fetch the data

Now we have all the parts to put together the fetch function and put the results into the variable mydata. The line below will always be the same. What changes is the definition of the variables above.

mydata <- acs.fetch(endyear=myendyear, span=myspan, geography=mygeo, variable=myvars)

Cleaning the Data

There are a few cleaning items It’s a lot easier in R than in excel.

First, we need to generate a standardized GEOID for the ACS data so it can be matched to the shapefile which already has a standardized GEOID. To do that, we need to paste together the state code, county code, and tract code. The paste0 function pastes things together (note: paste puts spaces between the things it pastes, while paste0 does not.) Everything you want pasted together goes inside the paste0 function separated by commas (e.g., paste0("27", "A", "0", "99") will return "27A099".)

The function as.character ensures that the state, county, and tract numbers are first turned into characters before concatenating them together

Small problem. ACS does not pad the county numbers with preceeding zeros, so we need to add those extra zero so that the GEOID code we generate here matches the GEOID codes in the shapefile. We use the str_pad function from the stringer package.

mystate <- as.character(mydata@geography$state)

mycounty <- as.character(mydata@geography$county)

# add leading zeros to make the total number of characters 3
mycountypadded <- str_pad(mycounty, 3, pad="0")

mytract <- as.character(mydata@geography$tract)

acsgeoid <- paste0(mystate,mycountypadded,mytract)

Now you are ready to put the fetched data into a usable format for R. The best format is a dataframe object which is in essence a spreadsheet with columns (variables) and rows (data for each census tract).

# Put the dataset into a dataframe format
mydatadf <- data.frame(acsgeoid, mydata@estimate)

Note that the columns are ones with bold headers, that first item that has a long text description of the census tract is actually not a column, but the row number– it would be like changing the actual row number in excel. A little confusing I realize.

So the first column, acsgeoid is the the identifier for the census tract, the second column is the total commuters from that census tract, and the third is the number of transit commuters.

Let’s rename the columns to make them more user-friendly. We choose GEOID specifically because we are going to want to use this column to create a link to the shapefile.

# Rename column names
colnames(mydatadf)=c("GEOID", "total", "transit")

Calculating the percent

The mydatadf dataframe now has a column for the total commuters and a column for the transit commuters. We need to calculate the percent, and make an additional column for that. We will call it “transitpercent”. We multiply the transit percent by 100 to turn it from a fraction into the percentage.

In R, using a $ after a dataframe variable indicates what column you are referring to. In this case, there isn’t a column that already exists named transitpercent, so R knows to create a new one. R knows to do this operation for each row in the dataframe. In Excel, it would be like using a function in a new column and then copying that function for all rows.

# Create new column and calculate the percent commute trips by transit
mydatadf$transitpercent <- mydatadf$transit/mydatadf$total*100

Join the data

You need to join the ACS data with the shapefile. The geo_join function (from the ACS package) merges the shapefile (variable: shapefile) with the fetched data (variable: mydatadf). In other words, we want the value of percent commute trips by transit for each census tract to be associated with the respective shape. The merged data is put into the variable mydatamerged.

We need to tell the function which columns are to be used to match on. In this case we named them the same thing to make it more intuitive.

# Merge the ACS data with the shapefile
mydatamerged <- geo_join(shapefile, mydatadf, "GEOID", "GEOID")

# There is some weirdness with some tracts showing up as 100%, so eliminate those by subsetting the results to excluse them.
mydatamerged <- subset(mydatamerged, mydatamerged@data$transitpercent<100)

Creating the map

Popup This defines how a popup window will appear when you click on a census tract on the map. We will have the popup display the GEOID and the percent commute by transit. The <br> adds an extra carriage return to put the label on the second line of the popup. The round function will round the percent to the tenths place.

mypopup <- paste0("GEOID: ", mydatamerged$GEOID, "<br>", "Commute by Public transit: ", round(mydatamerged$transitpercent,1), "%")

Palette This sets the variation/range of colors. This is similar to how in ArcGIS you would symbolize by an attribute. Here we specify the pallete YlGnBu (note that the second character is a lower-case L). To explore different palettes, you can go here: http://colorbrewer2.org. To find the one we are working with, in Multi-hue, bottom row, click on the fourth one from the left.

mypal <- colorNumeric(
  palette = "YlGnBu",
  domain = NULL
)

LAT/LNG Centerpoint, zoom, and tiles

you need to determine a centerpoint latitude and longitude. To do that, use googlemaps, single click at a specific spot on the map and the lat,long values will appear on the bottom. Cut and paste those values below. I have used a point in Kendall Square. A zoom value of 12 is a good place to start.

You can choose the map background from a variety of sources. You can explore the options here: http://leaflet-extras.github.io/leaflet-providers/preview/. Move the map so you are over Boston. The default one you see is “OpenStreetMap.Mapnik”. Then on the right, you can Scroll to explore options. I chose “CartoDB.Positron” because it does not overly highlight roadways, making it easier to overlay the census data.

# Store values into variables
myLAT <- 42.376476
myLNG <- -71.085388
myZOOM <- 13
myTILES <- "CartoDB.Positron"

Define the map using leaflet

Now we load and create the map and all its parts.

We added a legend, note that we added a % suffix.

To change the color of the outlines of the shapes, you need to use a hex code for the color. You can choose colors here: https://www.w3schools.com/colors/colors_picker.asp

You can investigate the options for leaflet here: https://cran.r-project.org/web/packages/leaflet/leaflet.pdf. Search for “addPolygons” in that PDF.

Adding Transit Layers

# Import subway lines shape file
subway <- readOGR(dsn="transitlayers/mbtarapidtransit", layer="MBTA_ARC")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/callum/Desktop/Transportation Planning/transitlayers/mbtarapidtransit", layer: "MBTA_ARC"
## with 138 features
## It has 4 fields
subway <- spTransform(subway, CRS("+proj=longlat +ellps=GRS80"))

Adding color to the map

# Map colors to the different subway lines
pal <- colorFactor(palette = c('blue','green','orange','red','grey'),
                   domain = c('BLUE','GREEN','ORANGE','RED','SILVER'))

Adding stations to the map

# Define new MBTA icon
mbtaIcon <- makeIcon(
  iconUrl = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/MBTA.svg/240px-MBTA.svg.png",
  iconWidth = 15, iconHeight = 15
)

# Load the MBTA stops shapefile
stops <- readOGR(dsn="transitlayers/mbtarapidtransit", layer="MBTA_NODE")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/callum/Desktop/Transportation Planning/transitlayers/mbtarapidtransit", layer: "MBTA_NODE"
## with 166 features
## It has 4 fields
stops <- spTransform(stops, CRS("+proj=longlat +ellps=GRS80"))

Add commuter rail lines to the map

# Import rail lines
trains <- readOGR(dsn="transitlayers/trains", layer="TRAINS_ARC")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/callum/Desktop/Transportation Planning/transitlayers/trains", layer: "TRAINS_ARC"
## with 6595 features
## It has 28 fields
## Integer64 fields read as strings:  TRACK LINE_CODE ASSET_ID
trains <- spTransform(trains, CRS("+proj=longlat +ellps=GRS80"))

# Subset to only include commuter rail
trains <- subset(trains, COMMRAIL == "Y", na.rm=TRUE)

# Import rail stations
trainstops <- readOGR(dsn="transitlayers/trains", layer="TRAINS_NODE")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/callum/Desktop/Transportation Planning/transitlayers/trains", layer: "TRAINS_NODE"
## with 387 features
## It has 6 fields
trainstops <- spTransform(trainstops, CRS("+proj=longlat +ellps=GRS80"))
# Subset to only include commuter rail stations
trainstops <- subset(trainstops, trainstops@data$C_RAILSTAT == "Y", na.rm=TRUE)

# Make a purple circle icon for the commuter rail stations
crIcon <- makeIcon(
  iconUrl = "https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Eo_circle_purple_circle.svg/240px-Eo_circle_purple_circle.svg.png", iconWidth = 15, iconHeight = 15)

Add bus routes

# Import bus routes shapefile
bus <- readOGR(dsn="transitlayers/mbtabus", layer="MBTABUSROUTES_ARC")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/callum/Desktop/Transportation Planning/transitlayers/mbtabus", layer: "MBTABUSROUTES_ARC"
## with 923 features
## It has 11 fields
## Integer64 fields read as strings:  CTPS_ROU_2
bus <- spTransform(bus, CRS("+proj=longlat +ellps=GRS80"))

# Subset the bus routes into "key routes" and "non-key-routes"
keybuslist <- as.character(c(1,15,22,23,28,32,39,57,66,71,73,77,111,116,117))
bus.key <- subset(bus, MBTA_ROUTE %in% keybuslist, na.rm=TRUE)
bus.nokey <- subset(bus, !(MBTA_ROUTE %in% keybuslist), na.rm=TRUE)

Zoomed in Map of Brickbottom

mymap <- leaflet() %>%
  addProviderTiles(myTILES) %>%
  setView(myLNG, myLAT, zoom = 14) %>%
  addPolylines(data = bus.nokey, color="grey", weight=3) %>%
  addPolylines(data = bus.key, color="yellow", weight=4) %>%
  addPolylines(data = trains, color="purple", weight=5) %>%
  addPolylines(data = subway, color=~pal(LINE), weight=6) %>%
  addMarkers(data = stops, icon=mbtaIcon) %>%
  addMarkers(data = trainstops, icon=crIcon) %>%
  addPolygons(data = mydatamerged, 
              fillColor = ~mypal(transitpercent), # this makes the fill color on the spectrum of the palette
              color = "#b2aeae", # this is the color of the outline of the shapes
              fillOpacity = 0.7, # how see through the shapes are
              weight = 1, 
              smoothFactor = 0.2,
              popup = mypopup) %>%
  addLegend(pal = mypal, 
            values = mydatamerged$transitpercent, 
            position = "topright", 
            title = "Commute by Public Transit",
            labFormat = labelFormat(suffix = "%"),
            bins=5)

# Display the map
mymap