Tutorial outputs

In this Tutorial you will create interactive maps centered on Kendall Square, Cambridge using 2011-2015 5-year ACS data.

Create a map showing the percent of workers who commute by transit.
Create a map showing the percent of “service” workers who commute by transit.
Put those maps side by side.

Map generation overview

There are three part to creating the webmap:

Fetch the needed Census shapefiles (from Tigris)
Fetch and processing the ACS data, and linking to the shapefiles
Use Leaflet to generate interactive maps.

You should make sure that you have the following packages installed: tigris, acs, leaflet, and mapview.

Load Packages

First load the packages using the library function.

library(acs)
library(tigris)
library(leaflet)
library(mapview)

API ‘Key’

You can use Jeff’s API key which is available in Stellar and on Dropbox. Or you can get your own (https://api.census.gov/data/key_signup.htm). If you publish your results on RPub, be sure to set {r, echo=FALSE} so people can’t see your key (like I did below, you can’t see my command installing the key.)

api.key.install(key="INSERT YOUR API KEY HERE")

Fetch a spatial shapefile containing the needed geography

countylist <- c('17','25')
shapefile <- tracts(state='25', county=countylist)

We can visualize the new shapefile we just fetched:

plot(shapefile)

Fetching ACS Data

Now that we have the shapefile for the census tracts, we need some data to associate with these tracts.

Much like the advanced search in Fact Finder (http://factfinder.census.gov), we need to deterine a handful of things before we can fetch the data.

Table and/or variables (this can be a full table or just variety of variables from different tables)
Endyear (the last year in the range you want)
Span (1, 3 or 5 year estimates)
Geography

Choose the Census table of interest

The best way to determine which table(s) you are interested in is to use censusreporter: https://censusreporter.org.

We want to map the percent of workers who commute by transit. At censusreporter.org, scroll down, and click on Commute. Scroll down until you find B08124: Means of Transportation to Work by Occupation and click on it. There are many “columns” in this data. Enter cambridge,ma into the place box to show some sample data.

We want the percent of workers who commute by transit. Scroll down. For Cambridge, 31.4% comute by Public transportation.

Toward the right, click on “Switch to Totals.” These are the values that are imported, not the percentages. So we need to calculate the percentage manually.

Looking at the censusreporter, it’s a little frustrating but the absolute total for Cambridge, what you would expect to be the absolute first line, is missing. You can find that number by clicking on “Download Data” and selecting .csv. Doing this you would find the total for Cambridge to be 62,260. Therefore, if we take the Public transportation number 19,578 and divide it by 62,260 we get the 31.4% number we found above.

So we fetch the table of interest and put the results into the variable called mytable.

mytable <- acs.lookup(endyear=2015, table.number="B08124")

Choose the data columns of interest from the table

Census tables usually have many “columns” of data. We can make a list of all the columns.

results(mytable)$variable.name

##  [1] "Total:"                                                                                                    
##  [2] "Management, business, science, and arts occupations"                                                       
##  [3] "Service occupations"                                                                                       
##  [4] "Sales and office occupations"                                                                              
##  [5] "Natural resources, construction, and maintenance occupations"                                              
##  [6] "Production, transportation, and material moving occupations"                                               
##  [7] "Military specific occupations"                                                                             
##  [8] "Car, truck, or van - drove alone:"                                                                         
##  [9] "Car, truck, or van - drove alone: Management, business, science, and arts occupations"                     
## [10] "Car, truck, or van - drove alone: Service occupations"                                                     
## [11] "Car, truck, or van - drove alone: Sales and office occupations"                                            
## [12] "Car, truck, or van - drove alone: Natural resources, construction, and maintenance occupations"            
## [13] "Car, truck, or van - drove alone: Production, transportation, and material moving occupations"             
## [14] "Car, truck, or van - drove alone: Military specific occupations"                                           
## [15] "Car, truck, or van - carpooled:"                                                                           
## [16] "Car, truck, or van - carpooled: Management, business, science, and arts occupations"                       
## [17] "Car, truck, or van - carpooled: Service occupations"                                                       
## [18] "Car, truck, or van - carpooled: Sales and office occupations"                                              
## [19] "Car, truck, or van - carpooled: Natural resources, construction, and maintenance occupations"              
## [20] "Car, truck, or van - carpooled: Production, transportation, and material moving occupations"               
## [21] "Car, truck, or van - carpooled: Military specific occupations"                                             
## [22] "Public transportation (excluding taxicab):"                                                                
## [23] "Public transportation (excluding taxicab): Management, business, science, and arts occupations"            
## [24] "Public transportation (excluding taxicab): Service occupations"                                            
## [25] "Public transportation (excluding taxicab): Sales and office occupations"                                   
## [26] "Public transportation (excluding taxicab): Natural resources, construction, and maintenance occupations"   
## [27] "Public transportation (excluding taxicab): Production, transportation, and material moving occupations"    
## [28] "Public transportation (excluding taxicab): Military specific occupations"                                  
## [29] "Walked:"                                                                                                   
## [30] "Walked: Management, business, science, and arts occupations"                                               
## [31] "Walked: Service occupations"                                                                               
## [32] "Walked: Sales and office occupations"                                                                      
## [33] "Walked: Natural resources, construction, and maintenance occupations"                                      
## [34] "Walked: Production, transportation, and material moving occupations"                                       
## [35] "Walked: Military specific occupations"                                                                     
## [36] "Taxicab, motorcycle, bicycle, or other means:"                                                             
## [37] "Taxicab, motorcycle, bicycle, or other means: Management, business, science, and arts occupations"         
## [38] "Taxicab, motorcycle, bicycle, or other means: Service occupations"                                         
## [39] "Taxicab, motorcycle, bicycle, or other means: Sales and office occupations"                                
## [40] "Taxicab, motorcycle, bicycle, or other means: Natural resources, construction, and maintenance occupations"
## [41] "Taxicab, motorcycle, bicycle, or other means: Production, transportation, and material moving occupations" 
## [42] "Taxicab, motorcycle, bicycle, or other means: Military specific occupations"                               
## [43] "Worked at home:"                                                                                           
## [44] "Worked at home: Management, business, science, and arts occupations"                                       
## [45] "Worked at home: Service occupations"                                                                       
## [46] "Worked at home: Sales and office occupations"                                                              
## [47] "Worked at home: Natural resources, construction, and maintenance occupations"                              
## [48] "Worked at home: Production, transportation, and material moving occupations"                               
## [49] "Worked at home: Military specific occupations"

We want column #1 which gives the absolute total and also column #22 which gives the total commute by Public transportation. Later we will calcualte the percent.

myvars <- mytable[1]+mytable[22]

Span

5 year estimates have the lowest margin of error, so we will specify that by putting the number 5 into the variable myspan for use later.

myspan <- 5

End Year

We will use the most recent data available from the API, which is 2015, so we put 2015 into the variable myendyear for use later.

myendyear <- 2015

Defining Geography

Now we set the geography that we want. For this, we will use the geo.make function. We want to pull all the census tracts in the county (unfortunately we can’t pull just for a city but we can filter after the fact).

We’ve already found the state cose for Massachusetts (25). Ane earlier on we definaed a variable countylist containing the two counties of interest (17 and 25). Small glitch: the geo.make function wants that list as numbers and not text versions of the numbers.

We use the “*" for tracts to denote ‘select all’. We will store this in the variable mygeo so we can use it later in our function that will fetch the data.

countylist2 <- as.numeric(countylist)
mygeo <- geo.make(state=25, county=countylist2, tract="*")

Putting it all together and fetch the data

Now we have all the parts to put together the fetch function. The line below will always be the same. What changes is the definition of the variables above.

The variable mydata is a complex object containing lots of information about the census table and the data for each census tract.

mydata <- acs.fetch(endyear=myendyear, span=myspan, geography=mygeo, variable=myvars)

Cleaning the Data

As with data when we download it directly from ACS, it comes a bit messy and needs to be cleaned up. It’s a lot easier in R than in excel.

First, we need to generate a standardized GEOID for the ACS data so it can be matched to the shapefile which already has a standardized GEOID. To do that, we need to paste together the state code, county code, and tract code. The paste0 function pastes things together (note: paste puts spaces between the things it pastes, while paste0 does not.) Everything you want pasted together goes inside the paste0 function separated by commas (e.g., paste0("27", "A", "0", "99") will return "27A099".)

The function as.character ensures that the state, county, and tract numbers are first turned into characters before concatenating them together

Remember when we mentioned that ACS does not have an extra zero before the county number? We need to put in that extra zero so that the GEOID code we generate here matches the GEOID codes in the shapefile.

acsgeoid <- paste0(as.character(mydata@geography$state),'0', as.character(mydata@geography$county), as.character(mydata@geography$tract))

Now you are ready to put the fetched data into a usable format for R. The best format is a dataframe object which is in essence a spreadsheet with columns (variables) and rows (data for each census tract). You only need two columns: the GEOID and the data itself (in this case the median income variable which is found in the estimate component of the fetched data, identified by mydata$estimate). Note if you chose more than one variable in the myvars step above, then each of those variables would be a new column in the dataframe.

Once we put the columns into a dataframe, you can rename the column names to make them more user-friendly. In this case, there is only one data column median income.

You can use the head function to show the first 6 rows in a dataframe.

mydatadf <- data.frame(acsgeoid, mydata@estimate)
colnames(mydatadf)=c("GEOID", "total", "transit")
head(mydatadf)

##                                                             GEOID total
## Census Tract 3001, Middlesex County, Massachusetts    25017300100  1674
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101  2069
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102  2842
## Census Tract 3101, Middlesex County, Massachusetts    25017310100  2516
## Census Tract 3102, Middlesex County, Massachusetts    25017310200  3427
## Census Tract 3103, Middlesex County, Massachusetts    25017310300  2680
##                                                       transit
## Census Tract 3001, Middlesex County, Massachusetts         14
## Census Tract 3011.01, Middlesex County, Massachusetts       0
## Census Tract 3011.02, Middlesex County, Massachusetts      21
## Census Tract 3101, Middlesex County, Massachusetts        256
## Census Tract 3102, Middlesex County, Massachusetts        133
## Census Tract 3103, Middlesex County, Massachusetts          0

Calculating the percent

The mydatadf dataframe now has a column for the total commuters and a column for the transit commuters. We need to calculate the percent, and make an additional column, which we will call “transitpercent”. We multiply the transit percent by 100 to turn it from a fraction.

mydatadf$transitpercent <- mydatadf$transit/mydatadf$total*100

Map the data using Leaflet

First, you need to join the ACS data with the shapefile. The geo_join function (from the ACS package) takes the shapefile, the fetched data (mydatadf) and the column names of each to match on (in this case we named them the same thing to make it more intuitive).

Join the data

mydatamerged <- geo_join(shapefile, mydatadf, "GEOID", "GEOID")

Prepare for mapping

We need to:

Assign our dataframe to a new variable for simplicity
Define a popup
Set a color pallete
Determine the LAT and LNG for the centerpoint for your map

Dataframe shortcut To make a shortcut here, we’re defining a new variable df and setting it to whatever you called your data, so that we don’t have to change every column reference.

df <- mydatamerged

Popup This defines how a popup window will appear when you click on a census tract on the map. Here, we will have the popup display the GEOID and the percent commute by transit. The <br> adds an extra carriage return to put the label on the second line of the popup. The round function will round the percent to the tenths place.

mypopup <- paste0("GEOID: ", df$GEOID, "<br>", "Commute by Public transit: ", round(df$transitpercent,1), "%")

Pallete This sets the variation/range of colors. This is similar to how in ArcGIS you would symbolize by an attribute. Here we specify the pallete YlGnBu (note that the second character is a lower-case L). To explore different palettes, you can go here: http://colorbrewer2.org. To find the one we are working with, in Multi-hue, bottom row, click on the fourth one from the left).

mypal <- colorNumeric(
  palette = "YlGnBu",
  domain = df$transitpercent
)

** LAT/LNG Centerpoint** You can make your map have a center point for the area of interest. For Kendall Square, you can determine the LAT and LNG: go to google maps in a web browser, type “Kendall Square, Cambridge”, right click on the red pin, select “What’s Here?” and the LAT and LNG will display. Then define those points below.

myLAT <- 42.363951
myLNG <- -71.090011
mycentername <- "Kendall Square"

Define the map using leaflet

Now we load and create the map and all its parts. For the legend, note that we added a % suffix. A zoom of 12 is a good starting point, but you can tweak that value.

mymap<-leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(myLNG, myLAT, zoom = 12) %>%
  addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
  addPolygons(data = df, 
              fillColor = ~mypal(transitpercent), 
              color = "#b2aeae", # you need to use hex colors
              fillOpacity = 0.7, 
              weight = 1, 
              smoothFactor = 0.2,
              popup = mypopup) %>%
  addLegend(pal = mypal, 
            values = df$transitpercent, 
            position = "bottomright", 
            title = "Commute by Public Transit",
            labFormat = labelFormat(suffix = "%"),
            bins=5)

Finally, we draw it!

mymap

EXTENSION: What percent of service workers commute by transit?

Looking back at the censusreporter.org for Cambridge, there are 5,654 service workers, and of them, 1,719 commute by transit. This is 30.4%.

Looking back at our column numbers, we want column #3 which gives the total number of service workers and also column #24 which gives the total service workers who commute by Public transportation. Later we will calculate the percent.

myvars <- mytable[3]+mytable[24]

Span

myspan <- 5

End Year

myendyear <- 2015

Defining Geography

countylist2 <- as.numeric(countylist)
mygeo <- geo.make(state=25, county=countylist2, tract="*")

Putting it all together and fetch the data

mydata <- acs.fetch(endyear=myendyear, span=myspan, geography=mygeo, variable=myvars)

Cleaning the Data

acsgeoid <- paste0(as.character(mydata@geography$state),'0', as.character(mydata@geography$county), as.character(mydata@geography$tract))

mydatadf <- data.frame(acsgeoid, mydata@estimate)
colnames(mydatadf)=c("GEOID", "total", "transit")
head(mydatadf)

##                                                             GEOID total
## Census Tract 3001, Middlesex County, Massachusetts    25017300100   228
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101   328
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102   474
## Census Tract 3101, Middlesex County, Massachusetts    25017310100   555
## Census Tract 3102, Middlesex County, Massachusetts    25017310200   697
## Census Tract 3103, Middlesex County, Massachusetts    25017310300   698
##                                                       transit
## Census Tract 3001, Middlesex County, Massachusetts          0
## Census Tract 3011.01, Middlesex County, Massachusetts       0
## Census Tract 3011.02, Middlesex County, Massachusetts       6
## Census Tract 3101, Middlesex County, Massachusetts         33
## Census Tract 3102, Middlesex County, Massachusetts         17
## Census Tract 3103, Middlesex County, Massachusetts          0

Calculating the percent

mydatadf$transitpercent <- mydatadf$transit/mydatadf$total*100

Map the data using Leaflet

Join the data

mydatamerged <- geo_join(shapefile, mydatadf, "GEOID", "GEOID")

Prepare for mapping

Dataframe shortcut

df <- mydatamerged

Popup

mypopup <- paste0("GEOID: ", df$GEOID, "<br>", "Commute by Public transit by service workers: ", round(df$transitpercent,1), "%")

Pallete

mypal <- colorNumeric(
  palette = "YlGnBu",
  domain = df$transitpercent
)

** LAT/LNG Centerpoint**

myLAT <- 42.363951
myLNG <- -71.090011
mycentername <- "Kendall Square"

Define the map using leaflet

mymap2<-leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(myLNG, myLAT, zoom = 12) %>%
  addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
  
  addPolygons(data = df, 
              fillColor = ~mypal(transitpercent), 
              color = "#b2aeae", # you need to use hex colors
              fillOpacity = 0.7, 
              weight = 1, 
              smoothFactor = 0.2,
              popup = mypopup) %>%
  addLegend(pal = mypal, 
            values = df$transitpercent, 
            position = "bottomright", 
            title = "Commute by Public Transit<br>by service workers",
            labFormat = labelFormat(suffix = "%"),
            bins=4)

Finally, we draw it!

mymap2

Side by side!

How about putting these two maps side by side? The sync function comes from the mapview library.

sync(mymap, mymap2)

Census Mapping Tutorial: Example 2

Mapping Percent of Workers who commute by transit

Laura Krull lkrull@mit.edu & Jeff Rosenblum jeffreyr@mit.edu

Updated March 19, 2018

Tutorial outputs

Map generation overview

Load Packages

API ‘Key’

Fetch a spatial shapefile containing the needed geography

Fetching ACS Data

Choose the Census table of interest

Choose the data columns of interest from the table

Span

End Year

Defining Geography

Putting it all together and fetch the data

Cleaning the Data

Calculating the percent

Map the data using Leaflet

Join the data

Prepare for mapping

Define the map using leaflet

Finally, we draw it!

EXTENSION: What percent of service workers commute by transit?

Span

End Year

Defining Geography

Putting it all together and fetch the data

Cleaning the Data

Calculating the percent

Map the data using Leaflet

Join the data

Prepare for mapping

Define the map using leaflet

Finally, we draw it!

Side by side!