Tutorial outputs

In this Tutorial you will create interactive maps centered on Kendall Square, Cambridge using 2006-2010 and 2012-2016 5-year ACS data.

  1. Create a map showing the percent change of workers who commute by transit over time.
  2. Create a map showing the percent change of “service” workers who commute by transit over time.
  3. Put those maps side by side.

Map generation overview

There are three part to creating the webmap:

  1. Fetch the needed Census shapefiles (from Tigris)
  2. Fetch and processing the ACS data, and linking to the shapefiles
  3. Use Leaflet to generate interactive maps.

You should make sure that you have the following packages installed: tigris, acs, leaflet, and mapview.

Load Packages

First load the packages using the library function.

library(acs)
library(tigris)
library(leaflet)
library(mapview)

API ‘Key’

You can use Jeff’s API key which is available in Stellar and on Dropbox. Or you can get your own (https://api.census.gov/data/key_signup.htm). If you publish your results on RPub, be sure to set {r, echo=FALSE} so people can’t see your key (like I did below, you can’t see my command installing the key.)

api.key.install(key="INSERT YOUR API KEY HERE")

Fetch a spatial shapefile containing the needed geography

countylist <- c('17','25')
shapefile <- tracts(state='25', county=countylist)

We can visualize the new shapefile we just fetched:

plot(shapefile)

Fetching ACS Data

Now that we have the shapefile for the census tracts, we need the data to associate with these tracts.

Much like the advanced search in Fact Finder (http://factfinder.census.gov), we need to deterine a handful of things before we can fetch the data.

More about the different “spans”: https://www.census.gov/programs-surveys/acs/guidance/estimates.html

More about which years and spans of data are available https://www.census.gov/data/developers/data-sets.html

Note: Do not compare overlapping spans https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data.html

More info on comparing 5-year spans: https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data/2015/5-year-comparison.html

More about the different geographies: https://www.census.gov/programs-surveys/acs/geography-acs/concepts-definitions.html

Choose the Census table of interest

The best way to determine which table(s) you are interested in is to use censusreporter: https://censusreporter.org.

We want to map the percent of workers who commute by transit. At censusreporter.org, scroll down, and click on Commute. Scroll down until you find B08124: Means of Transportation to Work by Occupation and click on it. There are many “columns” in this data. Enter cambridge,ma into the place box to show some sample data.

We want the percent of workers who commute by transit. Scroll down. For Cambridge, 31.4% comute by Public transportation.

Toward the right, click on “Switch to Totals.” These are the values that are imported, not the percentages. So we need to calculate the percentage manually.

Looking at the censusreporter, it’s a little frustrating but the absolute total for Cambridge, what you would expect to be the absolute first line, is missing. You can find that number by clicking on “Download Data” and selecting .csv. Doing this you would find the total for Cambridge to be 62,260. Therefore, if we take the Public transportation number 19,578 and divide it by 62,260 we get the 31.4% number we found above.

So we fetch the table of interest and put the results into the variable called mytable.

mytable <- acs.lookup(endyear=2015, table.number="B08124")

Choose the data columns of interest from the table

Census tables usually have many “columns” of data. We can make a list of all the columns.

results(mytable)$variable.name
##  [1] "Total:"                                                                                                    
##  [2] "Management, business, science, and arts occupations"                                                       
##  [3] "Service occupations"                                                                                       
##  [4] "Sales and office occupations"                                                                              
##  [5] "Natural resources, construction, and maintenance occupations"                                              
##  [6] "Production, transportation, and material moving occupations"                                               
##  [7] "Military specific occupations"                                                                             
##  [8] "Car, truck, or van - drove alone:"                                                                         
##  [9] "Car, truck, or van - drove alone: Management, business, science, and arts occupations"                     
## [10] "Car, truck, or van - drove alone: Service occupations"                                                     
## [11] "Car, truck, or van - drove alone: Sales and office occupations"                                            
## [12] "Car, truck, or van - drove alone: Natural resources, construction, and maintenance occupations"            
## [13] "Car, truck, or van - drove alone: Production, transportation, and material moving occupations"             
## [14] "Car, truck, or van - drove alone: Military specific occupations"                                           
## [15] "Car, truck, or van - carpooled:"                                                                           
## [16] "Car, truck, or van - carpooled: Management, business, science, and arts occupations"                       
## [17] "Car, truck, or van - carpooled: Service occupations"                                                       
## [18] "Car, truck, or van - carpooled: Sales and office occupations"                                              
## [19] "Car, truck, or van - carpooled: Natural resources, construction, and maintenance occupations"              
## [20] "Car, truck, or van - carpooled: Production, transportation, and material moving occupations"               
## [21] "Car, truck, or van - carpooled: Military specific occupations"                                             
## [22] "Public transportation (excluding taxicab):"                                                                
## [23] "Public transportation (excluding taxicab): Management, business, science, and arts occupations"            
## [24] "Public transportation (excluding taxicab): Service occupations"                                            
## [25] "Public transportation (excluding taxicab): Sales and office occupations"                                   
## [26] "Public transportation (excluding taxicab): Natural resources, construction, and maintenance occupations"   
## [27] "Public transportation (excluding taxicab): Production, transportation, and material moving occupations"    
## [28] "Public transportation (excluding taxicab): Military specific occupations"                                  
## [29] "Walked:"                                                                                                   
## [30] "Walked: Management, business, science, and arts occupations"                                               
## [31] "Walked: Service occupations"                                                                               
## [32] "Walked: Sales and office occupations"                                                                      
## [33] "Walked: Natural resources, construction, and maintenance occupations"                                      
## [34] "Walked: Production, transportation, and material moving occupations"                                       
## [35] "Walked: Military specific occupations"                                                                     
## [36] "Taxicab, motorcycle, bicycle, or other means:"                                                             
## [37] "Taxicab, motorcycle, bicycle, or other means: Management, business, science, and arts occupations"         
## [38] "Taxicab, motorcycle, bicycle, or other means: Service occupations"                                         
## [39] "Taxicab, motorcycle, bicycle, or other means: Sales and office occupations"                                
## [40] "Taxicab, motorcycle, bicycle, or other means: Natural resources, construction, and maintenance occupations"
## [41] "Taxicab, motorcycle, bicycle, or other means: Production, transportation, and material moving occupations" 
## [42] "Taxicab, motorcycle, bicycle, or other means: Military specific occupations"                               
## [43] "Worked at home:"                                                                                           
## [44] "Worked at home: Management, business, science, and arts occupations"                                       
## [45] "Worked at home: Service occupations"                                                                       
## [46] "Worked at home: Sales and office occupations"                                                              
## [47] "Worked at home: Natural resources, construction, and maintenance occupations"                              
## [48] "Worked at home: Production, transportation, and material moving occupations"                               
## [49] "Worked at home: Military specific occupations"

We want column #1 which gives the absolute total and also column #22 which gives the total commute by Public transportation. Later we will calcualte the percent.

myvars <- mytable[1]+mytable[22] 

Span

5 year estimates have the lowest margin of error, so we will specify that by putting the number 5 into the variable myspan for use later.

myspan <- 5

End Year

We will want two data sets, one ending in 2010 and one ending in 2016.

myendyear1 <- 2010
myendyear2 <- 2015

Defining Geography

Now we set the geography that we want. For this, we will use the geo.make function. We want to pull all the census tracts in the county (unfortunately we can’t pull just for a city but we can filter after the fact).

We’ve already found the state cose for Massachusetts (25). Ane earlier on we definaed a variable countylist containing the two counties of interest (17 and 25). Small glitch: the geo.make function wants that list as numbers and not text versions of the numbers.

We use the “*" for tracts to denote ‘select all’. We will store this in the variable mygeo so we can use it later in our function that will fetch the data.

countylist2 <- as.numeric(countylist)
mygeo <- geo.make(state=25, county=countylist2, tract="*")

Putting it all together and fetch the data

Now we have all the parts to put together the fetch function. The line below will always be the same. What changes is the definition of the variables above.

The variables mydata1 and mydata2 are complex objects containing lots of information about the census table and the data for each census tract. the first contains the 5-year data ending 2010, and the second 5-year data ending 2016.

mydata1 <- acs.fetch(endyear=myendyear1, span=myspan, geography=mygeo, variable=myvars)
mydata2 <- acs.fetch(endyear=myendyear2, span=myspan, geography=mygeo, variable=myvars)

Cleaning the Data

As with data when we download it directly from ACS, it comes a bit messy and needs to be cleaned up. It’s a lot easier in R than in excel.

First, we need to generate a standardized GEOID for the ACS data so it can be matched to the shapefile which already has a standardized GEOID. To do that, we need to paste together the state code, county code, and tract code. The paste0 function pastes things together (note: paste puts spaces between the things it pastes, while paste0 does not.) Everything you want pasted together goes inside the paste0 function separated by commas (e.g., paste0("27", "A", "0", "99") will return "27A099".)

The function as.character ensures that the state, county, and tract numbers are first turned into characters before concatenating them together

Remember when we mentioned that ACS does not have an extra zero before the county number? We need to put in that extra zero so that the GEOID code we generate here matches the GEOID codes in the shapefile.

acsgeoid1 <- paste0(as.character(mydata1@geography$state),'0', as.character(mydata1@geography$county), as.character(mydata1@geography$tract))
acsgeoid2 <- paste0(as.character(mydata2@geography$state),'0', as.character(mydata2@geography$county), as.character(mydata2@geography$tract))

Now you are ready to put the fetched data into a usable format for R. The best format is a dataframe object which is in essence a spreadsheet with columns (variables) and rows (data for each census tract). You only need two columns: the GEOID and the data itself (in this case the median income variable which is found in the estimate component of the fetched data, identified by mydata$estimate). Note if you chose more than one variable in the myvars step above, then each of those variables would be a new column in the dataframe.

Once we put the columns into a dataframe, you can rename the column names to make them more user-friendly. In this case, there is only one data column median income.

You can use the head function to show the first 6 rows in a dataframe.

mydatadf1 <- data.frame(acsgeoid1, mydata1@estimate)
colnames(mydatadf1)=c("GEOID", "total", "transit")
head(mydatadf1)
##                                                             GEOID total
## Census Tract 3001, Middlesex County, Massachusetts    25017300100  1603
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101  1831
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102  2668
## Census Tract 3101, Middlesex County, Massachusetts    25017310100  1868
## Census Tract 3102, Middlesex County, Massachusetts    25017310200  3320
## Census Tract 3103, Middlesex County, Massachusetts    25017310300  2903
##                                                       transit
## Census Tract 3001, Middlesex County, Massachusetts         14
## Census Tract 3011.01, Middlesex County, Massachusetts       0
## Census Tract 3011.02, Middlesex County, Massachusetts      23
## Census Tract 3101, Middlesex County, Massachusetts         92
## Census Tract 3102, Middlesex County, Massachusetts         24
## Census Tract 3103, Middlesex County, Massachusetts         27
mydatadf2 <- data.frame(acsgeoid2, mydata2@estimate)
colnames(mydatadf2)=c("GEOID", "total", "transit")
head(mydatadf2)
##                                                             GEOID total
## Census Tract 3001, Middlesex County, Massachusetts    25017300100  1674
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101  2069
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102  2842
## Census Tract 3101, Middlesex County, Massachusetts    25017310100  2516
## Census Tract 3102, Middlesex County, Massachusetts    25017310200  3427
## Census Tract 3103, Middlesex County, Massachusetts    25017310300  2680
##                                                       transit
## Census Tract 3001, Middlesex County, Massachusetts         14
## Census Tract 3011.01, Middlesex County, Massachusetts       0
## Census Tract 3011.02, Middlesex County, Massachusetts      21
## Census Tract 3101, Middlesex County, Massachusetts        256
## Census Tract 3102, Middlesex County, Massachusetts        133
## Census Tract 3103, Middlesex County, Massachusetts          0

Calculating the percent of transit trips

The mydatadf dataframe now has a column for the total commuters and a column for the transit commuters. We need to calculate the percent, and make an additional column, which we will call “transitpercent”. We multiply the transit percent by 100 to turn it from a fraction.

mydatadf1$transitpercent <- mydatadf1$transit/mydatadf1$total*100
mydatadf2$transitpercent <- mydatadf2$transit/mydatadf2$total*100

Also, calculate the change in percent over time

First, we need to merge the two datasets so we can then calculate the delta in percent over time. The merge function does that: merge(dataframe#1, dataframe#2, by = "column name to match entries", incomparables = NA). We can use the head function to take a look at the new dataframe. The transitpercent.x is the percent transit from the first dataframe 2006-2010, and transitpercent.y is from the second 2012-2016. We then make a new column of the difference between the two.

mydatadf3 <- merge(mydatadf1, mydatadf2, by = "GEOID", incomparables = NA)
head(mydatadf3)
##         GEOID total.x transit.x transitpercent.x total.y transit.y
## 1 25017300100    1603        14        0.8733624    1674        14
## 2 25017301101    1831         0        0.0000000    2069         0
## 3 25017301102    2668        23        0.8620690    2842        21
## 4 25017310100    1868        92        4.9250535    2516       256
## 5 25017310200    3320        24        0.7228916    3427       133
## 6 25017310300    2903        27        0.9300723    2680         0
##   transitpercent.y
## 1        0.8363202
## 2        0.0000000
## 3        0.7389163
## 4       10.1748808
## 5        3.8809454
## 6        0.0000000
mydatadf3$deltapercent <- mydatadf3$transitpercent.y-mydatadf3$transitpercent.x
head(mydatadf3)
##         GEOID total.x transit.x transitpercent.x total.y transit.y
## 1 25017300100    1603        14        0.8733624    1674        14
## 2 25017301101    1831         0        0.0000000    2069         0
## 3 25017301102    2668        23        0.8620690    2842        21
## 4 25017310100    1868        92        4.9250535    2516       256
## 5 25017310200    3320        24        0.7228916    3427       133
## 6 25017310300    2903        27        0.9300723    2680         0
##   transitpercent.y deltapercent
## 1        0.8363202  -0.03704225
## 2        0.0000000   0.00000000
## 3        0.7389163  -0.12315271
## 4       10.1748808   5.24982723
## 5        3.8809454   3.15805387
## 6        0.0000000  -0.93007234

Map the data using Leaflet

First, you need to join the ACS data with the shapefile. The geo_join function (from the ACS package) takes the shapefile, the fetched data (mydatadf1 and mydatadf2) and the column names of each to match on (in this case we named them the same thing to make it more intuitive). Note that you can use the same shape

Join the data

mydatamerged1 <- geo_join(shapefile, mydatadf1, "GEOID", "GEOID")
mydatamerged2 <- geo_join(shapefile, mydatadf2, "GEOID", "GEOID")
mydatamerged3 <- geo_join(shapefile, mydatadf3, "GEOID", "GEOID")

Prepare for mapping

We need to:

  • Assign our dataframe to a new variable for simplicity
  • Define a popup
  • Set a color pallete
  • Determine the LAT and LNG for the centerpoint for your map

Dataframe shortcut To make a shortcut here, we’re defining a new variable df and setting it to whatever you called your data, so that we don’t have to change every column reference.

df1 <- mydatamerged1
df2 <- mydatamerged2
df3 <- mydatamerged3

Popup This defines how a popup window will appear when you click on a census tract on the map. Here, we will have the popup display the GEOID and the percent commute by transit. The <br> adds an extra carriage return to put the label on the second line of the popup. The round function will round the percent to the tenths place.

mypopup1 <- paste0("GEOID: ", df1$GEOID, "<br>", "2006-2010 Commute by Public transit: ", round(df1$transitpercent,1), "%")
mypopup2 <- paste0("GEOID: ", df2$GEOID, "<br>", "2012-2016 Commute by Public transit: ", round(df2$transitpercent,1), "%")
mypopup3 <- paste0("GEOID: ", df3$GEOID, "<br>", "Change from 2006-2010 to 2012-2016 Commute by Public transit: ", round(df3$deltapercent,1), "%")

Pallete This sets the variation/range of colors. This is similar to how in ArcGIS you would symbolize by an attribute. Here we specify the pallete YlGnBu (note that the second character is a lower-case L). To explore different palettes, you can go here: http://colorbrewer2.org. To find the one we are working with, in Multi-hue, bottom row, click on the fourth one from the left).

Note that we only use ONE palette for df1 and df2, becuase we want the palette range to be exactly the same between the two maps. I just chose df1.

The palatte for df3, though, needs to be different because the change in percent will have a different range. Also, because the delta is on a scale from - to +, I changed the palette to a range from red to green. Notice that I’m not using domain=df3$deltapercent. The reason is that when I used that, there were some outliers that caused problems with the map visualization which chose -50 to +100. I played around with the domain and settled on -25 to +25. More information on the colorNumeric function here: https://www.rdocumentation.org/packages/leaflet/versions/1.1.0/topics/colorNumeric

mypal <- colorNumeric(
  palette = "YlGnBu",
  domain = df1$transitpercent
)
mypal3 <- colorNumeric(
  palette = "RdYlGn",
  domain = c(-25,+25)
)

** LAT/LNG Centerpoint** You can make your map have a center point for the area of interest. For Kendall Square, you can determine the LAT and LNG: go to google maps in a web browser, type “Kendall Square, Cambridge”, right click on the red pin, select “What’s Here?” and the LAT and LNG will display. Then define those points below.

myLAT <- 42.363951
myLNG <- -71.090011
mycentername <- "Kendall Square"

Define the map using leaflet

Now we load and create the map and all its parts. For the legend, note that we added a % suffix. A zoom of 12 is a good starting point, but you can tweak that value.

mymap1<-leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(myLNG, myLAT, zoom = 12) %>%
  addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
  addPolygons(data = df1, 
              fillColor = ~mypal(transitpercent), 
              color = "#b2aeae", # you need to use hex colors
              fillOpacity = 0.7, 
              weight = 1, 
              smoothFactor = 0.2,
              popup = mypopup1) %>%
  addLegend(pal = mypal, 
            values = df1$transitpercent, 
            position = "bottomright", 
            title = "2006-2010 Commute by Public Transit",
            labFormat = labelFormat(suffix = "%"),
            bins=5)

mymap2<-leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(myLNG, myLAT, zoom = 12) %>%
  addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
  addPolygons(data = df2, 
              fillColor = ~mypal(transitpercent), 
              color = "#b2aeae", # you need to use hex colors
              fillOpacity = 0.7, 
              weight = 1, 
              smoothFactor = 0.2,
              popup = mypopup2) %>%
  addLegend(pal = mypal, 
            values = df2$transitpercent, 
            position = "bottomright", 
            title = "2012-2016 Commute by Public Transit",
            labFormat = labelFormat(suffix = "%"),
            bins=5)

Side by side!

Put these two maps side by side? The sync function comes from the mapview library.

sync(mymap1, mymap2)

Now lets map the delta

Map

mymap3<-leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(myLNG, myLAT, zoom = 12) %>%
  addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
  addPolygons(data = df3, 
              fillColor = ~mypal3(deltapercent), 
              color = "#b2aeae", # you need to use hex colors
              fillOpacity = 0.7, 
              weight = 1, 
              smoothFactor = 0.2,
              popup = mypopup3) %>%
  addLegend(pal = mypal3, 
            values = c(-15,+25), #df3$deltapercent, 
            position = "bottomright", 
            title = "change from 2006-2010 to 2012-2016 Commute by Public Transit",
            labFormat = labelFormat(suffix = "%"),
            bins=5)
## Warning in mypal3(deltapercent): Some values were outside the color scale
## and will be treated as NA
mymap3