In this Tutorial you will create interactive maps centered on Kendall Square, Cambridge using 2006-2010 and 2012-2016 5-year ACS data.
There are three part to creating the webmap:
You should make sure that you have the following packages installed: tigris, acs, leaflet, and mapview.
First load the packages using the library function.
library(acs)
library(tigris)
library(leaflet)
library(mapview)
You can use Jeff’s API key which is available in Stellar and on Dropbox. Or you can get your own (https://api.census.gov/data/key_signup.htm). If you publish your results on RPub, be sure to set {r, echo=FALSE} so people can’t see your key (like I did below, you can’t see my command installing the key.)
api.key.install(key="INSERT YOUR API KEY HERE")
countylist <- c('17','25')
shapefile <- tracts(state='25', county=countylist)
We can visualize the new shapefile we just fetched:
plot(shapefile)
Now that we have the shapefile for the census tracts, we need the data to associate with these tracts.
Much like the advanced search in Fact Finder (http://factfinder.census.gov), we need to deterine a handful of things before we can fetch the data.
More about the different “spans”: https://www.census.gov/programs-surveys/acs/guidance/estimates.html
More about which years and spans of data are available https://www.census.gov/data/developers/data-sets.html
Note: Do not compare overlapping spans https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data.html
More info on comparing 5-year spans: https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data/2015/5-year-comparison.html
More about the different geographies: https://www.census.gov/programs-surveys/acs/geography-acs/concepts-definitions.html
The best way to determine which table(s) you are interested in is to use censusreporter: https://censusreporter.org.
We want to map the percent of workers who commute by transit. At censusreporter.org, scroll down, and click on Commute. Scroll down until you find B08124: Means of Transportation to Work by Occupation and click on it. There are many “columns” in this data. Enter cambridge,ma into the place box to show some sample data.
We want the percent of workers who commute by transit. Scroll down. For Cambridge, 31.4% comute by Public transportation.
Toward the right, click on “Switch to Totals.” These are the values that are imported, not the percentages. So we need to calculate the percentage manually.
Looking at the censusreporter, it’s a little frustrating but the absolute total for Cambridge, what you would expect to be the absolute first line, is missing. You can find that number by clicking on “Download Data” and selecting .csv. Doing this you would find the total for Cambridge to be 62,260. Therefore, if we take the Public transportation number 19,578 and divide it by 62,260 we get the 31.4% number we found above.
So we fetch the table of interest and put the results into the variable called mytable.
mytable <- acs.lookup(endyear=2015, table.number="B08124")
Census tables usually have many “columns” of data. We can make a list of all the columns.
results(mytable)$variable.name
## [1] "Total:"
## [2] "Management, business, science, and arts occupations"
## [3] "Service occupations"
## [4] "Sales and office occupations"
## [5] "Natural resources, construction, and maintenance occupations"
## [6] "Production, transportation, and material moving occupations"
## [7] "Military specific occupations"
## [8] "Car, truck, or van - drove alone:"
## [9] "Car, truck, or van - drove alone: Management, business, science, and arts occupations"
## [10] "Car, truck, or van - drove alone: Service occupations"
## [11] "Car, truck, or van - drove alone: Sales and office occupations"
## [12] "Car, truck, or van - drove alone: Natural resources, construction, and maintenance occupations"
## [13] "Car, truck, or van - drove alone: Production, transportation, and material moving occupations"
## [14] "Car, truck, or van - drove alone: Military specific occupations"
## [15] "Car, truck, or van - carpooled:"
## [16] "Car, truck, or van - carpooled: Management, business, science, and arts occupations"
## [17] "Car, truck, or van - carpooled: Service occupations"
## [18] "Car, truck, or van - carpooled: Sales and office occupations"
## [19] "Car, truck, or van - carpooled: Natural resources, construction, and maintenance occupations"
## [20] "Car, truck, or van - carpooled: Production, transportation, and material moving occupations"
## [21] "Car, truck, or van - carpooled: Military specific occupations"
## [22] "Public transportation (excluding taxicab):"
## [23] "Public transportation (excluding taxicab): Management, business, science, and arts occupations"
## [24] "Public transportation (excluding taxicab): Service occupations"
## [25] "Public transportation (excluding taxicab): Sales and office occupations"
## [26] "Public transportation (excluding taxicab): Natural resources, construction, and maintenance occupations"
## [27] "Public transportation (excluding taxicab): Production, transportation, and material moving occupations"
## [28] "Public transportation (excluding taxicab): Military specific occupations"
## [29] "Walked:"
## [30] "Walked: Management, business, science, and arts occupations"
## [31] "Walked: Service occupations"
## [32] "Walked: Sales and office occupations"
## [33] "Walked: Natural resources, construction, and maintenance occupations"
## [34] "Walked: Production, transportation, and material moving occupations"
## [35] "Walked: Military specific occupations"
## [36] "Taxicab, motorcycle, bicycle, or other means:"
## [37] "Taxicab, motorcycle, bicycle, or other means: Management, business, science, and arts occupations"
## [38] "Taxicab, motorcycle, bicycle, or other means: Service occupations"
## [39] "Taxicab, motorcycle, bicycle, or other means: Sales and office occupations"
## [40] "Taxicab, motorcycle, bicycle, or other means: Natural resources, construction, and maintenance occupations"
## [41] "Taxicab, motorcycle, bicycle, or other means: Production, transportation, and material moving occupations"
## [42] "Taxicab, motorcycle, bicycle, or other means: Military specific occupations"
## [43] "Worked at home:"
## [44] "Worked at home: Management, business, science, and arts occupations"
## [45] "Worked at home: Service occupations"
## [46] "Worked at home: Sales and office occupations"
## [47] "Worked at home: Natural resources, construction, and maintenance occupations"
## [48] "Worked at home: Production, transportation, and material moving occupations"
## [49] "Worked at home: Military specific occupations"
We want column #1 which gives the absolute total and also column #22 which gives the total commute by Public transportation. Later we will calcualte the percent.
myvars <- mytable[1]+mytable[22]
5 year estimates have the lowest margin of error, so we will specify that by putting the number 5 into the variable myspan for use later.
myspan <- 5
We will want two data sets, one ending in 2010 and one ending in 2016.
myendyear1 <- 2010
myendyear2 <- 2015
Now we set the geography that we want. For this, we will use the geo.make function. We want to pull all the census tracts in the county (unfortunately we can’t pull just for a city but we can filter after the fact).
We’ve already found the state cose for Massachusetts (25). Ane earlier on we definaed a variable countylist containing the two counties of interest (17 and 25). Small glitch: the geo.make function wants that list as numbers and not text versions of the numbers.
We use the “*" for tracts to denote ‘select all’. We will store this in the variable mygeo so we can use it later in our function that will fetch the data.
countylist2 <- as.numeric(countylist)
mygeo <- geo.make(state=25, county=countylist2, tract="*")
Now we have all the parts to put together the fetch function. The line below will always be the same. What changes is the definition of the variables above.
The variables mydata1 and mydata2 are complex objects containing lots of information about the census table and the data for each census tract. the first contains the 5-year data ending 2010, and the second 5-year data ending 2016.
mydata1 <- acs.fetch(endyear=myendyear1, span=myspan, geography=mygeo, variable=myvars)
mydata2 <- acs.fetch(endyear=myendyear2, span=myspan, geography=mygeo, variable=myvars)
As with data when we download it directly from ACS, it comes a bit messy and needs to be cleaned up. It’s a lot easier in R than in excel.
First, we need to generate a standardized GEOID for the ACS data so it can be matched to the shapefile which already has a standardized GEOID. To do that, we need to paste together the state code, county code, and tract code. The paste0 function pastes things together (note: paste puts spaces between the things it pastes, while paste0 does not.) Everything you want pasted together goes inside the paste0 function separated by commas (e.g., paste0("27", "A", "0", "99") will return "27A099".)
The function as.character ensures that the state, county, and tract numbers are first turned into characters before concatenating them together
Remember when we mentioned that ACS does not have an extra zero before the county number? We need to put in that extra zero so that the GEOID code we generate here matches the GEOID codes in the shapefile.
acsgeoid1 <- paste0(as.character(mydata1@geography$state),'0', as.character(mydata1@geography$county), as.character(mydata1@geography$tract))
acsgeoid2 <- paste0(as.character(mydata2@geography$state),'0', as.character(mydata2@geography$county), as.character(mydata2@geography$tract))
Now you are ready to put the fetched data into a usable format for R. The best format is a dataframe object which is in essence a spreadsheet with columns (variables) and rows (data for each census tract). You only need two columns: the GEOID and the data itself (in this case the median income variable which is found in the estimate component of the fetched data, identified by mydata$estimate). Note if you chose more than one variable in the myvars step above, then each of those variables would be a new column in the dataframe.
Once we put the columns into a dataframe, you can rename the column names to make them more user-friendly. In this case, there is only one data column median income.
You can use the head function to show the first 6 rows in a dataframe.
mydatadf1 <- data.frame(acsgeoid1, mydata1@estimate)
colnames(mydatadf1)=c("GEOID", "total", "transit")
head(mydatadf1)
## GEOID total
## Census Tract 3001, Middlesex County, Massachusetts 25017300100 1603
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101 1831
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102 2668
## Census Tract 3101, Middlesex County, Massachusetts 25017310100 1868
## Census Tract 3102, Middlesex County, Massachusetts 25017310200 3320
## Census Tract 3103, Middlesex County, Massachusetts 25017310300 2903
## transit
## Census Tract 3001, Middlesex County, Massachusetts 14
## Census Tract 3011.01, Middlesex County, Massachusetts 0
## Census Tract 3011.02, Middlesex County, Massachusetts 23
## Census Tract 3101, Middlesex County, Massachusetts 92
## Census Tract 3102, Middlesex County, Massachusetts 24
## Census Tract 3103, Middlesex County, Massachusetts 27
mydatadf2 <- data.frame(acsgeoid2, mydata2@estimate)
colnames(mydatadf2)=c("GEOID", "total", "transit")
head(mydatadf2)
## GEOID total
## Census Tract 3001, Middlesex County, Massachusetts 25017300100 1674
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101 2069
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102 2842
## Census Tract 3101, Middlesex County, Massachusetts 25017310100 2516
## Census Tract 3102, Middlesex County, Massachusetts 25017310200 3427
## Census Tract 3103, Middlesex County, Massachusetts 25017310300 2680
## transit
## Census Tract 3001, Middlesex County, Massachusetts 14
## Census Tract 3011.01, Middlesex County, Massachusetts 0
## Census Tract 3011.02, Middlesex County, Massachusetts 21
## Census Tract 3101, Middlesex County, Massachusetts 256
## Census Tract 3102, Middlesex County, Massachusetts 133
## Census Tract 3103, Middlesex County, Massachusetts 0
The mydatadf dataframe now has a column for the total commuters and a column for the transit commuters. We need to calculate the percent, and make an additional column, which we will call “transitpercent”. We multiply the transit percent by 100 to turn it from a fraction.
mydatadf1$transitpercent <- mydatadf1$transit/mydatadf1$total*100
mydatadf2$transitpercent <- mydatadf2$transit/mydatadf2$total*100
First, we need to merge the two datasets so we can then calculate the delta in percent over time. The merge function does that: merge(dataframe#1, dataframe#2, by = "column name to match entries", incomparables = NA). We can use the head function to take a look at the new dataframe. The transitpercent.x is the percent transit from the first dataframe 2006-2010, and transitpercent.y is from the second 2012-2016. We then make a new column of the difference between the two.
mydatadf3 <- merge(mydatadf1, mydatadf2, by = "GEOID", incomparables = NA)
head(mydatadf3)
## GEOID total.x transit.x transitpercent.x total.y transit.y
## 1 25017300100 1603 14 0.8733624 1674 14
## 2 25017301101 1831 0 0.0000000 2069 0
## 3 25017301102 2668 23 0.8620690 2842 21
## 4 25017310100 1868 92 4.9250535 2516 256
## 5 25017310200 3320 24 0.7228916 3427 133
## 6 25017310300 2903 27 0.9300723 2680 0
## transitpercent.y
## 1 0.8363202
## 2 0.0000000
## 3 0.7389163
## 4 10.1748808
## 5 3.8809454
## 6 0.0000000
mydatadf3$deltapercent <- mydatadf3$transitpercent.y-mydatadf3$transitpercent.x
head(mydatadf3)
## GEOID total.x transit.x transitpercent.x total.y transit.y
## 1 25017300100 1603 14 0.8733624 1674 14
## 2 25017301101 1831 0 0.0000000 2069 0
## 3 25017301102 2668 23 0.8620690 2842 21
## 4 25017310100 1868 92 4.9250535 2516 256
## 5 25017310200 3320 24 0.7228916 3427 133
## 6 25017310300 2903 27 0.9300723 2680 0
## transitpercent.y deltapercent
## 1 0.8363202 -0.03704225
## 2 0.0000000 0.00000000
## 3 0.7389163 -0.12315271
## 4 10.1748808 5.24982723
## 5 3.8809454 3.15805387
## 6 0.0000000 -0.93007234
First, you need to join the ACS data with the shapefile. The geo_join function (from the ACS package) takes the shapefile, the fetched data (mydatadf1 and mydatadf2) and the column names of each to match on (in this case we named them the same thing to make it more intuitive). Note that you can use the same shape
mydatamerged1 <- geo_join(shapefile, mydatadf1, "GEOID", "GEOID")
mydatamerged2 <- geo_join(shapefile, mydatadf2, "GEOID", "GEOID")
mydatamerged3 <- geo_join(shapefile, mydatadf3, "GEOID", "GEOID")
We need to:
Dataframe shortcut To make a shortcut here, we’re defining a new variable df and setting it to whatever you called your data, so that we don’t have to change every column reference.
df1 <- mydatamerged1
df2 <- mydatamerged2
df3 <- mydatamerged3
Popup This defines how a popup window will appear when you click on a census tract on the map. Here, we will have the popup display the GEOID and the percent commute by transit. The <br> adds an extra carriage return to put the label on the second line of the popup. The round function will round the percent to the tenths place.
mypopup1 <- paste0("GEOID: ", df1$GEOID, "<br>", "2006-2010 Commute by Public transit: ", round(df1$transitpercent,1), "%")
mypopup2 <- paste0("GEOID: ", df2$GEOID, "<br>", "2012-2016 Commute by Public transit: ", round(df2$transitpercent,1), "%")
mypopup3 <- paste0("GEOID: ", df3$GEOID, "<br>", "Change from 2006-2010 to 2012-2016 Commute by Public transit: ", round(df3$deltapercent,1), "%")
Pallete This sets the variation/range of colors. This is similar to how in ArcGIS you would symbolize by an attribute. Here we specify the pallete YlGnBu (note that the second character is a lower-case L). To explore different palettes, you can go here: http://colorbrewer2.org. To find the one we are working with, in Multi-hue, bottom row, click on the fourth one from the left).
Note that we only use ONE palette for df1 and df2, becuase we want the palette range to be exactly the same between the two maps. I just chose df1.
The palatte for df3, though, needs to be different because the change in percent will have a different range. Also, because the delta is on a scale from - to +, I changed the palette to a range from red to green. Notice that I’m not using domain=df3$deltapercent. The reason is that when I used that, there were some outliers that caused problems with the map visualization which chose -50 to +100. I played around with the domain and settled on -25 to +25. More information on the colorNumeric function here: https://www.rdocumentation.org/packages/leaflet/versions/1.1.0/topics/colorNumeric
mypal <- colorNumeric(
palette = "YlGnBu",
domain = df1$transitpercent
)
mypal3 <- colorNumeric(
palette = "RdYlGn",
domain = c(-25,+25)
)
** LAT/LNG Centerpoint** You can make your map have a center point for the area of interest. For Kendall Square, you can determine the LAT and LNG: go to google maps in a web browser, type “Kendall Square, Cambridge”, right click on the red pin, select “What’s Here?” and the LAT and LNG will display. Then define those points below.
myLAT <- 42.363951
myLNG <- -71.090011
mycentername <- "Kendall Square"
Now we load and create the map and all its parts. For the legend, note that we added a % suffix. A zoom of 12 is a good starting point, but you can tweak that value.
mymap1<-leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(myLNG, myLAT, zoom = 12) %>%
addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
addPolygons(data = df1,
fillColor = ~mypal(transitpercent),
color = "#b2aeae", # you need to use hex colors
fillOpacity = 0.7,
weight = 1,
smoothFactor = 0.2,
popup = mypopup1) %>%
addLegend(pal = mypal,
values = df1$transitpercent,
position = "bottomright",
title = "2006-2010 Commute by Public Transit",
labFormat = labelFormat(suffix = "%"),
bins=5)
mymap2<-leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(myLNG, myLAT, zoom = 12) %>%
addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
addPolygons(data = df2,
fillColor = ~mypal(transitpercent),
color = "#b2aeae", # you need to use hex colors
fillOpacity = 0.7,
weight = 1,
smoothFactor = 0.2,
popup = mypopup2) %>%
addLegend(pal = mypal,
values = df2$transitpercent,
position = "bottomright",
title = "2012-2016 Commute by Public Transit",
labFormat = labelFormat(suffix = "%"),
bins=5)
Put these two maps side by side? The sync function comes from the mapview library.
sync(mymap1, mymap2)
Map
mymap3<-leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(myLNG, myLAT, zoom = 12) %>%
addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
addPolygons(data = df3,
fillColor = ~mypal3(deltapercent),
color = "#b2aeae", # you need to use hex colors
fillOpacity = 0.7,
weight = 1,
smoothFactor = 0.2,
popup = mypopup3) %>%
addLegend(pal = mypal3,
values = c(-15,+25), #df3$deltapercent,
position = "bottomright",
title = "change from 2006-2010 to 2012-2016 Commute by Public Transit",
labFormat = labelFormat(suffix = "%"),
bins=5)
## Warning in mypal3(deltapercent): Some values were outside the color scale
## and will be treated as NA
mymap3