In this Tutorial you will create interactive maps centered on Kendall Square, Cambridge using 2011-2015 5-year ACS data.
There are three part to creating the webmap:
You should make sure that you have the following packages installed: tigris, acs, leaflet, and mapview.
First load the packages using the library function.
library(acs)
library(tigris)
library(leaflet)
library(mapview)
You can use Jeff’s API key which is available in Stellar and on Dropbox. Or you can get your own (https://api.census.gov/data/key_signup.htm). If you publish your results on RPub, be sure to set {r, echo=FALSE} so people can’t see your key (like I did below, you can’t see my command installing the key.)
api.key.install(key="INSERT YOUR API KEY HERE")
countylist <- c('17','25')
shapefile <- tracts(state='25', county=countylist)
We can visualize the new shapefile we just fetched:
plot(shapefile)
Now that we have the shapefile for the census tracts, we need some data to associate with these tracts.
Much like the advanced search in Fact Finder (http://factfinder.census.gov), we need to deterine a handful of things before we can fetch the data.
The best way to determine which table(s) you are interested in is to use censusreporter: https://censusreporter.org.
We want to map the percent of workers who commute by transit. At censusreporter.org, scroll down, and click on Commute. Scroll down until you find B08124: Means of Transportation to Work by Occupation and click on it. There are many “columns” in this data. Enter cambridge,ma into the place box to show some sample data.
We want the percent of workers who commute by transit. Scroll down. For Cambridge, 31.4% comute by Public transportation.
Toward the right, click on “Switch to Totals.” These are the values that are imported, not the percentages. So we need to calculate the percentage manually.
Looking at the censusreporter, it’s a little frustrating but the absolute total for Cambridge, what you would expect to be the absolute first line, is missing. You can find that number by clicking on “Download Data” and selecting .csv. Doing this you would find the total for Cambridge to be 62,260. Therefore, if we take the Public transportation number 19,578 and divide it by 62,260 we get the 31.4% number we found above.
So we fetch the table of interest and put the results into the variable called mytable.
mytable <- acs.lookup(endyear=2015, table.number="B08124")
Census tables usually have many “columns” of data. We can make a list of all the columns.
results(mytable)$variable.name
## [1] "Total:"
## [2] "Management, business, science, and arts occupations"
## [3] "Service occupations"
## [4] "Sales and office occupations"
## [5] "Natural resources, construction, and maintenance occupations"
## [6] "Production, transportation, and material moving occupations"
## [7] "Military specific occupations"
## [8] "Car, truck, or van - drove alone:"
## [9] "Car, truck, or van - drove alone: Management, business, science, and arts occupations"
## [10] "Car, truck, or van - drove alone: Service occupations"
## [11] "Car, truck, or van - drove alone: Sales and office occupations"
## [12] "Car, truck, or van - drove alone: Natural resources, construction, and maintenance occupations"
## [13] "Car, truck, or van - drove alone: Production, transportation, and material moving occupations"
## [14] "Car, truck, or van - drove alone: Military specific occupations"
## [15] "Car, truck, or van - carpooled:"
## [16] "Car, truck, or van - carpooled: Management, business, science, and arts occupations"
## [17] "Car, truck, or van - carpooled: Service occupations"
## [18] "Car, truck, or van - carpooled: Sales and office occupations"
## [19] "Car, truck, or van - carpooled: Natural resources, construction, and maintenance occupations"
## [20] "Car, truck, or van - carpooled: Production, transportation, and material moving occupations"
## [21] "Car, truck, or van - carpooled: Military specific occupations"
## [22] "Public transportation (excluding taxicab):"
## [23] "Public transportation (excluding taxicab): Management, business, science, and arts occupations"
## [24] "Public transportation (excluding taxicab): Service occupations"
## [25] "Public transportation (excluding taxicab): Sales and office occupations"
## [26] "Public transportation (excluding taxicab): Natural resources, construction, and maintenance occupations"
## [27] "Public transportation (excluding taxicab): Production, transportation, and material moving occupations"
## [28] "Public transportation (excluding taxicab): Military specific occupations"
## [29] "Walked:"
## [30] "Walked: Management, business, science, and arts occupations"
## [31] "Walked: Service occupations"
## [32] "Walked: Sales and office occupations"
## [33] "Walked: Natural resources, construction, and maintenance occupations"
## [34] "Walked: Production, transportation, and material moving occupations"
## [35] "Walked: Military specific occupations"
## [36] "Taxicab, motorcycle, bicycle, or other means:"
## [37] "Taxicab, motorcycle, bicycle, or other means: Management, business, science, and arts occupations"
## [38] "Taxicab, motorcycle, bicycle, or other means: Service occupations"
## [39] "Taxicab, motorcycle, bicycle, or other means: Sales and office occupations"
## [40] "Taxicab, motorcycle, bicycle, or other means: Natural resources, construction, and maintenance occupations"
## [41] "Taxicab, motorcycle, bicycle, or other means: Production, transportation, and material moving occupations"
## [42] "Taxicab, motorcycle, bicycle, or other means: Military specific occupations"
## [43] "Worked at home:"
## [44] "Worked at home: Management, business, science, and arts occupations"
## [45] "Worked at home: Service occupations"
## [46] "Worked at home: Sales and office occupations"
## [47] "Worked at home: Natural resources, construction, and maintenance occupations"
## [48] "Worked at home: Production, transportation, and material moving occupations"
## [49] "Worked at home: Military specific occupations"
We want column #1 which gives the absolute total and also column #22 which gives the total commute by Public transportation. Later we will calcualte the percent.
myvars <- mytable[1]+mytable[22]
5 year estimates have the lowest margin of error, so we will specify that by putting the number 5 into the variable myspan for use later.
myspan <- 5
We will use the most recent data available from the API, which is 2015, so we put 2015 into the variable myendyear for use later.
myendyear <- 2015
Now we set the geography that we want. For this, we will use the geo.make function. We want to pull all the census tracts in the county (unfortunately we can’t pull just for a city but we can filter after the fact).
We’ve already found the state cose for Massachusetts (25). Ane earlier on we definaed a variable countylist containing the two counties of interest (17 and 25). Small glitch: the geo.make function wants that list as numbers and not text versions of the numbers.
We use the “*" for tracts to denote ‘select all’. We will store this in the variable mygeo so we can use it later in our function that will fetch the data.
countylist2 <- as.numeric(countylist)
mygeo <- geo.make(state=25, county=countylist2, tract="*")
Now we have all the parts to put together the fetch function. The line below will always be the same. What changes is the definition of the variables above.
The variable mydata is a complex object containing lots of information about the census table and the data for each census tract.
mydata <- acs.fetch(endyear=myendyear, span=myspan, geography=mygeo, variable=myvars)
As with data when we download it directly from ACS, it comes a bit messy and needs to be cleaned up. It’s a lot easier in R than in excel.
First, we need to generate a standardized GEOID for the ACS data so it can be matched to the shapefile which already has a standardized GEOID. To do that, we need to paste together the state code, county code, and tract code. The paste0 function pastes things together (note: paste puts spaces between the things it pastes, while paste0 does not.) Everything you want pasted together goes inside the paste0 function separated by commas (e.g., paste0("27", "A", "0", "99") will return "27A099".)
The function as.character ensures that the state, county, and tract numbers are first turned into characters before concatenating them together
Remember when we mentioned that ACS does not have an extra zero before the county number? We need to put in that extra zero so that the GEOID code we generate here matches the GEOID codes in the shapefile.
acsgeoid <- paste0(as.character(mydata@geography$state),'0', as.character(mydata@geography$county), as.character(mydata@geography$tract))
Now you are ready to put the fetched data into a usable format for R. The best format is a dataframe object which is in essence a spreadsheet with columns (variables) and rows (data for each census tract). You only need two columns: the GEOID and the data itself (in this case the median income variable which is found in the estimate component of the fetched data, identified by mydata$estimate). Note if you chose more than one variable in the myvars step above, then each of those variables would be a new column in the dataframe.
Once we put the columns into a dataframe, you can rename the column names to make them more user-friendly. In this case, there is only one data column median income.
You can use the head function to show the first 6 rows in a dataframe.
mydatadf <- data.frame(acsgeoid, mydata@estimate)
colnames(mydatadf)=c("GEOID", "total", "transit")
head(mydatadf)
## GEOID total
## Census Tract 3001, Middlesex County, Massachusetts 25017300100 1674
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101 2069
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102 2842
## Census Tract 3101, Middlesex County, Massachusetts 25017310100 2516
## Census Tract 3102, Middlesex County, Massachusetts 25017310200 3427
## Census Tract 3103, Middlesex County, Massachusetts 25017310300 2680
## transit
## Census Tract 3001, Middlesex County, Massachusetts 14
## Census Tract 3011.01, Middlesex County, Massachusetts 0
## Census Tract 3011.02, Middlesex County, Massachusetts 21
## Census Tract 3101, Middlesex County, Massachusetts 256
## Census Tract 3102, Middlesex County, Massachusetts 133
## Census Tract 3103, Middlesex County, Massachusetts 0
The mydatadf dataframe now has a column for the total commuters and a column for the transit commuters. We need to calculate the percent, and make an additional column, which we will call “transitpercent”. We multiply the transit percent by 100 to turn it from a fraction.
mydatadf$transitpercent <- mydatadf$transit/mydatadf$total*100
First, you need to join the ACS data with the shapefile. The geo_join function (from the ACS package) takes the shapefile, the fetched data (mydatadf) and the column names of each to match on (in this case we named them the same thing to make it more intuitive).
mydatamerged <- geo_join(shapefile, mydatadf, "GEOID", "GEOID")
We need to:
Dataframe shortcut To make a shortcut here, we’re defining a new variable df and setting it to whatever you called your data, so that we don’t have to change every column reference.
df <- mydatamerged
Popup This defines how a popup window will appear when you click on a census tract on the map. Here, we will have the popup display the GEOID and the percent commute by transit. The <br> adds an extra carriage return to put the label on the second line of the popup. The round function will round the percent to the tenths place.
mypopup <- paste0("GEOID: ", df$GEOID, "<br>", "Commute by Public transit: ", round(df$transitpercent,1), "%")
Pallete This sets the variation/range of colors. This is similar to how in ArcGIS you would symbolize by an attribute. Here we specify the pallete YlGnBu (note that the second character is a lower-case L). To explore different palettes, you can go here: http://colorbrewer2.org. To find the one we are working with, in Multi-hue, bottom row, click on the fourth one from the left).
mypal <- colorNumeric(
palette = "YlGnBu",
domain = df$transitpercent
)
** LAT/LNG Centerpoint** You can make your map have a center point for the area of interest. For Kendall Square, you can determine the LAT and LNG: go to google maps in a web browser, type “Kendall Square, Cambridge”, right click on the red pin, select “What’s Here?” and the LAT and LNG will display. Then define those points below.
myLAT <- 42.363951
myLNG <- -71.090011
mycentername <- "Kendall Square"
Now we load and create the map and all its parts. For the legend, note that we added a % suffix. A zoom of 12 is a good starting point, but you can tweak that value.
mymap<-leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(myLNG, myLAT, zoom = 12) %>%
addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
addPolygons(data = df,
fillColor = ~mypal(transitpercent),
color = "#b2aeae", # you need to use hex colors
fillOpacity = 0.7,
weight = 1,
smoothFactor = 0.2,
popup = mypopup) %>%
addLegend(pal = mypal,
values = df$transitpercent,
position = "bottomright",
title = "Commute by Public Transit",
labFormat = labelFormat(suffix = "%"),
bins=5)
mymap
Looking back at the censusreporter.org for Cambridge, there are 5,654 service workers, and of them, 1,719 commute by transit. This is 30.4%.
Looking back at our column numbers, we want column #3 which gives the total number of service workers and also column #24 which gives the total service workers who commute by Public transportation. Later we will calculate the percent.
myvars <- mytable[3]+mytable[24]
myspan <- 5
myendyear <- 2015
countylist2 <- as.numeric(countylist)
mygeo <- geo.make(state=25, county=countylist2, tract="*")
mydata <- acs.fetch(endyear=myendyear, span=myspan, geography=mygeo, variable=myvars)
acsgeoid <- paste0(as.character(mydata@geography$state),'0', as.character(mydata@geography$county), as.character(mydata@geography$tract))
mydatadf <- data.frame(acsgeoid, mydata@estimate)
colnames(mydatadf)=c("GEOID", "total", "transit")
head(mydatadf)
## GEOID total
## Census Tract 3001, Middlesex County, Massachusetts 25017300100 228
## Census Tract 3011.01, Middlesex County, Massachusetts 25017301101 328
## Census Tract 3011.02, Middlesex County, Massachusetts 25017301102 474
## Census Tract 3101, Middlesex County, Massachusetts 25017310100 555
## Census Tract 3102, Middlesex County, Massachusetts 25017310200 697
## Census Tract 3103, Middlesex County, Massachusetts 25017310300 698
## transit
## Census Tract 3001, Middlesex County, Massachusetts 0
## Census Tract 3011.01, Middlesex County, Massachusetts 0
## Census Tract 3011.02, Middlesex County, Massachusetts 6
## Census Tract 3101, Middlesex County, Massachusetts 33
## Census Tract 3102, Middlesex County, Massachusetts 17
## Census Tract 3103, Middlesex County, Massachusetts 0
mydatadf$transitpercent <- mydatadf$transit/mydatadf$total*100
mydatamerged <- geo_join(shapefile, mydatadf, "GEOID", "GEOID")
Dataframe shortcut
df <- mydatamerged
Popup
mypopup <- paste0("GEOID: ", df$GEOID, "<br>", "Commute by Public transit by service workers: ", round(df$transitpercent,1), "%")
Pallete
mypal <- colorNumeric(
palette = "YlGnBu",
domain = df$transitpercent
)
** LAT/LNG Centerpoint**
myLAT <- 42.363951
myLNG <- -71.090011
mycentername <- "Kendall Square"
mymap2<-leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(myLNG, myLAT, zoom = 12) %>%
addMarkers(lat=myLAT, lng=myLNG, popup=mycentername) %>%
addPolygons(data = df,
fillColor = ~mypal(transitpercent),
color = "#b2aeae", # you need to use hex colors
fillOpacity = 0.7,
weight = 1,
smoothFactor = 0.2,
popup = mypopup) %>%
addLegend(pal = mypal,
values = df$transitpercent,
position = "bottomright",
title = "Commute by Public Transit<br>by service workers",
labFormat = labelFormat(suffix = "%"),
bins=4)
mymap2
How about putting these two maps side by side? The sync function comes from the mapview library.
sync(mymap, mymap2)