Mapping ACS data

This script will extract, graph and map county-level broadband access estimates from the U.S. Census Bureau’s American Community Survey for counties in the Nashville, Tennessee, area. It can be easily modified to do the same with other ACS estimates for other areas and geographic levels.

Below the script, you’ll find a Script output and notes section explaining what each code block in the script does. To use the script, you’ll need a Census API key, which you can get for free by visiting:

https://api.census.gov/data/key_signup.html

Farther below, you’ll find some modified versions of the code that will show you how flexible the script is:

Fun mod #1: Broadband in New York
Fun mod #2: Median household income in New York
Fun mod #3: Custom map colors and pop-up window
Fun mod #4: Switching to the county subdivision level

The example script

Here is the example script. It will fetch, graph and map county-level American Community Survey estimates of broadband access in the Nashville area.

# Installing and loading required packages

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidyverse")
if (!require("sf")) install.packages("tidyverse")
if (!require("mapview")) install.packages("tidyverse")

library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)

# Transmitting API key

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

# Fetching ACS codebooks

DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)

# Double checking target variables

ChosenVars <- filter(ProfileTables,name == "DP02_0154P"|
                                     name == "DP02_0001")
print(ChosenVars$name)
print(ChosenVars$label)
print(ChosenVars$concept)

# Specifying target variables

VariableList = 
  c(Broadband_ = "DP02_0154P",
    Households_ = "DP02_0001")

# Fetching data

mydata <- get_acs(
  geography = "county",
  state = "TN",
  variables = VariableList,
  year = 2022,
  survey = "acs5",
  output = "wide",
  geometry = TRUE)

# Reformatting data

mydata <-
  separate_wider_delim(mydata,
                       NAME,
                       delim = " County, ",
                       names = c("County", "State"))

# Filtering data

filtereddata <- mydata %>% 
  filter(County == "Davidson"|
           County == "Rutherford"|
           County == "Williamson"|
           County == "Cheatham"|
           County == "Robertson"|
           County == "Sumner"|
           County == "Wilson")

# Plotting data

ggplot(filtereddata, aes(x = Broadband_E, y = reorder(County, Broadband_E))) + 
  geom_errorbarh(aes(xmin = Broadband_E - Broadband_M, xmax = Broadband_E + Broadband_M)) + 
  geom_point(size = 3, color = "darkblue") + 
  theme_minimal(base_size = 12.5) + 
  labs(title = "Pct. households with broadband", 
       subtitle = "Nashville-area counties. Brackets show error margins.", 
       x = "2018-2022 ACS estimate", 
       y = "")

# Mapping data

mapdata <- filtereddata %>% 
  rename(Broadband = Broadband_E,
         Households = Households_E)

mapdata <- st_as_sf(mapdata)

mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Broadband",
        layer.name = "Pct. with broadband",
        popup = TRUE)

# Exporting data in .csv format

CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)

Script output and notes

As usual, the script begins by activating the required packages after checking to see whether any need to be installed. The tidyverse package adds a range of data wrangling features, while tidycensus provides a set of tools for accessing data and maps from the U.S. Census Bureau’s server. Finally, the sf and mapview packages give R the ability to make online, interactive maps.

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidyverse")
if (!require("sf")) install.packages("tidyverse")
if (!require("mapview")) install.packages("tidyverse")

library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)

Fetching some Census data

The Census Bureau’s annual American Community Survey datasets estimate thousands of demographic, cultural and economic variables. The tidycensus package makes it easy to extract them at the national, state or local level via the Census Bureau’s application programming interface, or API. Before you can use the API, you must complete and submit the Census Bureau’s API request form:

https://api.census.gov/data/key_signup.html

… and activate the key by clicking the activation link that will be included in your key’s delivery e-mail. Usually, the delivery e-mail arrives within a few minutes of your submitting the form. There is no charge for a key. You should not share your key with anyone else.

Using your Census API key

You must use your key to unlock the Census API at the start of every computing session. Running the tidycensus package’s census_api_key() function will do the job, but only if you first copy and paste your API key between the quote marks in this code, in place of the PasteYourAPIKeyBetweenTheseQuoteMarks text.

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

Deciding which variable(s) to extract

Figuring out exactly which variable or variables you want to extract from the ACS can get complicated. But one manageable way to go about it is to use this code to import the ACS’s three codebooks into R as data frames, then poke around in each data frame with the filter feature in R’s data frame viewer.

Possible customizations of the code include:

Changing 2022 to whatever year you prefer
Changing acs5 to acs1 if you want the codebook for a single-year ACS dataset. Note that variable names sometimes change from year to year. Also, single-year ACS data are available only for geographic areas with 60,000 or more residents.

DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)

An example: County-level broadband access.

Running this code after plugging the variable names you choose and the codebook the variables came from is a way of verifying that you got the variable names right and double checking your understanding of each variable’s definition. Strictly speaking, though, this code is optional.

This example focuses on the DP02_0154P and DP02_0001 variables from the ProfileTables codebook.

ChosenVars <- filter(ProfileTables,name == "DP02_0154P"|
                                     name == "DP02_0001")
print(ChosenVars$name)

## [1] "DP02_0001"  "DP02_0154P"

print(ChosenVars$label)

## [1] "Estimate!!HOUSEHOLDS BY TYPE!!Total households"                                               
## [2] "Percent!!COMPUTERS AND INTERNET USE!!Total households!!With a broadband Internet subscription"

print(ChosenVars$concept)

## [1] "Selected Social Characteristics in the United States"
## [2] "Selected Social Characteristics in the United States"

As you can see from the output:

The DP02_0154P variable estimates the percentage of households that have broadband internet access.
The DP02_0001 variable estimates the total number of households.

Fetching the data from the Census API

With the variable names confirmed, it’s time to extract them from the Census server via the Census API, using the code below. The code includes several parts that you can customize:

DP02_0154P and DP02_0001 are the names of the variables extractedf or this example. You can change them, and even use fewer or more, to extract other variables.
In the extracted data frame, Broadband_ and Households_ will become variable names for the DP02_0154P and DP02_0001 variables, respectively. If you choose different variables to extract, be sure you use variable names that match the meanings of the variables you choose.
The geography = "county" and state = TN arguments tell R to extract county-level estimates for the specified variables, and for all counties in Tennessee. You can specify other geographic levels and groupings, though. For directions, see the “Geography in tidycensus” section at: https://walker-data.com/tidycensus/articles/basic-usage.html.
Change 2022 to some other year if you want data for a year other than 2022.
Change acs5 to acs1 if you want single-year data rather than five-year data.
The geometry = TRUE argument tells R to retrieve mapping information for the specified geographic area (county, in this example). If for some reason you want only the data and not the map info, change TRUE to FALSE.

VariableList = 
  c(Broadband_ = "DP02_0154P",
    Households_ = "DP02_0001")

mydata <- get_acs(
  geography = "county",
  state = "TN",
  variables = VariableList,
  year = 2022,
  survey = "acs5",
  output = "wide",
  geometry = TRUE)

## Getting data from the 2018-2022 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

## Using the ACS Data Profile

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%

Formatting and filtering the data

The above code will extract the data your asked for and store it in a data frame called mydata. Looking at the data frame, you will see two variables for each one variable you extracted, one ending in _E, and one ending in _M. The _E variable contains the estimate, while the _M variable contains the margin of error, meaning the range below and above in which the actual population figure likely falls.

To illustrate what that means, the estimated broadband access percentage for Rutherford County in the example, as reported in Broadband_E, is 93.1 percent. The estimate’s margin of error in Rutherford County, as reported in Broadband_M, is 0.6. Accordingly, Rutherford County’s actual percentage of households with broadband access could be anywhere from 93.1 - 0.6 = 92.5 percent to 93.1 + 0.6 = 94.2 percent.

If (as in this example) you extracted county-level data, the name of each county will end up in a variable called name and will have a format like Rutherford County, Tennessee. It’s not especially handy to have the county name combined with the phrase, County, Tennessee, though. This code takes the name variable apart, discards the name variable and the County, portion (including the blank space at the start and end), and saves the Rutherford and Tennessee parts in new variables called County and State, respectively.

mydata <-
  separate_wider_delim(mydata,
                       NAME,
                       delim = " County, ",
                       names = c("County", "State"))

You can skip this code if you want to keep data for all of the geographic areas you extracted. But if you want data for only some areas, this code will let you specify which ones. This example keeps data for only Davidson County and the counties with which it shares a border.

Note that the filtered data get stored in a new data frame called filtereddata, leaving the original mydata data frame as is.

filtereddata <- mydata %>% 
  filter(County == "Davidson"|
           County == "Rutherford"|
           County == "Williamson"|
           County == "Cheatham"|
           County == "Robertson"|
           County == "Sumner"|
           County == "Wilson")

Graphing the data

How about a nice chart showing each county’s Broadband_E estimate and margin of error? This code will do the job. Customizable parts include:

Broadband_E and Broadband_M, which you can change to the _E and _M names of whatever variable you want the chart to show.
County, which is involved only if you are dealing with county-level data, and the County variable appears in the data frame.
darkblue, which you can change to some other color, if you like.
Pct. households with broadband, which you can change to a title suitable for whatever variable you are graphing.
Nashville-area counties. Brackets show error margins, which you can change to a more suitable subtitle, if needed.
2018-2022 ACS estimate, which you can change to a more suitable X axis label, if needed.

ggplot(filtereddata, aes(x = Broadband_E, y = reorder(County, Broadband_E))) + 
  geom_errorbarh(aes(xmin = Broadband_E - Broadband_M, xmax = Broadband_E + Broadband_M)) + 
  geom_point(size = 3, color = "darkblue") + 
  theme_minimal(base_size = 12.5) + 
  labs(title = "Pct. households with broadband", 
       subtitle = "Nashville-area counties. Brackets show error margins.", 
       x = "2018-2022 ACS estimate", 
       y = "")

Mapping the data

Meanwhile, this code will make a map of the variable you choose. This example maps the Broadband_E variable, after renaming it to just Broadband. Note the mapdata <- st_as_sf(mapdata) function, which restores mapdata’s ability to be displayed as a map. The file lost that ability by default during the earlier reformatting and filtering steps.

Customizable parts of this code include:

Broadband = Broadband_E and Households = Households_E, which rename the Broadband_E and Households_E variables as simply Broadband and Households. Both will look better in the map’s pop-up window as a result. If you are using different variables, you will need to substitute your variable name(s).
The zcol = "Broadband" argument tells R to shade the map’s areas according to the Broadband variable’s values. Change Broadband to whatever data frame variable you want to shade the map’s regions by.
Blues shades the map in hues of blue. For options, see: https://r-graph-gallery.com/38-rcolorbrewers-palettes.html.
Changing popup = TRUE to popup = FALSE will keep the detail window from opening when a user clicks a map region. This can be handy if you don’t like the look of the default detail window. With more advanced code, it is possible to customize the detail window.

mapdata <- filtereddata %>% 
  rename(Broadband = Broadband_E,
         Households = Households_E)

mapdata <- st_as_sf(mapdata)

mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Broadband",
        layer.name = "Pct. with broadband",
        popup = TRUE)

Exporting the data in .csv format

In case you would like to work with your map’s data in Google Sheets or some other data application, this code exports the mapdata data frame as a comma-separated value file called mydata.csv and stores it on your computer, in the same directory as this script. The exported file will not include the “geometry” column, which can be difficult to work with in a spreadsheet environment.

CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)

Fun mod #1: Broadband in New York

Try this mod to show broadband access in the counties around New York City, rather than Nashville.

# Fetching data
# Note the state = "NY" mod ##############################

mydata <- get_acs(
  geography = "county",
  state = "NY",
  variables = VariableList,
  year = 2022,
  survey = "acs5",
  output = "wide",
  geometry = TRUE)

# Reformatting data

mydata <-
  separate_wider_delim(mydata,
                       NAME,
                       delim = " County, ",
                       names = c("County", "State"))

# Filtering the data
# Note the list of New York counties ##############################

filtereddata <- mydata %>% 
  filter(County == "Suffolk"|
           County == "Nassau"|
           County == "Queens"|
           County == "Kings"|
           County == "Richmond"|
           County == "New York"|
           County == "Bronx"|
           County == "Westchester"|
           County == "Rockland")

# Plotting data
# Note the subtitle wording change ##############################

ggplot(filtereddata, aes(x = Broadband_E, y = reorder(County, Broadband_E))) + 
  geom_errorbarh(aes(xmin = Broadband_E - Broadband_M, xmax = Broadband_E + Broadband_M)) + 
  geom_point(size = 3, color = "darkblue") + 
  theme_minimal(base_size = 12.5) + 
  labs(title = "Pct. households with broadband", 
       subtitle = "New York-area counties. Brackets show error margins.", 
       x = "2018-2022 ACS estimate", 
       y = "")

# Mapping data

mapdata <- filtereddata %>% 
  rename(Broadband = Broadband_E,
         Households = Households_E)

mapdata <- st_as_sf(mapdata)

mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Broadband",
        layer.name = "Pct. with broadband",
        popup = TRUE)

Fun mod #2: Median household income in New York

Try this mod to show median household income in the counties of New York City.

# Double checking target variables
# Note the DP03_0062 variable name  ##############################

ChosenVars <- filter(ProfileTables,name == "DP03_0062"|
                       name == "DP02_0001")
print(ChosenVars$name)
print(ChosenVars$label)
print(ChosenVars$concept)

# Specifying target variables

VariableList = 
  c(MedianIncome_ = "DP03_0062",
    Households_ = "DP02_0001")

# Fetching data
# Note the state = "NY" mod

mydata <- get_acs(
  geography = "county",
  state = "NY",
  variables = VariableList,
  year = 2022,
  survey = "acs5",
  output = "wide",
  geometry = TRUE)

# Reformatting data

mydata <-
  separate_wider_delim(mydata,
                       NAME,
                       delim = " County, ",
                       names = c("County", "State"))

# Filtering data
# Note the list of New York counties ##############################

filtereddata <- mydata %>% 
  filter(County == "Suffolk"|
           County == "Nassau"|
           County == "Queens"|
           County == "Kings"|
           County == "Richmond"|
           County == "New York"|
           County == "Bronx"|
           County == "Westchester"|
           County == "Rockland")

# Plotting data
# Note the subtitle change ##############################

ggplot(filtereddata, aes(x = MedianIncome_E, y = reorder(County, MedianIncome_E))) + 
  geom_errorbarh(aes(xmin = MedianIncome_E - MedianIncome_M, xmax = MedianIncome_E + MedianIncome_M)) + 
  geom_point(size = 3, color = "darkblue") + 
  theme_minimal(base_size = 12.5) + 
  labs(title = "Median HH Income", 
       subtitle = "New York-area counties. Brackets show error margins.", 
       x = "2018-2022 ACS estimate", 
       y = "")

# Mapping data

mapdata <- filtereddata %>% 
  rename(Income = MedianIncome_E,
         Households = Households_E)

mapdata <- st_as_sf(mapdata)

mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Income",
        layer.name = "Median income",
        popup = TRUE)

Fun mod #3: Custom map colors and pop-up window

Finally, some extra packages and code can help you customize the map colors and streamline the pop-up window.

The mypalette = colorRampPalette(c('lightgray', 'darkgreen')) code defines a color palette ranging from light gray to dark green. You can pick other colors, if you like.
The col.regions = mypalette, at = seq(40000, 145000, 15000) applies the color palette to a scale ranging from 40,000 to 145,000, in steps of 15,000. The scale range is designed to match the scale of the variable being represented: Income.
The zcol = c("County", "State", "Income", "Households") code lists the mapdata variables you want to show in the detail window. Variables not listed will not show.

Try it out, using this code:

# Installing and activating some additional packages

if (!require("leaflet")) install.packages("leaflet")
if (!require("leaflet.extras2")) install.packages("leaflet.extras2")

library(leaflet)
library(leaflet.extras2)
library(leafpop)

# Defining a custome color palette for the map

mypalette = colorRampPalette(c('lightgray', 'darkgreen'))

# Generating the map

mapview(mapdata, zcol = "Income",
        col.regions = mypalette, at = seq(40000, 145000, 15000),
        layer.name = "Median HH income",
        popup = popupTable(mapdata,
                           feature.id = FALSE,
                           row.numbers = FALSE,
                           zcol = c("County",
                                    "State",
                                    "Income",
                                    "Households")))

Fun mod #4: Switching to the county subdivision level

With a few more tweaks of the code, we can switch from the county level to the county subdivision level.

# Double checking target variables

ChosenVars <- filter(ProfileTables,name == "DP03_0062"|
                       name == "DP02_0001")
print(ChosenVars$name)
print(ChosenVars$label)
print(ChosenVars$concept)

# Specifying target variables

VariableList = 
  c(MedianIncome_ = "DP03_0062",
    Households_ = "DP02_0001")

# Fetching data
# Note the geography = "county subdivision" mod ####################

mydata <- get_acs(
  geography = "county subdivision",
  state = "NY",
  variables = VariableList,
  year = 2022,
  survey = "acs5",
  output = "wide",
  geometry = TRUE)

# Reformatting data
# Note the ###########################################
# change in the delimiter to a comma and a space
# and the addition of "Division"  

mydata <-
  separate_wider_delim(mydata,
                       NAME,
                       delim = ", ",
                       names = c("Division", "County", "State"))

# Filtering data

filtereddata <- mydata %>% 
  filter(County == "Suffolk County"|
           County == "Nassau County"|
           County == "Queens County"|
           County == "Kings County"|
           County == "Richmond County"|
           County == "New York County"|
           County == "Bronx County"|
           County == "Westchester County"|
           County == "Rockland County")

# Mapping data

# Defining a custome color palette for the map

mypalette = colorRampPalette(c('lightgray', 'darkgreen'))

# Generating the map
# Note the palette range revision ##############################


mapview(mapdata, zcol = "Income",
        col.regions = mypalette, at = seq(40000, 275000, 15000),
        layer.name = "Median HH income",
        popup = popupTable(mapdata,
                           feature.id = FALSE,
                           row.numbers = FALSE,
                           zcol = c("County",
                                    "Division",
                                    "State",
                                    "Income",
                                    "Households")))