This script will extract, graph and map county-level broadband access estimates from the U.S. Census Bureau’s American Community Survey for counties in the Nashville, Tennessee, area. It can be easily modified to do the same with other ACS estimates for other areas and geographic levels.
Below the script, you’ll find a Script output and notes section explaining what each code block in the script does. To use the script, you’ll need a Census API key, which you can get for free by visiting:
https://api.census.gov/data/key_signup.html
Farther below, you’ll find some modified versions of the code that will show you how flexible the script is:
Here is the example script. It will fetch, graph and map county-level American Community Survey estimates of broadband access in the Nashville area.
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidyverse")
if (!require("sf")) install.packages("tidyverse")
if (!require("mapview")) install.packages("tidyverse")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
# Double checking target variables
ChosenVars <- filter(ProfileTables,name == "DP02_0154P"|
name == "DP02_0001")
print(ChosenVars$name)
print(ChosenVars$label)
print(ChosenVars$concept)
# Specifying target variables
VariableList =
c(Broadband_ = "DP02_0154P",
Households_ = "DP02_0001")
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = " County, ",
names = c("County", "State"))
# Filtering data
filtereddata <- mydata %>%
filter(County == "Davidson"|
County == "Rutherford"|
County == "Williamson"|
County == "Cheatham"|
County == "Robertson"|
County == "Sumner"|
County == "Wilson")
# Plotting data
ggplot(filtereddata, aes(x = Broadband_E, y = reorder(County, Broadband_E))) +
geom_errorbarh(aes(xmin = Broadband_E - Broadband_M, xmax = Broadband_E + Broadband_M)) +
geom_point(size = 3, color = "darkblue") +
theme_minimal(base_size = 12.5) +
labs(title = "Pct. households with broadband",
subtitle = "Nashville-area counties. Brackets show error margins.",
x = "2018-2022 ACS estimate",
y = "")
# Mapping data
mapdata <- filtereddata %>%
rename(Broadband = Broadband_E,
Households = Households_E)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Broadband",
layer.name = "Pct. with broadband",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
As usual, the script begins by activating the required packages after
checking to see whether any need to be installed. The
tidyverse
package adds a range of data wrangling features,
while tidycensus
provides a set of tools for accessing data
and maps from the U.S. Census Bureau’s server. Finally, the
sf
and mapview
packages give R the ability to
make online, interactive maps.
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidyverse")
if (!require("sf")) install.packages("tidyverse")
if (!require("mapview")) install.packages("tidyverse")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
The Census Bureau’s annual American Community Survey datasets estimate thousands of demographic, cultural and economic variables. The tidycensus package makes it easy to extract them at the national, state or local level via the Census Bureau’s application programming interface, or API. Before you can use the API, you must complete and submit the Census Bureau’s API request form:
https://api.census.gov/data/key_signup.html
… and activate the key by clicking the activation link that will be included in your key’s delivery e-mail. Usually, the delivery e-mail arrives within a few minutes of your submitting the form. There is no charge for a key. You should not share your key with anyone else.
You must use your key to unlock the Census API at the start of every
computing session. Running the tidycensus
package’s
census_api_key() function will do the job, but only if you first copy
and paste your API key between the quote marks in this code, in place of
the PasteYourAPIKeyBetweenTheseQuoteMarks
text.
census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
Figuring out exactly which variable or variables you want to extract from the ACS can get complicated. But one manageable way to go about it is to use this code to import the ACS’s three codebooks into R as data frames, then poke around in each data frame with the filter feature in R’s data frame viewer.
Possible customizations of the code include:
Changing 2022
to whatever year you prefer
Changing acs5
to acs1
if you want the
codebook for a single-year ACS dataset. Note that variable names
sometimes change from year to year. Also, single-year ACS data are
available only for geographic areas with 60,000 or more
residents.
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
Running this code after plugging the variable names you choose and the codebook the variables came from is a way of verifying that you got the variable names right and double checking your understanding of each variable’s definition. Strictly speaking, though, this code is optional.
This example focuses on the DP02_0154P
and
DP02_0001
variables from the ProfileTables
codebook.
ChosenVars <- filter(ProfileTables,name == "DP02_0154P"|
name == "DP02_0001")
print(ChosenVars$name)
## [1] "DP02_0001" "DP02_0154P"
print(ChosenVars$label)
## [1] "Estimate!!HOUSEHOLDS BY TYPE!!Total households"
## [2] "Percent!!COMPUTERS AND INTERNET USE!!Total households!!With a broadband Internet subscription"
print(ChosenVars$concept)
## [1] "Selected Social Characteristics in the United States"
## [2] "Selected Social Characteristics in the United States"
As you can see from the output:
The DP02_0154P
variable estimates the percentage of
households that have broadband internet access.
The DP02_0001
variable estimates the total number of
households.
With the variable names confirmed, it’s time to extract them from the Census server via the Census API, using the code below. The code includes several parts that you can customize:
DP02_0154P
and DP02_0001
are the names
of the variables extractedf or this example. You can change them, and
even use fewer or more, to extract other variables.
In the extracted data frame, Broadband_
and
Households_
will become variable names for the
DP02_0154P
and DP02_0001
variables,
respectively. If you choose different variables to extract, be sure you
use variable names that match the meanings of the variables you
choose.
The geography = "county"
and state = TN
arguments tell R to extract county-level estimates for the specified
variables, and for all counties in Tennessee. You can specify other
geographic levels and groupings, though. For directions, see the
“Geography in tidycensus” section at: https://walker-data.com/tidycensus/articles/basic-usage.html.
Change 2022
to some other year if you want data for
a year other than 2022.
Change acs5
to acs1
if you want
single-year data rather than five-year data.
The geometry = TRUE
argument tells R to retrieve
mapping information for the specified geographic area
(county
, in this example). If for some reason you want only
the data and not the map info, change TRUE
to
FALSE
.
VariableList =
c(Broadband_ = "DP02_0154P",
Households_ = "DP02_0001")
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
## Getting data from the 2018-2022 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the ACS Data Profile
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================= | 24%
|
|================= | 25%
|
|================== | 26%
|
|==================== | 28%
|
|====================== | 32%
|
|======================= | 33%
|
|========================= | 36%
|
|============================ | 40%
|
|=============================== | 45%
|
|================================= | 48%
|
|===================================== | 52%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|================================================= | 70%
|
|=================================================== | 73%
|
|===================================================== | 76%
|
|======================================================= | 79%
|
|========================================================= | 81%
|
|========================================================== | 82%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|=============================================================== | 90%
|
|==================================================================== | 97%
|
|======================================================================| 100%
The above code will extract the data your asked for and store it in a
data frame called mydata
. Looking at the data frame, you
will see two variables for each one variable you extracted, one ending
in _E
, and one ending in _M
. The
_E
variable contains the estimate, while the
_M
variable contains the margin of error, meaning the range
below and above in which the actual population figure likely falls.
To illustrate what that means, the estimated broadband access
percentage for Rutherford County in the example, as reported in
Broadband_E
, is 93.1 percent. The estimate’s margin of
error in Rutherford County, as reported in Broadband_M
, is
0.6. Accordingly, Rutherford County’s actual percentage of households
with broadband access could be anywhere from 93.1 - 0.6 = 92.5 percent
to 93.1 + 0.6 = 94.2 percent.
If (as in this example) you extracted county-level data, the name of
each county will end up in a variable called name
and will
have a format like Rutherford County, Tennessee
. It’s not
especially handy to have the county name combined with the phrase,
County, Tennessee
, though. This code takes the name
variable apart, discards the name
variable and the
County,
portion (including the blank space at the start and
end), and saves the Rutherford
and Tennessee
parts in new variables called County
and
State
, respectively.
mydata <-
separate_wider_delim(mydata,
NAME,
delim = " County, ",
names = c("County", "State"))
You can skip this code if you want to keep data for all of the geographic areas you extracted. But if you want data for only some areas, this code will let you specify which ones. This example keeps data for only Davidson County and the counties with which it shares a border.
Note that the filtered data get stored in a new data frame called
filtereddata
, leaving the original mydata
data
frame as is.
filtereddata <- mydata %>%
filter(County == "Davidson"|
County == "Rutherford"|
County == "Williamson"|
County == "Cheatham"|
County == "Robertson"|
County == "Sumner"|
County == "Wilson")
How about a nice chart showing each county’s Broadband_E estimate and margin of error? This code will do the job. Customizable parts include:
Broadband_E
and Broadband_M
, which you
can change to the _E
and _M
names of whatever
variable you want the chart to show.
County
, which is involved only if you are dealing
with county-level data, and the County
variable appears in
the data frame.
darkblue
, which you can change to some other color,
if you like.
Pct. households with broadband
, which you can change
to a title suitable for whatever variable you are graphing.
Nashville-area counties. Brackets show error margins
,
which you can change to a more suitable subtitle, if needed.
2018-2022 ACS estimate
, which you can change to a
more suitable X axis label, if needed.
ggplot(filtereddata, aes(x = Broadband_E, y = reorder(County, Broadband_E))) +
geom_errorbarh(aes(xmin = Broadband_E - Broadband_M, xmax = Broadband_E + Broadband_M)) +
geom_point(size = 3, color = "darkblue") +
theme_minimal(base_size = 12.5) +
labs(title = "Pct. households with broadband",
subtitle = "Nashville-area counties. Brackets show error margins.",
x = "2018-2022 ACS estimate",
y = "")
Meanwhile, this code will make a map of the variable you choose. This
example maps the Broadband_E
variable, after renaming it to
just Broadband
. Note the
mapdata <- st_as_sf(mapdata) function
, which restores
mapdata’s ability to be displayed as a map. The file lost that ability
by default during the earlier reformatting and filtering steps.
Customizable parts of this code include:
Broadband = Broadband_E
and
Households = Households_E
, which rename the
Broadband_E
and Households_E
variables as
simply Broadband
and Households
. Both will
look better in the map’s pop-up window as a result. If you are using
different variables, you will need to substitute your variable
name(s).
The zcol = "Broadband"
argument tells R to shade the
map’s areas according to the Broadband
variable’s values.
Change Broadband
to whatever data frame variable you want
to shade the map’s regions by.
Blues
shades the map in hues of blue. For options,
see: https://r-graph-gallery.com/38-rcolorbrewers-palettes.html.
Changing popup = TRUE
to popup = FALSE
will keep the detail window from opening when a user clicks a map
region. This can be handy if you don’t like the look of the default
detail window. With more advanced code, it is possible to customize the
detail window.
mapdata <- filtereddata %>%
rename(Broadband = Broadband_E,
Households = Households_E)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Broadband",
layer.name = "Pct. with broadband",
popup = TRUE)
In case you would like to work with your map’s data in Google Sheets or some other data application, this code exports the mapdata data frame as a comma-separated value file called mydata.csv and stores it on your computer, in the same directory as this script. The exported file will not include the “geometry” column, which can be difficult to work with in a spreadsheet environment.
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
Try this mod to show broadband access in the counties around New York City, rather than Nashville.
# Fetching data
# Note the state = "NY" mod ##############################
mydata <- get_acs(
geography = "county",
state = "NY",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = " County, ",
names = c("County", "State"))
# Filtering the data
# Note the list of New York counties ##############################
filtereddata <- mydata %>%
filter(County == "Suffolk"|
County == "Nassau"|
County == "Queens"|
County == "Kings"|
County == "Richmond"|
County == "New York"|
County == "Bronx"|
County == "Westchester"|
County == "Rockland")
# Plotting data
# Note the subtitle wording change ##############################
ggplot(filtereddata, aes(x = Broadband_E, y = reorder(County, Broadband_E))) +
geom_errorbarh(aes(xmin = Broadband_E - Broadband_M, xmax = Broadband_E + Broadband_M)) +
geom_point(size = 3, color = "darkblue") +
theme_minimal(base_size = 12.5) +
labs(title = "Pct. households with broadband",
subtitle = "New York-area counties. Brackets show error margins.",
x = "2018-2022 ACS estimate",
y = "")
# Mapping data
mapdata <- filtereddata %>%
rename(Broadband = Broadband_E,
Households = Households_E)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Broadband",
layer.name = "Pct. with broadband",
popup = TRUE)
Try this mod to show median household income in the counties of New York City.
# Double checking target variables
# Note the DP03_0062 variable name ##############################
ChosenVars <- filter(ProfileTables,name == "DP03_0062"|
name == "DP02_0001")
print(ChosenVars$name)
print(ChosenVars$label)
print(ChosenVars$concept)
# Specifying target variables
VariableList =
c(MedianIncome_ = "DP03_0062",
Households_ = "DP02_0001")
# Fetching data
# Note the state = "NY" mod
mydata <- get_acs(
geography = "county",
state = "NY",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = " County, ",
names = c("County", "State"))
# Filtering data
# Note the list of New York counties ##############################
filtereddata <- mydata %>%
filter(County == "Suffolk"|
County == "Nassau"|
County == "Queens"|
County == "Kings"|
County == "Richmond"|
County == "New York"|
County == "Bronx"|
County == "Westchester"|
County == "Rockland")
# Plotting data
# Note the subtitle change ##############################
ggplot(filtereddata, aes(x = MedianIncome_E, y = reorder(County, MedianIncome_E))) +
geom_errorbarh(aes(xmin = MedianIncome_E - MedianIncome_M, xmax = MedianIncome_E + MedianIncome_M)) +
geom_point(size = 3, color = "darkblue") +
theme_minimal(base_size = 12.5) +
labs(title = "Median HH Income",
subtitle = "New York-area counties. Brackets show error margins.",
x = "2018-2022 ACS estimate",
y = "")
# Mapping data
mapdata <- filtereddata %>%
rename(Income = MedianIncome_E,
Households = Households_E)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Income",
layer.name = "Median income",
popup = TRUE)
Finally, some extra packages and code can help you customize the map colors and streamline the pop-up window.
The
mypalette = colorRampPalette(c('lightgray', 'darkgreen'))
code defines a color palette ranging from light gray to dark green. You
can pick other colors, if you like.
The
col.regions = mypalette, at = seq(40000, 145000, 15000)
applies the color palette to a scale ranging from 40,000 to 145,000, in
steps of 15,000. The scale range is designed to match the scale of the
variable being represented: Income
.
The zcol =
c("County", "State", "Income", "Households")
code lists the
mapdata
variables you want to show in the detail window.
Variables not listed will not show.
Try it out, using this code:
# Installing and activating some additional packages
if (!require("leaflet")) install.packages("leaflet")
if (!require("leaflet.extras2")) install.packages("leaflet.extras2")
library(leaflet)
library(leaflet.extras2)
library(leafpop)
# Defining a custome color palette for the map
mypalette = colorRampPalette(c('lightgray', 'darkgreen'))
# Generating the map
mapview(mapdata, zcol = "Income",
col.regions = mypalette, at = seq(40000, 145000, 15000),
layer.name = "Median HH income",
popup = popupTable(mapdata,
feature.id = FALSE,
row.numbers = FALSE,
zcol = c("County",
"State",
"Income",
"Households")))
With a few more tweaks of the code, we can switch from the county level to the county subdivision level.
# Double checking target variables
ChosenVars <- filter(ProfileTables,name == "DP03_0062"|
name == "DP02_0001")
print(ChosenVars$name)
print(ChosenVars$label)
print(ChosenVars$concept)
# Specifying target variables
VariableList =
c(MedianIncome_ = "DP03_0062",
Households_ = "DP02_0001")
# Fetching data
# Note the geography = "county subdivision" mod ####################
mydata <- get_acs(
geography = "county subdivision",
state = "NY",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
# Note the ###########################################
# change in the delimiter to a comma and a space
# and the addition of "Division"
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("Division", "County", "State"))
# Filtering data
filtereddata <- mydata %>%
filter(County == "Suffolk County"|
County == "Nassau County"|
County == "Queens County"|
County == "Kings County"|
County == "Richmond County"|
County == "New York County"|
County == "Bronx County"|
County == "Westchester County"|
County == "Rockland County")
# Mapping data
# Defining a custome color palette for the map
mypalette = colorRampPalette(c('lightgray', 'darkgreen'))
# Generating the map
# Note the palette range revision ##############################
mapview(mapdata, zcol = "Income",
col.regions = mypalette, at = seq(40000, 275000, 15000),
layer.name = "Median HH income",
popup = popupTable(mapdata,
feature.id = FALSE,
row.numbers = FALSE,
zcol = c("County",
"Division",
"State",
"Income",
"Households")))