tidycensus()
This file will support R users with the basics of mapping and spatial
analysis with the U.S. Census data using the tidycensus()
package. The slides for this file are available here.
For mapping tasks, we use the American Community Survey (ACS). This
is an annual survey that covers topics not available in the decennial US
Census data, such as income, education, etc. Estimates are available for
both 1- and 5-year periods. The default for the getacs()
function is the 5-year ACS estimates.
To change to the 1-year estimates add survey = "acs1"
below the geometry = T
input line. Walker recommends,
however, that you stick with the 5-year ACS file to get spatial
coherence and coherence.
The data are delivered as estimates characterized by margins of error.
tidycensus()
The tidycensus()
package created by Walker
“automatically downloads and merges Censsus geometries to data for
mapping [and] includes a variety of analytic tools to support common
Census workflows.”
The states and counties can be requested by name, so there is no need to look up FIPS codes1.
There are multiple packages that can be used for cartography. Some of
the popular packages that can be used are ggplot2
,
tmap
, and mapsf
. Walker, however, built a new
package called mapgl
.
mapgl is a new R package written by Walker for high-performance interactive mapping.
Start by installing/loading packages (as needed) and libraries:
# Install packages as needed
# install.packages(c("tidycensus", "tidyverse", "mapview", "mapgl", "quarto"))
# Load necessary libraries
library(tidycensus)
library(dplyr)
library(ggplot2)
library(sf)
We’ll start by gathering the data using the get_acs()
function.
Given the size of DC, we will use
geography = "tract"
.
dc_income <- get_acs(
geography = "tract",
variables = "B19013_001",
state = "DC",
year = 2023
)
dc_income
## # A tibble: 206 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 11001000101 Census Tract 1.01; District of Columbia;… B19013_… 135708 43153
## 2 11001000102 Census Tract 1.02; District of Columbia;… B19013_… 159583 70430
## 3 11001000201 Census Tract 2.01; District of Columbia;… B19013_… NA NA
## 4 11001000202 Census Tract 2.02; District of Columbia;… B19013_… 152059 60482
## 5 11001000300 Census Tract 3; District of Columbia; Di… B19013_… 174470 21374
## 6 11001000400 Census Tract 4; District of Columbia; Di… B19013_… 188929 71412
## 7 11001000501 Census Tract 5.01; District of Columbia;… B19013_… 109116 18172
## 8 11001000502 Census Tract 5.02; District of Columbia;… B19013_… 157344 32376
## 9 11001000600 Census Tract 6; District of Columbia; Di… B19013_… 183421 33830
## 10 11001000702 Census Tract 7.02; District of Columbia;… B19013_… 87750 16226
## # ℹ 196 more rows
For NC, we’ll use geography = "county"
.
nc_income <- get_acs(
geography = "county",
variables = "B19013_001",
state = "NC",
year = 2023
)
nc_income
## # A tibble: 100 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 37001 Alamance County, North Carolina B19013_001 64445 2226
## 2 37003 Alexander County, North Carolina B19013_001 65268 5441
## 3 37005 Alleghany County, North Carolina B19013_001 44272 3954
## 4 37007 Anson County, North Carolina B19013_001 44245 3895
## 5 37009 Ashe County, North Carolina B19013_001 50827 3707
## 6 37011 Avery County, North Carolina B19013_001 57657 4054
## 7 37013 Beaufort County, North Carolina B19013_001 57997 3520
## 8 37015 Bertie County, North Carolina B19013_001 45931 6859
## 9 37017 Bladen County, North Carolina B19013_001 44528 4772
## 10 37019 Brunswick County, North Carolina B19013_001 74034 2420
## # ℹ 90 more rows
We’ll now add the spatial data to our income data using the simple
features sf
conditions via geometry = T
. Take
note that we’ll use the same code above but simply add a new last line
of code.
We’ll use geometry=T
and update the name of our
df
.
Gathering spatial data for DC at the tract level.
dc_income_sf <- get_acs(
geography = "tract",
variables = "B19013_001",
state = "DC",
year = 2023,
geometry = T
)
dc_income_sf
## Simple feature collection with 206 features and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -77.11976 ymin: 38.79165 xmax: -76.9094 ymax: 38.99511
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME
## 1 11001000201 Census Tract 2.01; District of Columbia; District of Columbia
## 2 11001010300 Census Tract 103; District of Columbia; District of Columbia
## 3 11001002801 Census Tract 28.01; District of Columbia; District of Columbia
## 4 11001004002 Census Tract 40.02; District of Columbia; District of Columbia
## 5 11001006700 Census Tract 67; District of Columbia; District of Columbia
## 6 11001007707 Census Tract 77.07; District of Columbia; District of Columbia
## 7 11001008803 Census Tract 88.03; District of Columbia; District of Columbia
## 8 11001009302 Census Tract 93.02; District of Columbia; District of Columbia
## 9 11001009509 Census Tract 95.09; District of Columbia; District of Columbia
## 10 11001002802 Census Tract 28.02; District of Columbia; District of Columbia
## variable estimate moe geometry
## 1 B19013_001 NA NA POLYGON ((-77.07902 38.9126...
## 2 B19013_001 112778 40979 POLYGON ((-77.03636 38.9748...
## 3 B19013_001 86029 16177 POLYGON ((-77.03645 38.9349...
## 4 B19013_001 167102 45557 POLYGON ((-77.04627 38.9166...
## 5 B19013_001 166250 29702 POLYGON ((-76.99496 38.8898...
## 6 B19013_001 59512 32728 POLYGON ((-76.94486 38.8790...
## 7 B19013_001 91317 14597 POLYGON ((-77.00173 38.9099...
## 8 B19013_001 106364 29915 POLYGON ((-76.99494 38.9239...
## 9 B19013_001 116613 28202 POLYGON ((-77.00201 38.9510...
## 10 B19013_001 82675 14933 POLYGON ((-77.03671 38.9271...
Gathering spatial data for NC at the county level.
nc_income_sf <- get_acs(
geography = "county",
variables = "B19013_001",
state = "NC",
year = 2023,
geometry = T
)
We can use our spatial data frames to generate a plot using the
plot()
function.
We’ll use the DC income data to plot. Recall that we neeed to use our
sf
df.
plot(dc_income_sf)
Note here that we get multiple maps, which is not likely what you want. Each of the maps contains information for the variables in our data set. We’ll need to specify that we want the estimates.
plot(dc_income_sf['estimate'])
We’ll do the same for NC.
plot(nc_income_sf['estimate'])
At the state level, it is very clear what the purpose of our analysis would be. We’ll likely want to follow a specific research question from this point.
I will modify the code to look at a specific county in NC, Mecklenburg County.
Take note of the new rows added: - geography = tract - state = NC - county = “Mecklenburg”
nc_mecklenburg_income_sf <- get_acs(
geography = "tract",
state = "NC",
county = "Mecklenburg",
variables = "B19013_001",
year = 2023,
geometry = T
)
We then view the data.
nc_mecklenburg_income_sf
## Simple feature collection with 305 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -81.05803 ymin: 35.00162 xmax: -80.55035 ymax: 35.51479
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME
## 1 37119005200 Census Tract 52; Mecklenburg County; North Carolina
## 2 37119004800 Census Tract 48; Mecklenburg County; North Carolina
## 3 37119001507 Census Tract 15.07; Mecklenburg County; North Carolina
## 4 37119004400 Census Tract 44; Mecklenburg County; North Carolina
## 5 37119003201 Census Tract 32.01; Mecklenburg County; North Carolina
## 6 37119001605 Census Tract 16.05; Mecklenburg County; North Carolina
## 7 37119003108 Census Tract 31.08; Mecklenburg County; North Carolina
## 8 37119005713 Census Tract 57.13; Mecklenburg County; North Carolina
## 9 37119005915 Census Tract 59.15; Mecklenburg County; North Carolina
## 10 37119003203 Census Tract 32.03; Mecklenburg County; North Carolina
## variable estimate moe geometry
## 1 B19013_001 39688 18884 MULTIPOLYGON (((-80.84089 3...
## 2 B19013_001 33725 10600 MULTIPOLYGON (((-80.85648 3...
## 3 B19013_001 53383 5797 MULTIPOLYGON (((-80.75249 3...
## 4 B19013_001 46768 8288 MULTIPOLYGON (((-80.89605 3...
## 5 B19013_001 79335 22713 MULTIPOLYGON (((-80.87936 3...
## 6 B19013_001 45307 10145 MULTIPOLYGON (((-80.78052 3...
## 7 B19013_001 48762 11738 MULTIPOLYGON (((-80.87627 3...
## 8 B19013_001 123438 15165 MULTIPOLYGON (((-80.68104 3...
## 9 B19013_001 79338 9571 MULTIPOLYGON (((-80.9407 35...
## 10 B19013_001 52359 17710 MULTIPOLYGON (((-80.86911 3...
We then plot the data.
plot(nc_mecklenburg_income_sf['estimate'])
This graph is at a more focused geographic level, similar to the DC map.
We can also modify more visually appealing maps.
Let’s add the vidris
library to enhance the
visualization.
library(viridis)
Then we’ll produce the map using ggplot2
.
ggplot(nc_mecklenburg_income_sf) +
geom_sf(aes(fill = estimate), color = "white", size = 0.1) +
scale_fill_viridis(option = "magma", name = "Median Household Income") +
labs(
title = "Median Household Income in Mecklenburg County, NC",
subtitle = "By Census Tract (2023)",
caption = "Data: American Community Survey 5-Year Estimates"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 12),
axis.title = element_blank(),
legend.position = "right"
)
We can also make interactive maps with various specifications.
We’ll first need to load the mapview
library to make an
interactive map.
library(mapview)
Then we’ll call both our data frame with the sf
option
and specify that we want the estimate.
mapview(
dc_income_sf,
zcol = "estimate"
)
We’ll do the same for NC.
mapview(
nc_income_sf,
zcol = "estimate"
)
Then finally for Mecklenburg County, NC.
mapview(
nc_mecklenburg_income_sf,
zcol = "estimate"
)
Walker continues from here to explore various advanced options with mapping.
This file may be updated in the future; else, please see the Walker files.
Walker made a note about using FIPS codes based on a change made in 2022. It will be important to review the FIPS codes as some counties have been changed (names) and may be included twice, so be aware of your goals when using the FIPS codes.↩︎