The tidycensus package in R allows for quick and easy retrieval and cleaning of US Census data. In this lab, I will show some of the functionality of the package, specifically the “get_acs()” function.
First, I will load the necessary R packages:
library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(ggiraph)
library(htmlwidgets)
library(mapview)
library(sf)
For my first example, I will examine the percentage of individuals in Michigan who have a graduate degree, by county. The data will be retrieved from the 2021 American Community Survey.
mi_grad_degree <- get_acs(
geography = "county",
state = "MI",
variables = "DP02_0066P",
year = 2021
)
To examine the counties with the highest and lowest rates of graduate degree attainment, I will use the arrange function.
# top 10 highest counties
arrange(mi_grad_degree, -estimate)
## # A tibble: 83 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 26161 Washtenaw County, Michigan DP02_0066P 30.3 0.7
## 2 26125 Oakland County, Michigan DP02_0066P 21.3 0.3
## 3 26089 Leelanau County, Michigan DP02_0066P 19.6 1.6
## 4 26065 Ingham County, Michigan DP02_0066P 18.4 0.6
## 5 26083 Keweenaw County, Michigan DP02_0066P 16.1 3.3
## 6 26077 Kalamazoo County, Michigan DP02_0066P 15.8 0.6
## 7 26047 Emmet County, Michigan DP02_0066P 15.6 1.4
## 8 26055 Grand Traverse County, Michigan DP02_0066P 14.8 1
## 9 26061 Houghton County, Michigan DP02_0066P 14.8 1.4
## 10 26111 Midland County, Michigan DP02_0066P 14.7 1.1
## # ℹ 73 more rows
# top 10 lowest counties
arrange(mi_grad_degree, estimate)
## # A tibble: 83 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 26085 Lake County, Michigan DP02_0066P 3.7 0.7
## 2 26117 Montcalm County, Michigan DP02_0066P 4.3 0.6
## 3 26067 Ionia County, Michigan DP02_0066P 4.4 0.6
## 4 26113 Missaukee County, Michigan DP02_0066P 4.4 0.8
## 5 26131 Ontonagon County, Michigan DP02_0066P 4.4 0.7
## 6 26157 Tuscola County, Michigan DP02_0066P 4.4 0.6
## 7 26023 Branch County, Michigan DP02_0066P 4.5 0.6
## 8 26129 Ogemaw County, Michigan DP02_0066P 4.5 0.8
## 9 26135 Oscoda County, Michigan DP02_0066P 4.6 1.3
## 10 26011 Arenac County, Michigan DP02_0066P 4.7 0.7
## # ℹ 73 more rows
To better visualize this data, I will create a margin of error plot. Because Michigan has so many counties, this method can become a bit cluttered and hard to decipher.
mi_grad_plot <- ggplot(mi_grad_degree, aes(x = estimate,
y = reorder(NAME, estimate))) +
geom_point(color = "darkred", size = 1) +
scale_y_discrete(labels = function(x) str_remove(x, " County, Michigan")) +
scale_x_continuous(labels = label_percent(scale = 1, suffix = "%")) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
width = 0.5, linewidth = 0.5) +
labs(title = "Percentage of residents with a graduate degree, 2017-2021 ACS",
subtitle = "Counties in Michigan",
caption = "Data aquired with R and tidycensus. Error bars represent margin of error around estimate",
x = "ACS estimate",
y = "") +
theme_minimal(base_size = 5)
mi_grad_plot
Using the plotly package, we can turn this static chart into an interactive plot
mi_plot_ggiraph <- ggplot(mi_grad_degree, aes(x = estimate,
y = reorder(NAME, estimate),
tooltip = estimate,
data_id = GEOID)) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
width = 0.25, linewidth = 0.25) +
geom_point_interactive(color = "darkred", size = 0.75) +
scale_y_discrete(labels = function(x) str_remove(x, " County, Michigan")) +
scale_x_continuous(labels = label_percent(scale = 1, suffix = "%")) +
labs(title = "Percent of individuals with a graduate degree or higher, 2017-2021 ACS",
subtitle = "Counties in Michigan",
caption = "Data aquired with R and tidycensus. Error bars represent margin of error around estimate",
x = "ACS estimate",
y = "") +
theme_minimal(base_size = 5)
girafe(ggobj = mi_plot_ggiraph) %>%
girafe_options(opts_hover(css = "fill:cyan;"))
tidycensus also allows users to perform spatial analysis. To find a variable of interest, I will use the “load_variables()” function to see what data is availble.
vars <- load_variables(2021, "acs5")
For this example, I will use “B07001_017” which estimates the number of people who lived in their current home one year ago. This can be a good indicator of economic and geographic mobility.
mi_mobility <- get_acs(
geography = "county",
state = "MI",
variables = c(num_in_home_1_year_ago = "B07001_017"),
year = 2021,
geometry = TRUE # return geometry to enable spatial analysis
)
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|=============== | 21%
|
|=============== | 22%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|========================== | 37%
|
|========================== | 38%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|======================================================= | 78%
|
|========================================================= | 81%
|
|========================================================== | 83%
|
|============================================================ | 85%
|
|============================================================== | 88%
|
|=============================================================== | 90%
|
|======================================================================| 100%
The mapview package offers a quick way of interactively viewing this data in a map
mapview(mi_mobility, zcol = "estimate")
Additionally, I can use skills learned from last week’s lab to create a graduated symbol map in ggplot.
# create county centroids
mi_centroids = st_centroid(mi_mobility)
## Warning: st_centroid assumes attributes are constant over geometries
st_geometry_type(mi_centroids, by_geometry = FALSE)
## [1] POINT
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
ggplot() +
geom_sf(data = mi_centroids, aes(size = estimate)) +
geom_sf(data = mi_mobility, fill = NA) +
scale_size_continuous(name = "number of people in current home for at least one year", range = c(1,10)) +
theme_minimal() +
labs(title = "Mobility of Michiganders, 2017-2021 ACS",
caption = "Data aquired with R and tidycensus. Error bars represent margin of error around estimate")
Hopefully these simple examples show the power of tidycensus!