Basic usage of {tidycensus} package.

tidycensus enables users to interact with specific US Census Bureau data APIs. It retrieves data frames compatible with tidyverse and integrates a straightforward geography feature.

Here is an example on how to use this package.

  1. Load libraries.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidycensus)
  1. Get a U.S. Census data API key from https://api.census.gov/data/key_signup.html.
    After, enter your key alpha numeric value in census_api_key() function.

Here is an example of the code that you need to add to your program: census_api_key(“x2772902ex7ea12x13x671x12b76ab93f229x8xx”)

  1. Now, lets load the variables from American Community Survey (ACS), Demographic and Housing Characteristics summary file (DHC) and the (PL) redistricting data file created by the United States Census Bureau.
# Getting variables from the ACS
acs5_v20 <- load_variables(2020, "acs5", cache = TRUE)

View(acs5_v20)

# Getting variables from the DHC
dhc_v20 <- load_variables(2020, "dhc", cache = TRUE)

View(dhc_v20)

# Getting variables from the PL
pl_v20 <- load_variables(2020, "pl", cache = TRUE)

View(pl_v20)
  1. For this example, let’s look at the estimated numbers of Connecticut Hispanic householders (i.e., Variable name: H11_004N) by county. We are going to be using data drawn from the 2020 Demographic and Housing Characteristics file (DHC).
ct_hisp_county <- get_decennial(geography = "county", 
                       variables = "H11_004N", 
                       year = 2020,
                       state = "CT",
                       sumfile = "dhc")
## Getting data from the 2020 decennial Census
## Using the Demographic and Housing Characteristics File
## Note: 2020 decennial Census data use differential privacy, a technique that
## introduces errors into data to preserve respondent confidentiality.
## ℹ Small counts should be interpreted with caution.
## ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
## This message is displayed once per session.
head(ct_hisp_county)
## # A tibble: 6 × 4
##   GEOID NAME                           variable value
##   <chr> <chr>                          <chr>    <dbl>
## 1 09001 Fairfield County, Connecticut  H11_004N 21571
## 2 09003 Hartford County, Connecticut   H11_004N 17019
## 3 09005 Litchfield County, Connecticut H11_004N  2065
## 4 09007 Middlesex County, Connecticut  H11_004N  1567
## 5 09009 New Haven County, Connecticut  H11_004N 17406
## 6 09011 New London County, Connecticut H11_004N  3111
  1. Here is a plot of the estimated number of Connecticut Hispanic householders by county.
ct_hisp_county %>%
  ggplot(aes(x = value, y = reorder(NAME, value))) + 
  geom_point() +
  labs(title = "Estimated Number of Connecticut Hispanic Householders by County",
       subtitle = "2020 Census Demographic and Housing Characteristics file (DHC)")

Spatial data in {tidycensus} package.

  1. Using the {tidycensus} package spatial data features, the following example shows median household income (i.e., Variable name: B19013_001) from the 2016-2020 ACS file. The spatial data will be displayed for Hartford county census tracts.
hartford <- get_acs(
  state = "CT",
  county = "Hartford",
  geography = "tract",
  variables = "B19013_001",
  geometry = TRUE,
  year = 2020
)
## Getting data from the 2016-2020 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
head(hartford)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -72.90295 ymin: 41.59734 xmax: -72.51552 ymax: 41.83613
## Geodetic CRS:  NAD83
##         GEOID                                               NAME   variable
## 1 09003524700    Census Tract 5247, Hartford County, Connecticut B19013_001
## 2 09003420700    Census Tract 4207, Hartford County, Connecticut B19013_001
## 3 09003430602 Census Tract 4306.02, Hartford County, Connecticut B19013_001
## 4 09003514700    Census Tract 5147, Hartford County, Connecticut B19013_001
## 5 09003416500    Census Tract 4165, Hartford County, Connecticut B19013_001
## 6 09003471200    Census Tract 4712, Hartford County, Connecticut B19013_001
##   estimate   moe                       geometry
## 1    73550 16658 MULTIPOLYGON (((-72.71513 4...
## 2    82250 18694 MULTIPOLYGON (((-72.8654 41...
## 3   108632 23767 MULTIPOLYGON (((-72.90295 4...
## 4    31006 11568 MULTIPOLYGON (((-72.53319 4...
## 5    54220  6207 MULTIPOLYGON (((-72.82533 4...
## 6    43650 13376 MULTIPOLYGON (((-72.72104 4...
  1. Now, let’s visualize the median household income from the 2016-2020 ACS by Hartford county census tracts.
hartford %>%
  ggplot(aes(fill = estimate)) + 
  geom_sf(color = NA) + 
  scale_fill_viridis_c(option = "magma") + 
  labs(fill = "Median household income\nby census tracts,\nHartford County\n(2016-2020 ACS)")

  1. After, lets create faceted maps by census tracts for four Connecticut Hispanic subgroups populations.
# Select Hispanic subgroups
hispvars <- c(`Puerto Ricans` = "B03001_005", 
              Mexicans = "B03001_004", 
              Cubans = "B03001_006",
              Peruvians = "B03001_023")

# Hartford County Hispanics subgroups by census tracts
hartford_county_hispanics <- get_acs(
  state = "CT",
  county = "Hartford",
  geography = "tract",
  variables = hispvars, # list of Hispanic groups
  geometry = TRUE,
  year = 2020
)
## Getting data from the 2016-2020 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
head(hartford_county_hispanics)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -72.86621 ymin: 41.65109 xmax: -72.70292 ymax: 41.7435
## Geodetic CRS:  NAD83
##         GEOID                                            NAME      variable
## 1 09003524700 Census Tract 5247, Hartford County, Connecticut      Mexicans
## 2 09003524700 Census Tract 5247, Hartford County, Connecticut Puerto Ricans
## 3 09003524700 Census Tract 5247, Hartford County, Connecticut        Cubans
## 4 09003524700 Census Tract 5247, Hartford County, Connecticut     Peruvians
## 5 09003420700 Census Tract 4207, Hartford County, Connecticut      Mexicans
## 6 09003420700 Census Tract 4207, Hartford County, Connecticut Puerto Ricans
##   estimate moe                       geometry
## 1        0  13 MULTIPOLYGON (((-72.71513 4...
## 2     1586 253 MULTIPOLYGON (((-72.71513 4...
## 3      140 144 MULTIPOLYGON (((-72.71513 4...
## 4      170 140 MULTIPOLYGON (((-72.71513 4...
## 5        0  13 MULTIPOLYGON (((-72.8654 41...
## 6      189 147 MULTIPOLYGON (((-72.8654 41...
  1. Now, let’s view the faceted maps for the for four Connecticut Hispanic subgroups populations.
# Plot the Hispanic subgroups
hartford_county_hispanics %>%
  ggplot(aes(fill = estimate)) +
  facet_wrap(~variable) +
  geom_sf(color = NA) +
  theme_void() + 
  scale_fill_viridis_c() + 
  labs(fill = "Hispanic subgroups\nestimated populations\nby census tract,\nHartford County\n(2016-2020 5-year ACS)")

Disclaimer. All presentation contents are the responsibility of the author and do not represent the official views of any organization.

A.M.D.G.

ite, inflammate omnia