Mapping Race in R with the 2020 Census Redistricting File (PL)

Kara Joyner

Pre-Assignment Assessment: What is this showing?

Mark all that apply

a. The percent of each state’s population comprised of non-Hispanic whites
b. The distribution of the non-Hispanic white population across states
c. The percent of the non-Hispanic white population that resides in different states
d. The rate at which people in each state identify as non-Hispanic white

Learning Goals

You will see examples of how the racial and ethnic make up of the US is depicted in maps.
You will learn how to sign up for a U.S. Census Data API Key.
You will learn how to read US Census data directly into R.
You will learn different ways to map the racial/ethnic makeup of a county.
You will learn to publish your document on Quarto Pub.

Maps of Race/Ethnic Composition

Most prevalent racial/ethnic group: Examining the Racial and Ethnic Diversity of Adults and Children
Percent of population identifying with a particular racial/ethnic group: Demographics of Asian Americans
Number of people from a particular racial/ethnic group: Race and Ethnicity across the Nation
Distribution of a particular racial/ethnic group: Key Facts about Asian Americans

Geography in tidycensus

Information on available geographies, and how to specify them, can be found in the tidycensus documentation

Today’s Data Visualizations

Most prevalent racial/ethnic group in each census tract of Bexar County
Percent of population in each tract of Bexar County identifying with a particular racial/ethnic group
Number of people from each racial/ethnic group across Bexar County tracts
Distribution of Hispanics across states

Mapping Resources

Kyle Walker’s new book
- Walker, K. (2023). Analyzing us census data: Methods, maps, and models in R. Chapman and Hall/CRC.
Kyle Walker’s videos
- Analyzing 2020 Decennial US Census Data in R : 2024 Webinar Series Part 2
- Doing “GIS” and making maps with US Census Data in R : 2024 Webinar Series Part 3

Listing Libraries

library(haven)
library(janitor)
library(dplyr)
library(terra)
library(tmap)
library(tidycensus)
library(utils)
library(mapview)
library(tmap)
library(ggplot2)
library(posterior)
library(plyr)
library(sf)
options(tigris_use_cache = TRUE)

Using the Census Data API Key

The API key gives you access raw data from the US Census
Obtain a key here: Request a U.S. Census Data API Key
Use the code below to install (first time) or overwrite (subsequent times) the key.

census_api_key(key =  "8f054fa4225c982d7a9f27aa67fd88f8bb5f77ec", overwrite = TRUE)  #census_api_key(key =  "8f054fa4225c982d7a9f27aa67fd88f8bb5f77ec", install=TRUE)

Scanning Variables in the 2020 Census Redistricting Data

This code creates an object that includes all the variable names.

vars <- load_variables(year=2020, dataset = "pl")
head(vars)

# A tibble: 6 × 3
  name    label                                             concept         
  <chr>   <chr>                                             <chr>           
1 H1_001N " !!Total:"                                       OCCUPANCY STATUS
2 H1_002N " !!Total:!!Occupied"                             OCCUPANCY STATUS
3 H1_003N " !!Total:!!Vacant"                               OCCUPANCY STATUS
4 P1_001N " !!Total:"                                       RACE            
5 P1_002N " !!Total:!!Population of one race:"              RACE            
6 P1_003N " !!Total:!!Population of one race:!!White alone" RACE

Reading the 2020 Census Redistricting Data into R

This is code is explained on page 129 of Kyle Walker’s book.

bexar_race <- get_decennial(
  geography = "tract",
  state = "TX",
  county = "Bexar",
  variables = c(
    Hispanic = "P2_002N",
    White = "P2_005N",
    Black = "P2_006N",
    Native = "P2_007N",
    Asian = "P2_008N"
  ),
  summary_var = "P2_001N",
  year = 2020,
  geometry = TRUE) %>%
  mutate(percent = 100 * (value / summary_value))

Becoming Acquainted with the Data

What is this showing?

nrow(bexar_race)

[1] 1875

head(bexar_race)

Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -98.62401 ymin: 29.40333 xmax: -98.55362 ymax: 29.49398
Geodetic CRS:  NAD83
# A tibble: 6 × 7
  GEOID     NAME  variable value summary_value                  geometry percent
  <chr>     <chr> <chr>    <dbl>         <dbl>        <MULTIPOLYGON [°]>   <dbl>
1 48029171… Cens… Hispanic  3862          4284 (((-98.62398 29.41355, -…  90.1  
2 48029171… Cens… White      183          4284 (((-98.62398 29.41355, -…   4.27 
3 48029171… Cens… Black      150          4284 (((-98.62398 29.41355, -…   3.50 
4 48029171… Cens… Native      17          4284 (((-98.62398 29.41355, -…   0.397
5 48029171… Cens… Asian       17          4284 (((-98.62398 29.41355, -…   0.397
6 48029180… Cens… Hispanic  4306          5617 (((-98.58517 29.47894, -…  76.7

Filtering the Data

Above we see that for each tract there are 5 rows that correspond to 5 different racial/ethnic groups.
The code below selects for each tract the row that has the largest number.
This enables us to produce a map that shows the most common race/ethnicity in each tract.

bexar_new<-bexar_race %>%
group_by(GEOID) %>%
slice(which.max(value))

Viewing the Filtered Data

nrow(bexar_new)

[1] 375

head(bexar_new)

Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -98.5152 ymin: 29.40677 xmax: -98.46093 ymax: 29.44854
Geodetic CRS:  NAD83
# A tibble: 6 × 7
# Groups:   GEOID [6]
  GEOID     NAME  variable value summary_value                  geometry percent
  <chr>     <chr> <chr>    <dbl>         <dbl>        <MULTIPOLYGON [°]>   <dbl>
1 48029110… Cens… Hispanic  1758          3812 (((-98.50131 29.4258, -9…    46.1
2 48029110… Cens… Hispanic  1589          2401 (((-98.48895 29.41608, -…    66.2
3 48029110… Cens… Hispanic  1982          2369 (((-98.51479 29.41527, -…    83.7
4 48029110… Cens… Hispanic  4763          7284 (((-98.51358 29.42691, -…    65.4
5 48029110… Cens… Hispanic   933          1210 (((-98.51508 29.44412, -…    77.1
6 48029111… Cens… Hispanic  1403          2356 (((-98.47861 29.43792, -…    59.6

Creating a Map Using the Filtered Data

What is a limitation of this map?

mapview(bexar_new, zcol = "variable")

Creating a Map for Each Racial/Ethnic Group

This code retrieves the first data set that was produced earlier (bexar_race).
The ggplot function produces maps of the percent of population in each tract in each racial/ethnic group.

faceted_choro <- ggplot(bexar_race, aes(fill = percent)) + 
  geom_sf(color = NA) + 
  theme_void() + 
  scale_fill_viridis_c(option = "rocket") + 
  facet_wrap(~variable) + 
  labs(title = "Race / ethnicity by Census tract",
       subtitle = "Bexar County, Texas",
       fill = "Census value (%)",
       caption = "2020 Census Redistricting Data")

Displaying the Maps

What is a limitation of this way of showing racial/ethnic concentration?

faceted_choro

Preparing the Data for a Dot Map

This code creates dots that represent the number of people in each tract who identify with each racial/ethnic group.
You must specify the number of people that each dot represents.

bexar_dots <-bexar_race %>%
  as_dot_density(
    value = "value",
    values_per_dot = 100,
    group = "variable"
  )

Creating a Dot Map

This code produces a dot map.

dot_density_map <- ggplot() + 
  geom_sf(data = bexar_race, color = "lightgrey", fill = "white") + 
  geom_sf(data = bexar_dots, aes(color = variable), size = 0.01) + 
  scale_color_brewer(palette = "Set1") + 
  guides(color = guide_legend(override.aes = list(size = 3))) + 
  theme_void() + 
  labs(color = "Race / ethnicity",
       caption = "2020 Redistricing Data | 1 dot = approximately 100 people")

Displaying the Dot Map

Where are the dots placed?

dot_density_map

Re-reading the 2020 Census Redistricting Data into R

This code is similar but refers to geography at the state level.

Hispanics_all <- get_decennial(   
  geography = "us",   
  variables = c("P2_002N"),   
  year = 2020) 

Hispanics_state <- get_decennial(   
  geography = "state",   
  variables = c(     
  Hispanic = "P2_002N"),   
  year = 2020)

Hispanics_state$proportion<-(Hispanics_state$value)/(Hispanics_all$value)

Joining to Spatial Data

library(urbnmapr)  
library(urbnthemes)  
library(janitor)  
library(sf)    

Hispanics_state$state_name<-Hispanics_state$NAME

spatial_data <- left_join(Hispanics_state, get_urbn_map(map = "states", sf = TRUE), by = "state_name")

Creating a Map

Here is the code for producing a map based on the Hispanic data.

Hispanics<-ggplot() + 
geom_sf(spatial_data, 
mapping = aes(fill = proportion, geometry=geometry), 
color = "#ffffff", size = 0.25) + 
scale_fill_gradientn(labels = scales::percent) + 
labs(fill = "Percent") + 
coord_sf(datum = NA) + 
labs(title="Distribution of Hispanics across States",caption="Source: 2020 Redistricting File")

Displaying the Map

What is this showing?

Hispanics

Publishing Your Document in Quarto Pub

Render your document to find any problems.
See the instructions here: Quarto Pub
Sign up here: Publish and Share
Set working directory under Session to source file and select new terminal under Tools
Under terminal type: quarto publish quarto-pub name.qmd
Choose yes.

Post-Assignment Assessment: What is this showing?

Mark all that apply

a. The percent of each county’s population comprised of non-Hispanic whites
b. The distribution of the non-Hispanic white population across counties
c. The percent of the non-Hispanic white population that resides in different counties
d. The rate at which people in each county identify as non-Hispanic white