Mapping Ancestry in R with 2020 Census Data (DDHCA)

Kara Joyner

Learning Goals

You will see examples of how ancestry is depicted in maps.
You will be work with a 2020 Census file that was recently released (DDHCA).
You will learn different ways to map the concentration of an ancestry group in a county.
You will learn the challenges of measuring ancestral groups.

Maps of Ancestry

Most prevalent ancestry group: Ancestry Map of the United States
Number of people from a particular ancestry group: Mapping Immigrant America
Few recent examples

Today’s Data Visualizations

Number and percent of population in each tract identifying as Chinese
Number of people identifying as Chinese across Bexar County
Distribution of people identifying as Chinese across states

Resources

Reporting issues
- Hout, M., & Goldstein, J. R. (1994). How 4.5 million Irish immigrants became 40 million Irish Americans: Demographic and subjective aspects of the ethnic composition of white Americans. American Sociological Review, 64-82.
- Gullickson, Aaron. “Essential measures: Ancestry, race, and social difference.” American Behavioral Scientist 60, no. 4 (2016): 498-518.

Listing Libraries

library(haven)
library(janitor)
library(dplyr)
library(terra)
library(tmap)
library(tidycensus)
library(utils)
library(mapview)
library(tmap)
library(ggplot2)
library(posterior)
library(plyr)
library(sf)
options(tigris_use_cache = TRUE)

Using the Census Data API Key

The API key gives you access raw data from the US Census
Obtain a key here: Request a U.S. Census Data API Key
Use the code below to install (first time) or overwrite (subsequent times) the key.

census_api_key(key =  "8f054fa4225c982d7a9f27aa67fd88f8bb5f77ec", overwrite = TRUE)  #census_api_key(key =  "8f054fa4225c982d7a9f27aa67fd88f8bb5f77ec", install=TRUE)

Scanning Variables in the 2020 Census File with Ancestry Groups (DDHCA)

This code creates an object that includes all the variable names.

vars <- load_variables(year=2020, dataset = "dhc") 
head(vars)

# A tibble: 6 × 3
  name     label                                                         concept
  <chr>    <chr>                                                         <chr>  
1 H10_001N " !!Total:"                                                   TENURE…
2 H10_002N " !!Total:!!Owner occupied:"                                  TENURE…
3 H10_003N " !!Total:!!Owner occupied:!!Householder who is White alone"  TENURE…
4 H10_004N " !!Total:!!Owner occupied:!!Householder who is Black or Afr… TENURE…
5 H10_005N " !!Total:!!Owner occupied:!!Householder who is American Ind… TENURE…
6 H10_006N " !!Total:!!Owner occupied:!!Householder who is Asian alone"  TENURE…

Reading in the 2020 Census File with Ancestry Groups (DDHCA)

This is code is explained in Kyle Walker’s video:
- Doing “GIS” and making maps with US Census Data in R : 2024 Webinar Series Part 3

bexar_all <- get_decennial(
  geography = "tract",
  state = "TX",
  county = "Bexar",
  variables ="P10_001N",
  year = 2020,
  sumfile = "dhc",
  geometry = TRUE)

Viewing the New Data

How can you find the codes for different ancestry groups?

bexar_all

Simple feature collection with 375 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -98.80655 ymin: 29.11444 xmax: -98.1169 ymax: 29.76071
Geodetic CRS:  NAD83
# A tibble: 375 × 5
   GEOID       NAME                     variable value                  geometry
   <chr>       <chr>                    <chr>    <dbl>        <MULTIPOLYGON [°]>
 1 48029171601 Census Tract 1716.01; B… P10_001N  2912 (((-98.62398 29.41355, -…
 2 48029180604 Census Tract 1806.04; B… P10_001N  4521 (((-98.58517 29.47894, -…
 3 48029151600 Census Tract 1516; Bexa… P10_001N  5467 (((-98.50388 29.34895, -…
 4 48029150502 Census Tract 1505.02; B… P10_001N  2766 (((-98.53415 29.37502, -…
 5 48029181712 Census Tract 1817.12; B… P10_001N  2897 (((-98.66604 29.48512, -…
 6 48029180102 Census Tract 1801.02; B… P10_001N  1549 (((-98.54874 29.46871, -…
 7 48029121508 Census Tract 1215.08; B… P10_001N  3772 (((-98.37162 29.5104, -9…
 8 48029121404 Census Tract 1214.04; B… P10_001N  3749 (((-98.4041 29.47818, -9…
 9 48029191505 Census Tract 1915.05; B… P10_001N  1761 (((-98.56706 29.55566, -…
10 48029190504 Census Tract 1905.04; B… P10_001N  1940 (((-98.516 29.46644, -98…
# ℹ 365 more rows

Reading in the Data for People from China

Specify the variable name and group number.

bexar_chinese <- get_decennial(
  geography = "tract",
  variables = "T01001_001N",
  state = "TX",
  county = "Bexar",
  year = 2020,
  sumfile = "ddhca",
  pop_group = "3822",
  pop_group_label = TRUE,
  geometry = TRUE)

Creating a Map

Which tracts have no color?

mapview(bexar_chinese, zcol = "value")

Calculating Percentages

The data set above just obtains number of people in each tract who identify as Chinese.
To obtain a percent we need to join this with the earlier file.

bexar_all$total<-bexar_all$value
bexar_all <- subset(bexar_all, select = c("GEOID","total"))
bexar_chinese_new <- subset(bexar_chinese, select = c("GEOID","value"))
bexar_chinese_new<-st_drop_geometry(bexar_chinese_new)
combined<-left_join(bexar_all,bexar_chinese_new, by=c("GEOID"))
combined<-replace(combined, is.na(combined),0)
combined$percent<-(combined$value)/(combined$total)
combined<-replace(combined, is.na(combined),0)

Creating a Map

What is this showing?

mapview(combined, zcol = "percent")

Preparing the Data for a Dot Map

This code creates dots that represent the number of people in each tract who identify as Chinese.
It specifies that 1 dot represents 100 people.

bexar_dots <- as_dot_density(
  bexar_chinese,
  value = "value",
  values_per_dot = 100)

Producing a Dot Map

mapview(bexar_dots, cex = 0.01, layer.name = "Chinese population 1 dot = 100 people",
col.regions = "navy", color = "navy")

Re-reading the Data into R

This code refers to geography at the state level.

Chinese_all <- get_decennial(
  geography = "us",
  variables = "T01001_001N",
  year = 2020,
  sumfile = "ddhca",
  pop_group = "3822",
  pop_group_label = TRUE)
Chinese_state <- get_decennial(
  geography = "state",
  variables = "T01001_001N",
  year = 2020,
  sumfile = "ddhca",
  pop_group = "3822",
  pop_group_label = TRUE)
Chinese_state$proportion<-(Chinese_state$value)/(Chinese_all$value)

Joining to Spatial Data

library(urbnmapr)   
library(urbnthemes)   
library(janitor)   
library(sf)      
Chinese_state$state_name<-Chinese_state$NAME  
spatial_data <- left_join(Chinese_state, get_urbn_map(map = "states", sf = TRUE), by = "state_name")

Creating a Map

Here is the code for producing a map based on the Chinese sample.

Chinese<-ggplot() +  
geom_sf(spatial_data,  
mapping = aes(fill = proportion, geometry=geometry),  
color = "#ffffff", size = 0.25) +  
  scale_fill_gradientn(labels = scales::percent) +  
  labs(fill = "Percent") +  coord_sf(datum = NA) +  
  labs(title="Chinese",caption="Source: 2020 Redistricting File")

Displaying the Map

What is this showing?

Chinese