Spark Talk

Author

Coda Rayo-Garza

3 Super Easy Visuals Using Tidycensus

1. A Basic Map

Here, we are mapping median income data from the American Community Survey 2021 5 Year Estimates

library(tidycensus)
library(tidyverse)

options(tigris_use_cache = TRUE)

bexar <- get_acs(
  state = "TX",
  county = "Bexar",
  geography = "tract",
  variables = "B19013_001",
  geometry = TRUE,
  year = 2021)

bexar %>%
  ggplot(aes(fill = estimate)) + 
  geom_sf(color = NA) + 
  scale_fill_viridis_c(option = "magma")

A More Detailed Map: Foreign Born Children Mapped By County + Tract

library(tidyverse)
library(tidycensus)
library(sf)
library(scales)
library(viridis)

# load all acs variables
acs2011720 <- load_variables(2021, "acs5", cache = T)

raw_two_par <- get_acs(geography = "tract", 
                        variables = c(child_nav_born = "B05009_004",
                                      child_for_born = "B05009_005",
                                      child_tot_pop= "B05009_002"), 
                        state='TX',
                        county = 'Bexar',
                        geometry = T, 
                        year = 2021,
                        output = "wide")

for_bor_under <- raw_two_par %>% 
  mutate(pct_for_bor = child_for_bornE/child_tot_popM,
         pct_nav_bor= child_nav_bornE/child_tot_popM)


library(RColorBrewer)
ggplot()  + 
  geom_sf(data = for_bor_under, mapping = aes(fill = pct_for_bor), 
          color = "white", 
          lwd = 1) + # removes the census tract outline
  theme_void() +
  scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
                       direction = 1,
                       palette = "RdPu",
                       na.value = "transparent",
                       name="Percent Children Under 6 Foreign Born (%)",
                       labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Bexar County, Texas Foreign Born Children", 
    subtitle= "Children  Under 6 Foreign Born w/Two Parent by Census Tract",
    caption = "Source: American Community Survey, 2021 5 Yr")

2. Population Pyramids!

First, load the required packages. The last line saves the shapefiles from the tigris package.

# library(tidycensus)
# library(tidyverse)
# library(tigris)
# options(tigris_use_cache = TRUE)

# us_components <- get_estimates(geography = "state", product = "components")
# 
# us_components
# 
# unique(us_components$variable)

Next, get the estimates.

bx_age_hisp <- get_estimates(geography = "county", 
                             product = "characteristics", 
                             breakdown = c("SEX", "AGEGROUP", "HISP"),  
                             breakdown_labels = TRUE, 
                             state = "TX",
                             county = "Bexar")

bx_age_hisp

# A tibble: 210 × 6
   GEOID NAME                  value SEX        AGEGROUP           HISP         
   <chr> <chr>                 <dbl> <chr>      <fct>              <chr>        
 1 48029 Bexar County, Texas 2003554 Both sexes All ages           Both Hispani…
 2 48029 Bexar County, Texas  787766 Both sexes All ages           Non-Hispanic 
 3 48029 Bexar County, Texas 1215788 Both sexes All ages           Hispanic     
 4 48029 Bexar County, Texas  138705 Both sexes Age 0 to 4 years   Both Hispani…
 5 48029 Bexar County, Texas   45433 Both sexes Age 0 to 4 years   Non-Hispanic 
 6 48029 Bexar County, Texas   93272 Both sexes Age 0 to 4 years   Hispanic     
 7 48029 Bexar County, Texas  141277 Both sexes Age 5 to 9 years   Both Hispani…
 8 48029 Bexar County, Texas   46037 Both sexes Age 5 to 9 years   Non-Hispanic 
 9 48029 Bexar County, Texas   95240 Both sexes Age 5 to 9 years   Hispanic     
10 48029 Bexar County, Texas  142589 Both sexes Age 10 to 14 years Both Hispani…
# ℹ 200 more rows

Now, we do a little analysis to calculate the estimates by sex and hispanic/non-hispanic. This saves the date into a new dataframe that reorganizes the data ready to use in a population pyramid viz.

compare <- filter(bx_age_hisp, str_detect(AGEGROUP, "^Age"), 
                  HISP != "Both Hispanic Origins", 
                  SEX != "Both sexes") %>%
  mutate(value = ifelse(SEX == "Male", -value, value))

Finally, this code creates the visualization.

ggplot(compare, aes(x = AGEGROUP, y = value, fill = SEX)) + 
  geom_bar(stat = "identity", width = 1) + 
  theme_minimal() + 
  scale_y_continuous(labels = function(y) paste0(abs(y / 1000), "k")) + 
  scale_x_discrete(labels = function(x) gsub("Age | years", "", x)) + 
  scale_fill_manual(values = c("mediumorchid", "yellowgreen")) + 
  coord_flip() + 
  facet_wrap(~HISP) + 
  labs(x = "", 
       y = "2019 Census Bureau population estimate", 
       title = "Population structure by Hispanic origin", 
       subtitle = "Bexar County, TX", 
       fill = "", 
       caption = "Data source: US Census Bureau population estimates & tidycensus R package")

3. Neighborhood Deprivation Index Example

library(sfdep)
library(spdep)
library(gridExtra)
library(grid)
library(ndi)
library(sf)
options(scipen=999)

This one is a little more complicated than the other two examples because it uses more than basic variable analysis.

Get the NDI Scores

This chunk below gets the shapefiles we need and joins it to the NDI package with has the built in formula for calculating the neighborhood deprivation index.

# Compute the NDI (Messer) values (2017-2021 5-year ACS) for TX census tracts
# census_api_key("42539e850e81857ea4d3a8219088cab9544e88bb")
TX2021messer <- ndi::messer(state = "TX", year = 2021)

# Obtain the 2021 census tracts from the "tigris" package
tract2021TX <- tigris::tracts(state = "TX", year = 2021, cb = TRUE)

# Join the NDI (Messer) values to the census tract geometry
TX2021messer <- merge(tract2021TX, TX2021messer$ndi, by = "GEOID")

TX2021messer  #we can see the scores

Simple feature collection with 6885 features and 26 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
Geodetic CRS:  NAD83
First 10 features:
         GEOID STATEFP COUNTYFP TRACTCE             AFFGEOID    NAME
1  48001950100      48      001  950100 1400000US48001950100    9501
2  48001950401      48      001  950401 1400000US48001950401 9504.01
3  48001950402      48      001  950402 1400000US48001950402 9504.02
4  48001950500      48      001  950500 1400000US48001950500    9505
5  48001950600      48      001  950600 1400000US48001950600    9506
6  48001950700      48      001  950700 1400000US48001950700    9507
7  48001950800      48      001  950800 1400000US48001950800    9508
8  48001950901      48      001  950901 1400000US48001950901 9509.01
9  48001950902      48      001  950902 1400000US48001950902 9509.02
10 48001951001      48      001  951001 1400000US48001951001 9510.01
               NAMELSAD STUSPS      NAMELSADCO STATE_NAME LSAD     ALAND
1     Census Tract 9501     TX Anderson County      Texas   CT 483306613
2  Census Tract 9504.01     TX Anderson County      Texas   CT  16509268
3  Census Tract 9504.02     TX Anderson County      Texas   CT  71134275
4     Census Tract 9505     TX Anderson County      Texas   CT  23132052
5     Census Tract 9506     TX Anderson County      Texas   CT  20653881
6     Census Tract 9507     TX Anderson County      Texas   CT   6720934
7     Census Tract 9508     TX Anderson County      Texas   CT  10429389
8  Census Tract 9509.01     TX Anderson County      Texas   CT 290214322
9  Census Tract 9509.02     TX Anderson County      Texas   CT 441347126
10 Census Tract 9510.01     TX Anderson County      Texas   CT 358736667
    AWATER state          county   tract           NDI               NDIQuart
1  7864313 Texas Anderson County    9501  0.0049611245 3-AboveAvg deprivation
2   298419 Texas Anderson County 9504.01  0.0034580352 3-AboveAvg deprivation
3  2626492 Texas Anderson County 9504.02           NaN        9-NDI not avail
4    99223 Texas Anderson County    9505  0.0286627270 3-AboveAvg deprivation
5   329641 Texas Anderson County    9506  0.0821374170     4-Most deprivation
6     6724 Texas Anderson County    9507  0.0418867367     4-Most deprivation
7    92101 Texas Anderson County    9508  0.0018040321 3-AboveAvg deprivation
8  4738880 Texas Anderson County 9509.01 -0.0181186979 2-BelowAvg deprivation
9  4984901 Texas Anderson County 9509.02 -0.0009733503 3-AboveAvg deprivation
10 3015204 Texas Anderson County 9510.01  0.0053607285 3-AboveAvg deprivation
           OCC        CWD        POV        FHH        PUB       U30        EDU
1  0.047619048 0.06472847 0.15633571 0.08008777 0.13768513 0.2397148 0.09868421
2  0.000000000 0.14814815 0.00000000 0.22222222 0.03703704 0.0000000 0.31136482
3          NaN        NaN        NaN        NaN        NaN       NaN 0.28819444
4  0.007231405 0.07179115 0.19796954 0.13052937 0.19869471 0.3176215 0.11333333
5  0.000000000 0.02866076 0.34966128 0.06253257 0.43460135 0.4752475 0.31087533
6  0.000000000 0.06965944 0.17801858 0.14705882 0.15325077 0.4256966 0.21677803
7  0.022050717 0.08348933 0.07937102 0.10557844 0.18120554 0.2036690 0.08281325
8  0.081652257 0.02723971 0.09624697 0.06779661 0.06476998 0.2536320 0.15808470
9  0.022706630 0.06492843 0.08895706 0.05112474 0.17177914 0.2530675 0.18296169
10 0.020157756 0.01073729 0.15032212 0.05225483 0.21188261 0.2705798 0.08664260
           EMP                       geometry
1  0.075047801 MULTIPOLYGON (((-95.69483 3...
2  0.000000000 MULTIPOLYGON (((-95.84761 3...
3          NaN MULTIPOLYGON (((-95.98345 3...
4  0.028537455 MULTIPOLYGON (((-95.68779 3...
5  0.033364662 MULTIPOLYGON (((-95.70758 3...
6  0.034448819 MULTIPOLYGON (((-95.64951 3...
7  0.040482574 MULTIPOLYGON (((-95.62881 3...
8  0.040043884 MULTIPOLYGON (((-95.88044 3...
9  0.005847953 MULTIPOLYGON (((-95.70584 3...
10 0.084521385 MULTIPOLYGON (((-95.58587 3...

Visualize the NDI

This is the continuous index

# Visualize the NDI (Messer) values  for TX, U.S.A., counties
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = TX2021messer, 
                   ggplot2::aes(fill = NDI),
                   size = 0.20) +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2021 5 Year estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)",
                   subtitle = "TX counties as the referent")

ggplot() + geom_sf(data = TX2021messer, 
                   ggplot2::aes(fill = NDI),
                   size = 0.20, color="darkblue") +
  theme_minimal() + 
  scale_fill_viridis_c() +
  labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2021 5 Year estimates") +
  ggtitle("Neighborhood Deprivation Index (Messer)",
                   subtitle = "TX counties as the referent")

This is what it looks like when we rerun as categorical

TX2021messer$NDIQuartNA <- factor(replace(as.character(TX2021messer$NDIQuart), 
                                            TX2021messer$NDIQuart == "9-NDI not avail", NA),
                                         c(levels(TX2021messer$NDIQuart)[-5], NA))

txndi<-ggplot2::ggplot() + 
  geom_sf(data = TX2021messer, 
                   ggplot2::aes(fill = NDIQuartNA),
                   size = 0.20,
                   color = "darkblue") +
  theme_minimal() + 
  scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2021 5-Year estimates. Author: Coda Rayo-Garza") +
  ggtitle("Texas Neighborhood Deprivation Index (Messer) Quartiles",
                   subtitle = "TX counties as the referent") 

txndi

Make it Interactive!

# library(plotly)
# 
# ggplotly(txndi) %>%
#   highlight(
#     "plotly_hover",
#     selected = attrs_selected(line = list(color = "black"))
# )

There are so many easy and cool things we can do with the Tidycensus package to visualize our data!

Other Useful Stuff

When we call the variables, it can get a little taxing to sort through to identify the ones we need. Not a big deal if we do it enough, we can mostly recall them. However, I like lists. They make my life easier. So, here is a list of some of the most common socio-demographic variables that I find useful.

First, this code below shows you hoe to search through for specific variables. The second code (all_vars) saves all of the variables, which you can them manually sort through.

tx_vars <- load_variables(2021, "acs1", cache = TRUE) %>%
  filter(name %in% c("B01003_001", "B01001_003", "B01001_004", "B01001_005", "B01001_006", "B01001_027", "B01001_028", "B01001_029", "B01001_030"  ))

all_vars <- load_variables(2021, "acs1", cache = TRUE)

List of Variables

List of Most Variables for Children/Youth + Other Socio-Dem Characteristics

AGE x SEX

B01003_001: Total

B01001_003: Estimate!!Total:!!Male:!!Under 5 years

B01001_004: Estimate!!Total:!!Male:!!5 to 9 years

B01001_005: Estimate!!Total:!!Male:!!10 to 14 years

B01001_006: Estimate!!Total:!!Male:!!15 to 17 years

B01001_027: Estimate!!Total:!!Female:!!Under 5 years

B01001_028E: Estimate!!Total:!!Female:!!5 to 9 years

B01001_029E: Estimate!!Total:!!Female:!!10 to 14 years

B01001_030E: Estimate!!Total:!!Female:!!15 to 17 years

SOCIOECONOMIC

B19013_001: Median Household Income

B19013B_001: MHI Non-Hispanic white

B19013C_001: MHI American Indian Alaskan Native

B19013D_001: MHI Asian

B19013E_001: MHI Native Hawaii or Other Pacific Islander

B19013F_001: MHI Some other Race Alone

B19013G_001: MHI Two or More Races

B19013I_001: Hispanic or Latino

LANGUAGE

AGE BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS

B16003_002: Estimate!!Total:!!5 to 17 years:

B16003_003: Estimate!!Total:!!5 to 17 years:!!Speak only English

B16003_004: Estimate!!Total:!!5 to 17 years:!!Speak Spanish

I would like to work on creating code that calculates the estimates across different tables to generate %’s. . For example, if you wanted to know the percentage of people under 18 years old, you would want to add the tables listed below and divide by the total estimate. This code is in the works. Stay tuned to see how much % code I can generate!

For example, this code (theoretically) creates a single variable for all persons (f/m) under 18 years old.

Under_18 <- (“tx_vars$B01001_003” + “tx_vars$B01001_004” + “tx_vars$B01001_005”+“tx_vars$B01001_006”+“tx_vars$B01001_027”+“tx_vars$B01001_028”+“tx_vars$B01001_02”+“tx_vars$B01001_030”)/“tx_vars$B01003_001” #PER_UNDER_18

…Until Next Time!