library(tidycensus)
library(tidyverse)
options(tigris_use_cache = TRUE)
<- get_acs(
bexar state = "TX",
county = "Bexar",
geography = "tract",
variables = "B19013_001",
geometry = TRUE,
year = 2021)
%>%
bexar ggplot(aes(fill = estimate)) +
geom_sf(color = NA) +
scale_fill_viridis_c(option = "magma")
Spark Talk
3 Super Easy Visuals Using Tidycensus
1. A Basic Map
Here, we are mapping median income data from the American Community Survey 2021 5 Year Estimates
A More Detailed Map: Foreign Born Children Mapped By County + Tract
library(tidyverse)
library(tidycensus)
library(sf)
library(scales)
library(viridis)
# load all acs variables
<- load_variables(2021, "acs5", cache = T)
acs2011720
<- get_acs(geography = "tract",
raw_two_par variables = c(child_nav_born = "B05009_004",
child_for_born = "B05009_005",
child_tot_pop= "B05009_002"),
state='TX',
county = 'Bexar',
geometry = T,
year = 2021,
output = "wide")
<- raw_two_par %>%
for_bor_under mutate(pct_for_bor = child_for_bornE/child_tot_popM,
pct_nav_bor= child_nav_bornE/child_tot_popM)
library(RColorBrewer)
ggplot() +
geom_sf(data = for_bor_under, mapping = aes(fill = pct_for_bor),
color = "white",
lwd = 1) + # removes the census tract outline
theme_void() +
scale_fill_distiller(breaks=c(0, .2, .4, .6, .8, 1),
direction = 1,
palette = "RdPu",
na.value = "transparent",
name="Percent Children Under 6 Foreign Born (%)",
labels=percent_format(accuracy = 1L)) +
labs(
title = "Bexar County, Texas Foreign Born Children",
subtitle= "Children Under 6 Foreign Born w/Two Parent by Census Tract",
caption = "Source: American Community Survey, 2021 5 Yr")
2. Population Pyramids!
First, load the required packages. The last line saves the shapefiles from the tigris package.
# library(tidycensus)
# library(tidyverse)
# library(tigris)
# options(tigris_use_cache = TRUE)
# us_components <- get_estimates(geography = "state", product = "components")
#
# us_components
#
# unique(us_components$variable)
Next, get the estimates.
<- get_estimates(geography = "county",
bx_age_hisp product = "characteristics",
breakdown = c("SEX", "AGEGROUP", "HISP"),
breakdown_labels = TRUE,
state = "TX",
county = "Bexar")
bx_age_hisp
# A tibble: 210 × 6
GEOID NAME value SEX AGEGROUP HISP
<chr> <chr> <dbl> <chr> <fct> <chr>
1 48029 Bexar County, Texas 2003554 Both sexes All ages Both Hispani…
2 48029 Bexar County, Texas 787766 Both sexes All ages Non-Hispanic
3 48029 Bexar County, Texas 1215788 Both sexes All ages Hispanic
4 48029 Bexar County, Texas 138705 Both sexes Age 0 to 4 years Both Hispani…
5 48029 Bexar County, Texas 45433 Both sexes Age 0 to 4 years Non-Hispanic
6 48029 Bexar County, Texas 93272 Both sexes Age 0 to 4 years Hispanic
7 48029 Bexar County, Texas 99002 Both sexes Age 10 to 14 years Hispanic
8 48029 Bexar County, Texas 141277 Both sexes Age 5 to 9 years Both Hispani…
9 48029 Bexar County, Texas 46037 Both sexes Age 5 to 9 years Non-Hispanic
10 48029 Bexar County, Texas 95240 Both sexes Age 5 to 9 years Hispanic
# ℹ 200 more rows
Now, we do a little analysis to calculate the estimates by sex and hispanic/non-hispanic. This saves the date into a new dataframe that reorganizes the data ready to use in a population pyramid viz.
<- filter(bx_age_hisp, str_detect(AGEGROUP, "^Age"),
compare != "Both Hispanic Origins",
HISP != "Both sexes") %>%
SEX mutate(value = ifelse(SEX == "Male", -value, value))
Finally, this code creates the visualization.
ggplot(compare, aes(x = AGEGROUP, y = value, fill = SEX)) +
geom_bar(stat = "identity", width = 1) +
theme_minimal() +
scale_y_continuous(labels = function(y) paste0(abs(y / 1000), "k")) +
scale_x_discrete(labels = function(x) gsub("Age | years", "", x)) +
scale_fill_manual(values = c("mediumorchid", "yellowgreen")) +
coord_flip() +
facet_wrap(~HISP) +
labs(x = "",
y = "2019 Census Bureau population estimate",
title = "Population structure by Hispanic origin",
subtitle = "Bexar County, TX",
fill = "",
caption = "Data source: US Census Bureau population estimates & tidycensus R package")
3. Neighborhood Deprivation Index Example
library(sfdep)
library(spdep)
library(gridExtra)
library(grid)
library(ndi)
library(sf)
options(scipen=999)
This next map is a little more complicated than the other two examples because it uses more than basic variable analysis.
Get the NDI Scores
This chunk below gets the shapefiles we need and joins it to the NDI package with has the built in formula for calculating the neighborhood deprivation index.
# Compute the NDI (Messer) values (2017-2021 5-year ACS) for TX census tracts
# census_api_key("42539e850e81857ea4d3a8219088cab9544e88bb")
<- ndi::messer(state = "TX", year = 2021)
TX2021messer
# Obtain the 2021 census tracts from the "tigris" package
<- tigris::tracts(state = "TX", year = 2021, cb = TRUE)
tract2021TX
# Join the NDI (Messer) values to the census tract geometry
<- merge(tract2021TX, TX2021messer$ndi, by = "GEOID")
TX2021messer
#we can see the scores TX2021messer
Simple feature collection with 6885 features and 26 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
Geodetic CRS: NAD83
First 10 features:
GEOID STATEFP COUNTYFP TRACTCE AFFGEOID NAME
1 48001950100 48 001 950100 1400000US48001950100 9501
2 48001950401 48 001 950401 1400000US48001950401 9504.01
3 48001950402 48 001 950402 1400000US48001950402 9504.02
4 48001950500 48 001 950500 1400000US48001950500 9505
5 48001950600 48 001 950600 1400000US48001950600 9506
6 48001950700 48 001 950700 1400000US48001950700 9507
7 48001950800 48 001 950800 1400000US48001950800 9508
8 48001950901 48 001 950901 1400000US48001950901 9509.01
9 48001950902 48 001 950902 1400000US48001950902 9509.02
10 48001951001 48 001 951001 1400000US48001951001 9510.01
NAMELSAD STUSPS NAMELSADCO STATE_NAME LSAD ALAND
1 Census Tract 9501 TX Anderson County Texas CT 483306613
2 Census Tract 9504.01 TX Anderson County Texas CT 16509268
3 Census Tract 9504.02 TX Anderson County Texas CT 71134275
4 Census Tract 9505 TX Anderson County Texas CT 23132052
5 Census Tract 9506 TX Anderson County Texas CT 20653881
6 Census Tract 9507 TX Anderson County Texas CT 6720934
7 Census Tract 9508 TX Anderson County Texas CT 10429389
8 Census Tract 9509.01 TX Anderson County Texas CT 290214322
9 Census Tract 9509.02 TX Anderson County Texas CT 441347126
10 Census Tract 9510.01 TX Anderson County Texas CT 358736667
AWATER state county tract NDI NDIQuart
1 7864313 Texas Anderson County 9501 0.0049611245 3-AboveAvg deprivation
2 298419 Texas Anderson County 9504.01 0.0034580352 3-AboveAvg deprivation
3 2626492 Texas Anderson County 9504.02 NaN 9-NDI not avail
4 99223 Texas Anderson County 9505 0.0286627270 3-AboveAvg deprivation
5 329641 Texas Anderson County 9506 0.0821374170 4-Most deprivation
6 6724 Texas Anderson County 9507 0.0418867367 4-Most deprivation
7 92101 Texas Anderson County 9508 0.0018040321 3-AboveAvg deprivation
8 4738880 Texas Anderson County 9509.01 -0.0181186979 2-BelowAvg deprivation
9 4984901 Texas Anderson County 9509.02 -0.0009733503 3-AboveAvg deprivation
10 3015204 Texas Anderson County 9510.01 0.0053607285 3-AboveAvg deprivation
OCC CWD POV FHH PUB U30 EDU
1 0.047619048 0.06472847 0.15633571 0.08008777 0.13768513 0.2397148 0.09868421
2 0.000000000 0.14814815 0.00000000 0.22222222 0.03703704 0.0000000 0.31136482
3 NaN NaN NaN NaN NaN NaN 0.28819444
4 0.007231405 0.07179115 0.19796954 0.13052937 0.19869471 0.3176215 0.11333333
5 0.000000000 0.02866076 0.34966128 0.06253257 0.43460135 0.4752475 0.31087533
6 0.000000000 0.06965944 0.17801858 0.14705882 0.15325077 0.4256966 0.21677803
7 0.022050717 0.08348933 0.07937102 0.10557844 0.18120554 0.2036690 0.08281325
8 0.081652257 0.02723971 0.09624697 0.06779661 0.06476998 0.2536320 0.15808470
9 0.022706630 0.06492843 0.08895706 0.05112474 0.17177914 0.2530675 0.18296169
10 0.020157756 0.01073729 0.15032212 0.05225483 0.21188261 0.2705798 0.08664260
EMP geometry
1 0.075047801 MULTIPOLYGON (((-95.69483 3...
2 0.000000000 MULTIPOLYGON (((-95.84761 3...
3 NaN MULTIPOLYGON (((-95.98345 3...
4 0.028537455 MULTIPOLYGON (((-95.68779 3...
5 0.033364662 MULTIPOLYGON (((-95.70758 3...
6 0.034448819 MULTIPOLYGON (((-95.64951 3...
7 0.040482574 MULTIPOLYGON (((-95.62881 3...
8 0.040043884 MULTIPOLYGON (((-95.88044 3...
9 0.005847953 MULTIPOLYGON (((-95.70584 3...
10 0.084521385 MULTIPOLYGON (((-95.58587 3...
Visualize the NDI
This is the continuous index
# Visualize the NDI (Messer) values for TX, U.S.A., counties
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = TX2021messer,
ggplot2::aes(fill = NDI),
ggplot2size = 0.20) +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2021 5 Year estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer)",
ggplot2subtitle = "TX counties as the referent")
ggplot() + geom_sf(data = TX2021messer,
::aes(fill = NDI),
ggplot2size = 0.20, color="darkblue") +
theme_minimal() +
scale_fill_viridis_c() +
labs(fill = "Index (Continuous)",
caption = "Source: U.S. Census ACS 2021 5 Year estimates") +
ggtitle("Neighborhood Deprivation Index (Messer)",
subtitle = "TX counties as the referent")
This is what it looks like when we rerun as categorical
$NDIQuartNA <- factor(replace(as.character(TX2021messer$NDIQuart),
TX2021messer$NDIQuart == "9-NDI not avail", NA),
TX2021messerc(levels(TX2021messer$NDIQuart)[-5], NA))
<-ggplot2::ggplot() +
txndigeom_sf(data = TX2021messer,
::aes(fill = NDIQuartNA),
ggplot2size = 0.20,
color = "darkblue") +
theme_minimal() +
scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
na.value = "grey80") +
labs(fill = "Index (Categorical)",
caption = "Source: U.S. Census ACS 2021 5-Year estimates. Author: Coda Rayo-Garza") +
ggtitle("Texas Neighborhood Deprivation Index (Messer) Quartiles",
subtitle = "TX counties as the referent")
txndi
Make it Interactive!
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
ggplotly(txndi) %>%
highlight(
"plotly_hover",
selected = attrs_selected(line = list(color = "black"))
)
There are so many easy and cool things we can do with the Tidycensus package to visualize our data!
Other Useful Stuff
When we call the variables, it can get a little taxing to sort through to identify the ones we need. Not a big deal if we do it enough, we can mostly recall them. However, I like lists. They make my life easier. So, here is a list of some of the most common socio-demographic variables that I find useful.
First, this code below shows you hoe to search through for specific variables. The second code (all_vars) saves all of the variables, which you can them manually sort through.
<- load_variables(2021, "acs1", cache = TRUE) %>%
tx_vars filter(name %in% c("B01003_001", "B01001_003", "B01001_004", "B01001_005", "B01001_006", "B01001_027", "B01001_028", "B01001_029", "B01001_030" ))
<- load_variables(2021, "acs1", cache = TRUE) all_vars
List of Variables
List of Most Variables for Children/Youth + Other Socio-Dem Characteristics
AGE x SEX
B01003_001: Total
B01001_003: Estimate!!Total:!!Male:!!Under 5 years
B01001_004: Estimate!!Total:!!Male:!!5 to 9 years
B01001_005: Estimate!!Total:!!Male:!!10 to 14 years
B01001_006: Estimate!!Total:!!Male:!!15 to 17 years
B01001_027: Estimate!!Total:!!Female:!!Under 5 years
B01001_028E: Estimate!!Total:!!Female:!!5 to 9 years
B01001_029E: Estimate!!Total:!!Female:!!10 to 14 years
B01001_030E: Estimate!!Total:!!Female:!!15 to 17 years
SOCIOECONOMIC
B19013_001: Median Household Income
B19013B_001: MHI Non-Hispanic white
B19013C_001: MHI American Indian Alaskan Native
B19013D_001: MHI Asian
B19013E_001: MHI Native Hawaii or Other Pacific Islander
B19013F_001: MHI Some other Race Alone
B19013G_001: MHI Two or More Races
B19013I_001: Hispanic or Latino
LANGUAGE
AGE BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS
B16003_002: Estimate!!Total:!!5 to 17 years:
B16003_003: Estimate!!Total:!!5 to 17 years:!!Speak only English
B16003_004: Estimate!!Total:!!5 to 17 years:!!Speak Spanish
I would like to work on creating code that calculates the estimates across different tables to generate %’s. . For example, if you wanted to know the percentage of people under 18 years old, you would want to add the tables listed below and divide by the total estimate. This code is in the works. Stay tuned to see how much % code I can generate!
For example, this code (theoretically) creates a single variable for all persons (f/m) under 18 years old.
Under_18 <- (“tx_vars$B01001_003” + “tx_vars$B01001_004” + “tx_vars$B01001_005”+“tx_vars$B01001_006”+“tx_vars$B01001_027”+“tx_vars$B01001_028”+“tx_vars$B01001_02”+“tx_vars$B01001_030”)/“tx_vars$B01003_001” #PER_UNDER_18
…Until Next Time!