Harold Nelson
2022-10-23
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
We’ll use the 2020 five year ACS Survey. Let’s get all the variable names.
## [1] 27850
## # A tibble: 6 × 4
## name label concept geography
## <chr> <chr> <chr> <chr>
## 1 B01001_001 Estimate!!Total: SEX BY AGE block group
## 2 B01001_002 Estimate!!Total:!!Male: SEX BY AGE block group
## 3 B01001_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE block group
## 4 B01001_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE block group
## 5 B01001_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE block group
## 6 B01001_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE block group
We can see that the total population is given with the variable name B010001_001.
For this exercise, I want to find the proportion of Bachelor’s degree holders in the population of each Washington county.
rows in v20 where the name contains “B15012”. I know that the count of bachelor’s degree holders is in this table because of an exploration I did with Census Reporter. See https://censusreporter.org/
## # A tibble: 16 × 4
## name label concept geogr…¹
## <chr> <chr> <chr> <chr>
## 1 B15012_001 Estimate!!Total: TOTAL … block …
## 2 B15012_002 Estimate!!Total:!!Science and Engineering!!Comput… TOTAL … block …
## 3 B15012_003 Estimate!!Total:!!Science and Engineering!!Biolog… TOTAL … block …
## 4 B15012_004 Estimate!!Total:!!Science and Engineering!!Physic… TOTAL … block …
## 5 B15012_005 Estimate!!Total:!!Science and Engineering!!Psycho… TOTAL … block …
## 6 B15012_006 Estimate!!Total:!!Science and Engineering!!Social… TOTAL … block …
## 7 B15012_007 Estimate!!Total:!!Science and Engineering!!Engine… TOTAL … block …
## 8 B15012_008 Estimate!!Total:!!Science and Engineering!!Multid… TOTAL … block …
## 9 B15012_009 Estimate!!Total:!!Science and Engineering Related… TOTAL … block …
## 10 B15012_010 Estimate!!Total:!!Business TOTAL … block …
## 11 B15012_011 Estimate!!Total:!!Education TOTAL … block …
## 12 B15012_012 Estimate!!Total:!!Arts, Humanities, and Other!!Li… TOTAL … block …
## 13 B15012_013 Estimate!!Total:!!Arts, Humanities, and Other!!Li… TOTAL … block …
## 14 B15012_014 Estimate!!Total:!!Arts, Humanities, and Other!!Vi… TOTAL … block …
## 15 B15012_015 Estimate!!Total:!!Arts, Humanities, and Other!!Co… TOTAL … block …
## 16 B15012_016 Estimate!!Total:!!Arts, Humanities, and Other!!Ot… TOTAL … block …
## # … with abbreviated variable name ¹geography
Use get_acs() to get data from the 2020 acs. We want the total number of bachelor’s degrees and the population for each county in Washington.
options(tigris_use_cache = TRUE)
raw_data = get_acs(year = 2020,
variables = c(Bachelors = "B15012_001",
Pop = "B01001_001"),
state = "WA",
geography = "county",
output = "wide",
geometry = TRUE)
## Getting data from the 2016-2020 5-year ACS
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -123.0237 ymin: 45.54354 xmax: -117.0398 ymax: 49.0007
## Geodetic CRS: NAD83
## GEOID NAME BachelorsE BachelorsM PopE PopM
## 1 53053 Pierce County, Washington 180718 3399 891862 NA
## 2 53057 Skagit County, Washington 26949 1189 127442 NA
## 3 53047 Okanogan County, Washington 6619 610 42080 NA
## 4 53063 Spokane County, Washington 122374 2889 513402 53
## 5 53035 Kitsap County, Washington 70380 2043 268945 NA
## 6 53011 Clark County, Washington 112141 2683 481950 NA
## geometry
## 1 MULTIPOLYGON (((-122.6406 4...
## 2 MULTIPOLYGON (((-122.5377 4...
## 3 MULTIPOLYGON (((-120.8821 4...
## 4 MULTIPOLYGON (((-117.8233 4...
## 5 MULTIPOLYGON (((-122.5049 4...
## 6 MULTIPOLYGON (((-122.796 45...
We want a file named bs_rate. It contains two variables.
bs_rate = raw_data %>%
mutate(bs_rate = BachelorsE / PopE,
name = str_replace(NAME, " County, Washington",""))%>%
select(name, bs_rate)
head(bs_rate)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -123.0237 ymin: 45.54354 xmax: -117.0398 ymax: 49.0007
## Geodetic CRS: NAD83
## name bs_rate geometry
## 1 Pierce 0.2026300 MULTIPOLYGON (((-122.6406 4...
## 2 Skagit 0.2114609 MULTIPOLYGON (((-122.5377 4...
## 3 Okanogan 0.1572956 MULTIPOLYGON (((-120.8821 4...
## 4 Spokane 0.2383590 MULTIPOLYGON (((-117.8233 4...
## 5 Kitsap 0.2616892 MULTIPOLYGON (((-122.5049 4...
## 6 Clark 0.2326818 MULTIPOLYGON (((-122.796 45...
Modify the last slide in Walker’s first chapter to get a Cleveland Dotplot of bs_rate
dotplot <- ggplot(bs_rate, aes(x = bs_rate, y = reorder(name, bs_rate))) +
geom_point(color = "navy", size = 4) +
theme_minimal(base_size = 12) +
labs(x ="Ratio of Bachelors Degrees to Population",
y = "",
title = "Bachelors Degrees/Population for WA Counties")
dotplot
Modify what Walker did at the end of his last chapter to make a choropleth of this data.
Try it yourself and then see what I did.
# Set the color guide to FALSE and add a subtitle and caption to your map
ggplot(bs_rate, aes(fill = bs_rate), color = "white") +
geom_sf() +
scale_fill_viridis_c() +
theme_minimal() +
coord_sf(crs = 26911, datum = NA) +
labs(title = "Bachelors Degrees/Population for WA Counties",
caption = "Data source: 2020 ACS.\nData acquired with the R tidycensus package.",
fill = "ACS estimate")