Cleveland and Choropleth

Harold Nelson

2022-10-23

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(tidycensus)
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
#census_api_key("Your Census API  Key",install = TRUE)

Get ACS Variables

We’ll use the 2020 five year ACS Survey. Let’s get all the variable names.

v20 = load_variables(year = 2020,
                    "acs5", 
                    cache = TRUE)
nrow(v20)
## [1] 27850

Check the Head

Solution

head(v20)
## # A tibble: 6 × 4
##   name       label                                   concept    geography  
##   <chr>      <chr>                                   <chr>      <chr>      
## 1 B01001_001 Estimate!!Total:                        SEX BY AGE block group
## 2 B01001_002 Estimate!!Total:!!Male:                 SEX BY AGE block group
## 3 B01001_003 Estimate!!Total:!!Male:!!Under 5 years  SEX BY AGE block group
## 4 B01001_004 Estimate!!Total:!!Male:!!5 to 9 years   SEX BY AGE block group
## 5 B01001_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE block group
## 6 B01001_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE block group

We can see that the total population is given with the variable name B010001_001.

For this exercise, I want to find the proportion of Bachelor’s degree holders in the population of each Washington county.

Find

rows in v20 where the name contains “B15012”. I know that the count of bachelor’s degree holders is in this table because of an exploration I did with Census Reporter. See https://censusreporter.org/

Solution

v20 %>% filter(str_detect(name,"B15012"))
## # A tibble: 16 × 4
##    name       label                                              concept geogr…¹
##    <chr>      <chr>                                              <chr>   <chr>  
##  1 B15012_001 Estimate!!Total:                                   TOTAL … block …
##  2 B15012_002 Estimate!!Total:!!Science and Engineering!!Comput… TOTAL … block …
##  3 B15012_003 Estimate!!Total:!!Science and Engineering!!Biolog… TOTAL … block …
##  4 B15012_004 Estimate!!Total:!!Science and Engineering!!Physic… TOTAL … block …
##  5 B15012_005 Estimate!!Total:!!Science and Engineering!!Psycho… TOTAL … block …
##  6 B15012_006 Estimate!!Total:!!Science and Engineering!!Social… TOTAL … block …
##  7 B15012_007 Estimate!!Total:!!Science and Engineering!!Engine… TOTAL … block …
##  8 B15012_008 Estimate!!Total:!!Science and Engineering!!Multid… TOTAL … block …
##  9 B15012_009 Estimate!!Total:!!Science and Engineering Related… TOTAL … block …
## 10 B15012_010 Estimate!!Total:!!Business                         TOTAL … block …
## 11 B15012_011 Estimate!!Total:!!Education                        TOTAL … block …
## 12 B15012_012 Estimate!!Total:!!Arts, Humanities, and Other!!Li… TOTAL … block …
## 13 B15012_013 Estimate!!Total:!!Arts, Humanities, and Other!!Li… TOTAL … block …
## 14 B15012_014 Estimate!!Total:!!Arts, Humanities, and Other!!Vi… TOTAL … block …
## 15 B15012_015 Estimate!!Total:!!Arts, Humanities, and Other!!Co… TOTAL … block …
## 16 B15012_016 Estimate!!Total:!!Arts, Humanities, and Other!!Ot… TOTAL … block …
## # … with abbreviated variable name ¹​geography

Get Raw Data

Use get_acs() to get data from the 2020 acs. We want the total number of bachelor’s degrees and the population for each county in Washington.

Solution

options(tigris_use_cache = TRUE)

raw_data = get_acs(year = 2020,
        variables = c(Bachelors = "B15012_001",
                      Pop = "B01001_001"),
        state = "WA",
        geography = "county",
        output = "wide",
        geometry = TRUE)
## Getting data from the 2016-2020 5-year ACS
head(raw_data)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.0237 ymin: 45.54354 xmax: -117.0398 ymax: 49.0007
## Geodetic CRS:  NAD83
##   GEOID                        NAME BachelorsE BachelorsM   PopE PopM
## 1 53053   Pierce County, Washington     180718       3399 891862   NA
## 2 53057   Skagit County, Washington      26949       1189 127442   NA
## 3 53047 Okanogan County, Washington       6619        610  42080   NA
## 4 53063  Spokane County, Washington     122374       2889 513402   53
## 5 53035   Kitsap County, Washington      70380       2043 268945   NA
## 6 53011    Clark County, Washington     112141       2683 481950   NA
##                         geometry
## 1 MULTIPOLYGON (((-122.6406 4...
## 2 MULTIPOLYGON (((-122.5377 4...
## 3 MULTIPOLYGON (((-120.8821 4...
## 4 MULTIPOLYGON (((-117.8233 4...
## 5 MULTIPOLYGON (((-122.5049 4...
## 6 MULTIPOLYGON (((-122.796 45...

Preparation

We want a file named bs_rate. It contains two variables.

Solution

bs_rate = raw_data %>% 
  mutate(bs_rate = BachelorsE / PopE,
         name = str_replace(NAME, " County, Washington",""))%>% 
  select(name, bs_rate)

head(bs_rate)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.0237 ymin: 45.54354 xmax: -117.0398 ymax: 49.0007
## Geodetic CRS:  NAD83
##       name   bs_rate                       geometry
## 1   Pierce 0.2026300 MULTIPOLYGON (((-122.6406 4...
## 2   Skagit 0.2114609 MULTIPOLYGON (((-122.5377 4...
## 3 Okanogan 0.1572956 MULTIPOLYGON (((-120.8821 4...
## 4  Spokane 0.2383590 MULTIPOLYGON (((-117.8233 4...
## 5   Kitsap 0.2616892 MULTIPOLYGON (((-122.5049 4...
## 6    Clark 0.2326818 MULTIPOLYGON (((-122.796 45...

Cleveland Dotplot

Modify the last slide in Walker’s first chapter to get a Cleveland Dotplot of bs_rate

Solution

dotplot <- ggplot(bs_rate, aes(x = bs_rate, y = reorder(name, bs_rate))) + 
  geom_point(color = "navy", size = 4) +
  theme_minimal(base_size = 12) +

  labs(x ="Ratio of Bachelors Degrees to Population", 
       y = "", 
       title = "Bachelors Degrees/Population for WA Counties")
  
dotplot

Choropleth

Modify what Walker did at the end of his last chapter to make a choropleth of this data.

Try it yourself and then see what I did.

Solution

# Set the color guide to FALSE and add a subtitle and caption to your map
ggplot(bs_rate, aes(fill = bs_rate), color = "white") + 
  geom_sf() + 
  scale_fill_viridis_c() +  
  theme_minimal() + 
  coord_sf(crs = 26911, datum = NA) + 
  labs(title = "Bachelors Degrees/Population for WA Counties", 
       caption = "Data source: 2020 ACS.\nData acquired with the R tidycensus package.", 
       fill = "ACS estimate")