Introduction

The purpose of this lab is to work with US Census Data in the RStudio environment. Using the tidycensus package to easily extract up to data American Community Survey data directly from the US Census Bureau data portal. This lab works with ggplotly and ggirafe packages to create interactive visualizations for the census data. The sf and leaflet packages were utilized to create interactive maps with census data.

# install.packages(c("tidycensus", "tidyverse"))

# install.packages(c("mapview", "plotly", "ggiraph", "survey", "srvyr"))

library(tidycensus)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(ggiraph)
library(sf)
## Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(leaflet)
library(mapview)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Part A - Non-spatial analysis and data visualization with tidycensus

Part 1.1 Accessing American Community Survey (ACS) data

# Part 1. The get_acs() was used to fetch data on the percentage of the population that have a graduate degree ("DP02_0066P") at the county level for Virginia.
grad_pop <- get_acs(
  geography = "county",
  variables = "DP02_0066P",
  state = "VA"
  
) 
## Getting data from the 2019-2023 5-year ACS
## Warning: • You have not set a Census API key. Users without a key are limited to 500
## queries per day and may experience performance limitations.
## ℹ For best results, get a Census API key at
## http://api.census.gov/data/key_signup.html and then supply the key to the
## `census_api_key()` function to use it throughout your tidycensus session.
## This warning is displayed once per session.
## Using the ACS Data Profile
 # Part 1.1. Which counties in the selected state have the largest percentages of graduate degree holders?
highest_grad_pop <- head(arrange(grad_pop, desc(estimate)))
highest_grad_pop
## # A tibble: 6 × 5
##   GEOID NAME                           variable   estimate   moe
##   <chr> <chr>                          <chr>         <dbl> <dbl>
## 1 51610 Falls Church city, Virginia    DP02_0066P     49.8   3.5
## 2 51013 Arlington County, Virginia     DP02_0066P     41.4   1  
## 3 51510 Alexandria city, Virginia      DP02_0066P     34.4   1.2
## 4 51540 Charlottesville city, Virginia DP02_0066P     33.4   2.3
## 5 51059 Fairfax County, Virginia       DP02_0066P     32.4   0.4
## 6 51678 Lexington city, Virginia       DP02_0066P     32.4   8

Part 1.2 Data navigation and retrieving percentages of graduate degree holders

  # Part 1.2. Finding which county have the smallest percentage of graduate degree holders?
lowest_grad_pops <- head(arrange(grad_pop, estimate))
lowest_grad_pops
## # A tibble: 6 × 5
##   GEOID NAME                         variable   estimate   moe
##   <chr> <chr>                        <chr>         <dbl> <dbl>
## 1 51051 Dickenson County, Virginia   DP02_0066P      1.9   0.8
## 2 51105 Lee County, Virginia         DP02_0066P      3.6   1  
## 3 51580 Covington city, Virginia     DP02_0066P      3.6   1.6
## 4 51111 Lunenburg County, Virginia   DP02_0066P      3.7   1.2
## 5 51141 Patrick County, Virginia     DP02_0066P      3.7   1.2
## 6 51081 Greensville County, Virginia DP02_0066P      3.8   1.3

Part 1.3 Visualization of data with a margin of error plot

  # Part 1.3. Making a margin of error plot for Virginia. This method does not work well for Virginia due to the large amount of counties. This work provides a good example of how to project target data with a margin of error and a high amount of control over labelling. 
moe_grad_plot <- ggplot(grad_pop, aes(x = estimate, y = reorder(NAME, estimate))) +
  geom_col(color = "darkblue") +
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), width = .2, color = "red") +
  scale_x_continuous(labels = scales::number) +
  # Can use the "str_remove" function in scale_y_discrete to remove unwanted string values from the names of fields
  scale_y_discrete(labels = function(x) str_remove(x, " County Virginia|, Virginia")) +
  labs(
    title = "Graduate Degree Holders by County in Virginia",
    x = "Percentages of Graduate Degree Holders",
    y = "County",
    caption = "Data Source: U.S Census Bureau ACS 2019-2023"
  )
moe_grad_plot

Part 2. Making interactive charts with plotly

# Part 2. Using the plotly package to convert the data created and plotted in Part 1.3 into an interactive chart with margin of error included.
grad_plot <- ggplot(data = grad_pop, aes(x = estimate,
                                 y = reorder(NAME, estimate))) + 
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
                width = 0.5, linewidth = 0.5) +
  geom_point(color = "darkred", size = 2) + 
  scale_x_continuous(labels = label_number()) + # the label_number function is from the scales() package
  scale_y_discrete(labels = function(x) str_remove(x, " County, Virginia|, Virginia")) + 
  labs(title = "Median graduate degree holder, 2019-2023 ACS",
       subtitle = "Counties in Virginia",
       caption = "Data acquired with R and tidycensus",
       x = "ACS estimate",
       y = "")

# ggplotly takes the created plot and adds visualization components automatically
# A user can click around the graph to zoom in on specific data. It makes it easier to interact with the underlying information.
ggplotly(grad_plot)

Part 3 Making interactive charts with ggiraph

# Part 3. This optional activity includes using the ggiraph package to make the plot created in step 1 & 2 interactive with a different interactive package.
# ggiraph is an alternative approach for making ggplot2 graphics interactive, it includes *_interactive() versions of ggplot2 geoms.
grad_plot_girafe <- ggplot(grad_pop, aes(x = estimate,
                                  y = reorder(NAME, estimate))) + 
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
                width = 0.5, linewidth = 0.5) +
  geom_point_interactive(color = "darkred", size = 2) + 
  scale_x_continuous(labels = label_number()) + 
  scale_y_discrete(labels = function(x) str_remove(x, " County, Virginia|, Virginia")) + 
  labs(title = "Median graduate degree holder, 2019-2023 ACS",
       subtitle = "Counties in Virginia",
       caption = "Data acquired with R and tidycensus",
       x = "ACS estimate",
       y = "") +
  theme_minimal(base_size= 12)

girafe(ggobj = grad_plot_girafe) %>%
  girafe_options(opts_hover(css = "fill:cyan;"))

Part B - Spatial analysis and visualization with tidycensus

Story

This part of the project used get_acs() to fetch spatial ACS data of Virginia census tracts. This focused on the five year estimates from ACS 2019-2023. This work flow was informed by a legal requirement my work at the Virginia Department of Energy is tasked to complete every 3 years. The Virginia Clean Economy Act requires my agency to perform an impact analysis of the legislation on “Historically Economically Disadvantaged Communities” (HEDCs). HEDCs are defined as a census tract where the population is a community of color or a low income community. The current workflow uses a python scripted ArcGIS Pro tool to update the data. But this is game changing for pulling the data in from the web and visualizing in real time.

Part 1. Using the load_variables function to understand the ACS variable list

# Part 1. Using the load_variables() function to find the variables for the latest ACS data.
vars <- load_variables(2023, "acs5")

### From this list I was able to search for "total population" and "total white population" using the "var" object to identify B02001_001 and B02001_002 for race data. These datasets were used this to calculate the number of communities that qualified as communities of color in Virginia

Part 2. Calculating the population percentage of Virginia census tracts for individuals who identify as people of color

# Step 2 looks to calculate the population of individuals who identify as people of color in Virginia. This will inform the reoccurring impact assessment my agency performs for the Virginia Clean Economy Act. 
# To calculate the census tracts that are communities of color I need to obtain the total population and subtract by the total white population
pop <- get_acs(
  geography = "tract",
  variables = c(total_pop = "B02001_001", white_pop = "B02001_002"),
  state = "VA",
  geometry = TRUE
)
## Getting data from the 2019-2023 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |=================                                                     |  24%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  32%  |                                                                              |========================                                              |  34%  |                                                                              |===========================                                           |  39%  |                                                                              |=============================                                         |  41%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  48%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |================================================                      |  69%  |                                                                              |==================================================                    |  71%  |                                                                              |======================================================                |  77%  |                                                                              |========================================================              |  79%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |===========================================================           |  84%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |======================================================================| 100%
# transforming the data from long to wide in order to make the "total_pop" and "white_pop" columns
pop_wide <- pop %>%
  select(GEOID, NAME, variable, estimate) %>%
  pivot_wider(names_from = variable, values_from = estimate)

# mutating the total_pop and "white_pop" columns together to create a "Community of color population" or "coc_pop" column.
pop_wide <- pop_wide %>%
  group_by(GEOID) %>%
  mutate(coc_pop = total_pop - white_pop) %>%
  mutate(coc_perc = (coc_pop / total_pop) * 100)

Part 3. Displaying the data in mapview

# Part 3. Using mapview() to  quickly display the data interactively.
mapview(pop_wide)

Part 4. Making a choropleth map of communities of color by population percentage

# Part 4. Using ggplot to create choropleth map of community of color population percentage. A choropleth maps was chosen due to the use of percentages. It was chosen over a graduated symbols maps as raw counts were not displayed.
ggplot(data = pop_wide, aes(fill = coc_perc)) +
  geom_sf() +
  theme_void() +
  scale_fill_viridis_c(option = "rocket") +
  labs(title = "Census Tract Population Percent of People who Identify as People of Color",
       fill = "ACS estimate",
       caption = "2019-2023 ACS | tidycensus R package")

### Results The resulting choropleth map displays the distribution of communities of color. Which are defined in the Virginia Clean Economy Act as census tracts where more than 50% of the population are individuals who identify as people of color. White population was subtracted from total population ACS data to obtain an estimate for people of color per census tract. The population percentage was calculated and displayed using the “rocket” color scheme to show the census tracts with higher communities of color. The resulting map sharply displays a higher collection of communities of color of the eastern areas of Virginia and collected in the urban centers of the state. With notable rural communities of color being seen in Brunswick, Greensville, and Nottoway County.