Student: Lidiia Iavorivska

Session A - non-spatial Census

Load library for getting and processing Census data

library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(mapview)
library(viridisLite)

Get the Census data

Fetch data on the percentage of the population that have a graduate degree (“DP02_0066P”) in Vermont at the county level from the American Community Survey 2021 (ACS 2021).

vt_graduate <- get_acs(
  geography = "county", 
  variables = "DP02_0066P", 
  state = "VT", #<<
  year = 2021
)
## Getting data from the 2017-2021 5-year ACS
## Using the ACS Data Profile
vt_graduate  # display the data
## # A tibble: 14 × 5
##    GEOID NAME                       variable   estimate   moe
##    <chr> <chr>                      <chr>         <dbl> <dbl>
##  1 50001 Addison County, Vermont    DP02_0066P     16.9   1.2
##  2 50003 Bennington County, Vermont DP02_0066P     16.3   1.4
##  3 50005 Caledonia County, Vermont  DP02_0066P     11.6   0.9
##  4 50007 Chittenden County, Vermont DP02_0066P     22.3   0.9
##  5 50009 Essex County, Vermont      DP02_0066P      8.5   1.5
##  6 50011 Franklin County, Vermont   DP02_0066P      9.5   1.1
##  7 50013 Grand Isle County, Vermont DP02_0066P     16.6   2.5
##  8 50015 Lamoille County, Vermont   DP02_0066P     15.3   1.9
##  9 50017 Orange County, Vermont     DP02_0066P     16.3   1.5
## 10 50019 Orleans County, Vermont    DP02_0066P     10.7   1.4
## 11 50021 Rutland County, Vermont    DP02_0066P     11.8   1  
## 12 50023 Washington County, Vermont DP02_0066P     18.6   1.2
## 13 50025 Windham County, Vermont    DP02_0066P     17.6   1.4
## 14 50027 Windsor County, Vermont    DP02_0066P     17     1.2

Question 1.1: Which counties in Vermont have the largest percentages of graduate degree holders?

arrange(vt_graduate, desc(estimate))
## # A tibble: 14 × 5
##    GEOID NAME                       variable   estimate   moe
##    <chr> <chr>                      <chr>         <dbl> <dbl>
##  1 50007 Chittenden County, Vermont DP02_0066P     22.3   0.9
##  2 50023 Washington County, Vermont DP02_0066P     18.6   1.2
##  3 50025 Windham County, Vermont    DP02_0066P     17.6   1.4
##  4 50027 Windsor County, Vermont    DP02_0066P     17     1.2
##  5 50001 Addison County, Vermont    DP02_0066P     16.9   1.2
##  6 50013 Grand Isle County, Vermont DP02_0066P     16.6   2.5
##  7 50003 Bennington County, Vermont DP02_0066P     16.3   1.4
##  8 50017 Orange County, Vermont     DP02_0066P     16.3   1.5
##  9 50015 Lamoille County, Vermont   DP02_0066P     15.3   1.9
## 10 50021 Rutland County, Vermont    DP02_0066P     11.8   1  
## 11 50005 Caledonia County, Vermont  DP02_0066P     11.6   0.9
## 12 50019 Orleans County, Vermont    DP02_0066P     10.7   1.4
## 13 50011 Franklin County, Vermont   DP02_0066P      9.5   1.1
## 14 50009 Essex County, Vermont      DP02_0066P      8.5   1.5

Question 1.2: Which have the smallest percentages?

arrange(vt_graduate, estimate)
## # A tibble: 14 × 5
##    GEOID NAME                       variable   estimate   moe
##    <chr> <chr>                      <chr>         <dbl> <dbl>
##  1 50009 Essex County, Vermont      DP02_0066P      8.5   1.5
##  2 50011 Franklin County, Vermont   DP02_0066P      9.5   1.1
##  3 50019 Orleans County, Vermont    DP02_0066P     10.7   1.4
##  4 50005 Caledonia County, Vermont  DP02_0066P     11.6   0.9
##  5 50021 Rutland County, Vermont    DP02_0066P     11.8   1  
##  6 50015 Lamoille County, Vermont   DP02_0066P     15.3   1.9
##  7 50003 Bennington County, Vermont DP02_0066P     16.3   1.4
##  8 50017 Orange County, Vermont     DP02_0066P     16.3   1.5
##  9 50013 Grand Isle County, Vermont DP02_0066P     16.6   2.5
## 10 50001 Addison County, Vermont    DP02_0066P     16.9   1.2
## 11 50027 Windsor County, Vermont    DP02_0066P     17     1.2
## 12 50025 Windham County, Vermont    DP02_0066P     17.6   1.4
## 13 50023 Washington County, Vermont DP02_0066P     18.6   1.2
## 14 50007 Chittenden County, Vermont DP02_0066P     22.3   0.9

Question 1.3: Vizualizing the data with a plot

Each point represents the estimated percentage of residents with graduate or professional degrees in a Vermont county. The horizontal error bars show the 90% margins of error from the 2017–2021 ACS, indicating the range within which the true population value is likely to fall.

The figure shows a considerable variation in percentage of residents with graduate degrees across Vermont counties. The highest share of graduate-level holders is found in Chittenden County, which is an major urban center in the state and where the University of Vermont is located. Counties such as Washington and Windsor also display relatively high educational attainment compared to the rest of the state.

vt_plot <- ggplot(vt_graduate, aes(x = estimate, 
                                y = reorder(NAME, estimate))) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
                width = 0.5, linewidth = 0.5) +
geom_point(color = "darkred", size = 2) +
scale_x_continuous(labels = label_percent(scale = 1)) +   # keep X axis to scale 1, without default multiplication by 100
scale_y_discrete(labels = function(x) str_remove(x, " County, Vermont|, Vermont")) +
labs(title = "Percent of  population with graduate degrees across counties in Vermont,\n2017-2021 American Community Survey",
       caption = "Error bars represent margins of error associated with data estimates. Data acquired with R and tidycensus",
       x = "ACS estimate",
       y = "") + 
  theme_minimal(base_size = 10)

vt_plot  # display the plot

Question 3: Interactive plot

Pan over the data points to reveal exact estimates of percent of population with graduate degrees living in Vermont counties.

 ggplotly(vt_plot, tooltip = "x")

Session B - spatial Census

View a list of variables from the Census

vars <- load_variables(2021, "acs5")
View(vars)
vars  # this displays a partial list from a lengthy full table
## # A tibble: 27,886 × 4
##    name        label                                    concept        geography
##    <chr>       <chr>                                    <chr>          <chr>    
##  1 B01001A_001 Estimate!!Total:                         SEX BY AGE (W… tract    
##  2 B01001A_002 Estimate!!Total:!!Male:                  SEX BY AGE (W… tract    
##  3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years   SEX BY AGE (W… tract    
##  4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years    SEX BY AGE (W… tract    
##  5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years  SEX BY AGE (W… tract    
##  6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years  SEX BY AGE (W… tract    
##  7 B01001A_007 Estimate!!Total:!!Male:!!18 and 19 years SEX BY AGE (W… tract    
##  8 B01001A_008 Estimate!!Total:!!Male:!!20 to 24 years  SEX BY AGE (W… tract    
##  9 B01001A_009 Estimate!!Total:!!Male:!!25 to 29 years  SEX BY AGE (W… tract    
## 10 B01001A_010 Estimate!!Total:!!Male:!!30 to 34 years  SEX BY AGE (W… tract    
## # ℹ 27,876 more rows

Examine median monthly housing costs in Pennsylvania

I chose to explore the variation in housing costs across Pennsylvania at the county subdivision level (primarily townships), using the variable “Monthly median housing costs in dollars” (B25105_001) from the 2021 American Community Survey. The data are displayed below in an interactive map. Zoom in or out to examine different areas, and click on points of interest to view the subdivision name, the exact estimate, and the margin of error associated with that estimate.

Interactive map

The interactive map shows that the highest monthly housing costs (in dollars) are clustered in townships of the state’s major metropolitan areas around Philadelphia and Pittsburgh. Suburban areas of Philadelphia in the southeast particularly stand out in terms of higher housing costs, which tend to diminish in almost band-like pattern as one moves westward and to the center of the state. Areas in the center of the state in Centre county represent an elevated housing region relative to mostly rural areas that surround it. This part of the state is a home to Penn State University, where significant student population and elevated levels of economic activity due to the university drive higher housing costs.

# Fetch the data from the Census database for a variable of interest
pa_housing <- get_acs(
  geography = "county subdivision",  # township level
  variables = "B25105_001",
  state = "PA",
  year = 2021,
  geometry = TRUE,
  progress_bar = FALSE   # hide the download progress (0–100%) text bar
)

# Plot the data in an interactive map

colors <- inferno(n = 100)

mapview(pa_housing, zcol = "estimate", 
        layer.name = "Median monthly housing cost ($)<br/>across Pennsylvania county subdivisions,<br/>data from 2017-2021 ACS",
        col.regions = colors)
## systemfonts and textshaping have been compiled with different versions of Freetype. Because of this, textshaping will not use the font cache provided by systemfonts

Static map

This choropleth map showing the regional patterns in distribution of median monthly housing costs across townships in Pennsylvania (similar to the interactive map above).

ggplot(pa_housing, aes(fill = estimate)) + 
  geom_sf() + 
  theme_void() + 
  scale_fill_viridis_c(option = "plasma", n.breaks = 6) +    # continuous data
  labs(title = "Median monthly housing cost ($) by county subdivisions",
       subtitle = "Pennsylvania",
       fill = "ACS estimate",
       caption = "2017-2021 American Community Survey | tidycensus R package.\nGray polygons have no data associated with them.")