Introduction

In this report I use the tidycensus package in order to access and analyze ACS Census data. I explore two different variables in two different states. Part A analyzes the percent of the population that has a graduate degree by county in Virginia. Part B analyzes and maps the place of birth for people with an income of over $75,000. This data is mapped by county and by census tract to get a finer look at the data.

#Load needed libraries
library(tidyverse)
library(tidycensus)
library(scales)
library(plotly)
library(ggiraph)
library(mapview)
library(sf)

Part A

Data Preperation

To collect the data needed for this part I use the get_acs function that is part of the tidycensus package and grab data by county in Virginia that represents the percent of the population with a graduate degree for the year 2021. To make the data easier to read and to explore which counties have the highest and lowest prevalence of graduate degrees I selected the only variables I wanted to show, county name and the ACS estimate, and sorted the data in ascending and descending order.

#Grab ACS census data
va_edu <- get_acs(geography = "county", 
                  state = "VA", 
                  variables = c(percent_graduate = "DP02_0066P"), #Percent of poulation with graduate degrees
                  year = 2021)

top_edu_grad <- va_edu %>%
  arrange(desc(estimate)) %>% #Sort the percent graduate degree in descending order to grab highest %
  select(c(NAME, estimate)) %>%
  head() # Only show the top of the dataset

bot_edu_grad <- va_edu %>% #Sort the percent graduate degree in ascending order to grab lowest %
  arrange(estimate) %>%
  select(c(NAME, estimate)) %>%
  head()

Analysis of Graduate Degress by County

This table shows the names of of the 6 counties in Virginia with the highest proportion of the population with graduate degrees. There are also cities on this list because Virginia has a number of places called independent cities that operate as their own county and are considered counties in the census. Something interesting I noticed is that the top 5 counties contain all the cities and suburbs just south of Washington DC. The 6th county Lexington City is a very small city that primarily consists of Washington and Lee University.

top_edu_grad
## # A tibble: 6 × 2
##   NAME                        estimate
##   <chr>                          <dbl>
## 1 Falls Church city, Virginia     49.1
## 2 Arlington County, Virginia      41  
## 3 Alexandria city, Virginia       34.1
## 4 Fairfax County, Virginia        32  
## 5 Fairfax city, Virginia          31  
## 6 Lexington city, Virginia        30.6

This table shows the counties with the least percent of the population with a graduate degree. Something I noticed about these counties is that they are mostly large rural counties. They are mostly located near the West Virginia border or are a great distance from any city center.

bot_edu_grad
## # A tibble: 6 × 2
##   NAME                         estimate
##   <chr>                           <dbl>
## 1 Dickenson County, Virginia        2  
## 2 Lee County, Virginia              2.4
## 3 Hopewell city, Virginia           3.2
## 4 Russell County, Virginia          3.5
## 5 Greensville County, Virginia      3.9
## 6 Lunenburg County, Virginia        3.9

Margin of Error

This margin of error plot shows the ACS estimate for the percentage of graduate degrees by county in Virginia as well as the margin of error for each county. As Virginia has 133 counties this plot can become quite cluttered and may not be the most useful at portraying this information. To counteract this I used an argument in the scale_y_discrete function to remove any of the labels that overlap. However, while this cleans up the y labels for this plot it removes the ability to associate each estimate and bar with a county.

#Margin of error plot
moe_plot <- ggplot(va_edu, aes(x = estimate, y=reorder(NAME, estimate))) + 
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), width = 0.5, linewidth = 0.5) + #Take estimate and subtract and add moe to find boundaries
  geom_point(color = "red", size = 1) + # Add estimate as red dot
  scale_y_discrete(guide = guide_axis(check.overlap = TRUE), #Remove overlapping y axis labels
                   labels = function(x). = str_remove(x, "County, Virginia|, Virginia")) + # Remove word Virginia and county from y axis labels
  labs(title = "Percentage of Population with Graduate Degrees in Virginia",
       caption = "Plot 1: Static plot showing the margin of error for the % of the population with a graduate degree in Virginia.",
       x = "ACS Estimate % Graduate Degree", 
       y="County")

moe_plot

Margin of Error Interactive Plot

In the interactive plot, I am able to offer a better solution to the problem I had when using the static ggplot2 plot. Along with the ability to zoom in and pan around the visualization, ggplotly also offers the dynamicTicks argument. This argument, when set to TRUE, makes it so that when you zoom in the county labels for each bar become visible in a way that is both visually pleasing and informative.

#Interactive plot version of margin of error plot
ggplotly(moe_plot, tooltip=c("x","y"), dynamicTicks = T)

Plot 2: Interactive plot showing the margin of error for the % of the population with a graduate degree in Virginia.

Part B

Data Preperation

In Part B the data I use is the place of birth for people with an income of over $75,000. This data allows us to investigate how our place of birth impacts our life. TO grab the data I used the “get_acs” function again, but this time getting the data for 2023 as well as getting the geometry associated with the county and census tract.

# Grab data for year 2023 by census tract
nc_inc_tract <- get_acs(geography = "tract", 
                  variables = c(pob_inc = "B06010_011"), # Place of birth for people with annual income over $75,000
                  state = "NC",
                  year = 2023,
                  geometry = TRUE)
# Grab data by county
nc_inc_cnty <- get_acs(geography = "county", 
                        variables = c(pob_inc = "B06010_011"), 
                        state = "NC",
                        year = 2023,
                        geometry = TRUE)

Mapview Analysis by Census Tract

Using the mapview library and function I can map the places of birth for people with an income of over $75,000. Based on the map, most of the tracts near major cities have have a larger than average value with the largest tracts being outside Raleigh near the city of Apex and outside Charlotte near the South Carolina border.

mapview(nc_inc_tract, zcol = "estimate")

Analysis by County

I have also created another map using the same data but by county instead of census tracts. This map uses graduated symbols to show the total counts for the data. The map shows two county with a larger than average number of people in Wake county, home to Raleigh, and Mecklenburg County, home to Charlotte. These counties have the highest counts make sense because they are the two most populated counties and are home to many large businesses.

centroids <- st_centroid(nc_inc_cnty) # Create centroids to be used in graduated symbols plot

ggplot() + 
  geom_sf(data = nc_inc_cnty, color = "black", fill = "lightgrey") + # North Carolina basemap and county boundaries
  geom_sf(data = centroids, aes(size = estimate), alpha = 0.7, col="blue") + #Centroids with size based on count of people
  scale_size_continuous(name = "Count of People", breaks = pretty_breaks(n=4)) + #Use pretty_breaks function to have 4 breaks in size for the graduated symbols
  labs(title = "Place of Birth of People with Income over $75,000",
       subtitle = "North Carolina (2023)") + 
  theme_void()
Map 2: Graduated symbols plot showing the place of birth for people with an annual income of over $75,000 by county

Map 2: Graduated symbols plot showing the place of birth for people with an annual income of over $75,000 by county