Lab 5 RMarkdown Report

Introduction

In this week’s lab we learned how to access, read, and analyze US Census Data through the use of an R package called tidycensus. Tidycensus gives us access to the US Census Data, both attribute and spatial, through an API and converts this data into a tidy, R format for analytical use. Through the use of the tidycensus package we will request data of interest, prepare that data, and perform visual analytics on that data through the use of charts and plots. This report is split into two sections: A non-spatial Census data analysis(Part A) and a spatial Census data analysis(Part B).

R Environment Preparation

As always, we first begin by loading in the necessary packages for our R analysis:

#load packages
library(tidycensus)
library(tidyverse)
library(dplyr)
library(scales)
library(plotly)
library(ggiraph)
library(sf)
library(mapview)
library(RColorBrewer)
library(tigris)

The packages above can be grouped into use-case categories: The first category of packages is data manipulation and preparation and contains tidyverse and dplyr. The second category is data visualization and contains scales, plotly, RColorBrewer, and ggirpah. The third category pertains the spatial components of our data and includes sf and mapview. Lastly we have our Census category including the tidycensus and tigris packages.

An additional preparatory step of working with Census data entails gaining access to the Census data API. While this is not necessary in all cases it is a useful step to take if we plan on querying lots of census data through the API. To gain enhanced access and capabilities we can request an API key that essentially unlocks the full scope of the Census data. This is done in the code block below with the function census_api_key(). Our code is commented out because we already have a key installed to our R environment.

##census_api_key("20ee2a431cc7958d07b7359bb8c7dcb43af2bea4", install = TRUE)##

Part A: Non-Spatial

In this section we are focusing on the census variable code (“DP02_0066P”) which represents the percentage of the population that holds a graduate level degree. For our purposes we are narrowing our scope to the state of California. With this data in mind we are hoping to achieve three main objectives:

Objective 1: Utilize get_acs() and a margin for error plot to understand our census data

The get_acs() function allows us to pull census from the American Community Survey(ACS) through the API. In the code block below we call this function and assign it to an object called CA_Grads. Within the get_acs function we give it three arguments. Geography is where we specify the level of census data we are pulling from. In this case we specify county, meaning, we want census data from the county level. Next we pass the state argument where we state that we want census data from California. Lastly we pass the argument variables. This argument is where we specify the variable code in the census data that equals the population percentage of graduate students.

CA_Grads <- get_acs(
  geography = "county",
  state = "CA",
  variables = c(percent_grad = "DP02_0066P"))
## Getting data from the 2019-2023 5-year ACS
## Using the ACS Data Profile

If you notice the notes below the code block we see that we are accessing the census data through the API and be default we are pulling information from the most recent five year block of information: 2019-2023.

#examine our CA_Grads object
glimpse(CA_Grads)
## Rows: 58
## Columns: 5
## $ GEOID    <chr> "06001", "06003", "06005", "06007", "06009", "06011", "06013"…
## $ NAME     <chr> "Alameda County, California", "Alpine County, California", "A…
## $ variable <chr> "percent_grad", "percent_grad", "percent_grad", "percent_grad…
## $ estimate <dbl> 22.8, 19.7, 7.0, 11.0, 7.5, 3.2, 17.5, 8.2, 13.1, 8.2, 4.1, 1…
## $ moe      <dbl> 0.3, 9.3, 1.1, 0.7, 1.4, 1.2, 0.3, 1.8, 0.7, 0.3, 1.0, 0.9, 0…

Now that we have we have our data stored as an object in our R environment we can answer a couple of questions:

1) What counties have the highest percentage?

#Counties with the highest percentage of graduates
arrange(CA_Grads, desc(estimate))
## # A tibble: 58 × 5
##    GEOID NAME                             variable     estimate   moe
##    <chr> <chr>                            <chr>           <dbl> <dbl>
##  1 06085 Santa Clara County, California   percent_grad     27.7   0.3
##  2 06041 Marin County, California         percent_grad     27.2   0.8
##  3 06075 San Francisco County, California percent_grad     25.1   0.5
##  4 06081 San Mateo County, California     percent_grad     23.9   0.4
##  5 06001 Alameda County, California       percent_grad     22.8   0.3
##  6 06113 Yolo County, California          percent_grad     21.2   0.8
##  7 06003 Alpine County, California        percent_grad     19.7   9.3
##  8 06087 Santa Cruz County, California    percent_grad     18.3   0.7
##  9 06013 Contra Costa County, California  percent_grad     17.5   0.3
## 10 06073 San Diego County, California     percent_grad     16.5   0.2
## # ℹ 48 more rows

By arranging our CA_Grads object by descending order through the estimate field we can see that the top 5 counties with the highest percentage of graduates are Santa Clara, Marin, San Francisco, San Mateo, and Alameda.

2) What counties have the lowest percentage?

#counties with the lowest percentage of graduates
arrange(CA_Grads,estimate)
## # A tibble: 58 × 5
##    GEOID NAME                        variable     estimate   moe
##    <chr> <chr>                       <chr>           <dbl> <dbl>
##  1 06011 Colusa County, California   percent_grad      3.2   1.2
##  2 06035 Lassen County, California   percent_grad      3.3   0.7
##  3 06021 Glenn County, California    percent_grad      4.1   1  
##  4 06047 Merced County, California   percent_grad      4.4   0.4
##  5 06031 Kings County, California    percent_grad      4.5   0.6
##  6 06025 Imperial County, California percent_grad      4.6   0.5
##  7 06039 Madera County, California   percent_grad      5.5   0.7
##  8 06103 Tehama County, California   percent_grad      5.6   1  
##  9 06107 Tulare County, California   percent_grad      5.7   0.4
## 10 06033 Lake County, California     percent_grad      6     0.8
## # ℹ 48 more rows

By arranging without descending order we can see that the top 5 counties with the lowest percentage of graduates are Colussa, Lassen, Glen, Merced, and Kings County.

Margin for Error Plot

As you may have noticed when we glimpsed our census data in the code blocks above there is a field that says moe. Our estimate field is the estimate of the variable we are interested in (percentage of the population that are graduate degree holders). The moe field holds the margin for error of these estimates. To have the best understanding of our data we can plot it on a graph showing the margin of error for each counties estimate.

In the code block below we utilize the familiar ggplot to create a visual of our census data. The key aspect to look at is our geom_errorbar argument. With this argument we specify the minimum and maximum length of our error bar around the central point (our estimate) based off of the margin of error.

MOEPlot <- ggplot(CA_Grads, aes(x = estimate, 
                                y = reorder(NAME, estimate))) + 
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
                width = .5, linewidth = 1, color = "red") +
  geom_point(color = "blue", size = 2, shape = 18) + 
  scale_x_continuous(labels = label_number(accuracy = 1, suffix = "%")) + 
  scale_y_discrete(labels = function(x) str_remove(x, " County, California|, California")) + 
  labs(title = "California Graduate Degree Population Percentage",
       subtitle = "Counties in California",
       caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
       x = "ACS estimate",
       y = "") + 
  theme_minimal()

#call and view our created plot
MOEPlot

Objective 2: Utilize the plotly package to convert our plot into an interactive format

Plotly is a package in R that allows us to create or convert plots into an interactive format. We will convert our MOEPlot from the code block above into an interactive plot below. The plot below allows us to zoom, pan, select, and a host of other useful tools for interactivity with our plot.

#create our plotly plot
MOEPlotly <- ggplotly(MOEPlot, tooltip = "x", dynamicTicks = TRUE,
         layerData = 1, originalData = FALSE)

#call and display our plotly plot
MOEPlotly

Objective 3: Utilize the ggiraph package to convert our plot into an interactive format

Ggiraph is another package that allows us to convert or create interactive plots. In the code below we utilize this package to create another interactive plot of our census data. In the graph below we can hover over the estimate geom_point object and it will highlight the point of interest in yellow and display the percentage number.

plot_ggiraph <- ggplot(CA_Grads, aes(x = estimate, 
                                     y = reorder(NAME, estimate),
                                     tooltip = estimate,
                                     data_id = GEOID)) +
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), 
                width = 0.5, size = 0.5, color = "red") + 
  geom_point_interactive(color = "blue", size = 2) +
  scale_x_continuous(labels = label_percent()) + 
  scale_y_discrete(labels = function(x) str_remove(x, " County, California|, California")) + 
  labs(title = "California Graduate Degree Population Percentage",
       subtitle = "Counties in California",
       caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
       x = "ACS estimate",
       y = "") + 
  theme_minimal(base_size = 12)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
girafe(ggobj = plot_ggiraph) %>%
  girafe_options(opts_hover(css = "fill:yellow;stroke:black"))

Part B: Spatial

In part B we will be incorporating spatial variables with our census data. For this section of the lab we will be taking a look at median house hold income across counties in the Bay Area of California.

We do not know the variable code for this data but, fortunately, we can look at all of the variables in a given partition of the US Census data with the function load_variables(). In the code below we examine the American Community Survey Census Data from 2023 to see what variables are available to us to analyze.

DataOfInterest <- load_variables(2023, "acs5")

DataOfInterest
## # A tibble: 28,261 × 4
##    name        label                                    concept        geography
##    <chr>       <chr>                                    <chr>          <chr>    
##  1 B01001A_001 Estimate!!Total:                         Sex by Age (W… tract    
##  2 B01001A_002 Estimate!!Total:!!Male:                  Sex by Age (W… tract    
##  3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years   Sex by Age (W… tract    
##  4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years    Sex by Age (W… tract    
##  5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years  Sex by Age (W… tract    
##  6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years  Sex by Age (W… tract    
##  7 B01001A_007 Estimate!!Total:!!Male:!!18 and 19 years Sex by Age (W… tract    
##  8 B01001A_008 Estimate!!Total:!!Male:!!20 to 24 years  Sex by Age (W… tract    
##  9 B01001A_009 Estimate!!Total:!!Male:!!25 to 29 years  Sex by Age (W… tract    
## 10 B01001A_010 Estimate!!Total:!!Male:!!30 to 34 years  Sex by Age (W… tract    
## # ℹ 28,251 more rows

Upon searching the table we found that the median house hold income variable code is “B19013_001” which we will use in our analysis. Our analysis in this lab consists of three objectives:

Objective 1: Utilize the get_acs() function to grab the spatial data of our choice

For Part B we will be looking at the median household income for four of the main counties in the Bay Area of Northern California: Alameda, Contra Costa, Marin, and San Francisco. We are looking at 2023 within the five year ACS data. In the code block below we have created an object (bayArea_Med_Income) that retrieves the ACS survey data through the function, get_acs. Within this function we supplied several arguments: First we assign the geography argument to give us back census data in the units of census tracts. Our variable, “B19013_001” is the code for the median household income. We additionally specify the state and counties of interest. The last argument to take note of is where our spatial component comes in. By setting the geometry equal to TRUE, we are returned an sf tibble that contains both the attribute and spatial data from the US Census.

options(tigris_progress_bar = FALSE)

bayArea_Med_Income <- get_acs(
  geography = "tract",
  variables =  "B19013_001",
  state = "CA",
  county = c("Alameda", "Contra Costa", "Marin", "San Francisco"),
  geometry = TRUE,
  year = 2023
)
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |==============================                                        |  44%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%

Objective 2: Utilize the the map view package to display the data interactively

The map view package allows us to depict our data in a very useful way from a spatial standpoint. The mapview package combined with our census data overlays the specified data from our get_acs function on top of an interactive map. In our code below we are calling the mapview function to create a map of our Bay Area incomes. In col.regions we specify the color pallete we would like to use from the RColorBrewer package. Our at argument is used to control our breakpoints for the color classifications of our income data. The Bay Area’s income is highly skewed so we utilize quantile breaks based on the 11 categories of income from the US Census.

#mapview of bay area incomes ----
mapview(
  bayArea_Med_Income,
  zcol = "estimate",
  col.regions = brewer.pal(9, "RdYlGn"),
  at = round(quantile(bayArea_Med_Income$estimate, seq(0, 1, 0.1), na.rm = TRUE))
)

Objective 3: Utilize ggplot to create a choropleth map of our data

Our last plot returns to the ggplot package where we create a choropleth map of our Bay Area Income data. We again define our quantile breaks to show a more accurate depiction of the skewed Bay Area income.

#compute quantile breaks
Qbreaks <- quantile(bayArea_Med_Income$estimate, seq(0, 1, 0.1), na.rm = TRUE)

#ggplot graph of bay area incomes ----
ggplot(bayArea_Med_Income, aes(fill = estimate)) + 
  geom_sf() + 
  theme_void() + 
  scale_fill_viridis_b(option = "magma", breaks = Qbreaks, labels = dollar_format()) + 
  labs(title = "Median Bay Area Income Distribution in California",
       subtitle = "Alameda, Contra Costa, Marin, and San Francisco Counties",
       fill = "ACS 5 year estimate",
       caption = "2023 ACS | tidycensus R package")