library(sf)
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
library(RColorBrewer)
library(units)
library(cowplot)
library(here)
library(classInt)
library(leaflet)
library(lubridate)
library(tidycensus)
library(tidyverse)
library(plotly)
library(ggiraph)
library(htmlwidgets)
library(mapview)
library(tigris)
library(scales)
Given that the variable and enumeration unit was prescribed for this section of the assignment, there isn’t much extra information to give for this introduction. We were supposed to use get_acs() to fetch data on the percentage of the population that have a graduate degree (“DP02_0066P”) at the county level and for any state of your interest. I chose Washington state.
To get the percentage of the population of Washington state that have a graduate degree by county, I used default year of 2021 for the most recent 5 year survey. Changing the output to wide moves the variable type from its own column in longform style of tidy data, into a wide format where the name of the variable is moved across the data into the other attributes. I chose to read in the variable with the name, percentgrad, and I chose county geography.
#percentage of the population that have a graduate degree, 5 year
grad_degrees_5yr <- get_acs(
geography = "county",
variables = c(percentgrad = "DP02_0066P"),
state = "WA",
output = "wide", #toggled this during exploration
year = 2021
)
I chose to look at this two ways. First, sort the dataset by descending population estimate, showing the top percentages for each county. Second, by calculating the max percentage, thus finding the county with the largest population percentage of graduate degree holders.
#sort descending
grad_degrees_5yr %>% arrange(desc(percentgradE))
## # A tibble: 39 × 4
## GEOID NAME percentgradE percentgradM
## <chr> <chr> <dbl> <dbl>
## 1 53075 Whitman County, Washington 22.6 1.9
## 2 53033 King County, Washington 22.1 0.3
## 3 53055 San Juan County, Washington 22.1 1.1
## 4 53031 Jefferson County, Washington 18.8 1.4
## 5 53029 Island County, Washington 14.1 1.1
## 6 53073 Whatcom County, Washington 14 0.7
## 7 53067 Thurston County, Washington 13.8 0.7
## 8 53039 Klickitat County, Washington 12.7 2.2
## 9 53005 Benton County, Washington 12.6 0.8
## 10 53035 Kitsap County, Washington 12.2 0.5
## # ℹ 29 more rows
#find county with largest percentage population with graduate degrees
highest <- grad_degrees_5yr %>%
filter(percentgradE == max(percentgradE, na.rm = TRUE))
highest
## # A tibble: 1 × 4
## GEOID NAME percentgradE percentgradM
## <chr> <chr> <dbl> <dbl>
## 1 53075 Whitman County, Washington 22.6 1.9
Similar to above, I chose to look at this two ways. First by sorting the dataset by ascending population estimate, showing the lowest percentages for each county. Second, by calculating the min percentage, thus finding the county with the lowest population percentage of graduate degree holders.
#Which have the smallest percentages?
grad_degrees_5yr %>% arrange((percentgradE))
## # A tibble: 39 × 4
## GEOID NAME percentgradE percentgradM
## <chr> <chr> <dbl> <dbl>
## 1 53025 Grant County, Washington 5.1 0.7
## 2 53015 Cowlitz County, Washington 5.7 0.6
## 3 53001 Adams County, Washington 5.8 1.4
## 4 53027 Grays Harbor County, Washington 6.1 0.8
## 5 53045 Mason County, Washington 6.1 0.9
## 6 53019 Ferry County, Washington 6.4 2.4
## 7 53077 Yakima County, Washington 6.6 0.6
## 8 53041 Lewis County, Washington 6.9 0.8
## 9 53017 Douglas County, Washington 7 1.2
## 10 53021 Franklin County, Washington 7.3 0.9
## # ℹ 29 more rows
lowest <- grad_degrees_5yr %>%
filter(percentgradE == min(percentgradE, na.rm = TRUE))
lowest
## # A tibble: 1 × 4
## GEOID NAME percentgradE percentgradM
## <chr> <chr> <dbl> <dbl>
## 1 53025 Grant County, Washington 5.1 0.7
This plot shows the margin of error for the given estimates in the dataset. The margin of error is the possible range +/- the percentage estimate, this is the shown in the error bar for each county. The estimate of the percentage of the population that have graduate degrees, is visualized by the green circle.
grad_plot_errorbar <- ggplot(grad_degrees_5yr, aes(x = percentgradE,
y = reorder(NAME, percentgradE))) + #sort by estimate
#calculate error bar
geom_errorbar(aes(xmin = percentgradE - percentgradM, xmax = percentgradE + percentgradM),
width = 0.5, linewidth = 0.5) +
geom_point(color = "darkgreen", size = 2) +
scale_x_continuous(labels = function(x) paste0(format(x, nsmall = 0), "%")) + #add percent to label
scale_y_discrete(labels = function(x) str_remove(x, "Blank County, Washington| County, Washington")) +
labs(title = "Percent Population that have Graduate Degrees, 2017-2021 ACS",
subtitle = "Per County in Washington State",
caption = "Figure 1. Percent of population who are graduate degree holders, per county in Washington state. \nData acquired with R and tidycensus for 2017-2021 American Community Survey Data Estimates. \nError bars represent margin of error around estimates.",
x = "ACS estimate",
y = "") +
theme_minimal(base_size = 12)
grad_plot_errorbar
#plotly
ggplotly(grad_plot_errorbar, tooltip = "x")
Figure 2. Percent of population who are graduate degree holders, per county in Washington state, Interactive with Plotly. Data acquired with R and tidycensus for 2017-2021 American Community Survey Data Estimates. Error bars represent margin of error around estimates.
I chose to do the ggiraph option as well for more customization.
#ggiraph
grad_plot_ggiraph <- ggplot(grad_degrees_5yr, aes(x = percentgradE,
y = reorder(NAME, percentgradE), #sort by estimate
tooltip = percentgradE,
data_id = GEOID)) +
geom_errorbar(aes(xmin = percentgradE - percentgradM, xmax = percentgradE + percentgradM), #calculate errorbar
width = 0.5, size = 0.5) +
geom_point_interactive(color = "darkgreen", size = 2) +
scale_x_continuous(labels = function(x) paste0(format(x, nsmall = 0), "%")) + #add percentage to label
#clean up county names
scale_y_discrete(labels = function(x) str_remove(x, "Blank County, Washington| County, Washington")) +
labs(title = "Percent Population that have Graduate Degrees, 2017-2021 ACS",
subtitle = "Per County in Washington State",
caption = "Figure 1. Percent of poulation who are graduate degree holders, per county in Washington state. \nData acquired with R and tidycensus for 2017-2021 American Community Survey Data Estimates. \nError bars represent margin of error around estimates.",
x = "ACS estimate",
y = "") +
theme_minimal(base_size = 12)
girafe(ggobj = grad_plot_ggiraph) %>%
girafe_options(opts_hover(css = "fill:cyan;"))
Figure 3. Percent of population who are graduate degree holders, per county in Washington state, Interactive with GGiraph. Data acquired with R and tidycensus for 2017-2021 American Community Survey Data Estimates. Error bars represent margin of error around estimates.
For my final project I am wanting to analyze my Seattle Fire Department 911 call data with median income, in hopes of quantifying possible disparities in response time calls based on tract median income. In my project, I am examining the greater Seattle metropolitan area, which encompasses most of Western portion of King County Washington.
There were two main median income variables I found that seemed appropriate, one is median individual income and median household income. At first, I also wanted to find this data at the block group level. Median income per individual was only available in the 1 year estimate at the county level so I cannot use that. Median income per household was available at the block group level in the 5 year estimate ACS but after examining it, I learned that several groups were null in the downtown area (a key area in my analysis), many tracts had a null margin of error (moe) and many had way too high of a moe. I learned that a null moe means that there is very little confidence in the estimate. For these reasons, I chose median household income per tract with the 5 year estimate.
To find these variables and examine them, I used the load_variables() function as instructed: vars <- load_variables(2021, “acs5”)
As stated above, I chose median household income in the 5 year estimate. More specifically its the :
Estimate!!Median income in the past 12 months –!!Total : MEDIAN INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) BY GEOGRAPHICAL MOBILITY IN THE PAST YEAR FOR CURRENT RESIDENCE IN THE UNITED STATES
Given large water bodies of the Puget Sound and Lake Washington, I chose to remove water bodies for better aesthetics.
king_income_5 <- get_acs(
geography = "tract", #smallest enumeration use available
variables = c(medianincome = "B07011_001"), #named variable for easy examination
state = "WA",
county = "King", #only want king county for final project
geometry = TRUE
)
##
|
| | 0%
|
|= | 1%
|
|=== | 4%
|
|==== | 5%
|
|===== | 7%
|
|====== | 9%
|
|========= | 12%
|
|========= | 13%
|
|=========== | 15%
|
|=========== | 16%
|
|============= | 19%
|
|============== | 20%
|
|================= | 25%
|
|==================== | 28%
|
|====================== | 31%
|
|======================== | 34%
|
|============================ | 40%
|
|============================== | 44%
|
|=============================== | 45%
|
|=================================== | 50%
|
|========================================== | 59%
|
|=============================================== | 67%
|
|====================================================== | 77%
|
|=========================================================== | 85%
|
|================================================================= | 93%
|
|======================================================================| 100%
# remove water bodies
king_income_5_erase <- erase_water(king_income_5,
area_threshold = 0.9,
year = 2021)
Given the goal was to map median income, I wanted to classify the estimates in this dataset. I chose to take a look at the distribution of income estimates across the total number of tracts.
# see distribution of median income by census tract
hist(king_income_5$estimate, main = "Histogram of King County Median Income Estimates \n by Census Tract", xlab = "Median Income in USD")
Figure 4. Histogram of King County Median Income by Census Tract, 2017-2021 ACS. Data acquired with R and tidycensus.
I wanted to capture that most of the tracts had median income’s between $45k - $65k without compromising the tracts on either end, so I chose to use Jenks Natural Breaks to classify my median income estimate data. I chose 5 classes to capture the most diversity without losing aesthetic distinction between class colors.
#for choropleth map, create class breaks
king_income_5_erase <- king_income_5_erase %>%
mutate(incomeclass = cut(estimate,
breaks = classIntervals(estimate, n = 5, style = "jenks")$brks,
include.lowest = TRUE, dig.lab=10))
We needed to use mapview() to display your data interactively.
#Interactive map view
#mapview(king_income_5_erase,
# layer.name = "Estimated Median Income by Census Tract<br/>King County Washington<br/>2017-2021 ACS",
# zcol = "estimate",
# col.regions = brewer.pal(6, "GnBu"),
# fgb=FALSE)
Link to html mapview because it is not rendering in report: file:///C:/Users/DMesler/OneDrive%20-%20The%20Pennsylvania%20State%20University/spring2_588/lesson6/mapview.html
Figure 5. Estimated Median Income by Census Tract, King County Washington. Data acquired with R and tidycensus for 2017-2021 American Community Survey Data Estimates.
Next we needed to use ggplot to create a static map. I chose choropleth because I was mapping income median household estimates. Though, because of the diversity of incomes in this dataset, I believe I could also choose to use a graduated symbol given these represent household estimate counts.
#map all king
ggplot(king_income_5_erase, aes(fill = incomeclass)) +
geom_sf(aes(fill = incomeclass)) +
theme_void() +
scale_fill_brewer(name = expression("Median Income Range, in USD"),
palette = "Greens") +
#labels = function(x) paste0("$", x)) +
labs(title = "Estimated Median Income by Census Tract",
subtitle = "King County, Washington",
fill = "ACS estimate",
caption = "2017-2021 ACS Estimates | Acquired with tidycensus R package") +
theme(legend.position = "right")
Because my study area is more of the metropolitan part of the county in the western half, I chose to remove the large eastern tracts to better focus on my study area.
# remove large eastern tracts
exclude <- c("53033032800", "53033032706", "53033031502")
king_income_5_erase_partial <- king_income_5_erase %>%
filter(!GEOID %in% exclude)
#map western king
ggplot(king_income_5_erase_partial, aes(fill = incomeclass)) +
geom_sf(aes(fill = incomeclass)) +
theme_void() +
scale_fill_brewer(name = expression("Median Income Range, in USD"),
palette = "Greens") +
#labels = function(x) paste0("$", x)) +
labs(title = "Estimated Median Income by Census Tract",
subtitle = "Western King County, Washington",
fill = "ACS estimate") +
theme(legend.position = "right")
Figure 6. Estimated Median Income by Census Tract, Western King County Washington.Three eastern tracts removed, GEOIDs ‘53033032800’, ‘53033032706’, ‘53033031502’.Data acquired with R and tidycensus for 2017-2021 American Community Survey Data Estimates. Acquired with tidycensus R package.
I think this picture of median income is incomplete as I am not currently showing the moe for these estimates. My goals for this map beyond this lesson and definitely in my final project, is to map the moe in a bivariate map. In the meantime, I tried mapping the moe with a error bar chart but with 491 census tracts, it was not readable. Another goal of mine is to find readable way to graph this moe.
Daniella Mesler