The first set of exercises examined the percentage of the adult population that have obtained a graduate degree in Washington State, using the 2021 ACS 5-year estimates. Estimates, on a per county basis with associated margin of error values are provided. The counties with the highest and lowest graduate degree rates were found through the analysis
library(tidyverse)
library(tidycensus)
library(tigris)
library(janitor)
library(here)
library(sf)
library(plotly)
library(ggiraph)
library(ggthemes)
library(scales)
library(cowplot)
library(mapview)
options(tigris_use_cache = TRUE)
#api_key <- "" # API Key
#census_api_key(api_key,install = TRUE)
Variables used to acquire data using the tidycensus package
variable_graduate <- "DP02_0066P" # Select % of population with graduate degree as variable
geography <- "county" # Geography set to county
state <- "WA" # State Washington
year <- 2021 # Survey year 2021
survey <- "acs5" # ACS 5-year estimates
Data frame created, comprised of percentage of population with graduate degrees by county
graduate_degree <- get_acs(variables = c(percent_grad_deg = variable_graduate),geography = geography,
state = state,year = year,survey = survey) %>% # Load into data frame using input variables
arrange(-estimate) %>% # Arrange in descending order
mutate(NAME = str_remove(NAME,"County, Washington")) # Remove redundant county and state name from columns
First chart is a test of percentage of the population with graduate degrees in Washington state.
ggplot(graduate_degree, aes(x = reorder(NAME,estimate),y = estimate)) + # Graph results of estimates on per county
geom_point(color = "red") + # Color for estimate points
coord_flip() +
theme_economist() + # Set theme to style used by The Economist
labs(title = "Percentage of Population with Graduate Degrees") # Add label
Figure 1: Test chart of results that show percentage of adults with a graduate degree by county in Washington State.
Error bars to indicate margin of error associated with each estimate provided at the county level.
error_plot <- ggplot(graduate_degree,aes(x = reorder(NAME,estimate), y = estimate)) + # Order counties by highest % with graduate degrees
geom_errorbar(aes(ymin = estimate - moe, ymax = estimate + moe), width = 0.25, linewidth = 0.25, color = "navy") + # Add in margin of errors and modify attribute
geom_point(color = "darkred") + # Point color
scale_y_continuous(labels = label_percent(scale = 1, suffix = "%"))+ # Set scale as %
coord_flip() + # Flip axis
theme_economist() + #Set theme to style used by The Economist
labs(title = "Percentage of Population with Graduate Degrees", subtitle = "King County, WA", x = "County", y = "% With Graduate Degree")
error_plot # Show graph
Figure 2: Chart of results that show percentage of adults with a graduate degree by county in Washington State, with associated margin of error bars for each estimate.
Display charting using ggplotly. Chart shown displays percentage of population with graduate degrees and associated margin of error.
ggplotly(error_plot, tooltip = c("y")) # Make chart interactive to allow user to obtain actual underlying estimate value
Figure 3: Interactive chart of results that show percentage of adults with a graduate degree by county in Washington State, with associated margin of error bars for each estimate.
Create new interactive chart to be displayed using girafe library.
error_plot_2 <- ggplot(graduate_degree,aes(x = reorder(NAME,estimate), y = estimate, tooltip = estimate, data_id = GEOID)) + # Repeat process but add in inputs for Girafe
geom_errorbar(aes(ymin = estimate - moe, ymax = estimate + moe), width = 0.25, linewidth = 0.25, color = "navy") +
geom_point_interactive(color = "darkred") +
scale_y_continuous(labels = label_percent(scale = 1, suffix = "%"))+
coord_flip() +
theme_economist() +
labs(title = "Percentage of Population with Graduate Degrees", subtitle = "King County, WA",
x = "County", y = "% With Graduate Degree")
Display interactive chart
girafe(ggobj = error_plot_2) %>% # Create Girafe object
girafe_options(opts_hover(css = "fill:lightgreen;"))
Figure 4: Interactive chart of results that show percentage of adults with a graduate degree by county in Washington State, with associated margin of error bars for each estimate.
After conducting the analysis, there were a few key findings. The counties with the highest rate of graduate degree holders are Whitman, San Juan, and King. The counties with the lowest rates are Adams, Cowlitz, and Grant. That being said, there are a few caveats that need to be addressed. Some of the margin of error ranges are quite large, potentially moving some of the counties higher or lower up the list. While the top three would not change, some counties in the middle would probably have their rankings changed, as well as a few at the bottom. Ferry county is probably the biggest unknown, as it has such a large margin of error it could potentially have one of the lowest percentages.
The following analysis was conducted to gain some initial insight into some housing characteristics for owner occupied housing units in King County, Washington. The length of residency is an important item to understand when studying the housing profile of an area, which is what the analysis starts to explore. The element of interest in this particular case in the period in which housing tenure commenced for owner occupied units. There are six periods used, with the percentage of housing units falling into each period by census tract the result of the analysis that is showing in the following maps.
The libraries are already loaded.
Variables to be used when using the tidycensus package.
Pop_in_housing_by_tenure_all <- "B25026_002" # Total owner occupied housing units used
Pop_in_housing_by_tenure_2019_later <- "B25026_003" # Number of owner occupied units with initial occupancy starting 2019 or later
Pop_in_housing_by_tenure_2015_2018 <- "B25026_004" # Number of owner occupied units with initial occupancy starting between 2015 and 2018
Pop_in_housing_by_tenure_2010_2014 <- "B25026_005" # Number of owner occupied units with initial occupancy starting between 2010 and 2014
Pop_in_housing_by_tenure_2000_2009 <- "B25026_006" # Number of owner occupied units with initial occupancy starting between 2000 and 2009
Pop_in_housing_by_tenure_1990_1990 <- "B25026_007" # Number of owner occupied units with initial occupancy starting between 1990 and 1999
Pop_in_housing_by_tenure_1989_earlier <- "B25026_008" # Number of owner occupied units with initial occupancy starting in 1989 or earlier
Inputs to pass to function to acquire data.
geography <- "tract" # Set geography to census tracts
state <- "WA" # Set state to Washington
county <- "King" # Set county to King
year <- 2021 # Set survey year to 2021
survey <- "acs5" # Set survey type to 5-year estimates
The first analysis steps were acquiring data related to total owned housing units on a per census tract basis for King County, Washington. Some additional wrangling steps were undertaken to prepare the data. The values obtained were used as the denominator to calculate percentages in following steps.
Housing_Units <- get_acs(variables = c(Housing_Tenure_all = Pop_in_housing_by_tenure_all), geography = geography, state = state, year = year, survey = survey, county = county, geometry = T) %>% # Load data using inputs above for total owner occupied housing units with geometry column
rename(Census_Tract = NAME, Tenure = variable, Total_Owned_Housing_Units = estimate) %>% # Rename columns
mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) %>% # Extract census tract number from full name
erase_water(area_threshold = 0.90, year = year) # Remove bodies of water from polygons
Overview of total owned housing units displayed for King County. This provides a summary of where the most owned units are located within the county, along with related attributes.Results can be viewed using the attached file. Sample of code below.
mapview(Housing_Units,zcol = “Total_Owned_Housing_Units”) # Mapview of census tracts and attributes
Total owned housing units obtained without associated geometry. This was done to allow for additional variables to be calculated.
Tenure_all <- get_acs(variables = c(Housing_Tenure_all = Pop_in_housing_by_tenure_all), geography = geography, state = state,
year = year, survey = survey, county = county) %>% # Load data using inputs above for total owner occupied housing units without geometry
rename(Census_Tract = NAME, Tenure = variable) %>% # Rename columns
mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) # Extract census tract number
Housing units with housing tenure starting in 2019 or later obtained as an initial exploratory step.
Tenure_2019 <- get_acs(variables = c(Housing_Tenure_2019 = Pop_in_housing_by_tenure_2019_later), geography = geography, state = state, year = year, survey = survey,county = county) %>% # Load data using inputs above for total owner occupied housing units with housing tenure starting in 2019 and later without geometry
rename(Census_Tract = NAME, Tenure = variable) %>% # Rename columns
mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) # Extract census tract number
Census Tract shapefiles for King County obtained to be joined to data frames in following steps. This layer was called separately so additional values could be calculated before adding in the census tract geometry.
King_County <- tracts(state = state, county = "King") # Load in census tract TIGER/Line shapefiles
Total housing units data frame joined to tenure starting in 2019 or later data frame. The percentage of total owned housing units where housing tenure started in 2019 or later calculated. Census tract shapefiles joined and data frame converted to sf object. At this stage, all the analytical tasks have been carried out, and data is ready for mapping.
Tenure_2019 <- left_join(Tenure_all,Tenure_2019, by = "GEOID") %>% # Join total owner occupied housing unit data frame to tenure in 2019 or later data frame
rename(c(Total_Units = estimate.x, Units_2019 = estimate.y)) %>% # Rename columns
mutate(Pct_2019 = (Units_2019/Total_Units)*100) # Calculate percentage of owner occupied units with housing tenure beginning in 2019 or later
Tenure_2019 <- left_join(Tenure_2019,King_County, by = "GEOID") # Join census boundaries to Tenure data frame
Tenure_2019 <- st_as_sf(Tenure_2019) %>% # Create sf object
erase_water(area_threshold = 0.90, year = year) # Erase bodies of water from polygon
Results displayed, showing percentage of owned housing units were housing tenure commences in 2019 or later shown. AS
Year_2019 <- ggplot(data = Tenure_2019) + # Create choropleth of percentage of owner occupied housing units with initial occupancy starting in 2019 or lather
geom_sf(aes(fill = Pct_2019)) + #
scale_fill_viridis_b(option = "mako", n.breaks = 5, direction = -1) + # Set scale settings
theme_economist_white() + # Set theme to style used by The Economist
theme(legend.position = "right") + # Position legend
labs(title = "Tenure by Year of Occupation (2019 and later)",
subtitle = "King County, WA ",
fill = "Percentage of Owned Units\n by Initial Year of Occupation", caption = "ACS 5-Year Estimates 2021") # Add labels
Year_2019 # Display resulting map
Figure 5: Map of the percentage of owner occupied housing units out of total owner occupied housing units with occupancy starting in 2019 or later.
Following the successful completion of showing one tenure period result, all periods were used. Not only would missing data be provided, but the scale would be the same for each period as well, making cross period comparisons easier. The same steps were taken as before, with the calculated values for each period being what percentage of owned housing units were represented by different beginning housing tenure periods.
Built_Units <- get_acs(geography = geography, state = state, county = county, year = year,
variables = c(Tenure_2019_and_Later = Pop_in_housing_by_tenure_2019_later,
Tenure_2015_to_2018 = Pop_in_housing_by_tenure_2015_2018,
Tenure_2010_to_2014 = Pop_in_housing_by_tenure_2010_2014,
Tenure_2000_to_2009 = Pop_in_housing_by_tenure_2000_2009,
Tenure_1990_to_1999 = Pop_in_housing_by_tenure_1990_1990,
Tenure_1989_and_Earlier = Pop_in_housing_by_tenure_1989_earlier)) %>% # Create data frame with all tenure variables
rename(Census_Tract = NAME, Tenure = variable) %>% # Rename columns
mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) # Extract census tract number
Built_Units <- left_join(Tenure_all,Built_Units, by = "GEOID") %>% # Join total owner occupied housing unit data frame to tenure data frame
rename(c(Total_Units = estimate.x, Units_Year = estimate.y,
Tenure = Tenure.y, Census_Tract = Census_Tract.y)) %>% # Rename columns
mutate(Percent_Year = (Units_Year/Total_Units)*100) %>% # Calculate percentage of owner occupied units for each housing tenure range
select(GEOID,Census_Tract, Percent_Year,Tenure, Units_Year, Total_Units) # Select only relevant columns
Built_Units <- left_join(Built_Units,King_County, by = "GEOID") # Join census boundaries to Tenure data frame
Built_Units <- st_as_sf(Built_Units) %>% # Create sf object
erase_water(area_threshold = 0.90, year = year) # Erase bodies of water from polygon
Results are displayed for all six tenure periods.
ggplot(Built_Units, aes(fill = Percent_Year)) + # Create maps of results by housing tenure type percentages
geom_sf(color = NA) + # Map sf object
geom_sf(data = King_County, fill = NA,color = "lightgrey") + # Census boundaries
scale_fill_viridis_b(option = "mako", n.breaks = 6, direction = -1) + # Set scale options
facet_wrap(~Tenure) + # Create separate map for each tenure range
theme_void() + # Set map theme
theme(legend.position = "bottom") + # position legend
labs(title = "Tenure by Year of Occupation ",
subtitle = "King County, WA ",
fill = "Percentage of Owned Units by Initial Year of Occupation", caption = "ACS 5-Year Estimates 2021") # Add legend
Figure 6: Maps of the percentage of owner occupied housing units out of total owner occupied housing units by year of occupancy for different year ranges.
After conducting the analysis there were a few key takeaways.The counties with occupancy starting in the 2000 to 2009 and 2015 to 2018 periods have the most counties in the 20%-30% range. This suggests that somewhere between 40% and 60% of owned units have been occupied by the same owner somewhere between 10 and 20 years. These periods also have quite a few census tracts with even a larger percentage being represented. Based on this preliminary analysis, the percentage of owners who have lived in their homes for a decade or more is quite high in King County. Some follow up analysis would be to run some basis descriptive statistics to add some additional context to the maps.