Week 6 - Lesson 5 Exercises

Part A

Introduction

The first set of exercises examined the percentage of the adult population that have obtained a graduate degree in Washington State, using the 2021 ACS 5-year estimates. Estimates, on a per county basis with associated margin of error values are provided. The counties with the highest and lowest graduate degree rates were found through the analysis

Data Preparation

Load Libraries

library(tidyverse)
library(tidycensus)
library(tigris)
library(janitor)
library(here)
library(sf)
library(plotly)
library(ggiraph)
library(ggthemes)
library(scales)
library(cowplot)
library(mapview)
options(tigris_use_cache = TRUE)

API Key

#api_key <- "" # API Key
#census_api_key(api_key,install = TRUE)

Input Variables

Variables used to acquire data using the tidycensus package

variable_graduate <- "DP02_0066P" # Select % of population with graduate degree as variable
geography <- "county" # Geography set to county
state <- "WA" # State Washington
year <- 2021 # Survey year 2021
survey <- "acs5" # ACS 5-year estimates 

Get Data

Data frame created, comprised of percentage of population with graduate degrees by county

graduate_degree <- get_acs(variables = c(percent_grad_deg = variable_graduate),geography = geography,
                           state = state,year = year,survey = survey) %>% # Load into data frame using input variables
  arrange(-estimate) %>% # Arrange in descending order
  mutate(NAME = str_remove(NAME,"County, Washington")) # Remove redundant county and state name from columns

Analysis

Graph 1 - Test

First chart is a test of percentage of the population with graduate degrees in Washington state.

ggplot(graduate_degree, aes(x = reorder(NAME,estimate),y = estimate)) + # Graph results of estimates on per county 
  geom_point(color = "red") + # Color for estimate points
  coord_flip() + 
  theme_economist() + # Set theme to style used by The Economist
  labs(title = "Percentage of Population with Graduate Degrees") # Add label
Figure 1: Test chart of results that show percentage of adults with a graduate degree by county in Washington State.

Figure 1: Test chart of results that show percentage of adults with a graduate degree by county in Washington State.

Plot with error bars

Error bars to indicate margin of error associated with each estimate provided at the county level.

error_plot <- ggplot(graduate_degree,aes(x = reorder(NAME,estimate), y = estimate)) + # Order counties by highest % with graduate degrees
  geom_errorbar(aes(ymin = estimate - moe, ymax = estimate + moe), width = 0.25, linewidth = 0.25, color = "navy") + # Add in margin of errors and modify attribute
  geom_point(color = "darkred") + # Point color
  scale_y_continuous(labels = label_percent(scale = 1, suffix = "%"))+ # Set scale as % 
  coord_flip() +  # Flip axis
  theme_economist() +  #Set theme to style used by The Economist
  labs(title = "Percentage of Population with Graduate Degrees", subtitle = "King County, WA", x = "County", y = "% With Graduate Degree")
error_plot # Show graph
Figure 2: Chart  of results that show percentage of adults with a graduate degree by county in Washington State, with associated margin of error bars for each estimate.

Figure 2: Chart of results that show percentage of adults with a graduate degree by county in Washington State, with associated margin of error bars for each estimate.

Make chart interactive with ggplotly

Display charting using ggplotly. Chart shown displays percentage of population with graduate degrees and associated margin of error.

ggplotly(error_plot, tooltip = c("y")) # Make chart interactive to allow user to obtain actual underlying estimate value

Figure 3: Interactive chart of results that show percentage of adults with a graduate degree by county in Washington State, with associated margin of error bars for each estimate.

Interactive Chart using girafe

Create new interactive chart to be displayed using girafe library.

error_plot_2 <- ggplot(graduate_degree,aes(x = reorder(NAME,estimate), y = estimate, tooltip = estimate, data_id = GEOID)) + # Repeat process but add in inputs for Girafe 
  geom_errorbar(aes(ymin = estimate - moe, ymax = estimate + moe), width = 0.25, linewidth = 0.25, color = "navy") +
  geom_point_interactive(color = "darkred") + 
  scale_y_continuous(labels = label_percent(scale = 1, suffix = "%"))+
  coord_flip() + 
  theme_economist() +
  labs(title = "Percentage of Population with Graduate Degrees", subtitle = "King County, WA",
       x = "County", y = "% With Graduate Degree")

Girafe Object

Display interactive chart

girafe(ggobj = error_plot_2) %>% # Create Girafe object
  girafe_options(opts_hover(css = "fill:lightgreen;"))

Figure 4: Interactive chart of results that show percentage of adults with a graduate degree by county in Washington State, with associated margin of error bars for each estimate.

Conclusion

After conducting the analysis, there were a few key findings. The counties with the highest rate of graduate degree holders are Whitman, San Juan, and King. The counties with the lowest rates are Adams, Cowlitz, and Grant. That being said, there are a few caveats that need to be addressed. Some of the margin of error ranges are quite large, potentially moving some of the counties higher or lower up the list. While the top three would not change, some counties in the middle would probably have their rankings changed, as well as a few at the bottom. Ferry county is probably the biggest unknown, as it has such a large margin of error it could potentially have one of the lowest percentages.

Part B

Introduction

The following analysis was conducted to gain some initial insight into some housing characteristics for owner occupied housing units in King County, Washington. The length of residency is an important item to understand when studying the housing profile of an area, which is what the analysis starts to explore. The element of interest in this particular case in the period in which housing tenure commenced for owner occupied units. There are six periods used, with the percentage of housing units falling into each period by census tract the result of the analysis that is showing in the following maps.

Data Preparation

The libraries are already loaded.

Variables

Variables to be used when using the tidycensus package.

Pop_in_housing_by_tenure_all <- "B25026_002" # Total owner occupied housing units used
Pop_in_housing_by_tenure_2019_later <- "B25026_003" # Number of owner occupied units with initial occupancy starting 2019 or later
Pop_in_housing_by_tenure_2015_2018 <- "B25026_004" # Number of owner occupied units with initial occupancy starting between 2015 and 2018
Pop_in_housing_by_tenure_2010_2014 <- "B25026_005" # Number of owner occupied units with initial occupancy starting between 2010 and 2014
Pop_in_housing_by_tenure_2000_2009 <- "B25026_006" # Number of owner occupied units with initial occupancy starting between 2000 and 2009
Pop_in_housing_by_tenure_1990_1990 <- "B25026_007" # Number of owner occupied units with initial occupancy starting between 1990 and 1999
Pop_in_housing_by_tenure_1989_earlier <- "B25026_008" # Number of owner occupied units with initial occupancy starting in 1989 or earlier

Inputs

Inputs to pass to function to acquire data.

geography <- "tract" # Set geography to census tracts
state <- "WA" # Set state to Washington
county <- "King" # Set county to King
year <- 2021 # Set survey year to 2021
survey <- "acs5" # Set survey type to 5-year estimates

Analysis

Total Housing Units

The first analysis steps were acquiring data related to total owned housing units on a per census tract basis for King County, Washington. Some additional wrangling steps were undertaken to prepare the data. The values obtained were used as the denominator to calculate percentages in following steps.

Housing_Units <- get_acs(variables = c(Housing_Tenure_all = Pop_in_housing_by_tenure_all), geography = geography, state = state, year = year, survey = survey, county = county, geometry = T) %>% # Load data using inputs above for total owner occupied housing units with geometry column
  rename(Census_Tract = NAME, Tenure = variable, Total_Owned_Housing_Units = estimate) %>% # Rename columns
  mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) %>% # Extract census tract number from full name
  erase_water(area_threshold = 0.90, year = year) # Remove bodies of water from polygons 

Display Map View

Overview of total owned housing units displayed for King County. This provides a summary of where the most owned units are located within the county, along with related attributes.Results can be viewed using the attached file. Sample of code below.

Code Example

mapview(Housing_Units,zcol = “Total_Owned_Housing_Units”) # Mapview of census tracts and attributes

All Housing Units without Geometry

Total owned housing units obtained without associated geometry. This was done to allow for additional variables to be calculated.

Tenure_all <- get_acs(variables = c(Housing_Tenure_all = Pop_in_housing_by_tenure_all), geography = geography, state = state,
                      year = year, survey = survey, county = county) %>% # Load data using inputs above for total owner occupied housing units without geometry
  rename(Census_Tract = NAME, Tenure = variable) %>% # Rename columns
  mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) # Extract census tract number

2019 and Later

Housing units with housing tenure starting in 2019 or later obtained as an initial exploratory step.

Tenure_2019 <- get_acs(variables = c(Housing_Tenure_2019 = Pop_in_housing_by_tenure_2019_later), geography = geography, state = state, year = year, survey = survey,county = county) %>% # Load data using inputs above for total owner occupied housing units with housing tenure starting in 2019 and later without geometry
  rename(Census_Tract = NAME, Tenure = variable) %>% # Rename columns
  mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) # Extract census tract number

Census Tracts

Census Tract shapefiles for King County obtained to be joined to data frames in following steps. This layer was called separately so additional values could be calculated before adding in the census tract geometry.

King_County <- tracts(state = state, county = "King") # Load in census tract TIGER/Line shapefiles

2019 and Later

Total housing units data frame joined to tenure starting in 2019 or later data frame. The percentage of total owned housing units where housing tenure started in 2019 or later calculated. Census tract shapefiles joined and data frame converted to sf object. At this stage, all the analytical tasks have been carried out, and data is ready for mapping.

Tenure_2019 <- left_join(Tenure_all,Tenure_2019, by = "GEOID") %>% # Join total owner occupied housing unit data frame to tenure in 2019 or later data frame
  rename(c(Total_Units = estimate.x, Units_2019 = estimate.y)) %>% # Rename columns
  mutate(Pct_2019 = (Units_2019/Total_Units)*100) # Calculate percentage of owner occupied units with housing tenure beginning in 2019 or later
Tenure_2019 <- left_join(Tenure_2019,King_County, by = "GEOID") # Join census boundaries to Tenure data frame
Tenure_2019 <- st_as_sf(Tenure_2019) %>% # Create sf object
  erase_water(area_threshold = 0.90, year = year) # Erase bodies of water from polygon

Display

Results displayed, showing percentage of owned housing units were housing tenure commences in 2019 or later shown. AS

Year_2019 <- ggplot(data = Tenure_2019) + # Create choropleth of percentage of owner occupied housing units with initial occupancy starting in 2019 or lather
  geom_sf(aes(fill = Pct_2019)) + # 
  scale_fill_viridis_b(option = "mako", n.breaks = 5, direction = -1) + # Set scale settings
  theme_economist_white() + # Set theme to style used by The Economist
  theme(legend.position = "right") + # Position legend
  labs(title = "Tenure by Year of Occupation (2019 and later)",
       subtitle = "King County, WA ",
       fill = "Percentage of Owned Units\n by Initial Year of Occupation", caption = "ACS 5-Year Estimates 2021") # Add labels
Year_2019 # Display resulting map
Figure 5: Map of the percentage of owner occupied housing units out of total owner occupied housing units with occupancy starting in 2019 or later.

Figure 5: Map of the percentage of owner occupied housing units out of total owner occupied housing units with occupancy starting in 2019 or later.

Combined Analysis

Following the successful completion of showing one tenure period result, all periods were used. Not only would missing data be provided, but the scale would be the same for each period as well, making cross period comparisons easier. The same steps were taken as before, with the calculated values for each period being what percentage of owned housing units were represented by different beginning housing tenure periods.

Built_Units <- get_acs(geography = geography, state = state, county = county, year = year,
                       variables = c(Tenure_2019_and_Later = Pop_in_housing_by_tenure_2019_later, 
                                     Tenure_2015_to_2018 = Pop_in_housing_by_tenure_2015_2018,
                                     Tenure_2010_to_2014 = Pop_in_housing_by_tenure_2010_2014,
                                     Tenure_2000_to_2009 = Pop_in_housing_by_tenure_2000_2009,
                                     Tenure_1990_to_1999 = Pop_in_housing_by_tenure_1990_1990,
                                     Tenure_1989_and_Earlier = Pop_in_housing_by_tenure_1989_earlier)) %>% # Create data frame with all tenure variables
  rename(Census_Tract = NAME, Tenure = variable) %>% # Rename columns
  mutate(Census_Tract = as.numeric(str_extract(Census_Tract,"\\d+\\.*\\d*"))) # Extract census tract number

Built_Units <- left_join(Tenure_all,Built_Units, by = "GEOID") %>% # Join total owner occupied housing unit data frame to tenure data frame
  rename(c(Total_Units = estimate.x, Units_Year = estimate.y,
           Tenure = Tenure.y, Census_Tract = Census_Tract.y)) %>% # Rename columns
  mutate(Percent_Year = (Units_Year/Total_Units)*100) %>% # Calculate percentage of owner occupied units for each housing tenure range
  select(GEOID,Census_Tract, Percent_Year,Tenure, Units_Year, Total_Units) # Select only relevant columns

Built_Units <- left_join(Built_Units,King_County, by = "GEOID") # Join census boundaries to Tenure data frame
Built_Units <- st_as_sf(Built_Units) %>% # Create sf object
  erase_water(area_threshold = 0.90, year = year) # Erase bodies of water from polygon

Display results

Results are displayed for all six tenure periods.

ggplot(Built_Units, aes(fill = Percent_Year)) + # Create maps of results by housing tenure type percentages
  geom_sf(color = NA) + # Map sf object
  geom_sf(data = King_County, fill = NA,color = "lightgrey") + # Census boundaries
  scale_fill_viridis_b(option = "mako", n.breaks = 6, direction = -1) + # Set scale options
  facet_wrap(~Tenure) + # Create separate map for each tenure range
  theme_void() + # Set map theme
  theme(legend.position = "bottom") + # position legend
  labs(title = "Tenure by Year of Occupation ",
       subtitle = "King County, WA ",
       fill = "Percentage of Owned Units by Initial Year of Occupation", caption = "ACS 5-Year Estimates 2021") # Add legend
Figure 6: Maps of the percentage of owner occupied housing units out of total owner occupied housing units by year of occupancy for different year ranges.

Figure 6: Maps of the percentage of owner occupied housing units out of total owner occupied housing units by year of occupancy for different year ranges.

Conclusion

After conducting the analysis there were a few key takeaways.The counties with occupancy starting in the 2000 to 2009 and 2015 to 2018 periods have the most counties in the 20%-30% range. This suggests that somewhere between 40% and 60% of owned units have been occupied by the same owner somewhere between 10 and 20 years. These periods also have quite a few census tracts with even a larger percentage being represented. Based on this preliminary analysis, the percentage of owners who have lived in their homes for a decade or more is quite high in King County. Some follow up analysis would be to run some basis descriptive statistics to add some additional context to the maps.