Median Home Values in King County, Washington

The following analysis was conducted to provide an overview of the median homes values on a Census Tract basis for King County, Washington. The data set used was MEDIAN VALUE BY YEAR STRUCTURE BUILT, grouped by period of construction. While many of the techniques demonstrated are not directly applicable with the data used, such as finding medians of data already aggregated by median value, it serves as a useful learning exercise. The main objectives are to provide a broad overview of the distribution of median home values throughout the county, and to provide a complimentary visualization of where the lowest and highest median home values can be found.

Load libraries

library(tidyverse)
library(here)
library(janitor)
library(skimr)
library(magrittr)
library(ggbeeswarm)
library(RColorBrewer)
library(magick)
library(beepr)
library(ggthemes)
library(tidycensus)
library(tigris)
library(sf)
library(leaflet)
library(stringr)

Import dataset from previous lesson

buildings <- read_csv("C:/Users/langs/Dropbox (Personal)/PSU - GIS/GEOG 588 - Analytical Approaches for Spatial Data Science/L3/data/CensusTractBuildingCleaned.csv")

Summary Statistics

Descriptive statistics to provide numerical background for following analysis

summary(buildings) # Obtain summary statistics for median home values
##     Tract               Total             2020+         2010_to_2019    
##  Length:495         Min.   :  26000   Min.   :661500   Min.   :  10000  
##  Class :character   1st Qu.: 462600   1st Qu.:661500   1st Qu.: 635400  
##  Mode  :character   Median : 639100   Median :661500   Median : 867900  
##                     Mean   : 685134   Mean   :661500   Mean   : 982908  
##                     3rd Qu.: 852700   3rd Qu.:661500   3rd Qu.:1166700  
##                     Max.   :2000000   Max.   :661500   Max.   :2000000  
##                     NA's   :10        NA's   :494      NA's   :262      
##   2000_to_2009      1990_to_1999      1980_to_1989      1970_to_1979    
##  Min.   :  94200   Min.   :  83000   Min.   :  56100   Min.   :  18900  
##  1st Qu.: 521800   1st Qu.: 470600   1st Qu.: 417975   1st Qu.: 398500  
##  Median : 669600   Median : 623200   Median : 579650   Median : 562900  
##  Mean   : 791720   Mean   : 727853   Mean   : 662764   Mean   : 629255  
##  3rd Qu.: 920100   3rd Qu.: 885200   3rd Qu.: 839900   3rd Qu.: 774625  
##  Max.   :2000000   Max.   :2000000   Max.   :2000000   Max.   :2000000  
##  NA's   :158       NA's   :178       NA's   :171       NA's   :195      
##   1960_to_1969      1950_to_1959      1940_to_1949         1939-        
##  Min.   : 125500   Min.   : 221900   Min.   : 236800   Min.   : 235200  
##  1st Qu.: 417050   1st Qu.: 459450   1st Qu.: 471900   1st Qu.: 519650  
##  Median : 578000   Median : 635900   Median : 664600   Median : 727100  
##  Mean   : 636026   Mean   : 715223   Mean   : 736044   Mean   : 782192  
##  3rd Qu.: 776550   3rd Qu.: 839050   3rd Qu.: 853500   3rd Qu.: 932350  
##  Max.   :2000000   Max.   :2000000   Max.   :2000000   Max.   :2000000  
##  NA's   :180       NA's   :236       NA's   :325       NA's   :268

Covert from wide to long format

The conversion from wide to long format required to use groupby with built period.

buildings_new <- buildings %>%
  pivot_longer(names_to = "Built_Period", values_to = "Median_Home_Price", Total:`1939-`)

create histogram of median homes values, grouped by period of construction

A histogram of median home values for each built period was generated to provide a broad overview of the characteristics of home value distributions throughout King County.

theme_set(theme_economist_white()) # Set theme to Economist White
buildings_new %>% # Modify data frame
  na.omit() %>% # Remove Census Tracts with no price value
  group_by(Built_Period) %>% # Group by period built
  mutate(Median_Home_Price = Median_Home_Price/1e6) %>% # Divide median home price by one million
  ggplot(aes(y = Median_Home_Price)) + # Plot median home prices
  geom_histogram(aes(bins = 15), fill = "lightblue", color = "darkred", alpha = 0.5) + # Add color scale  
  coord_flip() + # Flip  axis
  facet_wrap(facets = ~Built_Period ) + # plot histograms for all periods
  labs(title = "Median Home Value by Year Built",
       subtitle = "King County, WA", y = "Median Home Value in $ Millions", x = "Number of Census Tracts") # Add labels
Figure 1: Historgram of median home prices by Census Tract in King County, grouped by period built

Figure 1: Historgram of median home prices by Census Tract in King County, grouped by period built

ggsave("histogram.png") # Save figure

Create boxplot to show IQR and outliers

After the initial overview of median home value distractions, a follow up visual was provided in the form of box plots. The same built period grouping scheme as before was kept. The median values, along with the interquartile range and outliers are shown. This gives a summary of what the typical home value is for each period of construction.

theme_set(theme_economist()) # Set theme to Economist
buildings_new %>% # Modify data frame
  na.omit() %>% # Remove Census Tracts with no price value
  group_by(Built_Period) %>% # Group by period built
  ggplot(aes(x = Built_Period, y = Median_Home_Price)) + # Plot median home prices grouped by period built
  geom_boxplot() + # Plot boxplot
  coord_flip() + # Flip  axis
  labs(title = "Median Home Value by Year Built",
       subtitle = "King County, WA",
       x = "Period of Construction", y = "Median Home Value") # Add labels
Figure 2: Boxplot of median home prices by Census Tract in King County, grouped by period built

Figure 2: Boxplot of median home prices by Census Tract in King County, grouped by period built

ggsave("boxplot.png") # Save figure

Create violin plot with individual home value points included

A related violin plot is shown to provide some additional information not directly shown with the box plot. The shape is impacted by the distribution, allowing for value groupings to be seen more easily than in the box plot. Additionally, points representing the median home values for each Census Tract are shown as well, with lower values in blue and higher values in red.

theme_set(theme_dark()) # Set theme to dark
buildings_new %>% # Modify data frame
  na.omit() %>% # Remove Census Tracts with no price value
  group_by(Built_Period) %>% # Group by period built
  ggplot(aes(x = Built_Period, y = Median_Home_Price)) + # Plot median home prices grouped by period built
  geom_violin(fill = "lightgrey", alpha = 0.75) +  # Add violin plot
  geom_jitter(aes(y = Median_Home_Price, color = Median_Home_Price), alpha = 0.35) + # Add individual home value points for each Census Tract
  scale_color_distiller(name = "Median Home Price Range", palette = "Spectral") + # Add color scale
  labs(title = "Median Home Value by Year Built",
       subtitle = "King County, WA",
       x = "Period of Construction", y = "Median Home Value") + # Add labels
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) # Adjust period built labels 
Figure 3: Violin plot of median home prices by Census Tract in King County, grouped by period built. Includes points for the median home value in each Census Tract

Figure 3: Violin plot of median home prices by Census Tract in King County, grouped by period built. Includes points for the median home value in each Census Tract

ggsave("violin.png") # Save figure

Create list of tract numbers

The method to just extract Tract numbers from the full Tract name was conducted. This step was done so that the home value data can be joined to the Census Tract shapefiles.

Tract_names <- (buildings$Tract) # Create new list for Tract names only
Tract_names <-as.numeric(str_extract(Tract_names,"\\d+\\.*\\d*")) # Extract Census Tract numbers
head(Tract_names) # Examine first five rows 
## [1] 1.01 1.02 2.01 2.02 3.00 4.02

Create data frame that only containes overall median home value and Census Tract numbers

A new data frame was created with the only two columns being the Tract number and the median home values of each Census Tract for all built periods. This data frame is joined to the Census Tract shapefiles.

Total_Median_Price <- data.frame(buildings) %>%  # Create new data frame
  select(Tract:Total) %>% # Select only columns for Tract names and median home values for all periods
  mutate(Tract_number = as.numeric(str_extract(Tract,"\\d+\\.*\\d*"))) %>% # Change Census Tract names to Tract numbers
  select(Tract_number, Total) # Reorder 

Use tidycensus library to import Census Tract Shapefiles for King County

The tidycensus library was used to obtained the shapefiles for the Census Tracts in King County.

Code run but not displayed below

King_tracts <- tracts(“WA”, “King”) # Obtain Shapefiles for Census Tracts in King County King_tracts <- mutate(King_tracts,Tract_number = as.numeric(King_tracts$NAME)) # Convert Tract names column from text to numeric

Test to see if files worked

Before the home values were used, a test was run to see how the Census Tract boundaries would be displayed.

ggplot(King_tracts) + # Plot Tract boundaries
  geom_sf() + 
  theme_dark() # Set theme
Figure 4: Test display of Census Tract Shapefiles for King County

Figure 4: Test display of Census Tract Shapefiles for King County

Combine overall median home price and Census Tract data frames

After the initial test, the home value data was joined to the Census Tract layer.

King_Combined <- full_join(King_tracts,Total_Median_Price,by = c("Tract_number" = "Tract_number")) # Join median home values to Tract boundaries

Display median home price values by Census Tract

The final step was to show the median home value for each Census Tract using the same color scheme as shown in the violin plot. The locations of the lower and higher valued homes are displayed to provide additional context. While the previous steps gave a breakdown of values in strictly numerical terms, the map provided the missing link of showing where those higher and lower value home Tracts are actually located.

ggplot(data = King_Combined, aes(fill = Total)) + # Plot boundaries using home values for each tract
  geom_sf()+ 
  scale_fill_distiller(name = "Median House Value", palette = "Spectral") + # Add color scale for home prices 
  theme_dark() + # Set theme
  labs(title = "Median Home Value by Census Tract Map", subtitle = "King County, WA") # Add labels
Figure 5: Map showing median homes prices by Census Tract in King County for all construction periods

Figure 5: Map showing median homes prices by Census Tract in King County for all construction periods

ggsave("PriceMap.jpg") # Save figure

Final Results and Conclusion

After performing the prior analysis, there were several main takeaways. The median home values are roughly similar for the different periods, at over 600k, with the period for 2020 and later construction being the outlier. There is simply not enough recent data for its inclusion. At this time, that period should probably be excluded.The 2010 to 2019 construction period also appears to have the highest proportional number of highest valued homes. Another interesting factor is that the highest valued homes are found roughly in the same general area. While more thorough techniques would be needed before drawing any defensible conclusions on whether clustering exists, at least at this stage, the distribution of home values and where the highest values are located has been completed.