The following analysis was conducted to provide an overview of the median homes values on a Census Tract basis for King County, Washington. The data set used was MEDIAN VALUE BY YEAR STRUCTURE BUILT, grouped by period of construction. While many of the techniques demonstrated are not directly applicable with the data used, such as finding medians of data already aggregated by median value, it serves as a useful learning exercise. The main objectives are to provide a broad overview of the distribution of median home values throughout the county, and to provide a complimentary visualization of where the lowest and highest median home values can be found.
library(tidyverse)
library(here)
library(janitor)
library(skimr)
library(magrittr)
library(ggbeeswarm)
library(RColorBrewer)
library(magick)
library(beepr)
library(ggthemes)
library(tidycensus)
library(tigris)
library(sf)
library(leaflet)
library(stringr)
buildings <- read_csv("C:/Users/langs/Dropbox (Personal)/PSU - GIS/GEOG 588 - Analytical Approaches for Spatial Data Science/L3/data/CensusTractBuildingCleaned.csv")
Descriptive statistics to provide numerical background for following analysis
summary(buildings) # Obtain summary statistics for median home values
## Tract Total 2020+ 2010_to_2019
## Length:495 Min. : 26000 Min. :661500 Min. : 10000
## Class :character 1st Qu.: 462600 1st Qu.:661500 1st Qu.: 635400
## Mode :character Median : 639100 Median :661500 Median : 867900
## Mean : 685134 Mean :661500 Mean : 982908
## 3rd Qu.: 852700 3rd Qu.:661500 3rd Qu.:1166700
## Max. :2000000 Max. :661500 Max. :2000000
## NA's :10 NA's :494 NA's :262
## 2000_to_2009 1990_to_1999 1980_to_1989 1970_to_1979
## Min. : 94200 Min. : 83000 Min. : 56100 Min. : 18900
## 1st Qu.: 521800 1st Qu.: 470600 1st Qu.: 417975 1st Qu.: 398500
## Median : 669600 Median : 623200 Median : 579650 Median : 562900
## Mean : 791720 Mean : 727853 Mean : 662764 Mean : 629255
## 3rd Qu.: 920100 3rd Qu.: 885200 3rd Qu.: 839900 3rd Qu.: 774625
## Max. :2000000 Max. :2000000 Max. :2000000 Max. :2000000
## NA's :158 NA's :178 NA's :171 NA's :195
## 1960_to_1969 1950_to_1959 1940_to_1949 1939-
## Min. : 125500 Min. : 221900 Min. : 236800 Min. : 235200
## 1st Qu.: 417050 1st Qu.: 459450 1st Qu.: 471900 1st Qu.: 519650
## Median : 578000 Median : 635900 Median : 664600 Median : 727100
## Mean : 636026 Mean : 715223 Mean : 736044 Mean : 782192
## 3rd Qu.: 776550 3rd Qu.: 839050 3rd Qu.: 853500 3rd Qu.: 932350
## Max. :2000000 Max. :2000000 Max. :2000000 Max. :2000000
## NA's :180 NA's :236 NA's :325 NA's :268
The conversion from wide to long format required to use groupby with built period.
buildings_new <- buildings %>%
pivot_longer(names_to = "Built_Period", values_to = "Median_Home_Price", Total:`1939-`)
A histogram of median home values for each built period was generated to provide a broad overview of the characteristics of home value distributions throughout King County.
theme_set(theme_economist_white()) # Set theme to Economist White
buildings_new %>% # Modify data frame
na.omit() %>% # Remove Census Tracts with no price value
group_by(Built_Period) %>% # Group by period built
mutate(Median_Home_Price = Median_Home_Price/1e6) %>% # Divide median home price by one million
ggplot(aes(y = Median_Home_Price)) + # Plot median home prices
geom_histogram(aes(bins = 15), fill = "lightblue", color = "darkred", alpha = 0.5) + # Add color scale
coord_flip() + # Flip axis
facet_wrap(facets = ~Built_Period ) + # plot histograms for all periods
labs(title = "Median Home Value by Year Built",
subtitle = "King County, WA", y = "Median Home Value in $ Millions", x = "Number of Census Tracts") # Add labels
Figure 1: Historgram of median home prices by Census Tract in King County, grouped by period built
ggsave("histogram.png") # Save figure
After the initial overview of median home value distractions, a follow up visual was provided in the form of box plots. The same built period grouping scheme as before was kept. The median values, along with the interquartile range and outliers are shown. This gives a summary of what the typical home value is for each period of construction.
theme_set(theme_economist()) # Set theme to Economist
buildings_new %>% # Modify data frame
na.omit() %>% # Remove Census Tracts with no price value
group_by(Built_Period) %>% # Group by period built
ggplot(aes(x = Built_Period, y = Median_Home_Price)) + # Plot median home prices grouped by period built
geom_boxplot() + # Plot boxplot
coord_flip() + # Flip axis
labs(title = "Median Home Value by Year Built",
subtitle = "King County, WA",
x = "Period of Construction", y = "Median Home Value") # Add labels
Figure 2: Boxplot of median home prices by Census Tract in King County, grouped by period built
ggsave("boxplot.png") # Save figure
A related violin plot is shown to provide some additional information not directly shown with the box plot. The shape is impacted by the distribution, allowing for value groupings to be seen more easily than in the box plot. Additionally, points representing the median home values for each Census Tract are shown as well, with lower values in blue and higher values in red.
theme_set(theme_dark()) # Set theme to dark
buildings_new %>% # Modify data frame
na.omit() %>% # Remove Census Tracts with no price value
group_by(Built_Period) %>% # Group by period built
ggplot(aes(x = Built_Period, y = Median_Home_Price)) + # Plot median home prices grouped by period built
geom_violin(fill = "lightgrey", alpha = 0.75) + # Add violin plot
geom_jitter(aes(y = Median_Home_Price, color = Median_Home_Price), alpha = 0.35) + # Add individual home value points for each Census Tract
scale_color_distiller(name = "Median Home Price Range", palette = "Spectral") + # Add color scale
labs(title = "Median Home Value by Year Built",
subtitle = "King County, WA",
x = "Period of Construction", y = "Median Home Value") + # Add labels
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) # Adjust period built labels
Figure 3: Violin plot of median home prices by Census Tract in King County, grouped by period built. Includes points for the median home value in each Census Tract
ggsave("violin.png") # Save figure
The method to just extract Tract numbers from the full Tract name was conducted. This step was done so that the home value data can be joined to the Census Tract shapefiles.
Tract_names <- (buildings$Tract) # Create new list for Tract names only
Tract_names <-as.numeric(str_extract(Tract_names,"\\d+\\.*\\d*")) # Extract Census Tract numbers
head(Tract_names) # Examine first five rows
## [1] 1.01 1.02 2.01 2.02 3.00 4.02
A new data frame was created with the only two columns being the Tract number and the median home values of each Census Tract for all built periods. This data frame is joined to the Census Tract shapefiles.
Total_Median_Price <- data.frame(buildings) %>% # Create new data frame
select(Tract:Total) %>% # Select only columns for Tract names and median home values for all periods
mutate(Tract_number = as.numeric(str_extract(Tract,"\\d+\\.*\\d*"))) %>% # Change Census Tract names to Tract numbers
select(Tract_number, Total) # Reorder
The tidycensus library was used to obtained the shapefiles for the Census Tracts in King County.
King_tracts <- tracts(“WA”, “King”) # Obtain Shapefiles for Census Tracts in King County King_tracts <- mutate(King_tracts,Tract_number = as.numeric(King_tracts$NAME)) # Convert Tract names column from text to numeric
Before the home values were used, a test was run to see how the Census Tract boundaries would be displayed.
ggplot(King_tracts) + # Plot Tract boundaries
geom_sf() +
theme_dark() # Set theme
Figure 4: Test display of Census Tract Shapefiles for King County
After the initial test, the home value data was joined to the Census Tract layer.
King_Combined <- full_join(King_tracts,Total_Median_Price,by = c("Tract_number" = "Tract_number")) # Join median home values to Tract boundaries
The final step was to show the median home value for each Census Tract using the same color scheme as shown in the violin plot. The locations of the lower and higher valued homes are displayed to provide additional context. While the previous steps gave a breakdown of values in strictly numerical terms, the map provided the missing link of showing where those higher and lower value home Tracts are actually located.
ggplot(data = King_Combined, aes(fill = Total)) + # Plot boundaries using home values for each tract
geom_sf()+
scale_fill_distiller(name = "Median House Value", palette = "Spectral") + # Add color scale for home prices
theme_dark() + # Set theme
labs(title = "Median Home Value by Census Tract Map", subtitle = "King County, WA") # Add labels
Figure 5: Map showing median homes prices by Census Tract in King County for all construction periods
ggsave("PriceMap.jpg") # Save figure
After performing the prior analysis, there were several main takeaways. The median home values are roughly similar for the different periods, at over 600k, with the period for 2020 and later construction being the outlier. There is simply not enough recent data for its inclusion. At this time, that period should probably be excluded.The 2010 to 2019 construction period also appears to have the highest proportional number of highest valued homes. Another interesting factor is that the highest valued homes are found roughly in the same general area. While more thorough techniques would be needed before drawing any defensible conclusions on whether clustering exists, at least at this stage, the distribution of home values and where the highest values are located has been completed.