In this document I do some basic analyses on median home value of Maryland counties over time.
For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.
I use the following packages for the following purposes:
library(tidyverse)
library(tidycensus)
library(tigris)
library(sf)
library(readxl)
library(viridis)
library(tools)
The tidycensus and tigris packages require a Census API key. I previously obtained such a key and stored it for use by the functions included in the tidycensus and tigris packages.
I use data from the following sources; see the References section below for more information:
get_acs() function.counties() function.allitems.xlsx. Consumer Price Index data used to adjust for inflation.I check to make sure that the versions of the files being used in this analysis are identical to the versions of the files I originally downloaded. I do this by comparing the MD5 checksums of the files against MD5 values I previously computed, and stopping execution if they do not match.
stopifnot(md5sum("allitems.xlsx") == "f890a4aff139ca003246d17491d51272")
I first use the tidycensus get_acs() function to retrieve median home value estimates from the 2010 and 2017 ACS 5-year estimates. This function returns data in tidy format, with the following columns:
GEOID. A 5-character geographic ID combining the 2-digit FIPS code for Maryland (“24”) with the 3-digit FIPS code for the individual county or county-equivalent within Maryland.NAME. The name of the county or county-equivalent.variable. References the ACS variable being retrieved (which we here name mhv).estimate. The ACS 5-year estimate for the variable’s value.moe. The margin of error for the estimate.I take the table returned by get_acs() and add a new variable year containing the year of the estimate.
mhv_2010 <- get_acs(
geography = "county",
state = "MD",
year = 2010,
survey = "acs5",
variables = c(mhv = "B25077_001")
) %>%
mutate(year = 2010)
mhv_2017 <- get_acs(
geography = "county",
state = "MD",
year = 2017,
survey = "acs5",
variables = c(mhv = "B25077_001")
) %>%
mutate(year = 2017)
The ACS 5-year estimates are expressed in dollars as of the year in question. For example, the 5-year estimates for 2010 are in 2010 dollars. In order to adjust values for inflation I read in an Excel spreadsheet containing consumer price index data from 1978 to 2018. The index values are expressed as a percentage of the 1997 index (set at 100).
(Note that adjusting home values for inflation is a somewhat problematic exercise since housing costs are themselves a major component of inflation. However I do it here for consistency with other analyses in this series.)
The spreadsheet has two sheets, each of which has values for all twelve months of the year plus an average figure for the entire year. The sheets differ in whether or not the monthly figures are seasonally-adjusted. Since I am doing adjustments for entire years I set col_types to skip over the monthly values, and read only the first sheet (which is not seasonally-adjusted).
cpi_tbl <- read_excel("allitems.xlsx",
sheet = 1,
skip = 8,
col_names = c("year", "index"),
col_types = c("numeric", rep("skip", 12), "numeric")
)
I want to express all home values as 2017 dollars. I therefore take the index value for 2017 and divide all other index values by the 2017 index value to create a new table adj_tbl. The variable adj in this table can be multiplied by home values for each year to express them in 2017 dollars.
index_2017 <- cpi_tbl %>%
filter(year == 2017) %>%
select(index) %>%
as.integer()
adj_tbl <- cpi_tbl %>%
mutate(adj = index_2017 / index) %>%
select(year, adj)
The adjustment is done as follows, creating a new table mhv_2010_adj:
mhv_2010 table.adj_tbl table containing inflation adjustment figures for each year, using year as the common variable. This has the effect of adding a new column containing the variable adj. This variable has the value 1 for the year 2017, and for other years contains the adjustment factor needed to convert home values into 2017 dollars.estimate and moe variables to convert them to 2017 dollars.mhv_2010_adj <- mhv_2010 %>%
inner_join(adj_tbl, by = "year") %>%
mutate(estimate = estimate * adj, moe = moe * adj)
In order to produce maps of the median home value data I need geometry of the cartographic boundaries for all 24 Maryland counties and county-equivalents (i.e., Baltimore city). I use the tigris function counties() to return the cartographic boundaries in the form of a table compatible with the sf package. This table has a variable GEOID that is identical to the variable of the same name returned by get_acs().
Since it can take some time to retrieve the boundary data, I cache the returned data.
options(tigris_use_cache = TRUE)
county_sf <- counties(state = "MD", cb = TRUE, class = "sf", progress_bar = FALSE)
I do analyses to answer the following questions:
Median home value estimates for smaller Maryland counties (with populations under 65,000) are available only in the American Community Survey 5-year estimates. This means, for example, that the 2010 median home value estimates published for these counties actually reflect surveys in the years 2006-2010, the 2011 estimates reflect surveys in the years 2007-2011, and so on.
For this reason the U.S. Census Bureau recommends not comparing 5-year estimates from overlapping sets of years, for example comparing 2011 5-year estimates to 2010 5-year estimates. In order to look at changes over the maximum period of time I therefore choose to compare median home values between the 2010 and 2017 5-year estimates, representing home values over the timeframes 2006-2010 and 2013-2017.
I first do a bar chart showing the (inflation-adjusted) 2010 ACS 5-year estimates for median home value for each county, arranging the bars in order of decreasing home value from left to right. I do this as follows:
mhv_2010_adj from above.NAME into a categorical variable (or “factor” in R-speak) and order the variable values (the county names) so that they are in order of decreasing median home value.geom_col to plot one bar for each county, with the height of the bar corresponding to the county’s median home value.mhv_2010_adj %>%
mutate(NAME = sub("( County)?, Maryland$", "", NAME)) %>%
mutate(NAME = fct_reorder(NAME, -estimate)) %>%
ggplot() +
geom_col(aes(x = NAME, y = estimate)) +
scale_y_continuous(labels = scales::dollar, breaks = seq(0, 600000, 100000), limits = c(0, 600000)) +
xlab("County") +
ylab("Home Value") +
labs(
title = "Maryland Median Home Value by County",
subtitle = "2010 ACS 5-Year Estimates in 2017 Dollars",
caption = "Data source:\n U.S. Census Bureau, American Community Survey, Table B25077\nCreated using the tidyverse and tidycensus R packages"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
theme(axis.title.x = element_text(margin = margin(t = 5))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
To show another view of the same data I also map the counties, coloring each county according to its median home value in the (inflation-adjusted) 2010 5-year estimates. I do this as follows:
county_sf containing the geometry for the county boundaries.mhv_2010_adj, using the common variable GEOID.estimate variable to determine the fill color for each county.geom_sf() to draw the county physical boundaries in white.scale_fill_viridis() to specify the colors for each county along a predefined spectrum of color values, and to create a legend.county_sf %>%
inner_join(mhv_2010_adj, by = "GEOID") %>%
ggplot() +
geom_sf(aes(fill = estimate, geometry = geometry), color = "white", size = 0.4) +
scale_fill_viridis(option = "plasma", name = "Home Value", labels = scales::dollar, limits = c(100000, 600000)) +
labs(
title = "Maryland Median Home Value by County",
subtitle = "2010 ACS 5-Year Estimates in 2017 Dollars",
caption = "Data sources:\n U.S. Census Bureau, American Community Survey, Table B25077\n U.S. Census Bureau, Cartographic boundaries\nCreated using the tidyverse and tidycensus R packages"
) +
theme(plot.caption = element_text(hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
I next repeat this analysis for the 2017 ACS 5-year estimates, starting with a bar chart. (The estimates are already in 2017 dollars, so they do not need to be adjusted for inflation.) The code is almost identical to that above, except for changing the year where needed.
mhv_2017 %>%
mutate(NAME = sub("( County)?, Maryland$", "", NAME)) %>%
mutate(NAME = fct_reorder(NAME, -estimate)) %>%
ggplot(aes(x = NAME, y = estimate)) +
geom_col() +
scale_y_continuous(labels = scales::dollar, breaks = seq(0, 600000, 100000), limits = c(0, 600000)) +
xlab("County") +
ylab("Home Value") +
labs(
title = "Maryland Median Home Value by County",
subtitle = "2017 ACS 5-Year Estimates in 2017 Dollars",
caption = "Data source:\n U.S. Census Bureau, American Community Survey, Table B25077\nCreated using the tidyverse and tidycensus R packages"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
theme(axis.title.x = element_text(margin = margin(t = 5))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
As with the bar chart for the 2017 ACS 5-year estimates, the code for the map for the 2017 ACS 5-year estimates is almost identical to the code for the 2010 map.
county_sf %>%
inner_join(mhv_2017, by = "GEOID") %>%
ggplot(aes(fill = estimate, geometry = geometry)) +
geom_sf(color = "white", size = 0.4) +
scale_fill_viridis(option = "plasma", name = "Home Value", labels = scales::dollar, limits = c(100000, 600000)) +
labs(
title = "Maryland Median Home Value by County",
subtitle = "2017 ACS 5-Year Estimates in 2017 Dollars",
caption = "Data sources:\n U.S. Census Bureau, American Community Survey, Table B25077\n U.S. Census Bureau, Cartographic boundaries\nCreated using the tidyverse and tidycensus R packages"
) +
theme(plot.caption = element_text(hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
At first glance it looks as if (inflation-adjusted) home values declined between the 2006-2010 and 2013-2017 timeframes. In the next section I look more closely at the changes between the two timeframes.
I next want to look at the median home value for each Maryland county has changed over time. More specifically, I investigate how the median home values for the various counties changed from 2010 to 2017—or, more correctly, from the 2006-2010 timeframe to the 2013-2017 timeframe.
I create the bar chart as follows:
mhv_2010_adj containing 2010 median home value estimates expressed in 2017 dollars.mhv_2017 containing 2017 median home value estimates.NAME, GEOID, year, and estimate.rel_change containing the relative increase or decrease in median home value from 2010 to 2017, expressed as a fraction (not a percentage) of the 2010 value.GEOID, NAME, and rel_change variables.mhv_change so that I can use it for both a bar chart and a map.mhv_change <- mhv_2010_adj %>%
bind_rows(mhv_2017) %>%
select(NAME, GEOID, year, estimate) %>%
spread(year, estimate) %>%
mutate(rel_change = (`2017` - `2010`) / `2010`) %>%
select(NAME, GEOID, rel_change) %>%
mutate(NAME = sub("( County)?, Maryland$", "", NAME)) %>%
mutate(NAME = fct_reorder(NAME, -rel_change))
mhv_change %>%
ggplot(aes(x = NAME, y = rel_change)) +
geom_col() +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), breaks = seq(-0.30, 0.05, 0.05)) +
xlab("County") +
ylab("Change") +
labs(
title = "Maryland Median Home Value Changes by County",
subtitle = "2017 vs. 2010 ACS 5-Year Estimates in 2017 Dollars",
caption = "Data source:\n U.S. Census Bureau, American Community Survey, Table B25077\nCreated using the tidyverse and tidycensus R packages"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
theme(axis.title.x = element_text(margin = margin(t = 5))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
All Maryland counties had declining median home values over the timeframe in question, with the rural areas of Allegany and Garrett counties having the least decline, and the suburban D.C. jurisdictions of Prince George’s and Charles counties having the most. Overall Howard County fared relatively well among all counties.
I also create a map showing the same percentage changes, in a similar manner to the maps above.
county_sf %>%
inner_join(mhv_change, by = "GEOID") %>%
ggplot(aes(fill = rel_change, geometry = geometry)) +
geom_sf(color = "white", size = 0.4) +
scale_fill_viridis(option = "plasma", name = "Change", labels = scales::percent_format(accuracy = 1, trim = FALSE), limits = c(-0.30, 0.0)) +
labs(
title = "Maryland Median Home Value Changes by County",
subtitle = "2017 vs. 2010 ACS 5-Year Estimates in 2017 Dollars",
caption = "Data sources:\n U.S. Census Bureau, American Community Survey, Table B19013\n U.S. Census Bureau, Cartographic boundaries\nCreated using the tidyverse and tidycensus R packages"
) +
theme(plot.caption = element_text(hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
Again Allegany, Garrett, Montgomery, and Howard counties show as having the least declines in median home values.
All median home values are estimates based on survey samples, with associated margins of error. For estimates at the county level the associated standard errors can be a few thousand dollars.
As noted above, 5-year estimates identified as being for a certain year actually reflect surveys conducted over the prior five years.
To obtain the ACS data I used the tidycensus package, which provides an easy-to-use interface to U.S. Census Bureau data made available via a set of public APIs. As its name suggests, the tidycensus package is designed to be compatible with the tidyverse approach to representing and manipulating data.
To obtain the county boundaries I used the tigris package, which is designed to accompany the tidycensus package and provides access to U.S. Census Bureau cartographic boundaries and TIGER/Line shapefiles.
To adjust for inflation I used the CPI-U-RS data produced by the Bureau of Labor Statistics, specifically the “Updated CPI-U-RS, All items, 1977-2018” Excel spreadsheet, as recommended by the U.S. Census Bureau’s guide to comparing 2017 ACS data to prior years.
When the 2018 ACS 5-year estimates are available the graphs and maps above could be re-done using 2018 data.
I used the following R environment in doing the analysis above:
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] tools stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] viridis_0.5.1 viridisLite_0.3.0 readxl_1.3.1
## [4] sf_0.7-5 tigris_0.8.2 tidycensus_0.9.2
## [7] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.1
## [10] purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
## [13] tibble_2.1.3 ggplot2_3.2.0 tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 lubridate_1.7.4 lattice_0.20-38
## [4] class_7.3-15 assertthat_0.2.1 digest_0.6.20
## [7] R6_2.4.0 cellranger_1.1.0 backports_1.1.4
## [10] evaluate_0.14 e1071_1.7-2 httr_1.4.0
## [13] pillar_1.4.2 rlang_0.4.0 lazyeval_0.2.2
## [16] curl_3.3 uuid_0.1-2 rstudioapi_0.10
## [19] rmarkdown_1.13 labeling_0.3 rgdal_1.4-4
## [22] foreign_0.8-71 munsell_0.5.0 broom_0.5.2
## [25] compiler_3.6.0 modelr_0.1.4 xfun_0.8
## [28] pkgconfig_2.0.2 htmltools_0.3.6 tidyselect_0.2.5
## [31] gridExtra_2.3 crayon_1.3.4 withr_2.1.2
## [34] rappdirs_0.3.1 grid_3.6.0 nlme_3.1-139
## [37] jsonlite_1.6 gtable_0.3.0 DBI_1.0.0
## [40] magrittr_1.5 units_0.6-3 scales_1.0.0
## [43] KernSmooth_2.23-15 cli_1.1.0 stringi_1.4.3
## [46] sp_1.3-1 xml2_1.2.0 ellipsis_0.2.0.1
## [49] generics_0.0.2 glue_1.3.1 hms_0.4.2
## [52] yaml_2.2.0 colorspace_1.4-1 maptools_0.9-5
## [55] classInt_0.3-3 rvest_0.3.4 knitr_1.23
## [58] haven_2.1.1
You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.