Introduction

In this document I do some basic analyses on median home value of Maryland counties over time.

For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.

Setup and data preparation

Libraries

I use the following packages for the following purposes:

  • tidyverse: do general data manipulation.
  • tidycensus: retrieve U.S. census data.
  • tigris: retrieve U.S. census shapefiles.
  • sf: manipulate geospatial data.
  • readxl: to read Excel spreadsheets.
  • viridis: obtain special color palettes for graphs.
  • tools: compute MD5 checksums.
library(tidyverse)
library(tidycensus)
library(tigris)
library(sf)
library(readxl)
library(viridis)
library(tools)

The tidycensus and tigris packages require a Census API key. I previously obtained such a key and stored it for use by the functions included in the tidycensus and tigris packages.

Data sources

I use data from the following sources; see the References section below for more information:

  • Median home value estimates are from the American Community Survey table B25077, and are accessed via the tidycensus get_acs() function.
  • County boundaries are from the U.S. Census Bureau cartographic boundaries data, and are accessed via the tigris counties() function.
  • allitems.xlsx. Consumer Price Index data used to adjust for inflation.

I check to make sure that the versions of the files being used in this analysis are identical to the versions of the files I originally downloaded. I do this by comparing the MD5 checksums of the files against MD5 values I previously computed, and stopping execution if they do not match.

stopifnot(md5sum("allitems.xlsx") == "f890a4aff139ca003246d17491d51272")

Reading in and preparing the data

I first use the tidycensus get_acs() function to retrieve median home value estimates from the 2010 and 2017 ACS 5-year estimates. This function returns data in tidy format, with the following columns:

  • GEOID. A 5-character geographic ID combining the 2-digit FIPS code for Maryland (“24”) with the 3-digit FIPS code for the individual county or county-equivalent within Maryland.
  • NAME. The name of the county or county-equivalent.
  • variable. References the ACS variable being retrieved (which we here name mhv).
  • estimate. The ACS 5-year estimate for the variable’s value.
  • moe. The margin of error for the estimate.

I take the table returned by get_acs() and add a new variable year containing the year of the estimate.

mhv_2010 <- get_acs(
  geography = "county",
  state = "MD",
  year = 2010,
  survey = "acs5",
  variables = c(mhv = "B25077_001")
) %>%
  mutate(year = 2010)
mhv_2017 <- get_acs(
  geography = "county",
  state = "MD",
  year = 2017,
  survey = "acs5",
  variables = c(mhv = "B25077_001")
) %>%
  mutate(year = 2017)

The ACS 5-year estimates are expressed in dollars as of the year in question. For example, the 5-year estimates for 2010 are in 2010 dollars. In order to adjust values for inflation I read in an Excel spreadsheet containing consumer price index data from 1978 to 2018. The index values are expressed as a percentage of the 1997 index (set at 100).

(Note that adjusting home values for inflation is a somewhat problematic exercise since housing costs are themselves a major component of inflation. However I do it here for consistency with other analyses in this series.)

The spreadsheet has two sheets, each of which has values for all twelve months of the year plus an average figure for the entire year. The sheets differ in whether or not the monthly figures are seasonally-adjusted. Since I am doing adjustments for entire years I set col_types to skip over the monthly values, and read only the first sheet (which is not seasonally-adjusted).

cpi_tbl <- read_excel("allitems.xlsx",
  sheet = 1,
  skip = 8,
  col_names = c("year", "index"),
  col_types = c("numeric", rep("skip", 12), "numeric")
)

I want to express all home values as 2017 dollars. I therefore take the index value for 2017 and divide all other index values by the 2017 index value to create a new table adj_tbl. The variable adj in this table can be multiplied by home values for each year to express them in 2017 dollars.

index_2017 <- cpi_tbl %>%
  filter(year == 2017) %>%
  select(index) %>%
  as.integer()

adj_tbl <- cpi_tbl %>%
  mutate(adj = index_2017 / index) %>%
  select(year, adj)

The adjustment is done as follows, creating a new table mhv_2010_adj:

  1. I start with the mhv_2010 table.
  2. I then join that table with the adj_tbl table containing inflation adjustment figures for each year, using year as the common variable. This has the effect of adding a new column containing the variable adj. This variable has the value 1 for the year 2017, and for other years contains the adjustment factor needed to convert home values into 2017 dollars.
  3. I modify the estimate and moe variables to convert them to 2017 dollars.
mhv_2010_adj <- mhv_2010 %>%
  inner_join(adj_tbl, by = "year") %>%
  mutate(estimate = estimate * adj, moe = moe * adj)

In order to produce maps of the median home value data I need geometry of the cartographic boundaries for all 24 Maryland counties and county-equivalents (i.e., Baltimore city). I use the tigris function counties() to return the cartographic boundaries in the form of a table compatible with the sf package. This table has a variable GEOID that is identical to the variable of the same name returned by get_acs().

Since it can take some time to retrieve the boundary data, I cache the returned data.

options(tigris_use_cache = TRUE)
county_sf <- counties(state = "MD", cb = TRUE, class = "sf", progress_bar = FALSE)

Analysis

I do analyses to answer the following questions:

Maryland median home value by county

Median home value estimates for smaller Maryland counties (with populations under 65,000) are available only in the American Community Survey 5-year estimates. This means, for example, that the 2010 median home value estimates published for these counties actually reflect surveys in the years 2006-2010, the 2011 estimates reflect surveys in the years 2007-2011, and so on.

For this reason the U.S. Census Bureau recommends not comparing 5-year estimates from overlapping sets of years, for example comparing 2011 5-year estimates to 2010 5-year estimates. In order to look at changes over the maximum period of time I therefore choose to compare median home values between the 2010 and 2017 5-year estimates, representing home values over the timeframes 2006-2010 and 2013-2017.

I first do a bar chart showing the (inflation-adjusted) 2010 ACS 5-year estimates for median home value for each county, arranging the bars in order of decreasing home value from left to right. I do this as follows:

  1. I start with the table mhv_2010_adj from above.
  2. I shorten the names of the counties by removing “, Maryland” from the end of the name, as well as the word “County” if present.
  3. I change NAME into a categorical variable (or “factor” in R-speak) and order the variable values (the county names) so that they are in order of decreasing median home value.
  4. I plot the data with county on the x-axis and median home value on the y-axis.
  5. I use geom_col to plot one bar for each county, with the height of the bar corresponding to the county’s median home value.
  6. I put tick marks on the y-axis corresponding to dollar values for every $100,000 interval from zero to $600,000.
  7. I customize the x-axis and y-axis labels, title, and caption as desired. Among other things, I rotate the county names by 60 degrees to make them more readable.
mhv_2010_adj %>%
  mutate(NAME = sub("( County)?, Maryland$", "", NAME)) %>%
  mutate(NAME = fct_reorder(NAME, -estimate)) %>%
  ggplot() +
  geom_col(aes(x = NAME, y = estimate)) +
  scale_y_continuous(labels = scales::dollar, breaks = seq(0, 600000, 100000), limits = c(0, 600000)) +
  xlab("County") +
  ylab("Home Value") +
  labs(
    title = "Maryland Median Home Value by County",
    subtitle = "2010 ACS 5-Year Estimates in 2017 Dollars",
    caption = "Data source:\n  U.S. Census Bureau, American Community Survey, Table B25077\nCreated using the tidyverse and tidycensus R packages"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

To show another view of the same data I also map the counties, coloring each county according to its median home value in the (inflation-adjusted) 2010 5-year estimates. I do this as follows:

  1. I start with the table county_sf containing the geometry for the county boundaries.
  2. I then join that table with the table mhv_2010_adj, using the common variable GEOID.
  3. I plot the resulting table, using the value of the estimate variable to determine the fill color for each county.
  4. I use geom_sf() to draw the county physical boundaries in white.
  5. I use scale_fill_viridis() to specify the colors for each county along a predefined spectrum of color values, and to create a legend.
  6. I add a title, subtitle, and caption for the map.
  7. Finally I blank out the axis tick marks and axis labels (which would normally show latitude and longitude) and make the background of the graph blank for a cleaner look.
county_sf %>%
  inner_join(mhv_2010_adj, by = "GEOID") %>%
  ggplot() +
  geom_sf(aes(fill = estimate, geometry = geometry), color = "white", size = 0.4) +
  scale_fill_viridis(option = "plasma", name = "Home Value", labels = scales::dollar, limits = c(100000, 600000)) +
  labs(
    title = "Maryland Median Home Value by County",
    subtitle = "2010 ACS 5-Year Estimates in 2017 Dollars",
    caption = "Data sources:\n  U.S. Census Bureau, American Community Survey, Table B25077\n  U.S. Census Bureau, Cartographic boundaries\nCreated using the tidyverse and tidycensus R packages"
  ) +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(axis.ticks = element_blank(), axis.text = element_blank()) +
  theme(panel.background = element_blank())

I next repeat this analysis for the 2017 ACS 5-year estimates, starting with a bar chart. (The estimates are already in 2017 dollars, so they do not need to be adjusted for inflation.) The code is almost identical to that above, except for changing the year where needed.

mhv_2017 %>%
  mutate(NAME = sub("( County)?, Maryland$", "", NAME)) %>%
  mutate(NAME = fct_reorder(NAME, -estimate)) %>%
  ggplot(aes(x = NAME, y = estimate)) +
  geom_col() +
  scale_y_continuous(labels = scales::dollar, breaks = seq(0, 600000, 100000), limits = c(0, 600000)) +
  xlab("County") +
  ylab("Home Value") +
  labs(
    title = "Maryland Median Home Value by County",
    subtitle = "2017 ACS 5-Year Estimates in 2017 Dollars",
    caption = "Data source:\n  U.S. Census Bureau, American Community Survey, Table B25077\nCreated using the tidyverse and tidycensus R packages"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

As with the bar chart for the 2017 ACS 5-year estimates, the code for the map for the 2017 ACS 5-year estimates is almost identical to the code for the 2010 map.

county_sf %>%
  inner_join(mhv_2017, by = "GEOID") %>%
  ggplot(aes(fill = estimate, geometry = geometry)) +
  geom_sf(color = "white", size = 0.4) +
  scale_fill_viridis(option = "plasma", name = "Home Value", labels = scales::dollar, limits = c(100000, 600000)) +
  labs(
    title = "Maryland Median Home Value by County",
    subtitle = "2017 ACS 5-Year Estimates in 2017 Dollars",
    caption = "Data sources:\n  U.S. Census Bureau, American Community Survey, Table B25077\n  U.S. Census Bureau, Cartographic boundaries\nCreated using the tidyverse and tidycensus R packages"
  ) +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(axis.ticks = element_blank(), axis.text = element_blank()) +
  theme(panel.background = element_blank())

At first glance it looks as if (inflation-adjusted) home values declined between the 2006-2010 and 2013-2017 timeframes. In the next section I look more closely at the changes between the two timeframes.

Maryland median home value changes by county

I next want to look at the median home value for each Maryland county has changed over time. More specifically, I investigate how the median home values for the various counties changed from 2010 to 2017—or, more correctly, from the 2006-2010 timeframe to the 2013-2017 timeframe.

I create the bar chart as follows:

  1. I start with the table mhv_2010_adj containing 2010 median home value estimates expressed in 2017 dollars.
  2. I extend the table with rows from the table mhv_2017 containing 2017 median home value estimates.
  3. I retain only the variables of interest: NAME, GEOID, year, and estimate.
  4. I modify the table so that the median home value estimates for each county are reformatted into two columns, one for each year.
  5. I create a new variable rel_change containing the relative increase or decrease in median home value from 2010 to 2017, expressed as a fraction (not a percentage) of the 2010 value.
  6. I retain only the GEOID, NAME, and rel_change variables.
  7. I modify the county names, convert them into factors, and reorder them in order of decreasing change.
  8. I save the resulting table as mhv_change so that I can use it for both a bar chart and a map.
mhv_change <- mhv_2010_adj %>%
  bind_rows(mhv_2017) %>%
  select(NAME, GEOID, year, estimate) %>%
  spread(year, estimate) %>%
  mutate(rel_change = (`2017` - `2010`) / `2010`) %>%
  select(NAME, GEOID, rel_change) %>%
  mutate(NAME = sub("( County)?, Maryland$", "", NAME)) %>%
  mutate(NAME = fct_reorder(NAME, -rel_change))

mhv_change %>%
  ggplot(aes(x = NAME, y = rel_change)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1), breaks = seq(-0.30, 0.05, 0.05)) +
  xlab("County") +
  ylab("Change") +
  labs(
    title = "Maryland Median Home Value Changes by County",
    subtitle = "2017 vs. 2010 ACS 5-Year Estimates in 2017 Dollars",
    caption = "Data source:\n  U.S. Census Bureau, American Community Survey, Table B25077\nCreated using the tidyverse and tidycensus R packages"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

All Maryland counties had declining median home values over the timeframe in question, with the rural areas of Allegany and Garrett counties having the least decline, and the suburban D.C. jurisdictions of Prince George’s and Charles counties having the most. Overall Howard County fared relatively well among all counties.

I also create a map showing the same percentage changes, in a similar manner to the maps above.

county_sf %>%
  inner_join(mhv_change, by = "GEOID") %>%
  ggplot(aes(fill = rel_change, geometry = geometry)) +
  geom_sf(color = "white", size = 0.4) +
  scale_fill_viridis(option = "plasma", name = "Change", labels = scales::percent_format(accuracy = 1, trim = FALSE), limits = c(-0.30, 0.0)) +
  labs(
    title = "Maryland Median Home Value Changes by County",
    subtitle = "2017 vs. 2010 ACS 5-Year Estimates in 2017 Dollars",
    caption = "Data sources:\n  U.S. Census Bureau, American Community Survey, Table B19013\n  U.S. Census Bureau, Cartographic boundaries\nCreated using the tidyverse and tidycensus R packages"
  ) +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(axis.ticks = element_blank(), axis.text = element_blank()) +
  theme(panel.background = element_blank())

Again Allegany, Garrett, Montgomery, and Howard counties show as having the least declines in median home values.

Appendix

Caveats

All median home values are estimates based on survey samples, with associated margins of error. For estimates at the county level the associated standard errors can be a few thousand dollars.

As noted above, 5-year estimates identified as being for a certain year actually reflect surveys conducted over the prior five years.

References

To obtain the ACS data I used the tidycensus package, which provides an easy-to-use interface to U.S. Census Bureau data made available via a set of public APIs. As its name suggests, the tidycensus package is designed to be compatible with the tidyverse approach to representing and manipulating data.

To obtain the county boundaries I used the tigris package, which is designed to accompany the tidycensus package and provides access to U.S. Census Bureau cartographic boundaries and TIGER/Line shapefiles.

To adjust for inflation I used the CPI-U-RS data produced by the Bureau of Labor Statistics, specifically the “Updated CPI-U-RS, All items, 1977-2018” Excel spreadsheet, as recommended by the U.S. Census Bureau’s guide to comparing 2017 ACS data to prior years.

Suggestions for others

When the 2018 ACS 5-year estimates are available the graphs and maps above could be re-done using 2018 data.

Environment

I used the following R environment in doing the analysis above:

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] viridis_0.5.1     viridisLite_0.3.0 readxl_1.3.1     
##  [4] sf_0.7-5          tigris_0.8.2      tidycensus_0.9.2 
##  [7] forcats_0.4.0     stringr_1.4.0     dplyr_0.8.1      
## [10] purrr_0.3.2       readr_1.3.1       tidyr_0.8.3      
## [13] tibble_2.1.3      ggplot2_3.2.0     tidyverse_1.2.1  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1         lubridate_1.7.4    lattice_0.20-38   
##  [4] class_7.3-15       assertthat_0.2.1   digest_0.6.20     
##  [7] R6_2.4.0           cellranger_1.1.0   backports_1.1.4   
## [10] evaluate_0.14      e1071_1.7-2        httr_1.4.0        
## [13] pillar_1.4.2       rlang_0.4.0        lazyeval_0.2.2    
## [16] curl_3.3           uuid_0.1-2         rstudioapi_0.10   
## [19] rmarkdown_1.13     labeling_0.3       rgdal_1.4-4       
## [22] foreign_0.8-71     munsell_0.5.0      broom_0.5.2       
## [25] compiler_3.6.0     modelr_0.1.4       xfun_0.8          
## [28] pkgconfig_2.0.2    htmltools_0.3.6    tidyselect_0.2.5  
## [31] gridExtra_2.3      crayon_1.3.4       withr_2.1.2       
## [34] rappdirs_0.3.1     grid_3.6.0         nlme_3.1-139      
## [37] jsonlite_1.6       gtable_0.3.0       DBI_1.0.0         
## [40] magrittr_1.5       units_0.6-3        scales_1.0.0      
## [43] KernSmooth_2.23-15 cli_1.1.0          stringi_1.4.3     
## [46] sp_1.3-1           xml2_1.2.0         ellipsis_0.2.0.1  
## [49] generics_0.0.2     glue_1.3.1         hms_0.4.2         
## [52] yaml_2.2.0         colorspace_1.4-1   maptools_0.9-5    
## [55] classInt_0.3-3     rvest_0.3.4        knitr_1.23        
## [58] haven_2.1.1

Source code

You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.