Introduction

In this document I do some basic analyses on median home value of Howard County, Maryland over time compared to other local jurisdictions.

For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.

Setup and data preparation

Libraries

I use the tidyverse package of functions for general data manipulation, the readxl package (part of the Tidyverse, but not loaded by default) to read Excel spreadsheets, the knitr package to create an inline table using the kable function, and the tools package to get the md5sum function.

library(tidyverse)
library(readxl)
library(knitr)
library(tools)

Data sources

I use data from the following sources; see the References section below for more information:

  • ACS_05_EST_B25077.csv. Median home value 1-year estimates for the U.S., all states, and all counties (and county-equivalent geographies) for 2005. NOTE: The 1-year estimates do not include counties and county-equivalents with populations less than 65,000 people. Only 5-year estimates are available for these geographies.
  • ACS_06_EST_B25077.csv. Same data for 2006.
  • ACS_07_1YR_B25077.csv through ACS_17_1YR_B25077.csv. Same data for 2007 through 2017.
  • allitems.xlsx. Consumer Price Index data used to adjust for inflation.

I check to make sure that the versions of the files being used in this analysis are identical to the versions of the files I originally downloaded. I do this by comparing the MD5 checksums of the files against MD5 values I previously computed, and stopping execution if they do not match.

stopifnot(md5sum("ACS_05_EST_B25077.csv") == "7d141bdf84cfc135c13b7f1f2596dd99")
stopifnot(md5sum("ACS_06_EST_B25077.csv") == "e7bfed4461f0465bd6304ccb8c54d8d1")
stopifnot(md5sum("ACS_07_1YR_B25077.csv") == "502bf4a5e888c0c8ab6419b4351cb625")
stopifnot(md5sum("ACS_08_1YR_B25077.csv") == "c08a4f17dafe9ddcb098d2d9db31dced")
stopifnot(md5sum("ACS_09_1YR_B25077.csv") == "f14bdf2b8858d03eff079f3e971671d3")
stopifnot(md5sum("ACS_10_1YR_B25077.csv") == "8ff7010b27a4986e0eb16361fba3b663")
stopifnot(md5sum("ACS_11_1YR_B25077.csv") == "97179ebdf6f09418122590d706ff3643")
stopifnot(md5sum("ACS_12_1YR_B25077.csv") == "29d50a0e7a72450e1b2e62515bb3d559")
stopifnot(md5sum("ACS_13_1YR_B25077.csv") == "008b66afb78aeda87603b3c9b28452c5")
stopifnot(md5sum("ACS_14_1YR_B25077.csv") == "e3fef3007c4c8a1c0033134803333e44")
stopifnot(md5sum("ACS_15_1YR_B25077.csv") == "f77325671151497a47951f94895d2742")
stopifnot(md5sum("ACS_16_1YR_B25077.csv") == "06e739edfd15805a907fb51d88ee47ee")
stopifnot(md5sum("ACS_17_1YR_B25077.csv") == "ba9c68f840e15403ab1dde87a32fdee1")
stopifnot(md5sum("allitems.xlsx") == "f890a4aff139ca003246d17491d51272")

Reading in and preparing the data

The CSV files all have the same format: two lines of header information followed by one row for each geography, containing the following variables:

  • Geographic ID for the geography. This is a 9-character value for the U.S. as a whole, an 11-character value for each state, and a 14-character value for each county or county-equivalent.
  • Secondary geographic ID for the geography. This is null for the U.S. as a whole, a two-digit value for each state, and a five-digit value (padded with leading zeroes) for each county or county-equivalent.
  • The estimated median home value for the geography.
  • The margin of error for the median home value estimate.

For the analysis below I want all the data in one big table mhv_tbl, with an additional field for the year value. I do this as follows:

  1. I create a list of all the CSV filenames as mhv_files, and a corresponding list mhv_years of the years for those files.
  2. I create an empty table mhv_tbl with five variables corresponding to the five columns of each CSV file–geo_id, geo_id2, geography, mhv, and moe–as well as an additional variable year. To do this I take advantage of the fact that the read_csv function can take its input from a string literal, in this case a newline-terminated blank line.
  3. I then use an index variable to loop through each year and file. For each year and file I do the following:
    1. Read in the CSV file for that year, skipping the first two lines, assigning each column’s value to one of the first five variables in the mhv_tbl table.
    2. Modify the table just read in to add a new variable year set to the year value corresponding to the CSV file.
    3. Add the rows of the table thus created to the mhv_tbl table.
  4. In the previous step note that I used conventional function composition rather than the pipe syntax in order to add new rows to the end of mhv_tbl rather than the beginning.
  5. Some of the geography names include non-ASCII characters, which would cause problems later in this analysis when trying to use geography names as variables. I therefore use the chartr() function to translate the non-ASCII characters into their ASCII equivalents.
mhv_files <- c("ACS_05_EST_B25077.csv",
               "ACS_06_EST_B25077.csv",
               "ACS_07_1YR_B25077.csv",
               "ACS_08_1YR_B25077.csv",
               "ACS_09_1YR_B25077.csv",
               "ACS_10_1YR_B25077.csv",
               "ACS_11_1YR_B25077.csv",
               "ACS_12_1YR_B25077.csv",
               "ACS_13_1YR_B25077.csv",
               "ACS_14_1YR_B25077.csv",
               "ACS_15_1YR_B25077.csv",
               "ACS_16_1YR_B25077.csv",
               "ACS_17_1YR_B25077.csv")

mhv_years <- 2005:2017

mhv_cols <- c("geo_id", "geo_id2", "geography", "mhv", "moe")
mhv_tbl <- read_csv("\n", skip = 2, col_names = c(mhv_cols, "year"), col_types = "cccddd")

for (i in seq_along(mhv_years)) {
  mhv_tbl <- bind_rows(mhv_tbl,
                       mutate(year = mhv_years[[i]],
                              read_csv(mhv_files[[i]],
                                       skip = 2,
                                       col_names = mhv_cols,
                                       col_types = "cccdd",
                       locale = locale(encoding = "latin1"))))
}

# Replace non-ASCII geography names with their ASCII equivalents.
mhv_tbl <- mhv_tbl %>%
  mutate(geography = chartr("áéíóúñü", "aeiounu", geography))

The CSV files above contain 1-year estimates that are expressed in dollars as of the year in question. (For example, the 1-year estimates for 2010 are in 2010 dollars.) In order to adjust values for inflation I read in an Excel spreadsheet containing consumer price index data from 1978 to 2018. The index values are expressed as a percentage of the 1997 index (set at 100).

The spreadsheet has two sheets, each of which has values for all twelve months of the year plus an average figure for the entire year. The sheets differ in whether or not the monthly figures are seasonally-adjusted. Since I am doing adjustments for entire years I set col_types to skip over the monthly values, and read only the first sheet (which is not seasonally-adjusted).

cpi_tbl <- read_excel("allitems.xlsx",
                      sheet = 1,
                      skip = 8,
                      col_names = c("year", "index"),
                      col_types = c("numeric", rep("skip", 12), "numeric"))

I want to express all home values as 2017 dollars. I therefore take the index value for 2017 and divide all other index values by the 2017 index value to create a new table adj_tbl. The variable adj in this table can be multiplied by home values for each year to express them in 2017 dollars.

index_2017 <- cpi_tbl %>%
  filter(year == 2017) %>%
  select(index) %>%
  as.integer()

adj_tbl <- cpi_tbl %>%
  mutate(adj = index_2017 / index) %>%
  select(year, adj)

The adjustment is done as follows, creating a new table mhv_adj_tbl:

  1. I start with the mhv_tbl table.
  2. I then join that table with the adj_tbl table containing inflation adjustment figures for each year, using year as the common variable. This has the effect of adding a new column containing the variable adj. This variable has the value 1 for the year 2017, and for other years contains the adjustment factor needed to convert home values into 2017 dollars.
  3. I modify the mhv and moe variables to convert them to 2017 dollars.
mhv_adj_tbl <- mhv_tbl %>%
  inner_join(adj_tbl, by = "year") %>%
  mutate(mhv = mhv * adj, moe = moe * adj)

Analysis

I do analyses to answer the following questions:

Howard County median home value

For my first graph I do a simple line plot of how the median home value for Howard County and selected other geographies have changed over time. I don’t attempt to adjust the home values for inflation, so they are in current dollars for the year they were estimated.

I create the graph as follows:

  1. I start with the mhv_tbl table.
  2. I filter out all rows except for those for the geographies of interest.
  3. I use ggplot to create the plot, using geom_line to plot a line for each of the groups of data points (corresponding to the individual geographies). I use color to distinguish between the lines for the different geographies, with a special “colorblind-friendly” palette designed to be more readable by people with various forms of color blindess. (See the comments in the code for more information on how the color assignment is done.)
  4. I specify an x-axis running from 2004 to 2018, with tick marks every two years.
  5. I add a label for the y-axis (the x-axis label is taken from the variable Year), along with a plot title, subtitle, and caption.
  6. I use the theme_minimal theme for a clean look, and then tweak it slightly for readability, displaying the x-axis tick mark labels at an angle, moving the x- and y-axis labels slightly away from the tick mark labels, moving the caption slightly lower, and removing the title on the graph legend.
# Note: The order of geographies in this list determines the order
# in which they are listed in the graph legends. To improve readability
# I list the geographies in order from highest to lowest median
# home value in 2017 (the last year of data graphed).
geographies <- c("District of Columbia",
                 "Fairfax County, Virginia",
                 "Loudoun County, Virginia",
                 "Montgomery County, Maryland",
                 "Howard County, Maryland",
                 "Anne Arundel County, Maryland",
                 "Stafford County, Virginia",
                 "United States",
                 "Baltimore city, Maryland")

# Note: Colors are assigned to geographies based on their
# alphabetical order. To improve readability I assign the colors
# black to Howard County and gray to the United States.
cbPalette <- c("#E69F00",  # Anne Arundel County, Maryland
               "#56B4E9",  # Baltimore city, Maryland
               "#009E73",  # District of Columbia
               "#F0E442",  # Fairfax County, Virginia
               "#000000",  # Howard County, Maryland
               "#0072B2",  # Loudoun County, Virginia
               "#D55E00",  # Montgomery County, Maryland
               "#CC79A7",  # Stafford County, Virginia
               "#999999")  # United States

mhv_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot(aes(x = year, y = mhv, color = geography)) +
  geom_line(size = 0.8) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks=seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Home Value") +
  labs(title="Median Home Value for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in Current Dollars (Not Inflation-Adjusted)",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B25077") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

For my next graph I repeat the same plot, but add error bars to show the margins of error for the median home value estimates. I use position_dodge() so that bars for two different geographies in the same year don’t display right on top of each other.

mhv_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot() +
  geom_line(aes(x = year, y = mhv, color = geography), size = 0.8) +
  geom_errorbar(aes(x = year, ymin = mhv - moe, ymax = mhv + moe, color = geography), width=.2, position = position_dodge(.2)) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks = seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Home Value") +
  labs(title="Median Home Value for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in Current Dollars (Not Inflation-Adjusted) Including Margins of Error (90%)",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B25077") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

I then repeat the first plot, this time using constant 2017 dollars. I then plot the graph as before, first filtering for the geographies of interest.

mhv_adj_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot(aes(x = year, y = mhv, color = geography)) +
  geom_line(size = 0.8) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks=seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Home Value") +
  labs(title="Median Home Value for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in 2017 Dollars",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B25077") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

I then redo the graph to show error bars:

mhv_adj_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot() +
  geom_line(aes(x = year, y = mhv, color = geography), size = 0.8) +
  geom_errorbar(aes(x = year, ymin = mhv - moe, ymax = mhv + moe, color = geography), width=.2, position = position_dodge(.3)) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks = seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Home Value") +
  labs(title="Median Home Value for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in 2017 Dollars Including Margins of Error (90%)",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B25077") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

Next I plot median home values for the geographies of interest as a percentage of the overall U.S. median home value. I do this as follows:

  1. I start with the mhv_tbl table. (Because I am comparing values relative to one another, it does not matter whether I use inflation-adjusted values or not.)
  2. I retain only the year, geography, and mhv variables.
  3. I use spread to change the table to make each geography into a separate column, with the column value being the mhv value. This produces a table with one row for each year.
  4. I create an anonymous function that will divide a column’s value (the median home value) by the value for the column corresponding to the United States overall and then multiply by 100 to create a pecentage. I then apply that function to each column other than the first (which holds the year value). (Note that the . in ncol(.) references the (temporary) table against which mutate_at is executed.)
  5. I then use gather to reverse the effect of spread and create a new table mhv_rel_tbl that has a similar form to mhv_tbl, with each row representing a combination of year, geography, and (relative) median home value.
  6. Finally I use ggplot and related functions to create the actual graph for the geographies of interest.
mhv_rel_tbl <- mhv_tbl %>%
  select(year, geography, mhv) %>%
  spread(geography, mhv) %>%
  mutate_at(2:ncol(.), function(x) 100 * x / .[["United States"]]) %>%
  gather(geography, mhv, -year)

mhv_rel_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot(aes(x = year, y = mhv, color = geography)) +
  geom_line(size = 0.8) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks = seq(1980, 2020, 5)) +
  scale_y_continuous() +
  xlab("Year") +
  ylab("% of U.S. Median Home Value") +
  labs(title="Relative Median Home Value for Howard County, Maryland",
       subtitle="1-Year Estimates as Percentage of U.S. Median Home Value",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B25077") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

Howard County ranking over time

To see how the ranking of Howard County versus other counties has changed over time, I first compute rankings for each year as follows:

  1. I start with the mhv_tbl table created above.
  2. I filter for only counties and equivalent geographies (e.g., Baltimore City) using the first two characters of the geo_id variable.
  3. I group the rows by year, and then use themin_rankfunction to compute a new variableranking` representing the relative rank of each county or county-equivalent for that year, with rank 1 being the county with highest median home value. (Note that D.C. is included here as a county-equivalent.)
mhv_rank_tbl <- mhv_tbl %>%
  filter(str_sub(geo_id, 1, 2) == "05") %>%
  group_by(year) %>%
  mutate(ranking = min_rank(desc(mhv)))

I then graph the rankings for Howard County and selected other counties in Maryland and Virginia. Because the list of geographies in the graph is different from that used in the graphs above, I modify the graph’s palette to ensure that the mapping of colors to geographies remains the same as in the other graphs.

# Note: As before, the order of this list determines the order that
# geographies are listed in the legend, from top to bottom. The
# somewhat different name for D.C. is because it is included as a
# county-equivalent juridiction and not a state-equivalent
# jurisdiction.
geographies2 <- c("District of Columbia, District of Columbia",
                  "Fairfax County, Virginia",
                  "Loudoun County, Virginia",
                  "Montgomery County, Maryland",
                  "Howard County, Maryland",
                  "Anne Arundel County, Maryland",
                  "Stafford County, Virginia")

# Note: As below, colors are assigned to geographies based on their
# alphabetical order. Since some geographies have been removed from
# the graph I use a modified palette to maintain the same mapping of
# color to geography as in previous graphs.
cbPalette2 <- c("#E69F00",  # Anne Arundel County, Maryland
                "#009E73",  # District of Columbia
                "#F0E442",  # Fairfax County, Virginia
                "#000000",  # Howard County, Maryland
                "#0072B2",  # Loudoun County, Virginia
                "#D55E00",  # Montgomery County, Maryland
                "#CC79A7")  # Stafford County, Virginia

mhv_rank_tbl %>%
  filter(geography %in% geographies2) %>%
  ggplot(aes(x = year, y = ranking, color = geography)) +
  geom_line(size = 0.8) +
  scale_x_continuous(breaks=seq(2004, 2018, 2)) +
  scale_color_manual(values = cbPalette2, breaks = geographies2) +
  scale_y_reverse(breaks = seq(0, 100, 5)) +
  xlab("Year") +
  ylab("Rank") +
  labs(title="Ranking Howard County, Maryland vs. Other Jurisdictions",
       subtitle="Based on 1-Year Estimates of U.S. Median Home Value",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B25077") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

Finally, to compare Howard County with other affluent counties from across the U.S., I create a table top_50_tbl of the highest-ranking counties (or county-equivalent geographies) by median home value in 2017. (Again, this is restricted to counties or county-equivalents with populations over 65,000 that are represented in the ACS 1-year estimates.)

I do this as follows:

  1. I start with the mhv_rank_tbl table created above, which contains rankings grouped by year.
  2. I remove the grouping by year. (This makes it possible to remove the year variable from the table being created.)
  3. I retain only the rank values, the geography names, and the median home value values.
  4. I sort the resulting list in ascending order by rank.
  5. I retain the first 50 rows, representing those county-level geographies ranked from 1 to 50.
top_50_tbl <- mhv_rank_tbl %>%
  ungroup() %>%
  filter(year == 2017) %>%
  select(ranking, geography, mhv) %>%
  rename(Rank = ranking, Jurisdiction = geography, `Median Home Value` = mhv) %>%
  arrange(Rank) %>%
  top_n(50)
## Selecting by Median Home Value

Finally I print top_50_tbl as a formatted table.

Counties with the Highest Median Home Values in 2017
Rank Jurisdiction Median Home Value
1 San Francisco County, California 1104100
2 San Mateo County, California 1087100
3 Marin County, California 1014000
4 New York County, New York 976100
5 Santa Clara County, California 971300
6 Alameda County, California 781000
7 Santa Cruz County, California 775000
8 Arlington County, Virginia 735100
9 Kings County, New York 701800
10 Honolulu County, Hawaii 680200
11 Orange County, California 679400
12 Contra Costa County, California 642300
13 Maui County, Hawaii 641500
14 Napa County, California 628500
15 Sonoma County, California 628400
16 District of Columbia, District of Columbia 607200
17 Ventura County, California 592500
18 Los Angeles County, California 588700
19 San Luis Obispo County, California 574400
20 Alexandria city, Virginia 573200
21 Santa Barbara County, California 567600
22 Fairfax County, Virginia 567000
23 San Diego County, California 563800
24 King County, Washington 563600
25 Kauai County, Hawaii 560600
26 Westchester County, New York 547800
27 Queens County, New York 545800
28 Monterey County, California 517200
29 Monroe County, Florida 513700
30 Middlesex County, Massachusetts 511700
31 Loudoun County, Virginia 508800
32 Boulder County, Colorado 504800
33 Richmond County, New York 501300
34 Suffolk County, Massachusetts 497000
35 Nassau County, New York 489900
36 Montgomery County, Maryland 477800
37 Norfolk County, Massachusetts 473400
38 El Dorado County, California 472100
39 Bergen County, New Jersey 470300
40 Douglas County, Colorado 468900
41 Placer County, California 468400
42 Morris County, New Jersey 462400
43 Williamson County, Tennessee 451400
44 Howard County, Maryland 444500
45 Rockland County, New York 439100
46 Nevada County, California 437200
47 Yolo County, California 430400
48 Fairfield County, Connecticut 425900
49 Somerset County, New Jersey 424900
50 Monmouth County, New Jersey 421700

Appendix

Caveats

All values for median home value are estimates based on survey samples, with associated margins of error. For county-level estimates the associated standard errors are a few thousand dollars, as can be seen in the graph above showing error bars.

As noted above, rankings reflect only larger counties and county-equivalents (with populations over 65,000), since smaller counties and county-equivalents are not represented in the ACS 1-year estimates.

Using inflation-adjusted values for median home values is somewhat suspect, since housing costs are themselves a major component of the Consumer Price Index, including the CPI-U-RS values used in this analysis.

References

To obtain the CSV files used in this analysis I used the American Factfinder site maintained by the U.S. Census Bureau. More specifically, I did the following:

  1. I selected the “Advanced Search” option and clicked on “Show me all”.
  2. On the resulting page I specified the value of the “topic or table name” field as “B25077”, the ACS table for median home value, and clicked “Go”.
  3. From the resulting list I clicked on the link for “2017 ACS 1-year estimates”. This displays the median home value estimate for the United States as a whole (the default geography).
  4. I then clicked the “Add Geographies” button to add estimates for all the geographies of interest:
    1. I first selected “United States - 010” as the geographic type and “United States” as the geography, and then clicked “Add to Your Selections”.
    2. I next selected “State - 040” as the geographic type and “All States within United States and Puerto Rico” as the geographies, and again clicked “Add to Your Selections”.
    3. Finally I repeated the process with “County - 050” as the geographic type and “All Counties within United States and Puerto Rico” as the geographies.
  5. Having selected all the geographies of interest, I clicked on “Show Table” to show the table of estimates and associated margins of error.
  6. The table as displayed has columns for each geography. Because I wanted a row for each geography, I clicked on “Modify Table” and then on “Transpose Rows/Columns”.
  7. I then clicked on “Download”, selected the option “Use the data (e.g., in a spreadsheet or database)”, checked only the box “Include descriptive data element names?” (leaving “Merge the annotations and data into a single file?” unchecked) and then clicked “OK”.
  8. After the file had been prepared I clicked ”Download” to download the data. This resulted in downloading a file “ACS_17_1YR_B25077.zip”.
  9. Uncompressing the file produced a directory “ACS_17_1YR_B25077” with several files, including the file “ACS_17_1YR_B25077.csv” containing the actual data of interest.
  10. I then went back and under the section “Versions of this table are available for the following years” I clicked on “2016”. This displayed the median home value for 2016 for all the geographies selected, again with one column per geography. I repeated the process of modifying the table to transpose rows and columns, downloading the data, and uncompressing the zipped file to extract the file “ACS_17_1YR_B25077.csv”.
  11. I repeated the process for each of the other years for which data was available, from 2015 back to 2005.

To adjust for inflation I used the CPI-U-RS data produced by the Bureau of Labor Statistics, specifically the “Updated CPI-U-RS, All items, 1977-2018” Excel spreadsheet, as recommended by the U.S. Census Bureau’s guide to comparing 2017 ACS data to prior years.

Suggestions for others

The plots above could be extended to include additional geographies for comparison. However this would require modifying the color palette used for the line plots, since the colorblind-friendly palette used above has only nine colors, including gray and black.

Environment

I used the following R environment in doing the analysis above:

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] knitr_1.23      readxl_1.3.1    forcats_0.4.0   stringr_1.4.0  
##  [5] dplyr_0.8.1     purrr_0.3.2     readr_1.3.1     tidyr_0.8.3    
##  [9] tibble_2.1.3    ggplot2_3.1.1   tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1       highr_0.8        cellranger_1.1.0 pillar_1.4.1    
##  [5] compiler_3.6.0   plyr_1.8.4       digest_0.6.19    lubridate_1.7.4 
##  [9] jsonlite_1.6     evaluate_0.14    nlme_3.1-139     gtable_0.3.0    
## [13] lattice_0.20-38  pkgconfig_2.0.2  rlang_0.3.4      cli_1.1.0       
## [17] rstudioapi_0.10  yaml_2.2.0       haven_2.1.0      xfun_0.7        
## [21] withr_2.1.2      xml2_1.2.0       httr_1.4.0       hms_0.4.2       
## [25] generics_0.0.2   grid_3.6.0       tidyselect_0.2.5 glue_1.3.1      
## [29] R6_2.4.0         rmarkdown_1.13   modelr_0.1.4     magrittr_1.5    
## [33] backports_1.1.4  scales_1.0.0     htmltools_0.3.6  rvest_0.3.4     
## [37] assertthat_0.2.1 colorspace_1.4-1 labeling_0.3     stringi_1.4.3   
## [41] lazyeval_0.2.2   munsell_0.5.0    broom_0.5.2      crayon_1.3.4

Source code

You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.