Introduction

In this document I do some basic analyses on median household income of Howard County, Maryland over time compared to other local jurisdictions.

For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.

Setup and data preparation

Libraries

I use the tidyverse package of functions for general data manipulation, the readxl package (part of the Tidyverse, but not loaded by default) to read Excel spreadsheets, the knitr package to create an inline table using the kable function, and the tools package to get the md5sum function.

library("tidyverse")
library("readxl")
library("knitr")
library("tools")

Data sources

I use data from the following sources; see the References section below for more information:

  • ACS_05_EST_B19013.csv. Median household income 1-year estimates for the U.S., all states, and all counties (and county-equivalent geographies) for 2005. NOTE: The 1-year estimates do not include counties and county-equivalents with populations less than 65,000 people. Only 5-year estimates are available for these geographies.
  • ACS_06_EST_B19013.csv. Same data for 2006.
  • ACS_07_1YR_B19013.csv through ACS_17_1YR_B19013.csv. Same data for 2007 through 2017.
  • allitems.xlsx. Consumer Price Index data used to adjust for inflation.

I check to make sure that the versions of the files being used in this analysis are identical to the versions of the files I originally downloaded. I do this by comparing the MD5 checksums of the files against MD5 values I previously computed, and stopping execution if they do not match.

stopifnot(md5sum("ACS_05_EST_B19013.csv") == "83d2f4e4d941b2df9bbc600b1c581b9a")
stopifnot(md5sum("ACS_06_EST_B19013.csv") == "9814594aa476d72fe34db653f4b812ab")
stopifnot(md5sum("ACS_07_1YR_B19013.csv") == "d7786fa929289a2ab1aef3faaf65bb11")
stopifnot(md5sum("ACS_08_1YR_B19013.csv") == "ee2b141db5b59c6b66b75ca386244e3e")
stopifnot(md5sum("ACS_09_1YR_B19013.csv") == "675e1b729f43ee93f5cc58222b6fe6db")
stopifnot(md5sum("ACS_10_1YR_B19013.csv") == "415bc328bf0f869170b163a73e3ead1e")
stopifnot(md5sum("ACS_11_1YR_B19013.csv") == "de90254b51a547765dd3e4f3575fd51d")
stopifnot(md5sum("ACS_12_1YR_B19013.csv") == "0b0e3ff4f5a6717a9f6e7d8dbc0a95d4")
stopifnot(md5sum("ACS_13_1YR_B19013.csv") == "be67896e52ca070347a3a598ea896de2")
stopifnot(md5sum("ACS_14_1YR_B19013.csv") == "2ba769f819a0fb695a092d2d4f514580")
stopifnot(md5sum("ACS_15_1YR_B19013.csv") == "8ab9bf16870d6c401559afad5bb4506b")
stopifnot(md5sum("ACS_16_1YR_B19013.csv") == "e51169bd90138554447fe81a97f0390f")
stopifnot(md5sum("ACS_17_1YR_B19013.csv") == "a164336c7addb3eecfba09abb1dc2264")
stopifnot(md5sum("allitems.xlsx") == "f890a4aff139ca003246d17491d51272")

Reading in and preparing the data

The CSV files all have the same format: two lines of header information followed by one row for each geography, containing the following variables:

  • Geographic ID for the geography. This is a 9-character value for the U.S. as a whole, an 11-character value for each state, and a 14-character value for each county or county-equivalent.
  • Secondary geographic ID for the geography. This is null for the U.S. as a whole, a two-digit value for each state, and a five-digit value (padded with leading zeroes) for each county or county-equivalent.
  • The estimated median household income for the geography.
  • The margin of error for the median household income estimate.

For the analysis below I want all the data in one big table mhi_tbl, with an additional field for the year value. I do this as follows:

  1. I create a list of all the CSV filenames as mhi_files, and a corresponding list mhi_years of the years for those files.
  2. I create an empty table mhi_tbl with five variables corresponding to the five columns of each CSV file–geo_id, geo_id2, geography, mhi, and moe–as well as an additional variable year. To do this I take advantage of the fact that the read_csv function can take its input from a string literal, in this case a newline-terminated blank line.
  3. I then use an index variable to loop through each year and file. For each year and file I do the following:
    1. Read in the CSV file for that year, skipping the first two lines, assigning each column’s value to one of the first five variables in the mhi_tbl table.
    2. Modify the table just read in to add a new variable year set to the year value corresponding to the CSV file.
    3. Add the rows of the table thus created to the mhi_tbl table.
  4. In the previous step note that I used conventional function composition rather than the pipe syntax in order to add new rows to the end of mhi_tbl rather than the beginning.
  5. Some of the geography names include non-ASCII characters, which would cause problems later in this analysis when trying to use geography names as variables. Also, the name of Dona Ana County in New Mexico is spelled inconsistently in the various CSV files. I therefore define a function fix_nonascii_geography which will take a value of the geo_id variable and for the problematic geographies will return a modified name. (For all other geographies it will simply return the name modified.)
  6. Having defined the fix_nonascii_geography function, I use it to change the value of the geography variable in the mhi_tbl table. Note that this is a vector operation, so the fix_nonascii_geography function has to be able to take a vector as an argument. That’s why it uses the ifelse function, which can work on vectors.
mhi_files <- c("ACS_05_EST_B19013.csv",
               "ACS_06_EST_B19013.csv",
               "ACS_07_1YR_B19013.csv",
               "ACS_08_1YR_B19013.csv",
               "ACS_09_1YR_B19013.csv",
               "ACS_10_1YR_B19013.csv",
               "ACS_11_1YR_B19013.csv",
               "ACS_12_1YR_B19013.csv",
               "ACS_13_1YR_B19013.csv",
               "ACS_14_1YR_B19013.csv",
               "ACS_15_1YR_B19013.csv",
               "ACS_16_1YR_B19013.csv",
               "ACS_17_1YR_B19013.csv")

mhi_years <- 2005:2017

mhi_cols <- c("geo_id", "geo_id2", "geography", "mhi", "moe")
mhi_tbl <- read_csv("\n", skip = 2, col_names = c(mhi_cols, "year"), col_types = "cccddd")

for (i in seq_along(mhi_years)) {
  mhi_tbl <- bind_rows(mhi_tbl,
                       mutate(year = mhi_years[[i]],
                              read_csv(mhi_files[[i]],
                                       skip = 2,
                                       col_names = mhi_cols,
                                       col_types = "cccdd")))
}

# For geographies with non-ASCII or missing names, return a usable name.
fix_nonascii_geography <- function(g, j) {
  ifelse(g == "0500000US72021", "Bayamon Municipio, Puerto Rico",
  ifelse(g == "0500000US72097", "Mayaguez Municipio, Puerto Rico",
  ifelse(g == "0500000US35013", "Dona Ana County, New Mexico",
  j
  )))
}

mhi_tbl <- mhi_tbl %>%
  mutate(geography = fix_nonascii_geography(geo_id, geography))

The CSV files above contain 1-year estimates that are expressed in dollars as of the year in question. (For example, the 1-year estimates for 2010 are in 2010 dollars.) In order to adjust values for inflation I read in an Excel spreadsheet containing consumer price index data from 1978 to 2018. The index values are expressed as a percentage of the 1997 index (set at 100).

The spreadsheet has two sheets, each of which has values for all twelve months of the year plus an average figure for the entire year. The sheets differ in whether or not the monthly figures are seasonally-adjusted. Since I am doing adjustments for entire years I set col_types to skip over the monthly values, and read only the first sheet (which is not seasonally-adjusted).

cpi_tbl <- read_excel("allitems.xlsx",
                      sheet = 1,
                      skip = 8,
                      col_names = c("year", "index"),
                      col_types = c("numeric", rep("skip", 12), "numeric"))

I want to express all income values as 2017 dollars. I therefore take the index value for 2017 and divide all other index values by the 2017 index value to create a new table adj_tbl. The variable adj in this table can be multiplied by income values for each year to express them in 2017 dollars.

index_2017 <- cpi_tbl %>%
  filter(year == 2017) %>%
  select(index) %>%
  as.integer()

adj_tbl <- cpi_tbl %>%
  mutate(adj = index_2017 / index) %>%
  select(year, adj)

Analysis

I do analyses to answer the following questions:

Howard County median household income

For my first graph I do a simple line plot of how the median household income for Howard County and selected other geographies have changed over time. I don’t attempt to adjust the income values for inflation, so they are in current dollars for the year they were estimated.

I create the graph as follows:

  1. I start with the mhi_tbl table.
  2. I filter out all rows except for those for the geographies of interest.
  3. I use ggplot to create the plot, using geom_line to plot a line for each of the groups of data points (corresponding to the individual geographies). I use color to distinguish between the lines for the different geographies, with a special “colorblind-friendly” palette designed to be more readable by people with various forms of color blindess. (See the comments in the code for more information on how the color assignment is done.)
  4. I specify an x-axis running from 2004 to 2018, with tick marks every two years.
  5. I add a label for the y-axis (the x-axis label is taken from the variable Year), along with a plot title, subtitle, and caption.
  6. I use the theme_minimal theme for a clean look, and then tweak it slightly for readability, displaying the x-axis tick mark labels at an angle, moving the x- and y-axis labels slightly away from the tick mark labels, moving the caption slightly lower, and removing the title on the graph legend.
# Note: The order of geographies in this list determines the order
# in which they are listed in the graph legends. To improve readability
# I list the geographies in order from highest to lowest median
# household income in 2017 (the last year of data graphed).
geographies <- c("Loudoun County, Virginia",
                 "Fairfax County, Virginia",
                 "Stafford County, Virginia",
                 "Howard County, Maryland",
                 "Montgomery County, Maryland",
                 "Anne Arundel County, Maryland",
                 "District of Columbia",
                 "United States",
                 "Baltimore city, Maryland")

# Note: Colors are assigned to geographies based on their
# alphabetical order. To improve readability I assign the colors
# black to Howard County and gray to the United States.
cbPalette <- c("#E69F00",  # Anne Arundel County, Maryland
               "#56B4E9",  # Baltimore city, Maryland
               "#009E73",  # District of Columbia
               "#F0E442",  # Fairfax County, Virginia
               "#000000",  # Howard County, Maryland
               "#0072B2",  # Loudoun County, Virginia
               "#D55E00",  # Montgomery County, Maryland
               "#CC79A7",  # Stafford County, Virginia
               "#999999")  # United States

mhi_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot(aes(x = year, y = mhi, color = geography)) +
  geom_line(size = 0.8) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks=seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in Current Dollars (Not Inflation-Adjusted)",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B19013") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

For my next graph I repeat the same plot, but add error bars to show the margins of error for the median household income estimates. I use position_dodge() so that bars for two different geographies in the same year don’t display right on top of each other.

mhi_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot() +
  geom_line(aes(x = year, y = mhi, color = geography), size = 0.8) +
  geom_errorbar(aes(x = year, ymin = mhi - moe, ymax = mhi + moe, color = geography), width=.2, position = position_dodge(.2)) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks = seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in Current Dollars (Not Inflation-Adjusted) Including Margins of Error (90%)",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B19013") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

I then repeat the first plot, this time using constant 2017 dollars. The adjustment is done as follows, creating a new table mhi_adj_tbl:

  1. Again I start with the mhi_tbl table.
  2. I then join the resulting table with the adj_tbl table containing inflation adjustment figures for each year, using year as the common variable. This has the effect of adding a new column containing the variable adj. This variable has the value 1 for the year 2017, and for other years contains the adjustment factor needed to convert incomes into 2017 dollars.
  3. I modify the mhi and moe variables to convert them to 2017 dollars.

I then plot the graph as before, first filtering for the geographies of interest.

mhi_adj_tbl <- mhi_tbl %>%
  inner_join(adj_tbl, by = "year") %>%
  mutate(mhi = mhi * adj, moe = moe * adj)

mhi_adj_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot(aes(x = year, y = mhi, color = geography)) +
  geom_line(size = 0.8) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks=seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in 2017 Dollars",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B19013") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

I then redo the graph to show error bars:

mhi_adj_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot() +
  geom_line(aes(x = year, y = mhi, color = geography), size = 0.8) +
  geom_errorbar(aes(x = year, ymin = mhi - moe, ymax = mhi + moe, color = geography), width=.2, position = position_dodge(.3)) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks = seq(2004, 2018, 2)) +
  scale_y_continuous(labels = scales::dollar) +
  xlab("Year") +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Howard County vs. Other Jurisdictions",
       subtitle="1-Year Estimates in 2017 Dollars Including Margins of Error (90%)",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B19013") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

Next I plot median household incomes for the geographies of interest as a percentage of the overall U.S. median household income. I do this as follows:

  1. I start with the mhi_tbl table. (Because I am comparing incomes relative to one another, it does not matter whether I use inflation-adjusted values or not.)
  2. I retain only the year, geography, and mhi variables.
  3. I use spread to change the table to make each geography into a separate column, with the column value being the mhi value. This produces a table with one row for each year.
  4. I create an anonymous function that will divide a column’s value (the median household income) by the value for the column corresponding to the United States overall and then multiply by 100 to create a pecentage. I then apply that function to each column other than the first (which holds the year value). (Note that the . in ncol(.) references the (temporary) table against which mutate_at is executed.)
  5. I then use gather to reverse the effect of spread and create a new table mhi_rel_tbl that has a similar form to mhi_tbl, with each row representing a combination of year, geography, and (relative) median household income.
  6. Finally I use ggplot and related functions to create the actual graph for the geographies of interest.
mhi_rel_tbl <- mhi_tbl %>%
  select(year, geography, mhi) %>%
  spread(geography, mhi) %>%
  mutate_at(2:ncol(.), function(x) 100 * x / .[["United States"]]) %>%
  gather(geography, mhi, -year)

mhi_rel_tbl %>%
  filter(geography %in% geographies) %>%
  ggplot(aes(x = year, y = mhi, color = geography)) +
  geom_line(size = 0.8) +
  scale_color_manual(values = cbPalette, breaks = geographies) +
  scale_x_continuous(breaks = seq(1980, 2020, 5)) +
  scale_y_continuous() +
  xlab("Year") +
  ylab("% of U.S. Median Household Income") +
  labs(title="Relative Median Household Income for Howard County, Maryland",
       subtitle="1-Year Estimates as Percentage of U.S. Median Household Income",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B19013") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

Howard County ranking over time

To see how the ranking of Howard County versus other large counties (with populations over 65,000) has changed over time, I first compute rankings for each year as follows:

  1. I start with the mhi_tbl table created above.
  2. I filter for only counties and equivalent geographies (e.g., Baltimore City) using the first two characters of the geo_id variable.
  3. I group the rows by year, and then use themin_rankfunction to compute a new variableranking` representing the relative rank of each county or county-equivalent for that year, with rank 1 being the county with highest median household income.
mhi_rank_tbl <- mhi_tbl %>%
  filter(str_sub(geo_id, 1, 2) == "05") %>%
  group_by(year) %>%
  mutate(ranking = min_rank(desc(mhi)))

I then graph the rankings for Howard County and selected other counties in Maryland and Virginia. Because the list of geographies in the graph is different from that used in the graphs above, I modify the graph’s palette to ensure that the mapping of colors to geographies remains the same as in the other graphs.

# Note: As before, the order of this list determines the order that
# geographies are listed in the legend, from top to bottom.
geographies2 <- c("Loudoun County, Virginia",
                  "Fairfax County, Virginia",
                  "Stafford County, Virginia",
                  "Howard County, Maryland",
                  "Montgomery County, Maryland",
                  "Anne Arundel County, Maryland")

# Note: As below, colors are assigned to geographies based on their
# alphabetical order. Since some geographies have been removed from
# the graph I use a modified palette to maintain the same mapping of
# color to geography as in previous graphs.
cbPalette2 <- c("#E69F00",  # Anne Arundel County, Maryland
                "#F0E442",  # Fairfax County, Virginia
                "#000000",  # Howard County, Maryland
                "#0072B2",  # Loudoun County, Virginia
                "#D55E00",  # Montgomery County, Maryland
                "#CC79A7")  # Stafford County, Virginia

mhi_rank_tbl %>%
  filter(geography %in% geographies2) %>%
  ggplot(aes(x = year, y = ranking, color = geography)) +
  geom_line(size = 0.8) +
  scale_x_continuous(breaks=seq(2004, 2018, 2)) +
  scale_color_manual(values = cbPalette2, breaks = geographies2) +
  scale_y_reverse(breaks = seq(0, 40, 5)) +
  xlab("Year") +
  ylab("Rank") +
  labs(title="Ranking Howard County, Maryland vs. Other Jurisdictions",
       subtitle="Based on 1-Year Estimates of U.S. Median Household Income",
       caption="Data source: U.S. Census Bureau, American Community Survey, Table B19013") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15))) +
  theme(legend.title = element_blank())

Finally, to compare Howard County with other affluent counties from across the U.S., I create a table top_30_tbl of the highest-ranking counties (or county-equivalent geographies) by median household income in 2017. (Again, this is restricted to counties or county-equivalents with ppopulations over 65,000 that are represented in the ACS 1-year estimates.)

I do this as follows:

  1. I start with the mhi_rank_tbl table created above, which contains rankings grouped by year.
  2. I remove the grouping by year. (This makes it possible to remove the year variable from the table being created.)
  3. I retain only the rank values, the geography names, and the median household income values.
  4. I sort the resulting list in ascending order by rank.
  5. I retain the first 30 rows, representing those county-level geographies ranked from 1 to 30.
top_30_tbl <- mhi_rank_tbl %>%
  ungroup() %>%
  filter(year == 2017) %>%
  select(ranking, geography, mhi) %>%
  rename(Rank = ranking, Jurisdiction = geography, `Median Household Income` = mhi) %>%
  arrange(Rank) %>%
  top_n(30)
## Selecting by Median Household Income

Finally I print top_30_tbl as a formatted table.

Counties with the Highest Median Household Incomes in 2017
Rank Jurisdiction Median Household Income
1 Loudoun County, Virginia 135842
2 Santa Clara County, California 119035
3 Fairfax County, Virginia 118279
4 Arlington County, Virginia 117237
5 San Mateo County, California 116653
6 Morris County, New Jersey 114732
7 Marin County, California 113908
8 Stafford County, Virginia 112795
9 Somerset County, New Jersey 112153
10 Hunterdon County, New Jersey 112038
11 Douglas County, Colorado 111482
12 Howard County, Maryland 111473
13 San Francisco County, California 110816
14 Nassau County, New York 108133
15 Delaware County, Ohio 106933
16 Williamson County, Tennessee 105622
17 Montgomery County, Maryland 103235
18 Forsyth County, Georgia 102084
19 Carver County, Minnesota 101271
20 Prince William County, Virginia 100845
21 Norfolk County, Massachusetts 100829
22 Calvert County, Maryland 100590
23 Alexandria city, Virginia 100530
24 Putnam County, New York 99479
25 Fauquier County, Virginia 98585
26 Middlesex County, Massachusetts 98555
27 Scott County, Minnesota 98538
28 Monmouth County, New Jersey 98270
29 Charles County, Maryland 97986
30 Anne Arundel County, Maryland 97085

Appendix

Caveats

All values for median household income are estimates based on survey samples, with associated margins of error. For county-level estimates the associated standard errors are a few thousand dollars, as can be seen in the graph above showing error bars.

As noted above, rankings reflect only larger counties and county-equivalents (with populations over 65,000), since smaller counties and county-equivalents are not represented in the ACS 1-year estimates.

References

To obtain the CSV files used in this analysis I used the American Factfinder site maintained by the U.S. Census Bureau. More specifically, I did the following:

  1. I selected the “Advanced Search” option and clicked on “Show me all”.
  2. On the resulting page I specified the value of the “topic or table name” field as “B19013”, the ACS table for median household income, and clicked “Go”.
  3. From the resulting list I clicked on the link for “2017 ACS 1-year estimates”. This displays the median household income estimate for the United States as a whole (the default geography).
  4. I then clicked the “Add Geographies” button to add estimates for all the geographies of interest:
    1. I first selected “United States - 010” as the geographic type and “United States” as the geography, and then clicked “Add to Your Selections”.
    2. I next selected “State - 040” as the geographic type and “All States within United States and Puerto Rico” as the geographies, and again clicked “Add to Your Selections”.
    3. Finally I repeated the process with “County - 050” as the geographic type and “All Counties within United States and Puerto Rico” as the geographies.
  5. Having selected all the geographies of interest, I clicked on “Show Table” to show the table of estimates and associated margins of error.
  6. The table as displayed has columns for each geography. Because I wanted a row for each geography, I clicked on “Modify Table” and then on “Transpose Rows/Columns”.
  7. I then clicked on “Download”, selected the option “Use the data (e.g., in a spreadsheet or database)”, checked only the box “Include descriptive data element names?” (leaving “Merge the annotations and data into a single file?” unchecked) and then clicked “OK”.
  8. After the file had been prepared I clicked ”Download” to download the data. This resulted in downloading a file “ACS_17_1YR_B19013.zip”.
  9. Uncompressing the file produced a directory “ACS_17_1YR_B19013” with several files, including the file “ACS_17_1YR_B19013.csv” containing the actual data of interest.
  10. I then went back and under the section “Versions of this table are available for the following years” I clicked on “2016”. This displayed the median household income for 2016 for all the geographies selected, again with one column per geography. I repeated the process of modifying the table to transpose rows and columns, downloading the data, and uncompressing the zipped file to extract the file “ACS_17_1YR_B19013.csv”.
  11. I repeated the process for each of the other years for which data was available, from 2015 back to 2005.

To adjust for inflation I used the CPI-U-RS data produced by the Bureau of Labor Statistics, specifically the “Updated CPI-U-RS, All items, 1977-2018” Excel spreadsheet, as recommended by the U.S. Census Bureau’s guide to comparing 2017 ACS data to prior years.

Suggestions for others

The plots above could be extended to include additional geographies for comparison. However this would require modifying the color palette used for the line plots, since the colorblind-friendly palette used above has only nine colors, including gray and black.

Environment

I used the following R environment in doing the analysis above:

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] knitr_1.22      readxl_1.3.1    forcats_0.4.0   stringr_1.4.0  
##  [5] dplyr_0.8.1     purrr_0.3.2     readr_1.3.1     tidyr_0.8.3    
##  [9] tibble_2.1.1    ggplot2_3.1.1   tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1       highr_0.8        cellranger_1.1.0 pillar_1.4.0    
##  [5] compiler_3.6.0   plyr_1.8.4       digest_0.6.18    lubridate_1.7.4 
##  [9] jsonlite_1.6     evaluate_0.13    nlme_3.1-139     gtable_0.3.0    
## [13] lattice_0.20-38  pkgconfig_2.0.2  rlang_0.3.4      cli_1.1.0       
## [17] rstudioapi_0.10  yaml_2.2.0       haven_2.1.0      xfun_0.7        
## [21] withr_2.1.2      xml2_1.2.0       httr_1.4.0       hms_0.4.2       
## [25] generics_0.0.2   grid_3.6.0       tidyselect_0.2.5 glue_1.3.1      
## [29] R6_2.4.0         rmarkdown_1.12   modelr_0.1.4     magrittr_1.5    
## [33] backports_1.1.4  scales_1.0.0     htmltools_0.3.6  rvest_0.3.4     
## [37] assertthat_0.2.1 colorspace_1.4-1 labeling_0.3     stringi_1.4.3   
## [41] lazyeval_0.2.2   munsell_0.5.0    broom_0.5.2      crayon_1.3.4

Source code

You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.