Maryland Median Household Income Over Time

Introduction

In this document I look at historical trends for Maryland median household income in relation to nearby jurisdictions.

For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.

Setup and data preparation

Libraries

I use the tidyverse package of functions for general data manipulation, the readxl package (part of the Tidyverse, but not loaded by default) to read Excel files, and the knitr package to produce printed tables. I also use the tools package to get the md5sum function.

library(tidyverse)
library(readxl)
library(knitr)
library(tools)

Data sources

I use data from the following U.S. Census Bureau Excel spreadsheets:

h08.xls: Median household income by state, 1984-2017.
h08a.xls: 2-Year average of median household income by state, 1985-2017.
h08b.xls: 3-Year average of median household income by state, 1986-2017.

I check to make sure that the versions of the files being used in this analysis are identical to the versions of the files I originally downloaded. I do this by comparing the MD5 checksums of the files against MD5 values I previously computed, and stopping execution if they do not match.

stopifnot(md5sum("h08.xls") == "179175185733538c170127ef638a2c78")
stopifnot(md5sum("h08a.xls") == "fe390b065810e9ba42c52d1db797e723")
stopifnot(md5sum("h08b.xls") == "b742ec83105bc3e8fa351b17469be642")

Reading in and preparing the data

I use the read_xls function to read in the Excel spreadsheets, selecting only those cells that contain the data of interest. I use the col_types parameter to further skip over columns not used in this analysis and to provide the data types for those that are used. I also provide a set of new column names.

As read in, the data has one row for each jurisdiction (U.S., state, etc.) and one column for each year. To simplify plotting and other data manipulation I convert the data into so-called “tidy” format, where each row contains the income value for a given year and jurisdiction.

I do this using the gather(Year, Income, -Jurisdiction) function. For each row I take all columns except Jurisdiction, put the value in each column into a new variable Income, and convert the column name to a new variable Year. I then convert the Year values from character strings to integers.

ct <- c("text",
        rep(c("numeric", "skip"), 4),
        "skip", "skip",
        rep(c("numeric", "skip"), 30))

mhi_current <- read_xls("h08.xls",
                        range = "A7:BS58",
                        col_names = c("Jurisdiction", 2017:1984),
                        col_types = ct) %>%
  gather(Year, Income, -Jurisdiction) %>%
  mutate(Year = as.integer(Year))

mhi_constant <- read_xls("h08.xls",
                         range = "A62:BS113",
                         col_names = c("Jurisdiction", 2017:1984),
                         col_types = ct) %>%
  gather(Year, Income, -Jurisdiction) %>%
  mutate(Year = as.integer(Year))

mhi_2ya <- read_xls("h08a.xls",
                    range = "A6:BM57",
                    col_names = c("Jurisdiction", 2017:2015, 2013:1985),
                    col_types = c("text",
                                  rep(c("numeric", "skip"), 32))) %>%
  gather(Year, Income, -Jurisdiction) %>%
  mutate(Year = as.integer(Year))

mhi_3ya <- read_xls("h08b.xls",
                    range = "A6:BK57",
                    col_names = c("Jurisdiction", 2017:2015, 2013:1986),
                    col_types = c("text",
                                  rep(c("numeric", "skip"), 31))) %>%
  gather(Year, Income, -Jurisdiction) %>%
  mutate(Year = as.integer(Year))

Analysis

I do analyses to answer the following two questions:

How does Maryland median household income compare to the median household income for the U.S. as a whole, as well as to the median household incomes for the neighboring jurisdictions of Virginia and the District of Columbia?
What are the states with the highest median household income?

Maryland median household income historical trends

I use ggplot and associated functions to plot a series of graphs showing estimated Maryland median household income in relation to income estimates for D.C. and Virginia, as well as in relation to the United States overall.

The first graph is for 1-year estimates expressed in current dollars (i.e., in dollars as of each year in question, without adjusting for inflation). I create the graph as follows:

I start with the mhi_current table.
I filter out all rows except for those for the jurisdictions of interest.
I use ggplot to create the plot, using geom_line to plot a line for each of the groups of data points (corresponding to the individual jurisdictions). I use color to distinguish between the lines for the different jurisdictions.
I specify an x-axis running from 1980 to 2020, with tick marks every five years.
I add labels for the x- and y-axes, along with a plot title, subtitle, and caption.
I use the theme_minimal theme for a clean look, and then tweak it slightly for readability, displaying the x-axis tick mark labels at an angle, moving the x- and y-axis labels slightly away from the tick mark labels, and moving the caption slightly lower.

cbbPalette <- c("#56B4E9", "#009E73", "#000000", "#E69F00",
                "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

jurisdictions <-  c("Maryland", "D.C.", "Virginia", "United States")

mhi_current %>%
  filter(Jurisdiction %in% jurisdictions) %>%
  ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
  geom_line() +
  scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
  scale_x_continuous(breaks=seq(1980, 2020, 5)) +
  scale_y_continuous(labels = scales::dollar) +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
       subtitle="Estimates in Current Dollars (Not Inflation-Adjusted)",
       caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15)))

Maryland has consistently had the highest median household incomes of these jurisdictions, but the District of Columbia has caught up in recent years.

I repeat the same plot, but using inflation-adjusted values.

mhi_constant %>%
  filter(Jurisdiction %in% jurisdictions) %>%
  ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1980, 2020, 5)) +
  scale_y_continuous(labels = scales::dollar) +
  scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
       subtitle="Estimates in 2017 Dollars (Inflation-Adjusted)",
       caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15)))

Except for the District of Columbia, median household income has not increased very much in real terms over time: less than $10,000 in 2017 dollars over 30 years, or about $300 per year—-significantly less than 1% per year.

I repeat the previous plot, but this time using 2-year averages.

mhi_2ya %>%
  filter(Jurisdiction %in% jurisdictions) %>%
  ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1980, 2020, 5)) +
  scale_y_continuous(labels = scales::dollar) +
  scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
  xlab("Year") +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
       subtitle="2-Year Averages, Estimates in 2017 Dollars (Inflation-Adjusted)",
       caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8A") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15)))

I repeat the previous plot, but this time using 3-year averages.

mhi_3ya %>%
  filter(Jurisdiction %in% jurisdictions) %>%
  ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1980, 2020, 5)) +
  scale_y_continuous(breaks = seq(50000, 80000, 10000),
                     labels = scales::dollar) +
  scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
  xlab("Year") +
  ylab("Median Household Income") +
  labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
       subtitle="3-Year Averages, Estimates in 2017 Dollars (Inflation-Adjusted)",
       caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8B") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15)))

Finally, I plot median household incomes for the jurisdictions of interest as a percentage of the U.S. median household incomes, using the 3-year averages.

mhi_relative <- mhi_3ya %>%
  spread(Jurisdiction, Income) %>%
  mutate_at(2:52, function(x) 100 * x / .[["United States"]]) %>%
  gather(Jurisdiction, Income, -Year)
  
mhi_relative %>%
  filter(Jurisdiction %in% jurisdictions) %>%
  ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
  geom_line() +
  scale_x_continuous(breaks = seq(1980, 2020, 5)) +
  scale_y_continuous() +
  scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
  ylab("Median Household Income") +
  ylab("% of U.S. Median Household Income") +
  labs(title="Relative Median Household Income for Maryland and Other Jurisdictions",
       subtitle="3-Year Averages as Percentage of U.S. Median Household Income",
       caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8B") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  theme(axis.title.x=element_text(margin=margin(t=10))) +
  theme(axis.title.y=element_text(margin=margin(r=10))) +
  theme(plot.caption=element_text(margin=margin(t=15)))

Maryland median household income has been about 130% or more of U.S. median household income in the past few years, while Virginia median household income has been about 120% or more.

In contrast the District of Columbia has undergone a startling change from the 1990s, when its median household income was less than 90% of U.S. median household income, to the present day, when its median household income rivals that of Maryland.

Jurisdictions with the highest median household incomes

I next display a list of jurisdictions with the highest estimated median household incomes in 2017, as follows:

I start with the mh_current table.
I filter for data from 2017.
I remove the Year variable (since I no longer need it).
I arrange the resulting table in descending order by the variable Income.
I take the first 20 jurisdictions in the list.

top_20 <- mhi_current %>%
  filter(Year == 2017) %>%
  select(-Year) %>%
  arrange(desc(Income)) %>%
  top_n(20)

## Selecting by Income

I then use the kable function to print the resulting table.

Jurisdictions with the Highest Median Household Incomes in 2017
Jurisdiction	Income
D.C.	83382
Maryland	81084
Washington	75418
New Hampshire	74801
Colorado	74172
Hawaii	73575
Massachusetts	73227
New Jersey	72997
Connecticut	72780
Alaska	72231
Minnesota	71920
Utah	71319
Virginia	71293
California	69759
Rhode Island	66390
Oregon	64610
Illinois	64609
Vermont	63805
Iowa	63481
Wisconsin	63451

Maryland is the state with the highest median household income, but as noted above for 2017 it is slightly outranked by the District of Columbia. Virginia is some ways doen the list.

Appendix

Caveats

All values for median household income are estimates based on survey samples, with associated margins of error. For the 1-year Maryland estimates the associated standard errors are a few thousand dollars. For the multi-year averages the standard errors are smaller.

In 2013 the Census Bureau did two separate surveys using different subsamples. See footnotes 38 and 39 in the Historical Income Table Footnotes for a discussion of the differences. For this analysis I chose the sets of values collected according to footnote 38.

References

National and state data for 1984 through 2017 are from the U.S. Census Bureau “Historical Income Tables: Households”, Table H-8, “Median Household Income by State”. This table includes income values in both constant (2017) dollars and current dollars (i.e., current as of the year in question).

Note that there are two alternative sources for this data: Table H-8A, “Median Household Income by State - 2 Year Average”, and Table H-8B, “Median Household Income by State - 3 Year Average” These tables are in constant 2017 dollars only.

I downloaded all spreadsheets directly from the U.S. Census Bureau web site and did not make any changes to them.

Suggestions for future work

The plots above could be extended to include additional jurisdictions for comparison.

The analysis of the jurisdictions with the highest median household incomes could be extended to include a trend plot showing how the relative rankings of Maryland, D.C., Virginia, and other jurisdictions have changed over time.

Environment

I used the following R environment in doing the analysis above:

sessionInfo()

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] knitr_1.22      readxl_1.3.1    forcats_0.4.0   stringr_1.4.0  
##  [5] dplyr_0.8.0.1   purrr_0.3.2     readr_1.3.1     tidyr_0.8.3    
##  [9] tibble_2.1.1    ggplot2_3.1.1   tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1       highr_0.8        cellranger_1.1.0 pillar_1.3.1    
##  [5] compiler_3.6.0   plyr_1.8.4       digest_0.6.18    lubridate_1.7.4 
##  [9] jsonlite_1.6     evaluate_0.13    nlme_3.1-139     gtable_0.3.0    
## [13] lattice_0.20-38  pkgconfig_2.0.2  rlang_0.3.4      cli_1.1.0       
## [17] rstudioapi_0.10  yaml_2.2.0       haven_2.1.0      xfun_0.6        
## [21] withr_2.1.2      xml2_1.2.0       httr_1.4.0       hms_0.4.2       
## [25] generics_0.0.2   grid_3.6.0       tidyselect_0.2.5 glue_1.3.1      
## [29] R6_2.4.0         rematch_1.0.1    rmarkdown_1.12   modelr_0.1.4    
## [33] magrittr_1.5     backports_1.1.4  scales_1.0.0     htmltools_0.3.6 
## [37] rvest_0.3.3      assertthat_0.2.1 colorspace_1.4-1 labeling_0.3    
## [41] stringi_1.4.3    lazyeval_0.2.2   munsell_0.5.0    broom_0.5.2     
## [45] crayon_1.3.4

Source code

You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.