In this document I look at historical trends for Maryland median household income in relation to nearby jurisdictions.
For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.
I use the tidyverse package of functions for general data manipulation, the readxl package (part of the Tidyverse, but not loaded by default) to read Excel files, and the knitr package to produce printed tables. I also use the tools package to get the md5sum
function.
library(tidyverse)
library(readxl)
library(knitr)
library(tools)
I use data from the following U.S. Census Bureau Excel spreadsheets:
h08.xls
: Median household income by state, 1984-2017.h08a.xls
: 2-Year average of median household income by state, 1985-2017.h08b.xls
: 3-Year average of median household income by state, 1986-2017.I check to make sure that the versions of the files being used in this analysis are identical to the versions of the files I originally downloaded. I do this by comparing the MD5 checksums of the files against MD5 values I previously computed, and stopping execution if they do not match.
stopifnot(md5sum("h08.xls") == "179175185733538c170127ef638a2c78")
stopifnot(md5sum("h08a.xls") == "fe390b065810e9ba42c52d1db797e723")
stopifnot(md5sum("h08b.xls") == "b742ec83105bc3e8fa351b17469be642")
I use the read_xls
function to read in the Excel spreadsheets, selecting only those cells that contain the data of interest. I use the col_types
parameter to further skip over columns not used in this analysis and to provide the data types for those that are used. I also provide a set of new column names.
As read in, the data has one row for each jurisdiction (U.S., state, etc.) and one column for each year. To simplify plotting and other data manipulation I convert the data into so-called “tidy” format, where each row contains the income value for a given year and jurisdiction.
I do this using the gather(Year, Income, -Jurisdiction)
function. For each row I take all columns except Jurisdiction
, put the value in each column into a new variable Income
, and convert the column name to a new variable Year
. I then convert the Year
values from character strings to integers.
ct <- c("text",
rep(c("numeric", "skip"), 4),
"skip", "skip",
rep(c("numeric", "skip"), 30))
mhi_current <- read_xls("h08.xls",
range = "A7:BS58",
col_names = c("Jurisdiction", 2017:1984),
col_types = ct) %>%
gather(Year, Income, -Jurisdiction) %>%
mutate(Year = as.integer(Year))
mhi_constant <- read_xls("h08.xls",
range = "A62:BS113",
col_names = c("Jurisdiction", 2017:1984),
col_types = ct) %>%
gather(Year, Income, -Jurisdiction) %>%
mutate(Year = as.integer(Year))
mhi_2ya <- read_xls("h08a.xls",
range = "A6:BM57",
col_names = c("Jurisdiction", 2017:2015, 2013:1985),
col_types = c("text",
rep(c("numeric", "skip"), 32))) %>%
gather(Year, Income, -Jurisdiction) %>%
mutate(Year = as.integer(Year))
mhi_3ya <- read_xls("h08b.xls",
range = "A6:BK57",
col_names = c("Jurisdiction", 2017:2015, 2013:1986),
col_types = c("text",
rep(c("numeric", "skip"), 31))) %>%
gather(Year, Income, -Jurisdiction) %>%
mutate(Year = as.integer(Year))
I do analyses to answer the following two questions:
I use ggplot
and associated functions to plot a series of graphs showing estimated Maryland median household income in relation to income estimates for D.C. and Virginia, as well as in relation to the United States overall.
The first graph is for 1-year estimates expressed in current dollars (i.e., in dollars as of each year in question, without adjusting for inflation). I create the graph as follows:
mhi_current
table.ggplot
to create the plot, using geom_line
to plot a line for each of the groups of data points (corresponding to the individual jurisdictions). I use color to distinguish between the lines for the different jurisdictions.theme_minimal
theme for a clean look, and then tweak it slightly for readability, displaying the x-axis tick mark labels at an angle, moving the x- and y-axis labels slightly away from the tick mark labels, and moving the caption slightly lower.cbbPalette <- c("#56B4E9", "#009E73", "#000000", "#E69F00",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")
jurisdictions <- c("Maryland", "D.C.", "Virginia", "United States")
mhi_current %>%
filter(Jurisdiction %in% jurisdictions) %>%
ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
geom_line() +
scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
scale_x_continuous(breaks=seq(1980, 2020, 5)) +
scale_y_continuous(labels = scales::dollar) +
ylab("Median Household Income") +
labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
subtitle="Estimates in Current Dollars (Not Inflation-Adjusted)",
caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8") +
theme_minimal() +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(axis.title.x=element_text(margin=margin(t=10))) +
theme(axis.title.y=element_text(margin=margin(r=10))) +
theme(plot.caption=element_text(margin=margin(t=15)))
Maryland has consistently had the highest median household incomes of these jurisdictions, but the District of Columbia has caught up in recent years.
I repeat the same plot, but using inflation-adjusted values.
mhi_constant %>%
filter(Jurisdiction %in% jurisdictions) %>%
ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
geom_line() +
scale_x_continuous(breaks = seq(1980, 2020, 5)) +
scale_y_continuous(labels = scales::dollar) +
scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
ylab("Median Household Income") +
labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
subtitle="Estimates in 2017 Dollars (Inflation-Adjusted)",
caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8") +
theme_minimal() +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(axis.title.x=element_text(margin=margin(t=10))) +
theme(axis.title.y=element_text(margin=margin(r=10))) +
theme(plot.caption=element_text(margin=margin(t=15)))
Except for the District of Columbia, median household income has not increased very much in real terms over time: less than $10,000 in 2017 dollars over 30 years, or about $300 per year—-significantly less than 1% per year.
I repeat the previous plot, but this time using 2-year averages.
mhi_2ya %>%
filter(Jurisdiction %in% jurisdictions) %>%
ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
geom_line() +
scale_x_continuous(breaks = seq(1980, 2020, 5)) +
scale_y_continuous(labels = scales::dollar) +
scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
xlab("Year") +
ylab("Median Household Income") +
labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
subtitle="2-Year Averages, Estimates in 2017 Dollars (Inflation-Adjusted)",
caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8A") +
theme_minimal() +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(axis.title.x=element_text(margin=margin(t=10))) +
theme(axis.title.y=element_text(margin=margin(r=10))) +
theme(plot.caption=element_text(margin=margin(t=15)))
I repeat the previous plot, but this time using 3-year averages.
mhi_3ya %>%
filter(Jurisdiction %in% jurisdictions) %>%
ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
geom_line() +
scale_x_continuous(breaks = seq(1980, 2020, 5)) +
scale_y_continuous(breaks = seq(50000, 80000, 10000),
labels = scales::dollar) +
scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
xlab("Year") +
ylab("Median Household Income") +
labs(title="Median Household Income for Maryland vs. Other Jurisdictions",
subtitle="3-Year Averages, Estimates in 2017 Dollars (Inflation-Adjusted)",
caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8B") +
theme_minimal() +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(axis.title.x=element_text(margin=margin(t=10))) +
theme(axis.title.y=element_text(margin=margin(r=10))) +
theme(plot.caption=element_text(margin=margin(t=15)))
Finally, I plot median household incomes for the jurisdictions of interest as a percentage of the U.S. median household incomes, using the 3-year averages.
mhi_relative <- mhi_3ya %>%
spread(Jurisdiction, Income) %>%
mutate_at(2:52, function(x) 100 * x / .[["United States"]]) %>%
gather(Jurisdiction, Income, -Year)
mhi_relative %>%
filter(Jurisdiction %in% jurisdictions) %>%
ggplot(aes(x = Year, y = Income, color = Jurisdiction)) +
geom_line() +
scale_x_continuous(breaks = seq(1980, 2020, 5)) +
scale_y_continuous() +
scale_color_manual(values = cbbPalette, breaks = jurisdictions) +
ylab("Median Household Income") +
ylab("% of U.S. Median Household Income") +
labs(title="Relative Median Household Income for Maryland and Other Jurisdictions",
subtitle="3-Year Averages as Percentage of U.S. Median Household Income",
caption="Data source: U.S. Census Bureau, Historical Income Tables: Households, Table H-8B") +
theme_minimal() +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(axis.title.x=element_text(margin=margin(t=10))) +
theme(axis.title.y=element_text(margin=margin(r=10))) +
theme(plot.caption=element_text(margin=margin(t=15)))
Maryland median household income has been about 130% or more of U.S. median household income in the past few years, while Virginia median household income has been about 120% or more.
In contrast the District of Columbia has undergone a startling change from the 1990s, when its median household income was less than 90% of U.S. median household income, to the present day, when its median household income rivals that of Maryland.
I next display a list of jurisdictions with the highest estimated median household incomes in 2017, as follows:
mh_current
table.Year
variable (since I no longer need it).Income
.top_20 <- mhi_current %>%
filter(Year == 2017) %>%
select(-Year) %>%
arrange(desc(Income)) %>%
top_n(20)
## Selecting by Income
I then use the kable
function to print the resulting table.
Jurisdiction | Income |
---|---|
D.C. | 83382 |
Maryland | 81084 |
Washington | 75418 |
New Hampshire | 74801 |
Colorado | 74172 |
Hawaii | 73575 |
Massachusetts | 73227 |
New Jersey | 72997 |
Connecticut | 72780 |
Alaska | 72231 |
Minnesota | 71920 |
Utah | 71319 |
Virginia | 71293 |
California | 69759 |
Rhode Island | 66390 |
Oregon | 64610 |
Illinois | 64609 |
Vermont | 63805 |
Iowa | 63481 |
Wisconsin | 63451 |
Maryland is the state with the highest median household income, but as noted above for 2017 it is slightly outranked by the District of Columbia. Virginia is some ways doen the list.
All values for median household income are estimates based on survey samples, with associated margins of error. For the 1-year Maryland estimates the associated standard errors are a few thousand dollars. For the multi-year averages the standard errors are smaller.
In 2013 the Census Bureau did two separate surveys using different subsamples. See footnotes 38 and 39 in the Historical Income Table Footnotes for a discussion of the differences. For this analysis I chose the sets of values collected according to footnote 38.
National and state data for 1984 through 2017 are from the U.S. Census Bureau “Historical Income Tables: Households”, Table H-8, “Median Household Income by State”. This table includes income values in both constant (2017) dollars and current dollars (i.e., current as of the year in question).
Note that there are two alternative sources for this data: Table H-8A, “Median Household Income by State - 2 Year Average”, and Table H-8B, “Median Household Income by State - 3 Year Average” These tables are in constant 2017 dollars only.
I downloaded all spreadsheets directly from the U.S. Census Bureau web site and did not make any changes to them.
The plots above could be extended to include additional jurisdictions for comparison.
The analysis of the jurisdictions with the highest median household incomes could be extended to include a trend plot showing how the relative rankings of Maryland, D.C., Virginia, and other jurisdictions have changed over time.
I used the following R environment in doing the analysis above:
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] tools stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] knitr_1.22 readxl_1.3.1 forcats_0.4.0 stringr_1.4.0
## [5] dplyr_0.8.0.1 purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
## [9] tibble_2.1.1 ggplot2_3.1.1 tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 highr_0.8 cellranger_1.1.0 pillar_1.3.1
## [5] compiler_3.6.0 plyr_1.8.4 digest_0.6.18 lubridate_1.7.4
## [9] jsonlite_1.6 evaluate_0.13 nlme_3.1-139 gtable_0.3.0
## [13] lattice_0.20-38 pkgconfig_2.0.2 rlang_0.3.4 cli_1.1.0
## [17] rstudioapi_0.10 yaml_2.2.0 haven_2.1.0 xfun_0.6
## [21] withr_2.1.2 xml2_1.2.0 httr_1.4.0 hms_0.4.2
## [25] generics_0.0.2 grid_3.6.0 tidyselect_0.2.5 glue_1.3.1
## [29] R6_2.4.0 rematch_1.0.1 rmarkdown_1.12 modelr_0.1.4
## [33] magrittr_1.5 backports_1.1.4 scales_1.0.0 htmltools_0.3.6
## [37] rvest_0.3.3 assertthat_0.2.1 colorspace_1.4-1 labeling_0.3
## [41] stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0 broom_0.5.2
## [45] crayon_1.3.4
You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.