In this document I do some basic analyses on housing costs for renters and homeowners in Howard County, Maryland, looking at U.S. census data at the census tract level. (A census tract is the largest census-specific geography, above the level of census block groups and census blocks.) For the measures of housing costs I use median gross rent as a percentage of household income (for renters) and median selected costs of housing as a percentage of household income (for homeowners). (See the References section below for discussion of other ways to measure housing costs.)
For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.
I use the following packages for the following purposes:
library(tidyverse)
library(tidycensus)
library(tigris)
library(sf)
library(viridis)
library(knitr)
The tidycensus and tigris packages require a Census API key. I previously obtained such a key and stored it for use by the functions included in the tidycensus and tigris packages.
I use data from the following sources; see the References section below for more information:
get_acs() function.get_acs() function.tracts() function.roads() function.In this analysis I primarily look at data at the level of a census tract. There are currently 55 census tracts in Howard County. Tract-level data is not available in the American Community Survey 1-year estimates, but is available in the 5-year estimates.
As noted in previous analyses I’ve done, a typical census tract in Howard County contains about 6,000 people, and is roughly comparable in both size and population to a Columbia village.
I use the tidycensus get_acs() function to retrieve median gross rent as a percentage of household income from the 2017 ACS 5-year estimates for all census tracts in Howard County. This function returns data in tidy format, with the following columns:
GEOID. An 12-character geographic ID for the census tract, as discussed above.NAME. The name of the census tract.variable. References the ACS variable being retrieved, in this case “B25071_001”).estimate. The ACS 5-year estimate for the variable’s value.moe. The margin of error for the estimate.I use estimate and moe to calculate a coefficient of variation cv for the data. The CV can be used to gauge the relative reliability of the estimates, as discussed below.
I similarly use get_acs() to retrieve median selected costs of housing as a percentage of household income for homeowners with and without a mortage, using variables “B25092_002” and “B25092_003” respectively. Again I calculate CV values for the estimates.
mgrpi_tr <- get_acs(
year = 2017,
survey = "acs5",
geography = "tract",
state = "MD",
county = "Howard County",
variables = "B25071_001"
) %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_m_tr <- get_acs(
year = 2017,
survey = "acs5",
geography = "tract",
state = "MD",
county = "Howard County",
variables = "B25092_002"
) %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_nm_tr <- get_acs(
year = 2017,
survey = "acs5",
geography = "tract",
state = "MD",
county = "Howard County",
variables = "B25092_003"
) %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
I next get the same measures for Howard County as a whole (and other Maryland counties, in case I want to use them in future analyses), using the same general procedure as above.
mgrpi_co <- get_acs(
year = 2017,
survey = "acs5",
geography = "county", state = "MD", variables = "B25071_001"
) %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_m_co <- get_acs(
year = 2017,
survey = "acs5",
geography = "county",
state = "MD",
variables = "B25092_002"
) %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_nm_co <- get_acs(
year = 2017,
survey = "acs5",
geography = "county",
state = "MD",
variables = "B25092_003"
) %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
Next I get the same measures for Maryland as a whole (along with all other U.S. states, in case I want to use them in future analyses).
mgrpi_st <- get_acs(year = 2017, survey = "acs5", geography = "state", variables = "B25071_001") %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_m_st <- get_acs(year = 2017, survey = "acs5", geography = "state", variables = "B25092_002") %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_nm_st <- get_acs(year = 2017, survey = "acs5", geography = "state", variables = "B25092_003") %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
Finally, I get the same measures for the U.S. as a whole.
mgrpi_us <- get_acs(year = 2017, survey = "acs5", geography = "us", variables = "B25071_001") %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_m_us <- get_acs(year = 2017, survey = "acs5", geography = "us", variables = "B25092_002") %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
mscohpi_nm_us <- get_acs(year = 2017, survey = "acs5", geography = "us", variables = "B25092_003") %>%
mutate(cv = 100 * ((moe / 1.645) / estimate))
## Getting data from the 2013-2017 5-year ACS
In order to produce maps of housing costs I need geometry of the cartographic boundaries for all census tracts in Howard County as of 2017.
I use the tigris function tracts() to return the cartographic boundaries in the form of a table compatible with the sf package. The table as returned for 2017 has a variable GEOID that is identical to the variable of the same name returned by get_decennial() and get_acs(). I don’t need any of the metadata returned with the boundaries, so I retain only GEOID and the geometry itself.
options(tigris_use_cache = TRUE)
map_tr <- tracts(
year = 2017,
state = "MD",
county = "Howard County",
cb = TRUE, class = "sf",
progress_bar = FALSE
) %>%
select(GEOID)
map_co <- counties(
year = 2017,
state = "MD",
cb = TRUE, class = "sf",
progress_bar = FALSE
) %>%
select(GEOID)
To help orient readers as to the locations of the census tracts, I also want any maps generated to also display major roads in Howard County that correspond in whole or in part to census tract boundaries. I use the tigris function roads() to return geometry for all Howard County roads.
Because I don’t need or want to display each and every Howard County road, I use the RTTYP and FULLNAME variables to filter the results to retain only major roads (interstates and U.S. highways) and minor roads (Maryland state routes, and roads with “Pkwy” in their name). I store the geometry for each in separate variables, so that I can plot them at different widths.
all_roads <- roads(state = "MD", county = "Howard County", class = "sf", progress_bar = FALSE)
major_roads_geo <- all_roads %>%
filter(RTTYP %in% c("I", "U")) %>%
st_geometry()
minor_roads_geo <- all_roads %>%
filter(RTTYP == "S" | str_detect(FULLNAME, "Pkwy")) %>%
st_geometry()
I do analyses to answer the following question:
Before I look at relative housing costs within Howard County, here’s a quick summary of relative housing costs at the county, state, and national level:
The estimated median gross rent as a percentage of household income for the 2013-2017 timeframe was 27.8% for Howard County, 30.2% for Maryland, and 30.3% for the U.S. overall. The corresponding margins of error were 0.9%, 0.3%, and 0.1% respectively.
The estimated median selected costs of housing as a percentage of household income for owners with mortgages in the 2013-2017 timeframe was 21.7% for Howard County, 22.3% for Maryland, and 22% for the U.S. overall. The corresponding margins of error were 0.3%, 0.2%, and 0.3% respectively.
Finally, the estimated median selected costs of housing as a percentage of household income for owners without mortgages in the 2013-2017 timeframe was 10% for Howard County, 11% for Maryland, and 11.6% for the U.S. overall. The corresponding margins of error were NA%, 0.1%, and 0.1% respectively.
As discussed previously, there are currently 55 census tracts in Howard County. For the 2013-2017 timeframe the smallest median gross rent as a percentage of household income for any census tract was 14.5% and the largest 50%. Margins of error for estimates at the census tract level ranged from 1% to 44.1%.
For the 2013-2017 timeframe the smallest median selected costs of housing as a percentage of household income for owners with a mortgage for any census tract was 18.8% and the largest 28%. Margins of error for estimates at the census tract level ranged from 1.3% to 6.9%.
For the 2013-2017 timeframe the smallest median selected costs of housing as a percentage of household income for owners without a mortgage for any census tract was 10% and the largest 18.2%. Margins of error for estimates at the census tract level ranged from 2.2% to 8.8%.
The reliability of American Community Survey estimates can be judged based on the coefficients of variation. As a rule of thumb, estimates with coefficients of variation less than or equal to 12 are considered to have high reliability, estimates with CVs between 12 and 40 are considered to have medium reliability, and estimates with CVs above 40 are considered to have low reliability.
For the estimates of median gross rent as a percentage of income, the median coefficient of variation is 16 at the tract level, 4 at the county level within Maryland, and 1 at the state level.
For the estimates of median selected cost of housing as a percentage of income for owners with mortgages, the median coefficient of variation is 6 at the tract level, 1 at the county level within Maryland, and 1 at the state level.
For the estimates of median selected cost of housing as a percentage of income for owners without mortgages, the median coefficient of variation is 19 at the tract level, 4 at the county level within Maryland, and 1 at the state level.
The county- and state-level estimates thus all have high reliability, but the estimates at the census tract level are less reliable. That’s why I decided to do this analysis at the level of census tracts as opposed to census block groups, since reliability at the census block group level would be even worse.
The following histogram shows the distribution of coefficients of variation for the tract-level estimates for renters. (By setting the x-axis limit to 100 I eliminate from the histogram one outlier CV that is over 100.)
mgrpi_tr %>%
filter(!is.na(cv)) %>%
ggplot(mapping = aes(x = cv)) +
geom_histogram(binwidth = 5) +
xlab("Coefficients of Variation") +
ylab("Number of Census Tracts") +
scale_x_continuous(limits = c(0, 100), breaks = seq(0, 100, 10)) +
scale_y_continuous(limits = c(0, 50), breaks = seq(0, 50, 10)) +
labs(
title = "Reliability of Howard County Housing Cost Estimates (Renters)",
subtitle = paste0(
"Coefficients of Variation of ACS 2017 5-Year Estimates for",
"\nCensus Tracts of Median Gross Rent as Percentage of Income"
),
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B25071",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
## Warning: Removed 1 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
5 of the 55 Howard County census tracts do not have valid estimates and/or margins of error for the median gross rent as a percentage of income. Of those tracts that do have estimates, most estimates appear to be of medium or low reliability.
The following histogram shows the distribution of coefficients of variation for the tract-level estimates for homeowners with mortages.
mscohpi_m_tr %>%
filter(!is.na(cv)) %>%
ggplot(mapping = aes(x = cv)) +
geom_histogram(binwidth = 5) +
xlab("Coefficients of Variation") +
ylab("Number of Census Tracts") +
scale_x_continuous(limits = c(0, 100), breaks = seq(0, 100, 10)) +
scale_y_continuous(limits = c(0, 50), breaks = seq(0, 50, 10)) +
labs(
title = "Reliability of Howard County Housing Cost Estimates (Owners with Mortgages)",
subtitle = paste0(
"Coefficients of Variation of ACS 2017 5-Year Estimates for Census Tracts of",
"\nMedian Selected Cost of Housing as Percentage of Income (Owners with Mortgages)"
),
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B25092",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
## Warning: Removed 2 rows containing missing values (geom_bar).
All these estimates appear to be of high or medium reliability.
Finally, the following histogram shows the distribution of coefficients of variation for the tract-level estimates for homeowners without mortages.
mscohpi_nm_tr %>%
filter(!is.na(cv)) %>%
ggplot(mapping = aes(x = cv)) +
geom_histogram(binwidth = 5) +
xlab("Coefficients of Variation") +
ylab("Number of Census Tracts") +
scale_x_continuous(limits = c(0, 100), breaks = seq(0, 100, 10)) +
scale_y_continuous(limits = c(0, 50), breaks = seq(0, 50, 10)) +
labs(
title = "Reliability of Howard County Housing Cost Estimates (Owners w/o Mortgages)",
subtitle = paste0(
"Coefficients of Variation of ACS 2017 5-Year Estimates for Census Tracts of",
"\nMedian Selected Cost of Housing as Percentage of Income (Owners w/o Mortgages)"
),
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B25092",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
## Warning: Removed 2 rows containing missing values (geom_bar).
36 of the 55 Howard County census tracts do not have valid estimates and/or margins of error for the median selected costs of housing as a percentage of income for homeowners without mortgages. Of those tracts that do have estimates the estimates appear to be of medium reliability.
I now want to look at differing median housing costs relative to income across Howard County. I begin by discarding estimates for any census tracts that have missing coefficients of variation, or for which the CV indicates that the estimate is of low reliability.
mgrpi_tr <- mgrpi_tr %>%
filter(!is.na(cv) & cv < 40)
mscohpi_m_tr <- mscohpi_m_tr %>%
filter(!is.na(cv) & cv < 40)
mscohpi_nm_tr <- mscohpi_nm_tr %>%
filter(!is.na(cv) & cv < 40)
The following histogram shows the distribution of median gross rents as a percentage of household income among the various census tracts in Howard County. This histogram and the following two histograms all share the same scales on the x- and y-axis, in order to allow direct comparisons among the distributions.
mgrpi_hc <- mgrpi_co %>% filter(GEOID == "24027") %>% select(estimate) %>% as.numeric()
mgrpi_tr %>%
ggplot(aes(x = estimate)) +
geom_histogram(boundary = 0, binwidth = 5) +
geom_vline(mapping = aes(xintercept = 30)) +
geom_vline(mapping = aes(xintercept = mgrpi_hc), linetype = "dashed") +
xlab("Median Gross Rent as Percentage of Household Income") +
ylab("Number of Census Tracts") +
scale_x_continuous(limits = c(0, 55), breaks = seq(0, 55, 5)) +
scale_y_continuous(limits = c(0, 50), breaks = seq(0, 50, 10)) +
labs(
title = "2013-2017 Howard County Relative Housing Costs (Renters)",
subtitle = "Median Gross Rents as Percentage of Household Income by Census Tract",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B25071",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
The solid vertical line at 30% shows the level of rent vs. income above which a household is said to be “burdened” with respect to housing costs. Note that there are many census tracts within which the median household is paying well above 30% of household income in gross rents, and is thus considered to be burdened.
The dashed line shows the estimated median gross rent as a percentage of income for all Howard County households. Half of all households pay less than this percentage, and half more. This value is less than 30%, so less than 50% of Howard County households overall are housing cost burdened.
The following histogram shows the distribution of median selected costs of housing as a percentage of household income for owners with mortgages among the various census tracts in Howard County.
mscohpi_m_hc <- mscohpi_m_co %>% filter(GEOID == "24027") %>% select(estimate) %>% as.numeric()
mscohpi_m_tr %>%
ggplot(aes(x = estimate)) +
geom_histogram(boundary = 0, binwidth = 5) +
geom_vline(mapping = aes(xintercept = 30)) +
geom_vline(mapping = aes(xintercept = mscohpi_m_hc), linetype = "dashed") +
xlab("Median Selected Costs of Housing as Percentage of Household Income") +
ylab("Number of Census Tracts") +
scale_x_continuous(limits = c(0, 55), breaks = seq(0, 55, 5)) +
scale_y_continuous(limits = c(0, 50), breaks = seq(0, 50, 10)) +
labs(
title = "2013-2017 Howard County Relative Housing Costs (Owners with Mortgages)",
subtitle = "Median Selected Costs of Housing as Percentage of Household Income by Census Tract",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B25092",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
Based on this graph there no census tracts in Howard County in which the median homeowner with a mortgage is housing cost burdened. (This of course does not rule out the possibility that some homeowners with a mortgage may be housing cost burdened. I’m referring here to the median or typical homeowner.)
The following histogram shows the distribution of median selected costs of housing as a percentage of household income for owners without mortgages among the various census tracts in Howard County.
mscohpi_nm_hc <- mscohpi_nm_co %>% filter(GEOID == "24027") %>% select(estimate) %>% as.numeric()
mscohpi_nm_tr %>%
ggplot(aes(x = estimate)) +
geom_histogram(boundary = 0, binwidth = 5) +
geom_vline(mapping = aes(xintercept = 30)) +
geom_vline(mapping = aes(xintercept = mscohpi_nm_hc), linetype = "dashed") +
xlab("Median Selected Costs of Housing as Percentage of Household Income") +
ylab("Number of Census Tracts") +
scale_x_continuous(limits = c(0, 55), breaks = seq(0, 55, 5)) +
scale_y_continuous(limits = c(0, 50), breaks = seq(0, 50, 10)) +
labs(
title = "2013-2017 Howard County Relative Housing Costs (Owners w/o Mortgages)",
subtitle = "Median Selected Costs of Housing as Percentage of Household Income by Census tract",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B25092",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
Based on this graph there no census tracts in Howard County in which the median homeowner without a mortgage is housing cost burdened. (This of course does not rule out the possibility that some homeowners without a mortgage may be housing cost burdened. I’m referring here to the median or typical homeowner.) In almost all census tracts the median homeowning household with no mortgage pays less than 15% of its income on housing costs.
I next show the same data in the form of maps. In order to have a consistent scale for all maps I first calculate suitable lower and upper limits for the scale.
lower_limit <- round(min(
min(mgrpi_tr$estimate, na.rm = TRUE),
min(mscohpi_m_tr$estimate, na.rm = TRUE),
min(mscohpi_nm_tr$estimate, na.rm = TRUE)
) - 4.9, -1)
upper_limit <- round(max(
max(mgrpi_tr$estimate, na.rm = TRUE),
max(mscohpi_m_tr$estimate, na.rm = TRUE),
max(mscohpi_nm_tr$estimate, na.rm = TRUE)
) + 4.9, -1)
The first map shows the estimated median gross rent as a percentage of household income across Howard County, for the 2013-2017 timeframe.
(I don’t bother trying to label the various Howard County roads because it would be more work and make the map more cluttered—and in any case the intended audience of Howard County residents should be able to identify most if not all of the roads by sight.)
mgrpi_tr %>%
right_join(map_tr, by = "GEOID") %>%
ggplot(aes(fill = estimate, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
scale_fill_viridis(option = "plasma", name = "% of Income", limits = c(lower_limit, upper_limit), breaks = seq(lower_limit, upper_limit, 10)) +
labs(
title = "2013-2017 Howard County Relative Housing Costs (Renters)",
subtitle = "Median Gross Rent as Percentage of Household Income by Census Tract",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey, Table B25071",
"\n U.S. Census Bureau, 2017 cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages",
"\nGray areas indicate missing or unreliable estimates"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
Based on the above map there are multiple census tracts across the county where renters are housing cost burdened. (The gray areas of this map and those below correspond to census tracts that were dropped from the analysis based on missing or too high coefficients of variation.)
I now create a comparable map for median selected costs of housing as a percentage of household income for owners with mortgages.
mscohpi_m_tr %>%
right_join(map_tr, by = "GEOID") %>%
ggplot(aes(fill = estimate, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
scale_fill_viridis(option = "plasma", name = "% of Income", limits = c(lower_limit, upper_limit), breaks = seq(lower_limit, upper_limit, 10)) +
labs(
title = "2013-2017 Howard County Relative Housing Costs (Owners with Mortgages)",
subtitle = "Median Selected Costs of Housing as Percentage of Household Income by Census Tract",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey, Table B25092",
"\n U.S. Census Bureau, 2017 cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
This map matches the corresponding histogram above: the median homeowning household with a mortgage in almost all Howard County census tracts is not housing cost burdened.
Finally, I create a comparable map for median selected costs of housing as a percentage of household income for owners without mortgages.
mscohpi_nm_tr %>%
filter(!is.na(cv)) %>%
filter(cv < 40) %>%
right_join(map_tr, by = "GEOID") %>%
ggplot(aes(fill = estimate, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
scale_fill_viridis(option = "plasma", name = "% of Income", limits = c(lower_limit, upper_limit), breaks = seq(lower_limit, upper_limit, 10)) +
labs(
title = "2013-2017 Howard County Relative Housing Costs (Owners w/o Mortgages)",
subtitle = "Median Selected Costs of Housing as Percentage of Household Income by Census Tract",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey, Table B25092",
"\n U.S. Census Bureau, 2017 cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages",
"\nGray areas indicate missing or unreliable estimates"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
Again this matches the conclusion from the corresponding histogram above: across Howard County typical homeowners without mortgages spend relatively little on housing.
Values from the American Community Survey are estimates based on survey samples, with associated margins of error. For estimates at the census tract level the associated standard errors can be comparable to or even greater than the estimates themselves.
As noted above, 5-year estimates identified as being for a certain year actually reflect surveys conducted over the prior five years. So the 2017 estimates actually correspond to the 2013-2017 time frame.
To obtain the 2013-2017 ACS data I used the tidycensus package, which provides an easy-to-use interface to U.S. Census Bureau data made available via a set of public APIs. As its name suggests, the tidycensus package is designed to be compatible with the tidyverse approach to representing and manipulating data.
To obtain the census tract boundaries I used the tigris package, which is designed to accompany the tidycensus package and provides access to U.S. Census Bureau cartographic boundaries and TIGER/Line shapefiles.
There are at least two different ways to measure relative housing costs for households. The first is to look at the median percentage of households’ incomes taken up by housing costs. That is, for each household first compute the percentage of its income taken up by housing costs. Then compute the median of all such values, such that half of all households spend more than that percentage and half less.
An alternative measure of housing costs is the percentage of households whose housing costs exceed a certain amount, typically taken to be 30% of household income. Such households are said to be “housing cost burdened”. Households spending more than 50% of their income on housing are often referred to as “severely burdened”.
These two measures are distinct but related: If the median household spends 30% of its income on housing, then half of all households spend more than that and half less. This corresponds to 50% of households being housing cost burdened. If the median household spends, say, 40% of its income on housing, then half of all households spend more than 40%, which implies that more than half spend more than 30%. In this case over 50% of households would be housing cost burdened.
For this analysis I use the median percentage of housing costs relative to income, mainly because these estimates are directly available in the American Community Survey data. There are multiple estimates of interest:
For more background on government-sponsored housing affordability measures see the U.S. Census Bureau publication “Who Can Afford To Live in a Home?: A look at data from the 2006 American Community Survey”. The definitions of “gross rent” and “selected monthly owner costs” are from the publication “American Community Survey and Puerto Rico Community Survey: 2017 Subject Definitions”.
The 30-percent guideline is discussed in the Department of Housing and Urban Development publication “Rental Burdens: Rethinking Affordability Measures”. For an alternative view on how to measure housing costs, see “What is Housing Affordability? The Case for the Residual Income Approach”, by Michael E. Stone.
For a high-level overview of how to calculate reliability of American Community Survey estimates, see “Calculating American Community Survey (ACS) Reliability”. For a more in-depth discussion of margins of error and coefficients of variation in ACS estimates, see “Understanding the Margin of Errors and the Coefficient of Variance in the American Community Survey”.
If you look closely at the maps above you’ll notice that there’s a gap where MD 100 joins U.S. 29. That appears to be due to an unnamed section of road that is not classified as being part of either highway. I have not yet been able to identify where the geometry for that road section is in the table returned by the roads() function.
I used the following R environment in doing the analysis above:
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.24 viridis_0.5.1 viridisLite_0.3.0
## [4] sf_0.7-7 tigris_0.8.2 tidycensus_0.9.2
## [7] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3
## [10] purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
## [13] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.8 haven_2.1.1
## [4] lattice_0.20-38 colorspace_1.4-1 generics_0.0.2
## [7] vctrs_0.2.0 htmltools_0.3.6 yaml_2.2.0
## [10] rlang_0.4.0 e1071_1.7-2 pillar_1.4.2
## [13] foreign_0.8-71 glue_1.3.1 withr_2.1.2
## [16] DBI_1.0.0 rappdirs_0.3.1 sp_1.3-1
## [19] uuid_0.1-2 modelr_0.1.5 readxl_1.3.1
## [22] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
## [25] rvest_0.3.4 evaluate_0.14 maptools_0.9-5
## [28] class_7.3-15 broom_0.5.2 Rcpp_1.0.2
## [31] KernSmooth_2.23-15 scales_1.0.0 backports_1.1.4
## [34] classInt_0.4-1 jsonlite_1.6 gridExtra_2.3
## [37] hms_0.5.0 digest_0.6.20 stringi_1.4.3
## [40] grid_3.6.0 rgdal_1.4-4 cli_1.1.0
## [43] tools_3.6.0 magrittr_1.5 lazyeval_0.2.2
## [46] crayon_1.3.4 pkgconfig_2.0.2 zeallot_0.1.0
## [49] xml2_1.2.2 lubridate_1.7.4 assertthat_0.2.1
## [52] rmarkdown_1.14 httr_1.4.1 rstudioapi_0.10
## [55] R6_2.4.0 units_0.6-3 nlme_3.1-139
## [58] compiler_3.6.0
You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.