Introduction

In this document I do some basic analyses on the population density of Howard County, Maryland, census block groups over time. (A census block group is a census geography below the level of census tracts and above the level of census blocks.)

For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.

Setup and data preparation

Libraries

I use the following packages for the following purposes:

  • tidyverse: do general data manipulation.
  • tidycensus: retrieve U.S. census data.
  • tigris: retrieve U.S. census geospatial data.
  • sf: manipulate geospatial data.
  • viridis: obtain special color palettes for graphs.
  • scales: enable formatting numbers in legend with commas.
  • knitr: display tabular data.
library(tidyverse)
library(tidycensus)
library(tigris)
library(sf)
library(viridis)
library(scales)
library(knitr)

The tidycensus and tigris packages require a Census API key. I previously obtained such a key and stored it for use by the functions included in the tidycensus and tigris packages.

Data sources

I use data from the following sources; see the References section below for more information:

  • Population estimates for census block groups in 2010 are from the 2010 decennial census table P1, and are accessed via the tidycensus get_decennial() function.
  • Population estimates for census block groups for 2013-2017 are from the American Community Survey table B01003, and are accessed via the tidycensus get_acs() function.
  • Census block group boundaries are from the U.S. Census Bureau cartographic boundaries data for 2010 and 2017, and are accessed via the tigris block_groups() function.
  • Road geometry is also from the U.S. Census Bureau, and is accessed via the tigris roads() function.

Reading in and preparing the data

My goal is to look at population density across Howard County, at a relatively fine-grained level. (For example, are there particular neighborhoods in the county that have relatively low or relatively high population density?) In previous analyses I looked at data at the level of census tracts, of which there are 55 in Howard County.

In this analysis I primarily look at data at the level of a census block group (CBG), a geography that is one level below a census tract. There are currently 154 census block groups in Howard County. CBG-level data is not available in the American Community Survey 1-year estimates, but is available in the 5-year estimates, at least for the 2017 survey (2013-2017 time frame), which is the latest currently available.

The smallest geography in U.S. Census data is the census block, one level below the census block group. There are currently 4,845 census blocks defined for Howard County. The American Community Survey does not provide estimates at the census block level, but such data is available for the 2010 decennial census.

My primary interest is in population, so the only data I need is total population at the census block group level. In theory I could retrieve this data for 2010 by calling the tidycensus get_decennial() function and asking for the “P001P001” variable for the census block group geography. However, there is an apparent bug in the Census API that prevents this from working.

Instead I use the get_decennial() function to retrieve population counts for all census blocks in Howard County for the 2010 census. This function returns data in so-called “tidy” format (one row per observation), with the following columns:

  • GEOID. A 15-character geographic ID combining the 2-digit FIPS code for Maryland (“24”), the 3-digit FIPS Code for Howard County (“027”), a 6-digit identifier for an individual census tract within Howard County and a 4-digit identifier for the census block within the tract.
  • NAME. The name of the census block.
  • variable. References the census variable being retrieved, in this case “P001001”, the total population of the block.
  • value. The variable’s value. (There is no margin of error provided since the decennial census is a comprehensive count, not an estimate constructed from a sample.)

I retain only the GEOID value and the population count, renamed to population for consistency throughout this analysis.

pop_block_2010 <- get_decennial(year = 2010, geography = "block", state = "MD", county = "Howard County", variables = "P001001") %>%
  select(GEOID, value) %>%
  rename(population = value)
## Getting data from the 2010 decennial Census

I then summarize this data at the block group level to get the total population for each census block group. The GEOID for a census block group is the first 12 characters of the GEOID for a block within the group: the 2-digit state FIPS code plus the 3-digit county FIPS code plus the 6-digit FIPS code for the census tract plus a single digit to identify the block group within the tract.

Again I retain only the GEOID and population count, as GEOID and population respectively, and add a new variable year for later use.

pop_cbg_2010 <- pop_block_2010 %>%
  mutate(cbg = substr(GEOID, 1, 12)) %>%
  group_by(cbg) %>%
  summarize(population = sum(population)) %>%
  rename(GEOID = cbg) %>%
  mutate(year = 2010)

I next use the tidycensus get_acs() function to retrieve population estimates from the 2017 ACS 5-year estimates for all census block groups in Howard County. This function also returns data in tidy format, with the following columns:

  • GEOID. An 12-character geographic ID for the block group, as discussed above.
  • NAME. The name of the census block group.
  • variable. References the ACS variable being retrieved, in this case “B01003_001”).
  • estimate. The ACS 5-year estimate for the variable’s value.
  • moe. The margin of error for the estimate.

Again I retain only the GEOID and population count, as GEOID and population respectively, and add the variable year.

pop_cbg_2017 <- get_acs(year = 2017, survey = "acs5", geography = "block group", state = "MD", county = "Howard County", variables = "B01003_001") %>%
  select(GEOID, estimate) %>%
  rename(population = estimate) %>%
  mutate(year = 2017)
## Getting data from the 2013-2017 5-year ACS

I also get population values for Howard County as a whole for both 2010 and 2017, using the same general procedure as above.

pop_county_2010 <- get_decennial(year = 2010, geography = "county", state = "MD", county = "Howard County", variables = "P001001") %>%
  select(GEOID, value) %>%
  rename(population = value) %>%
  mutate(year = 2010)
## Getting data from the 2010 decennial Census
pop_county_2017 <- get_acs(year = 2017, survey = "acs5", geography = "county", state = "MD", county = "Howard County", variables = "B01003_001") %>%
  select(GEOID, estimate) %>%
  rename(population = estimate) %>%
  mutate(year = 2017)
## Getting data from the 2013-2017 5-year ACS

In order to produce maps of population density I need geometry of the cartographic boundaries for all census block groups in Howard County as of 2010 and 2017.

I use the tigris function block_groups() to return the cartographic boundaries in the form of a table compatible with the sf package. The table as returned for 2017 has a variable GEOID that is identical to the variable of the same name returned by get_decennial() and get_acs(). However, the table as returned for 2010 has instead a variable GEO_ID that contains the prefix “1500000US” for the geographic ids. I remove this prefix in order to create a GEOID variable of the same format as for the 2017 boundaries.

In order to compute population densities I need the areas of the census block groups for both 2010 and 2017. For 2010 the area is returned from the block_groups() function as a single variable CENSUSAREA, with values expressed as square miles. I retain this as the variable area.

For 2017 this data is returned in the form of two variables, ALAND and AWATER representing land area and water area respectively, both expressed in units of square meters. I convert the ALAND value to square miles and save the result as a variable area to match that for the 2010 boundaries. (The U.S. Census Bureau recommends using the land area when calculating population density.)

options(tigris_use_cache = TRUE)

area_cbg_2010 <- block_groups(year = 2010, state = "MD", county = "Howard County", cb = TRUE, class = "sf", progress_bar = FALSE) %>%
  rename(area = CENSUSAREA) %>%
  mutate(GEOID = substr(GEO_ID, 10, 21)) %>%
  select(GEOID, area)

area_cbg_2017 <- block_groups(year = 2017, state = "MD", county = "Howard County", cb = TRUE, class = "sf", progress_bar = FALSE) %>%
  mutate(area = ALAND / 2589988) %>%
  select(GEOID, area)

I now want to verify that the census block group boundaries have not changed from 2010 to 2017. Rather than comparing the boundaries directly, I just look at the census block group areas for each year.

areas_2010 <- area_cbg_2010 %>%
  st_drop_geometry() %>%
  select(GEOID, area)

areas_2017 <- area_cbg_2017 %>%
  st_drop_geometry() %>%
  select(GEOID, area)

The geographic IDs for the census block groups should be the same in each. If not I stop the analysis, because a discrepancy indicates some sort of problem.

stopifnot(sort(areas_2010$GEOID) == sort(areas_2017$GEOID))

I then compare the areas for each census block group.

area_diffs <- areas_2010 %>%
  left_join(areas_2017, by = "GEOID") %>%
  mutate(diff_pct = 100 * (area.x - area.y) / area.x)

The maximum difference between any of the areas for 2010 vs. 2017 is about 3.4%. I conclude that any census block group boundary changes from 2010 to 2017 are minimal, and elect to use the 2017 boundaries for the analyses below.

I repeat this process to get the (land) area for Howard County overall. (Since the county’s boundaries have not changed from 2010 to 2017, it does not matter which year I get the data for.) Since the counties() function returns all counties for a given state (in this case Maryland), I filter the results to retain only the data for Howard County (FIPS code “24027”).

options(tigris_use_cache = TRUE)
area_county <- counties(year = 2017, state = "MD", cb = TRUE, class = "sf", progress_bar = FALSE) %>%
  filter(GEOID == "24027") %>%
  mutate(area = ALAND / 2589988) %>%
  select(GEOID, area)

This calculation produces a land area of 251 square miles for Howard County, which matches the value found in the Howard County Wikipedia article.

To help orient readers as to the locations of the census block groups, I also want any maps generated to also display major roads in Howard County that correspond in whole or in part to census block group boundaries. I use the tigris function roads() to return geometry for all Howard County roads.

Because I don’t need or want to display each and every Howard County road, I use the RTTYP and FULLNAME variables to filter the results to retain only major roads (interstate and U.S. highways) and significant minor roads (Maryland state routes and roads with “Parkway” in their names). I store the geometry for each in separate variables, so that I can plot them at different widths.

all_roads <- roads(state = "MD", county = "Howard County", class = "sf", progress_bar = FALSE)

major_roads_geo <- all_roads %>%
  filter(RTTYP %in% c("I", "U")) %>%
  st_geometry()

minor_roads_geo <- all_roads %>%
  filter(RTTYP == "S" | str_detect(FULLNAME, "Pkwy")) %>%
  st_geometry()

Analysis

I do analyses to answer the following questions:

Summary of Howard County population statistics

Before I look at population density, here’s a quick summary of Howard County population at the county, census block group, and census block level:

The total population of Howard County according to the 2010 census was 287,085. The estimated total population of Howard County according to the ACS 5-year estimates for 2017 was 312,495. (Recall that ACS 5-year estimates for 2017 actually reflect surveys from 2013 through 2017.) This represents population growth of about 9% over roughly half a decade or so.

As discussed previously, there are currently 154 census block groups in Howard County, and 4,845 census blocks. In 2010 the least populated census block group contained 645 people, while the most populated census block group contained 3,632 people. The average population for all block groups was 1,864 people and the median population for all block groups was 1,824 people.

In 2010 the smallest block group covered an area of 0.1 square miles (about 65 acres), while the largest block group covered an area of 13.42 square miles. The average block group area was 1.63 square miles, and the median was 0.67 square miles.

A typical census block group is thus comparable in both size and population to a traditional village or small town. It is large enough to be a recognizable “place”, but small enough to have its own identity distinct from that of other places in the county.

On the other hand, census blocks are very small: in 2010 the average population for a Howard County census block was only

59, and the median population was 0, indicating that over half of the census blocks in the county contained no people at all.

Howard County population density by census block group

I first want to look at the differing population densities among the various census block groups. I start by computing population densities for 2010, including an overall population density for the county itself.

To do this I take the area_cbg_2017 table (which contains the variable area specifying the land area of the block group in square miles), delete the geometry (which I don’t need in this context), and join to it the table pop_cbg_2010 (which contains the variable population). I then create a new variable pop_density containing the population density. The calculation for the county-level density is similar.

density_cbg_2010 <- area_cbg_2017 %>%
  st_drop_geometry() %>%
  left_join(pop_cbg_2010, by = "GEOID") %>%
  mutate(pop_density = population / area)

density_county_2010 <- area_county %>%
  st_drop_geometry() %>%
  left_join(pop_county_2010, by = "GEOID") %>%
  mutate(pop_density = population / area)

Population density in 2010 varied from a low of 151 people per square mile to a high of 15,181 people per square mile, a difference of two orders of magnitude. Overall population density for the county in 2010 was 1,144 people per square mile.

The following histogram shows the distribution of population density among the various block groups for the 2010 census.

density_cbg_2010 %>%
  ggplot(aes(x = pop_density)) +
  geom_histogram(boundary = 0, binwidth = 500) +
  geom_vline(
    mapping = aes(xintercept = median(pop_density)),
    linetype = "dashed"
  ) +
  xlab("Population Density Per Square Mile") +
  ylab("Number of Census Block Groups") +
  scale_x_continuous(breaks = seq(0, 22000, 2000)) +
  labs(
    title = "2010 Howard County Population Density Distribution",
    subtitle = "Based on 2010 Census Values for Census Block Groups",
    caption = paste0(
      "Data sources:",
      "\n  U.S. Census Bureau, 2010 Decennial Census, Table P1",
      "\n  U.S. Census Bureau, Cartographic boundaries",
      "\nCreated using the tidyverse and tidycensus R packages"
    )
  ) +
  theme_minimal() +
  theme(axis.title.x = element_text(margin = margin(t = 10))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

The distribution is heavily skewed to the right, with only a few census block groups having a density over 6,000 people per square mile, and several block groups having population densities under 500 people per square mile. The median density, shown by the dashed line, was 2,419 people per square mile.

I now repeat the analysis for the 2017 ACS 5-year population estimates.

density_cbg_2017 <- area_cbg_2017 %>%
  st_drop_geometry() %>%
  left_join(pop_cbg_2017, by = "GEOID") %>%
  mutate(pop_density = population / area)

density_county_2017 <- area_county %>%
  st_drop_geometry() %>%
  left_join(pop_county_2017, by = "GEOID") %>%
  mutate(pop_density = population / area)

Population density in the 2013-2017 time frame varied from a low of 140 people per square mile to a high of 22,871 people per square mile, a wider range of densities than was present in 2010. Overall population density for the county in 2013-2017 was 1,245 people per square mile.

The following histogram shows the distribution of population density among the various block groups based on the 2017 ACS 5-year population estimates.

density_cbg_2017 %>%
  ggplot(aes(x = pop_density)) +
  geom_histogram(boundary = 0, binwidth = 500) +
  geom_vline(
    mapping = aes(xintercept = median(pop_density)),
    linetype = "dashed"
  ) +
  xlab("Population Density Per Square Mile") +
  ylab("Number of Census Block Groups") +
  scale_x_continuous(breaks = seq(0, 22000, 2000)) +
  labs(
    title = "2013-2017 Howard County Population Density Distribution",
    subtitle = "Based on 2017 ACS 5-Year Estimates for Census Block Groups",
    caption = paste0(
      "Data sources:",
      "\n  U.S. Census Bureau, 2017 American Community Survey, 5-Year Estimates, Table B01003",
      "\n  U.S. Census Bureau, Cartographic boundaries",
      "\nCreated using the tidyverse and tidycensus R packages"
    )
  ) +
  theme_minimal() +
  theme(axis.title.x = element_text(margin = margin(t = 10))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

The distribution is also heavily skewed to the right, with a median density of 2,560 people per square mile, a little increased from 2010.

The final histogram shows both distributions on a single graph, by combining the tables for 2010 and 2017, converting the year variable into a categorical variable (“factor”), and then plotting the bars of the histogram side by side using year to plot them in different colors.

bind_rows(density_cbg_2010, density_cbg_2017) %>%
  mutate(year = as.factor(year)) %>%
  ggplot(aes(x = pop_density, fill = year)) +
  geom_histogram(boundary = 0, binwidth = 500, position = "dodge") +
  xlab("Population Density Per Square Mile") +
  ylab("Number of Census Block Groups") +
  scale_x_continuous(breaks = seq(0, 22000, 2000)) +
  scale_fill_viridis(option = "viridis", name = "Year", discrete = TRUE) +
  labs(
    title = "2010 and 2017 Howard County Population Density Distribution",
    subtitle = "Based on 2010 Census and 2017 ACS 5-Year Estimates for Census Block Groups",
    caption = paste0(
      "Data sources:",
      "\n  U.S. Census Bureau, 2010 Decennial Census, Table P1",
      "\n  U.S. Census Bureau, American Community Survey, Table B01003",
      "\nCreated using the tidyverse and tidycensus R packages"
    )
  ) +
  theme_minimal() +
  theme(axis.title.x = element_text(margin = margin(t = 10))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

You can see the general rightward shift in the distribution in 2013-2017 compared to 2010, with more census blocks at higher densities and higher maximum densities.

Mapping Howard County population density variation

I next show the same data in the form of two maps, one for 2010 and one for 2013-2017. Because the population densities span such a wide range of values, I use a logarithmic scale when choosing the colors, based on powers of 10. I also use the same limits for the scale in both maps, to ensure that the color values are comparable between the maps.

Note that I don’t bother trying to label the various Howard County roads because it would be more work and make the map more cluttered—and in any case the intended audience of Howard County residents should be able to identify most if not all of the roads by sight.

area_cbg_2017 %>%
  left_join(density_cbg_2010, by = "GEOID") %>%
  ggplot(aes(fill = pop_density, geometry = geometry)) +
  geom_sf(size = 0) +
  geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
  geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
  scale_fill_viridis(trans = "log10", option = "plasma", name = "Population\nPer Sq Mi", limits = c(100, 25000), breaks = c(100, 1000, 10000), labels = comma) +
  labs(title="2010 Howard County Population Density Variations",
       subtitle = "Population Density Per Square Mile for Census Block Groups",
       caption = paste0(
         "Data sources:",
         "\n  U.S. Census Bureau, 2010 Decennial Census, Table P1",
         "\n  U.S. Census Bureau, Cartographic boundaries",
         "\nCreated using the tidyverse and tidycensus R packages"
       )
  ) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
  theme(axis.ticks = element_blank(), axis.text = element_blank()) +
  theme(panel.background = element_blank())