In this document I do some basic analyses on the population density of Columbia, Maryland, census block groups over time. (A census block group is a census geography below the level of census tracts and above the level of census blocks.) I also look at density at the level of census blocks for 2010, the most recent year for which I have data at that level.
For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.
I use the following packages for the following purposes:
library(tidyverse)
library(tidycensus)
library(tigris)
library(sf)
library(viridis)
library(scales)
library(knitr)
The tidycensus and tigris packages require a Census API key. I previously obtained such a key and used the tidycensus function census_api_key()
with the install = TRUE
option to store it for use by the functions included in the tidycensus and tigris packages.
I use data from the following sources; see the References section below for more information:
get_decennial()
function.get_acs()
function.block_groups()
function.blocks()
function.roads()
function.My goal is to look at population density in Columbia, Maryland, at a relatively fine-grained level. (For example, are there particular neighborhoods in Columbia that have relatively low or relatively high population density?) In my previous analysis of population density in Howard County I looked at data at the level of census block groups, and will continue that practice for this analysis.
A census block group is a geography that is one level below a census tract and one level above a census block, the smallest geographic subdivision used by the U.S. Census Bureau. While the 2010 census contains population counts for each census block, the American Community Survey data contains population estimates only down to the level of census block groups—and even then these estimates are available only in the 5-year estimates and not in the 1-year estimates.
My primary interest is in population, so the only data I need is total population at the census block group level. In theory I could retrieve this data for 2010 by calling the tidycensus get_decennial()
function and asking for the “P001P001” variable for the census block group geography. However, there is an apparent bug in the Census API that prevents this from working.
Instead I use the get_decennial()
function to retrieve population counts for all census blocks in Howard County for the 2010 census. This function returns data in so-called “tidy” format (one row per observation), with the following columns:
GEOID
. A 15-character geographic ID as discussed above.NAME
. The name of the census block.variable
. References the census variable being retrieved, in this case “P001001”, the total population of the block.value
. The variable’s value. (There is no margin of error provided since the decennial census is a comprehensive count, not an estimate constructed from a sample.)I retain only the GEOID
value and the population count, renamed to population
for consistency throughout this analysis.
hoco_pop_block_2010 <- get_decennial(year = 2010, geography = "block", state = "MD", county = "Howard County", variables = "P001001") %>%
select(GEOID, value) %>%
rename(population = value)
## Getting data from the 2010 decennial Census
I then summarize this data at the block group level to get the total population for each census block group. Again I retain only the GEOID and population count, as GEOID
and population
respectively, and add a new variable year
for later use.
hoco_pop_cbg_2010 <- hoco_pop_block_2010 %>%
mutate(cbg = substr(GEOID, 1, 12)) %>%
group_by(cbg) %>%
summarize(population = sum(population)) %>%
rename(GEOID = cbg) %>%
mutate(year = 2010)
I next use the tidycensus get_acs()
function to retrieve population estimates from the 2017 ACS 5-year estimates for all census block groups in Howard County. This function also returns data in tidy format, with the following columns:
GEOID
. An 12-character geographic ID for the block group, as discussed above.NAME
. The name of the census block group.variable
. References the ACS variable being retrieved, in this case “B01003_001”).estimate
. The ACS 5-year estimate for the variable’s value.moe
. The margin of error for the estimate.Again I retain only the GEOID and population count, as GEOID
and population
respectively, and add the variable year
.
hoco_pop_cbg_2017 <- get_acs(year = 2017, survey = "acs5", geography = "block group", state = "MD", county = "Howard County", variables = "B01003_001") %>%
select(GEOID, estimate) %>%
rename(population = estimate) %>%
mutate(year = 2017)
## Getting data from the 2013-2017 5-year ACS
As discussed above, the hoco_pop_cbg_2010
and hoco_pop_cbg_2017
tables contain data for all census block groups in Howard County. In the sections below I subset these tables to retain data only for census block groups associated with Columbia.
In order to produce maps of population density I need the geometry of the cartographic boundaries for all census block groups associated with Columbia as of 2010 and 2017, as well as the associated (land) areas for each block group.
I use the tigris function block_groups()
to return the cartographic boundaries for all block groups in Howard County in the form of a table compatible with the sf package. The table as returned for 2017 has a variable GEOID
that is identical to the variable of the same name returned by get_decennial()
and get_acs()
. However, the table as returned for 2010 has instead a variable GEO_ID
that contains the prefix “1500000US” for the geographic ids. I remove this prefix in order to create a GEOID
variable of the same format as for the 2017 boundaries.
For 2010 the area is returned from the block_groups()
function as a single variable CENSUSAREA
, with values expressed as square miles. I retain this as the variable area
.
For 2017 this data is returned in the form of two variables, ALAND
and AWATER
representing land area and water area respectively, both expressed in units of square meters. I convert the ALAND
value to square miles and save the result as a variable area
to match that for the 2010 boundaries. (The U.S. Census Bureau recommends using the land area when calculating population density.)
options(tigris_use_cache = TRUE)
hoco_area_cbg_2010 <- block_groups(year = 2010, state = "MD", county = "Howard County", cb = TRUE, class = "sf", progress_bar = FALSE) %>%
rename(area = CENSUSAREA) %>%
mutate(GEOID = substr(GEO_ID, 10, 21)) %>%
select(GEOID, area)
hoco_area_cbg_2017 <- block_groups(year = 2017, state = "MD", county = "Howard County", cb = TRUE, class = "sf", progress_bar = FALSE) %>%
mutate(area = ALAND / 2589988) %>%
select(GEOID, area)
In order to analyze data for Columbia specifically, I need to know which census block groups are associated with Columbia. I start with the so-called “block assignment files” published by the U.S. Census Bureau. These relate census blocks to the higher-level geographic entities with which they’re associated. In this case we want to know which census blocks are part of Columbia, which in census terms is a “census designated place” or CDP—basically a population center that is not an incorporated city.
(In this respect it’s important to note that “Columbia the CDP” is not 100% identical with “Columbia the planned community”. There have always been “outparcels” within Columbia that were not acquired by The Rouse Company and whose owners do not pay Columbia Association dues. There may also be other differences between Columbia’s boundaries as a CDP and as a planned community, based on U.S. Census Bureau decisions on what to include in the CDP.)
To do this I download the archive of block assignment files for Maryland from the 2010 census, unzip the archive, and read in the CSV file that maps census blocks to incorporated entities or CDPs.
Census blocks are uniquely identified by a 15-character ID that combines the 2-digit FIPS code for the state (“24” for Maryland), the 3-digit FIPS code identifying the county or county-equivalent within the state (“027” for Howard County), a 6-digit identifier for an individual census tract within the county or county-equivalent, and a 4-digit identifier for the census block within the tract.
Incorporated entities and CDPs are identified by a 5-digit code that identifies the place within the state. For Columbia that code is “19125”. (The full FIPS code for Columbia, “2419125”, adds the Maryland state code as a prefix.) Areas outside a CDP have missing values, which I replace with the string “none”.
if (!file.exists("BlockAssign_ST24_MD.zip")) {
download.file("https://www2.census.gov/geo/docs/maps-data/data/baf/BlockAssign_ST24_MD.zip", "BlockAssign_ST24_MD.zip", method = "auto")
}
unzip("BlockAssign_ST24_MD.zip", overwrite = TRUE)
md_block_places <- read_csv("BlockAssign_ST24_MD_INCPLACE_CDP.txt", col_types = "cc") %>%
rename(GEOID = BLOCKID) %>%
replace_na(list(PLACEFP = "none"))
The md_block_places
table contains entries for every census block in Maryland. I am interested only in Columbia, and only in census block groups, not individual census blocks. Unfortunately, it is possible for a given census block group to contain census blocks associated with two different places. This in fact the case for Howard County: for example, some census blocks in census block group “240276023022” are in Columbia (place code “19125”), some blocks are in Ellicott City (place code “26000”), and some blocks are not part of any CDP at all.
Which census block groups should I consider to be part of Columbia? I could consider only census block groups for which all census blocks are part of the Columbia CDP. However, this would risk omitting census block groups whose areas primarily fall within the Columbia CDP. Alternatively, I could consider any census block group that contains at least one census block associated with Columbia. However, this would risk including census block groups with only a small portion of their area actually part of Columbia.
For purposes of this analysis I consider a census block group to be part of Columbia only if at least 50% of its (land) area is included in the Columbia CDP. Since a given census block is either in Columbia or not, I can make this determination for a given census block group by comparing the total area of the census block group with the sum of the areas of all census blocks in the group that are associated with the Columbia CDP.
To do this I first need the areas for all census blocks in Howard County. I can get these by using the tigris functions blocks()
. For consistency with other tables I rename the GEOID10
variable to GEOID
and create a new variable area
containing the area of each block in square miles.
hoco_area_blocks_2010 <- blocks(year = 2010, state = "MD", county = "Howard County", class = "sf", progress_bar = FALSE) %>%
rename(GEOID = GEOID10) %>%
mutate(area = ALAND10 / 2589988)
I now determine which census block groups are more than 50% in Columbia by area, as follows:
hoco_area_blocks_2010
containing areas for all Howard County census blocks, and drop any geometry associated with the blocks.md_block_places
containing the place ID associated with each census block, using the variable GEOID
common to both tables. Since I use the left_join
function the resulting table contains only rows corresponding to census blocks in Howard County. Each row contains (among other things) the GEOID
value for a given block, the place associated with the block (in the PLACEFP
variable), and the area of the block.PLACEFP
(“19125”, “26000”, etc.) into their own columns, with the values for the columns being the areas of the census blocks. Since any given census block is associated with only one place, for any given row only one column will include a valid area.GEOID
variable.na.rm
parameter to ignore missing values. The resulting table has one row per census block group and one column for each Howard County place, with the value for that column being the sum of the areas of the blocks associated with that place.rowSums()
function to create a new variable area
containing the total land area of each census block group, summing the areas contained in all columns except the first (which contains the GEOID
value).GEOID
and area
variables.The resulting table cbgs
contains all census block groups considered as part of Columbia for this analysis, along with the total areas for those census block groups (which are not necessarily the areas within Columbia proper).
cbgs <- hoco_area_blocks_2010 %>%
st_drop_geometry() %>%
left_join(md_block_places, by = "GEOID") %>%
select(GEOID, PLACEFP, area) %>%
spread(PLACEFP, area) %>%
mutate(GEOID = substr(GEOID, 1, 12)) %>%
group_by(GEOID) %>%
summarize_all(sum, na.rm = TRUE) %>%
mutate(area = rowSums(.[2:length(.)])) %>%
filter(`19125` / area >= 0.5) %>%
select(GEOID, area)
There are 54 census block groups considered part of Columbia, with a total area of 31.1 square miles. Compare this to the figure of 31.9 square miles of land area given in the Wikipedia article on Columbia. The difference likely represents census block groups for which only a small portion of the groups’ area is in Columbia, block groups that I’ve omitted from this analysis.
I now have a list of census block groups associated with the Columbia CDP. I therefore join the hoco_pop_cbg_2010
table with the cbgs
table to produce a table of population counts for the Columbia-associated census block groups only. (The right_join()
function includes only rows for which the GEOID
values exist in the second table, namely cbgs
.) I do the same for the 2017 population estimates. In both case I remove the area
variable (carried over from the cbgs
table) because it’s not needed.
pop_cbg_2010 <- hoco_pop_cbg_2010 %>%
right_join(cbgs, by = "GEOID") %>%
select(-area)
pop_cbg_2017 <- hoco_pop_cbg_2017 %>%
right_join(cbgs, by = "GEOID") %>%
select(-area)
Again, because I am interested only in the Columbia census block groups I follow a similar procedure as above to produce tables of areas and boundaries restricted to the Columbia CDP.
area_cbg_2010 <- hoco_area_cbg_2010 %>%
right_join(cbgs, by = "GEOID")
area_cbg_2017 <- hoco_area_cbg_2017 %>%
right_join(cbgs, by = "GEOID")
The resulting tables will have two variables representing area, one from the original hoco_area_cbg_2010
and hoco_area_cbg_2017
tables (area.x
) and one from the cbgs
table (area.y
). They should be the same even though they come from different sources. As a confirmation of the calculations above I check to make sure that the areas don’t differ by more than 1% (0.01), and stop the analysis if they do.
stopifnot(max(abs((area_cbg_2010$area.x - area_cbg_2010$area.y) / area_cbg_2010$area.x)) < 0.01)
stopifnot(max(abs((area_cbg_2017$area.x - area_cbg_2017$area.y) / area_cbg_2017$area.x)) < 0.01)
I then retain only one variable for area in both tables, naming it area
.
area_cbg_2010 <- area_cbg_2010 %>%
rename(area = area.x) %>%
select(-area.y)
area_cbg_2017 <- area_cbg_2017 %>%
rename(area = area.x) %>%
select(-area.y)
Now that I have both population counts (for 2010) or estimates (for 2013-2017) along with areas for all of the block groups (wholly or mostly) associated with Columbia, I want to calculate the total population and total area for all Columbia block groups combined for both 2010 and 2017.
pop_col_2010 <- pop_cbg_2010 %>%
summarize(population = sum(population)) %>%
mutate(GEOID = "2419125")
pop_col_2017 <- pop_cbg_2017 %>%
summarize(population = sum(population)) %>%
mutate(GEOID = "2419125")
area_col_2010 <- area_cbg_2010 %>%
summarize(area = sum(area)) %>%
mutate(GEOID = "2419125")
area_col_2017 <- area_cbg_2017 %>%
summarize(area = sum(area)) %>%
mutate(GEOID = "2419125")
Moving down one level from census block groups, for 2010 I have population and area data for individual census blocks within Howard County. In order to map population density at the census block level, I create a table pop_block_2010
containing 2010 population counts for each census block associated with the Columbia CDP.
pop_block_2010 <- hoco_pop_block_2010 %>%
left_join(md_block_places, by = "GEOID") %>%
filter(PLACEFP == "19125") %>%
select(-PLACEFP)
I create another table area_block_2010
containing areas and boundaries for census blocks in the Columbia CDP.
area_block_2010 <- hoco_area_blocks_2010 %>%
left_join(md_block_places, by = "GEOID") %>%
filter(PLACEFP == "19125") %>%
select(-PLACEFP)
To help orient readers as to the locations of the census block groups, I also want any maps generated to also display major roads in Columbia that correspond in whole or in part to census block group boundaries. I use the tigris function roads()
to return geometry for all Howard County roads.
Because I don’t need or want to display each and every Columbia road, I use the RTTYP
and FULLNAME
variables to filter the results to retain only major roads (interstate and U.S. highways) and significant minor roads (Maryland state routes, roads with “Parkway” in their names, and other significant roads in Columbia). I store the geometry for each in separate variables, so that I can plot them at different widths.
hoco_roads <- roads(state = "MD", county = "Howard County", class = "sf", progress_bar = FALSE)
hoco_major_roads_geo <- hoco_roads %>%
filter(RTTYP %in% c("I", "U")) %>%
st_geometry() %>%
st_union()
key_roads <- c(
"Berger Rd",
"Cedar Ln",
"Columbia Rd",
"Columbia Gateway Dr",
"Cradlerock Way",
"Dobbin Rd",
"Eden Brook Dr",
"Eliots Oak Rd",
"Freetown Rd",
"Gerwig Ln",
"Great Star Dr",
"Grey Rock Rd",
"Guilford Rd",
"Harmel Dr",
"Harpers Farm Rd",
"Hickory Ridge Rd",
"High Tor Hill",
"Homespun Dr",
"Kilimanjaro Rd",
"Martin Rd",
"McGaw Rd",
"Murray Hill Rd",
"Oakland Mills Rd",
"Old Columbia Rd",
"Owen Brown Rd",
"Phelps Luck Dr",
"Robert Fulton Dr",
"Rt 29 Sb To South Entrance Rd Rmp",
"S Entrance Rd",
"Saddle Dr",
"Shaker Dr",
"Stevens Forest Rd",
"Ten Mills Rd",
"Trotter Rd",
"Twin Rivers Rd",
"Vollmerhausen Rd",
"Watch Chain Way",
"White Acres Rd"
)
hoco_minor_roads_geo <- hoco_roads %>%
filter(RTTYP == "S" | str_detect(FULLNAME, "Pkwy") | FULLNAME %in% key_roads) %>%
st_geometry() %>%
st_union()
I do analyses to answer the following questions:
Before I look at population density, here’s a quick summary of Columbia population at the CDP, census block group, and census block level:
The total 2010 population of census block groups wholly in Columbia was 99,085. The estimated total population of Columbia-associated block groups according to the ACS 5-year estimates for 2017 was 102,798. (Recall that ACS 5-year estimates for 2017 actually reflect surveys from 2013 through 2017.) This represents population growth of about 4% over roughly half a decade or so.
As discussed previously, there are currently 54 census block groups wholly or mostly in Columbia. In 2010 the least populated census block group contained 645 people, while the most populated census block group contained 3,632 people. The average population for all block groups was 1,835 people and the median population for all block groups was 1,824 people.
In 2010 the smallest block group covered an area of 0.12 square miles (about 79 acres), while the largest block group covered an area of 3.11 square miles. The average block group area was 0.58 square miles, and the median was 0.44 square miles.
A typical census block group is thus comparable in both size and population to a traditional village or small town. It is large enough to be a recognizable “place”, but small enough to have its own identity distinct from that of other places in the county. However a census block is smaller than a Columbia village: on average a Columbia village contains about 5 or 6 census block groups.
There are 1,605 census blocks in the Columbia CDP, over half of which contained no people at all in the 2010 census; the most populated census block contained 2,696 people. The average population for all blocks was 62 people.
In 2010 there were several census blocks that contained no land at all (and hence had zero land area), while the largest block covered an area of 0.84 square miles (about 535 acres). The average census block area was 0.02 square miles (about 13 acres).
I first want to look at the differing population densities among the various census block groups. I start by computing population densities for 2010, including an overall population density for Columbia itself.
To do this I take the area_cbg_2017
table (which contains the variable area
specifying the land area of the block groups in square miles), delete the geometry (which I don’t need in this context), and join to it the table pop_cbg_2010
(which contains the variable population
). I then create a new variable pop_density
containing the population density. I also calculate the population density for Columbia as a whole.
density_cbg_2010 <- area_cbg_2010 %>%
st_drop_geometry() %>%
left_join(pop_cbg_2010, by = "GEOID") %>%
mutate(pop_density = population / area)
density_col_2010 <- pop_col_2010 %>%
left_join(area_col_2010, by = "GEOID") %>%
mutate(pop_density = population / area)
Population density in 2010 varied from a low of 761 people per square mile to a high of 13,285 people per square mile, a difference of two orders of magnitude. Overall population density for Columbia in 2010 was 3,187 people per square mile.
The following histogram shows the distribution of population density among the various block groups for the 2010 census. For consistency I use the same limits and break points as in the histogram shown in the analysis for 2017 below.
density_cbg_2010 %>%
ggplot(aes(x = pop_density)) +
geom_histogram(boundary = 0, binwidth = 500) +
geom_vline(
mapping = aes(xintercept = median(pop_density)),
linetype = "dashed"
) +
xlab("Population Density Per Square Mile") +
ylab("Number of Census Block Groups") +
scale_x_continuous(breaks = seq(0, 22000, 2000), limits = c(0, 22000)) +
labs(
title = "2010 Core Columbia Population Density Distribution",
subtitle = "Based on 2010 Census Values for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
The distribution is heavily skewed to the right, with only a few census block groups having a density over 6,000 people per square mile, and several block groups having population densities under 500 people per square mile. The median density, shown by the dashed line, was 3,663 people per square mile.
I now repeat the analysis for the 2017 ACS 5-year population estimates.
density_cbg_2017 <- area_cbg_2017 %>%
st_drop_geometry() %>%
left_join(pop_cbg_2017, by = "GEOID") %>%
mutate(pop_density = population / area)
density_col_2017 <- pop_col_2017 %>%
left_join(area_col_2017, by = "GEOID") %>%
mutate(pop_density = population / area)
Population density in the 2013-2017 time frame varied from a low of 805 people per square mile to a high of 22,376 people per square mile, a wider range of densities than was present in 2010. Overall population density for the core Columbia CDP in 2013-2017 was 3,307 people per square mile.
The following histogram shows the distribution of population density among the various block groups based on the 2017 ACS 5-year population estimates.
density_cbg_2017 %>%
ggplot(aes(x = pop_density)) +
geom_histogram(boundary = 0, binwidth = 500) +
geom_vline(
mapping = aes(xintercept = median(pop_density)),
linetype = "dashed"
) +
xlab("Population Density Per Square Mile") +
ylab("Number of Census Block Groups") +
scale_x_continuous(breaks = seq(0, 22000, 2000)) +
labs(
title = "2013-2017 Core Columbia Population Density Distribution",
subtitle = "Based on 2017 ACS 5-Year Estimates for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey, 5-Year Estimates, Table B01003",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
The distribution is also heavily skewed to the right, with a median density of 3,596 people per square mile, a little increased from 2010.
The final histogram shows both distributions on a single graph, by combining the tables for 2010 and 2017, converting the year
variable into a categorical variable (“factor”), and then plotting the bars of the histogram side by side using year
to plot them in different colors.
bind_rows(density_cbg_2010, density_cbg_2017) %>%
mutate(year = as.factor(year)) %>%
ggplot(aes(x = pop_density, fill = year)) +
geom_histogram(boundary = 0, binwidth = 500, position = "dodge") +
xlab("Population Density Per Square Mile") +
ylab("Number of Census Block Groups") +
scale_x_continuous(breaks = seq(0, 22000, 2000)) +
scale_fill_viridis(option = "viridis", name = "Year", discrete = TRUE) +
labs(
title = "2010 and 2013-2017 Core Columbia Population Density Distribution",
subtitle = "Based on 2010 Census and 2017 ACS 5-Year Estimates for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B01003",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
You can see the general rightward shift in the distribution in 2013-2017 compared to 2010, with more census blocks at higher densities and higher maximum densities.
I next show the same data in the form of two maps, one for 2010 and one for 2013-2017. Because the population densities span such a wide range of values, I use a logarithmic scale when choosing the colors, based on powers of 10. I also use the same limits for the scale in both maps, to ensure that the color values are comparable between the maps.
Note that I don’t bother trying to label the various roads because it would be more work and make the map more cluttered—and in any case the intended audience of Howard County residents should be able to identify most if not all of the roads by sight. I compute the boundary of the area covered by all the census block groups, and then intersect that with the roads geometry to show only roads lying within Columbia.
col_boundary <- st_union(area_cbg_2010)
major_roads_geo <- st_intersection(col_boundary, hoco_major_roads_geo)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
minor_roads_geo <- st_intersection(col_boundary, hoco_minor_roads_geo)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
area_cbg_2010 %>%
left_join(density_cbg_2010, by = "GEOID") %>%
ggplot(aes(fill = pop_density, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
scale_fill_viridis(trans = "log10", option = "plasma", name = "Population\nPer Sq Mi", limits = c(100, 25000), breaks = c(100, 1000, 10000), labels = comma) +
labs(title="2010 Core Columbia Population Density Variations",
subtitle = "Population Density Per Square Mile for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
The areas of highest population density appear to be in Owen Brown and Wilde Lake villages.
I now create a comparable map for the 2013-2017 timeframe.
area_cbg_2017 %>%
left_join(density_cbg_2017, by = "GEOID") %>%
ggplot(aes(fill = pop_density, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 0.8, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.4, fill = NA) +
scale_fill_viridis(trans = "log10", option = "plasma", name = "Population\nPer Sq Mi", limits = c(100, 25000), breaks = c(100, 1000, 10000), labels = comma) +
labs(title="2013-2017 Core Columbia Population Density Variations",
subtitle = "Population Density Per Square Mile for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B01003",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
The overall picture is similar to that for 2010—which is only to be expected since only a few years separate the two maps.
I also want to look at the differing population densities among the various census blocks in 2010.
I first create a table density_block_2010
containing per-block population densities. This table is analogous to the previous table density_cbg_2010
and is created in a similar way.
density_block_2010 <- area_block_2010 %>%
st_drop_geometry() %>%
left_join(pop_block_2010, by = "GEOID") %>%
mutate(pop_density = ifelse(area > 0, population / area, 0))
Population density at the block level in 2010 varied from a low of 0 people per square mile (since most blocks contain no people at all) to a high of 610,379 people per square mile, a difference of almost six orders of magnitude. The average density for a block was 4,464.
However, the block with highest density is an outlier, as shown by the following table of the top 20 census blocks by density.
density_block_2010 %>%
select(GEOID, area, population, pop_density) %>%
arrange(desc(pop_density)) %>%
head(20) %>%
kable()
GEOID | area | population | pop_density |
---|---|---|---|
240276067051016 | 0.0001212 | 74 | 610379.34 |
240276054023029 | 0.0016290 | 246 | 151016.13 |
240276068031043 | 0.0015931 | 211 | 132449.70 |
240276066034002 | 0.0001108 | 11 | 99267.83 |
240276054024017 | 0.0124495 | 1175 | 94381.46 |
240276067062003 | 0.0000965 | 9 | 93239.57 |
240276066061001 | 0.0001135 | 10 | 88094.83 |
240276054013007 | 0.0043958 | 370 | 84171.77 |
240276055021007 | 0.0003571 | 27 | 75599.65 |
240276055032004 | 0.0003402 | 21 | 61736.38 |
240276066061020 | 0.0060695 | 348 | 57335.61 |
240276054023008 | 0.0012537 | 71 | 56633.55 |
240276066062026 | 0.0001139 | 6 | 52677.72 |
240276066072009 | 0.0006181 | 31 | 50149.67 |
240276067051001 | 0.0001004 | 5 | 49807.46 |
240276055023015 | 0.0001861 | 9 | 48360.77 |
240276055041046 | 0.0001656 | 8 | 48298.14 |
240276023023045 | 0.0029668 | 139 | 46851.68 |
240276067043005 | 0.0002884 | 13 | 45073.42 |
240276056021044 | 0.0005795 | 26 | 44863.22 |
Since there is only one census block with a density over about 150,000 people per square miles, and several in the range from roughly 50-150,000 per square mile, I adjust the densities to cap them at a maximum value of 150,000. I also change any blocks with a density of zero to have a density of 1 person per square mile. This prevents errors when plotting the density on a logarithmic scale.
density_block_2010 <- density_block_2010 %>%
mutate(pop_density = ifelse(pop_density > 150000, 150000, pop_density)) %>%
mutate(pop_density = ifelse(pop_density < 1, 1, pop_density))
I next show the 2010 population density variations at the census block level. Because the population densities span such a wide range of values, I again use a logarithmic scale when choosing the colors, based on powers of 10. Note that the color values are not comparable between this map and the maps above, because this map spans a considerably wider range of densities.
area_block_2010 %>%
left_join(density_block_2010, by = "GEOID") %>%
ggplot(aes(fill = pop_density, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
# scale_fill_viridis(trans = "log10", option = "plasma", name = "Population\nPer Sq Mi", limits = c(0, 150000), breaks = c(1, 100, 1000, 10000, 100000)) +
scale_fill_viridis(trans = "log10", option = "plasma", name = "Population\nPer Sq Mi", breaks = c(1, 10, 100, 1000, 10000, 100000), labels = comma) +
labs(title="2010 Core Columbia Population Density Variations",
subtitle = "Population Density Per Square Mile for Census Blocks",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
This map is most notable for showing areas of Columbia that have very low population densities. These are typically areas like the Mall at Columbia or Columbia Gateway that have office, retail, or industrial buildings but no residential buildings. (Recall that this data is from 2010, before the construction of the apartment buildings next to the mall.)
Note also that some areas appear on this map that were not on the prior maps. These reflect census blocks that are in the Columbia CDP, but that are in census block groups that are mostly not in Columbia. The most notable example of this is the area east of U.S. 29 and north of MD 108, which is split between Columbia and Ellicott City.
I now take another look at the population density changes between 2010 and the 2013-2017 time frame, this time mapping the changes for individual census block groups. As in the previous analysis for Howard County as a whole, I choose to show percentage changes in density rather than absolute changes. The map itself is created in exactly the same way as before.
density_changes <- bind_rows(density_cbg_2010, density_cbg_2017) %>%
select(GEOID, pop_density, year) %>%
spread(year, pop_density) %>%
mutate(pct_change = 100 * (`2017` - `2010`) / `2010`) %>%
select(GEOID, pct_change)
upper_limit <- round(max(density_changes$pct_change) + 10, -1)
lower_limit <- round(min(density_changes$pct_change) - 10, -1)
area_cbg_2017 %>%
left_join(density_changes, by = "GEOID") %>%
ggplot(aes(fill = pct_change, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 0.8, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.4, fill = NA) +
scale_fill_viridis(option = "viridis", name = "% Change", limits = c(lower_limit, upper_limit), breaks = seq(lower_limit, upper_limit, 20)) +
labs(title="Changes in Core Columbia Population Density",
subtitle = "2017 5-Year Estimates vs. 2010 Values for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B101003",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(hjust = 0, margin = margin(t = 15))) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
Population values from the American Community Survey are estimates based on survey samples, with associated margins of error. For estimates at the census block group level the associated standard errors can be 10-30% or more of the estimates themselves.
As noted above, 5-year estimates identified as being for a certain year actually reflect surveys conducted over the prior five years. So the 2017 population estimates actually correspond to the 2013-2017 time frame.
For more information about the structure of geographic identifiers used for census data, see “Understanding Geographic Identifiers (GEOIDs)”.
To obtain the 2010 census and 2017 ACS data I used the tidycensus package, which provides an easy-to-use interface to U.S. Census Bureau data made available via a set of public APIs. As its name suggests, the tidycensus package is designed to be compatible with the tidyverse approach to representing and manipulating data.
To obtain the census block group and census block boundaries I used the tigris package, which is designed to accompany the tidycensus package and provides access to U.S. Census Bureau cartographic boundaries and TIGER/Line shapefiles.
Calculation of population density based on land area is discussed in the U.S. Census Bureau document 2010 Census Summary File 1, 2010 Census of Population and Housing, Technical Documentation. See Appendix A, “Geographic Terms and Concepts”, page A-22, “Population and Housing Unit Density”, and Appendix B, “Definitions of Subject Characteristics”, page B-23, “Area Measurement and Density”.
If you look closely at the maps above you’ll notice that there are gaps in several places between roads where there should be no gaps. I have not yet been able to identify where the geometry for those road sections is in the table returned by the roads()
function.
I used the following R environment in doing the analysis above:
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.23 scales_1.0.0 viridis_0.5.1
## [4] viridisLite_0.3.0 sf_0.7-7 tigris_0.8.2
## [7] tidycensus_0.9.2 forcats_0.4.0 stringr_1.4.0
## [10] dplyr_0.8.3 purrr_0.3.2 readr_1.3.1
## [13] tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0
## [16] tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 lubridate_1.7.4 lattice_0.20-38
## [4] class_7.3-15 assertthat_0.2.1 zeallot_0.1.0
## [7] digest_0.6.20 R6_2.4.0 cellranger_1.1.0
## [10] backports_1.1.4 evaluate_0.14 e1071_1.7-2
## [13] highr_0.8 httr_1.4.1 pillar_1.4.2
## [16] rlang_0.4.0 lazyeval_0.2.2 curl_4.0
## [19] readxl_1.3.1 uuid_0.1-2 rstudioapi_0.10
## [22] rmarkdown_1.14 labeling_0.3 rgdal_1.4-4
## [25] foreign_0.8-71 munsell_0.5.0 broom_0.5.2
## [28] compiler_3.6.0 modelr_0.1.4 xfun_0.8
## [31] pkgconfig_2.0.2 htmltools_0.3.6 tidyselect_0.2.5
## [34] gridExtra_2.3 crayon_1.3.4 withr_2.1.2
## [37] rappdirs_0.3.1 grid_3.6.0 nlme_3.1-139
## [40] jsonlite_1.6 gtable_0.3.0 DBI_1.0.0
## [43] magrittr_1.5 units_0.6-3 KernSmooth_2.23-15
## [46] cli_1.1.0 stringi_1.4.3 sp_1.3-1
## [49] xml2_1.2.1 vctrs_0.2.0 generics_0.0.2
## [52] tools_3.6.0 glue_1.3.1 hms_0.5.0
## [55] yaml_2.2.0 colorspace_1.4-1 maptools_0.9-5
## [58] classInt_0.4-1 rvest_0.3.4 haven_2.1.1
You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.