In this document I do some basic analyses on the population density of Ellicott City, Maryland, census block groups over time. (A census block group is a census geography below the level of census tracts and above the level of census blocks.) I also look at density at the level of census blocks for 2010, the most recent year for which I have data at that level.
For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.
I use the following packages for the following purposes:
library(tidyverse)
library(tidycensus)
library(tigris)
library(sf)
library(viridis)
library(scales)
library(knitr)
The tidycensus and tigris packages require a Census API key. I previously obtained such a key and used the tidycensus function census_api_key()
with the install = TRUE
option to store it for use by the functions included in the tidycensus and tigris packages.
I use data from the following sources; see the References section below for more information:
get_decennial()
function.get_acs()
function.block_groups()
function.blocks()
function.roads()
function.My goal is to look at population density in Ellicott City, Maryland, at a relatively fine-grained level. (For example, are there particular neighborhoods in Ellicott City that have relatively low or relatively high population density?) In my previous analysis of population density in Howard County I looked at data at the level of census block groups, and will continue that practice for this analysis.
A census block group is a geography that is one level below a census tract and one level above a census block, the smallest geographic subdivision used by the U.S. Census Bureau. While the 2010 census contains population counts for each census block, the American Community Survey data contains population estimates only down to the level of census block groups—and even then these estimates are available only in the 5-year estimates and not in the 1-year estimates.
My primary interest is in population, so the only data I need is total population at the census block group level. In theory I could retrieve this data for 2010 by calling the tidycensus get_decennial()
function and asking for the “P001P001” variable for the census block group geography. However, there is an apparent bug in the Census API that prevents this from working.
Instead I use the get_decennial()
function to retrieve population counts for all census blocks in Howard County for the 2010 census. This function returns data in so-called “tidy” format (one row per observation), with the following columns:
GEOID
. A 15-character geographic identifier as discussed above.NAME
. The name of the census block.variable
. References the census variable being retrieved, in this case “P001001”, the total population of the block.value
. The variable’s value. (There is no margin of error provided since the decennial census is a comprehensive count, not an estimate constructed from a sample.)I retain only the GEOID
value and the population count, renamed to population
for consistency throughout this analysis.
hoco_pop_block_2010 <- get_decennial(
year = 2010,
geography = "block",
state = "MD",
county = "Howard County",
variables = "P001001"
) %>%
select(GEOID, value) %>%
rename(population = value)
## Getting data from the 2010 decennial Census
I then summarize this data at the block group level to get the total population for each census block group. Again I retain only the geographic identifier and population count, as GEOID
and population
respectively, and add a new variable year
for later use.
hoco_pop_cbg_2010 <- hoco_pop_block_2010 %>%
mutate(cbg = substr(GEOID, 1, 12)) %>%
group_by(cbg) %>%
summarize(population = sum(population)) %>%
rename(GEOID = cbg) %>%
mutate(year = 2010)
I next use the tidycensus get_acs()
function to retrieve population estimates from the 2017 ACS 5-year estimates for all census block groups in Howard County. This function also returns data in tidy format, with the following columns:
GEOID
. An 12-character geographic identifier for the block group, as discussed above.NAME
. The name of the census block group.variable
. References the ACS variable being retrieved, in this case “B01003_001”).estimate
. The ACS 5-year estimate for the variable’s value.moe
. The margin of error for the estimate.Again I retain only the geographic identifier and population count, as GEOID
and population
respectively, and add the variable year
.
hoco_pop_cbg_2017 <- get_acs(
year = 2017,
survey = "acs5",
geography = "block group",
state = "MD",
county = "Howard County",
variables = "B01003_001"
) %>%
select(GEOID, estimate) %>%
rename(population = estimate) %>%
mutate(year = 2017)
## Getting data from the 2013-2017 5-year ACS
As discussed above, the hoco_pop_cbg_2010
and hoco_pop_cbg_2017
tables contain data for all census block groups in Howard County. In the sections below I subset these tables to retain data only for census block groups associated with Ellicott City.
In order to produce maps of population density I need the geometry of the cartographic boundaries for all census block groups associated with Ellicott City as of 2010 and 2017, as well as the associated (land) areas for each block group.
I use the tigris function block_groups()
to return the cartographic boundaries for all block groups in Howard County in the form of a table compatible with the sf package. The table as returned for 2017 has a variable GEOID
that is identical to the variable of the same name returned by get_decennial()
and get_acs()
. However, the table as returned for 2010 has instead a variable GEO_ID
that contains the prefix “1500000US” for the geographic identifiers. I remove this prefix in order to create a GEOID
variable of the same format as for the 2017 boundaries.
For 2010 the area is returned from the block_groups()
function as a single variable CENSUSAREA
, with values expressed as square miles. I retain this as the variable area
.
For 2017 this data is returned in the form of two variables, ALAND
and AWATER
representing land area and water area respectively, both expressed in units of square meters. I convert the ALAND
value to square miles and save the result as a variable area
to match that for the 2010 boundaries. (The U.S. Census Bureau recommends using the land area when calculating population density.)
options(tigris_use_cache = TRUE)
hoco_area_cbg_2010 <- block_groups(
year = 2010,
state = "MD",
county = "Howard County",
cb = TRUE,
class = "sf",
progress_bar = FALSE
) %>%
rename(area = CENSUSAREA) %>%
mutate(GEOID = substr(GEO_ID, 10, 21)) %>%
select(GEOID, area)
hoco_area_cbg_2017 <- block_groups(
year = 2017,
state = "MD",
county = "Howard County",
cb = TRUE,
class = "sf",
progress_bar = FALSE
) %>%
mutate(area = ALAND / 2589988) %>%
select(GEOID, area)
In order to analyze data for Ellicott City specifically, I need to know which census block groups are associated with Ellicott City. I start with the so-called “block assignment files” published by the U.S. Census Bureau. These relate census blocks to the higher-level geographic entities with which they’re associated. In this case we want to know which census blocks are part of Ellicott City, which in census terms is a “census designated place” or CDP—basically a population center that is not an incorporated city.
To do this I download the archive of block assignment files for Maryland from the 2010 census (if it has not already been downloaded), unzip the archive, and read in the CSV file that maps census blocks to incorporated entities or CDPs.
Census blocks are uniquely identified by a 15-character ID that combines the 2-digit FIPS code for the state (“24” for Maryland), the 3-digit FIPS code identifying the county or county-equivalent within the state (“027” for Howard County), a 6-digit identifier for an individual census tract within the county or county-equivalent, and a 4-digit identifier for the census block within the tract.
Incorporated entities and CDPs are identified by a 5-digit code that uniquely identifies the place within the state. For Ellicott City that code is “26000”. (The full FIPS code for Ellicott City, “2426000”, adds the Maryland state code as a prefix.) Areas outside a CDP have missing values, which I replace with the string “none”.
if (!file.exists("BlockAssign_ST24_MD.zip")) {
download.file(
"https://www2.census.gov/geo/docs/maps-data/data/baf/BlockAssign_ST24_MD.zip",
"BlockAssign_ST24_MD.zip",
method = "auto"
)
}
unzip("BlockAssign_ST24_MD.zip", overwrite = TRUE)
md_block_places <- read_csv("BlockAssign_ST24_MD_INCPLACE_CDP.txt", col_types = "cc") %>%
rename(GEOID = BLOCKID) %>%
replace_na(list(PLACEFP = "none"))
The md_block_places
table contains entries for every census block in Maryland. I am interested only in Ellicott City, and only in census block groups, not individual census blocks. Unfortunately, it is possible for a given census block group to contain census blocks associated with two different places. This in fact the case for Howard County: for example, some census blocks in census block group “240276023022” are in Ellicott City (place code “26000”), some blocks are in Columbia (place code “19125”), and some blocks are not part of any CDP at all.
Which census block groups should I consider to be part of Ellicott City? I could consider only census block groups for which all census blocks are part of the Ellicott City CDP. However, this would risk omitting census block groups whose areas primarily fall within the Ellicott City CDP. Alternatively, I could consider any census block group that contains at least one census block associated with Ellicott City. However, this would risk including census block groups with only a small portion of their area actually part of Ellicott City.
For purposes of this analysis I consider a census block group to be part of Ellicott City if and only if at least 50% of its (land) area is included in the Ellicott City CDP. Since a given census block is either in the Ellicott City CDP or not, I can make this determination for a given census block group by comparing the total area of the census block group with the sum of the areas of all census blocks in the group that are associated with the Ellicott City CDP.
To do this I first need the areas for all census blocks in Howard County. I get these by using the tigris functions blocks()
. For consistency with other tables I rename the GEOID10
variable to GEOID
and create a new variable area
containing the area of each block in square miles.
hoco_area_blocks_2010 <- blocks(
year = 2010,
state = "MD",
county = "Howard County",
class = "sf",
progress_bar = FALSE
) %>%
rename(GEOID = GEOID10) %>%
mutate(area = ALAND10 / 2589988)
I now determine which census block groups are more than 50% in Ellicott City by area, as follows:
hoco_area_blocks_2010
containing areas for all Howard County census blocks, and drop any geometry associated with the blocks.md_block_places
containing the place ID associated with each census block, using the variable GEOID
common to both tables. Since I use the left_join
function the resulting table contains only rows corresponding to census blocks in Howard County. Each row contains (among other things) the GEOID
value for a given block, the place associated with the block (in the PLACEFP
variable), and the area of the block.PLACEFP
(“19125”, “26000”, “none”, etc.) into their own columns, with the values for the columns being the areas of the census blocks. Since any given census block is associated with only one place, for any given row only one column will include a valid area.GEOID
variable.na.rm
parameter to ignore missing values. The resulting table has one row per census block group and one column for each Howard County place, with the value for that column being the sum of the areas of the blocks associated with that place.rowSums()
function to create a new variable area
containing the total land area of each census block group, summing the areas contained in all columns except the first (which contains the GEOID
value). (The R syntax here is a bit tricky: the “.” in the rowSums()
calculation refers to the current row being summed. The expression .[2, length(.)]
refers to all columns in the current row from column 2 to the last column, whose index will be the length of the current row.)GEOID
and area
variables.The resulting table cbgs
contains all census block groups considered as part of Ellicott City for this analysis, along with the total areas for those census block groups (which are not necessarily the areas within Ellicott City proper).
cbgs <- hoco_area_blocks_2010 %>%
st_drop_geometry() %>%
left_join(md_block_places, by = "GEOID") %>%
select(GEOID, PLACEFP, area) %>%
spread(PLACEFP, area) %>%
mutate(GEOID = substr(GEOID, 1, 12)) %>%
group_by(GEOID) %>%
summarize_all(sum, na.rm = TRUE) %>%
mutate(area = rowSums(.[2:length(.)])) %>%
filter(`26000` / area >= 0.5) %>%
select(GEOID, area)
There are 34 census block groups considered part of Ellicott City, with a total area of 28.8 square miles. Compare this to the figure of 30.0 square miles of land area given in the Wikipedia article on Ellicott City. The difference likely represents census block groups for which only a small portion of the groups’ area is in Ellicott City, block groups that I’ve omitted from this analysis.
I now have a list of census block groups associated with the Ellicott City CDP. I therefore join the hoco_pop_cbg_2010
table with the cbgs
table to produce a table of population counts for the Ellicott City-associated census block groups only. (The right_join()
function includes only rows for which the GEOID
values exist in the second table, namely cbgs
.) I do the same for the 2017 population estimates. In both cases I remove the area
variable (carried over from the cbgs
table) because it’s not needed.
pop_cbg_2010 <- hoco_pop_cbg_2010 %>%
right_join(cbgs, by = "GEOID") %>%
select(-area)
pop_cbg_2017 <- hoco_pop_cbg_2017 %>%
right_join(cbgs, by = "GEOID") %>%
select(-area)
Again, because I am interested only in the Ellicott City census block groups I follow a similar procedure as above to produce tables of areas and boundaries restricted to the Ellicott City CDP.
area_cbg_2010 <- hoco_area_cbg_2010 %>%
right_join(cbgs, by = "GEOID")
area_cbg_2017 <- hoco_area_cbg_2017 %>%
right_join(cbgs, by = "GEOID")
The resulting tables will have two variables representing area, one from the original hoco_area_cbg_2010
and hoco_area_cbg_2017
tables (area.x
) and one from the cbgs
table (area.y
). They should be the same even though they come from different sources. As a confirmation of the calculations above I check to make sure that the areas don’t differ by more than 1% (0.01), and stop the analysis if they do.
stopifnot(max(abs((area_cbg_2010$area.x - area_cbg_2010$area.y) / area_cbg_2010$area.x)) < 0.01)
stopifnot(max(abs((area_cbg_2017$area.x - area_cbg_2017$area.y) / area_cbg_2017$area.x)) < 0.01)
I then retain only one variable for area in both tables, naming it area
.
area_cbg_2010 <- area_cbg_2010 %>%
rename(area = area.x) %>%
select(-area.y)
area_cbg_2017 <- area_cbg_2017 %>%
rename(area = area.x) %>%
select(-area.y)
Now that I have both population counts (for 2010) or estimates (for 2013-2017) along with areas for all of the block groups (wholly or mostly) associated with Ellicott City, I want to calculate the total population and total area for all Ellicott City block groups combined for both 2010 and 2017.
pop_cdp_2010 <- pop_cbg_2010 %>%
summarize(population = sum(population)) %>%
mutate(GEOID = "2426000")
pop_cdp_2017 <- pop_cbg_2017 %>%
summarize(population = sum(population)) %>%
mutate(GEOID = "2426000")
area_cdp_2010 <- area_cbg_2010 %>%
summarize(area = sum(area)) %>%
mutate(GEOID = "2426000")
area_cdp_2017 <- area_cbg_2017 %>%
summarize(area = sum(area)) %>%
mutate(GEOID = "2426000")
Moving down one level from census block groups, for 2010 I have population and area data for individual census blocks within Howard County. In order to map population density at the census block level, I create a table pop_block_2010
containing 2010 population counts for each census block associated with the Ellicott City CDP.
I do this as follows:
hoco_pop_block_2010
table containing population counts for each census block in Howard County.md_block_places
table that maps Maryland census blocks to CDPs, using the geographic identifier GEOID
as a common variable. Because I use the left_join()
function, the resulting table will contain only rows for which the GEOID
value is in the hoco_pop_block_2010
table.PLACEFP
variable, since it is no longer needed.pop_block_2010 <- hoco_pop_block_2010 %>%
left_join(md_block_places, by = "GEOID") %>%
filter(PLACEFP == "26000") %>%
select(-PLACEFP)
I create another table area_block_2010
containing areas and boundaries for census blocks in the Ellicott City CDP, using a similar process as above.
area_block_2010 <- hoco_area_blocks_2010 %>%
left_join(md_block_places, by = "GEOID") %>%
filter(PLACEFP == "26000") %>%
select(-PLACEFP)
Unfortunately I cannot create similar tables for 2017 because the ACS 5-year estimates do not go down to the census block level.
To help orient readers as to the locations of the census block groups, I also want any maps generated to also display roads in Ellicott City that correspond in whole or in part to census block group boundaries. I use the tigris function roads()
to return geometry for all Howard County roads.
Because I don’t need or want to display each and every Ellicott City road, I use the RTTYP
and FULLNAME
variables to filter the results to retain only major roads (interstate and U.S. highways) and significant minor roads (Maryland state routes and other significant roads in Ellicott City). I store the geometry for each in separate variables, so that I can plot them at different widths.
hoco_roads <- roads(state = "MD", county = "Howard County", class = "sf", progress_bar = FALSE)
hoco_major_roads_geo <- hoco_roads %>%
filter(RTTYP %in% c("I", "U")) %>%
st_geometry() %>%
st_union()
key_roads <- c(
"Bethany Ln",
"Centennial Ln",
"College Ave",
"Columbia Rd",
"Court House Dr",
"Ellicott City Rd",
"Ellicott Mills Dr",
"Frederick Rd",
"Gray Rock Dr",
"Main St",
"Marriottsville Rd",
"Maryland Ave",
"Montgomery Rd",
"New Cut Rd",
"Old Annapolis Rd",
"Old Columbia Pike",
"Old Ellicott City Rd",
"Old Frederick Rd",
"Old Saint Johns Ln",
"Old St Johns Ln",
"Rogers Ave",
"N Rogers Ave",
"Saint Johns Ln",
"St Johns Ln",
"Saint Paul St",
"Turf Valley Rd"
)
hoco_minor_roads_geo <- hoco_roads %>%
filter(RTTYP == "S" | FULLNAME %in% key_roads) %>%
st_geometry() %>%
st_union()
I do analyses to answer the following questions:
Before I look at population density, here’s a quick summary of Ellicott City population at the CDP, census block group, and census block level:
The total 2010 population of census block groups mostly in the Ellicott City CDP was 64,245. The estimated total population of Ellicott City-associated census block groups according to the ACS 5-year estimates for 2017 was 69,324. (Recall that ACS 5-year estimates for 2017 actually reflect surveys from 2013 through 2017.) This represents population growth of about 8% over roughly half a decade or so.
As discussed previously, there are currently 34 census block groups wholly or mostly in Ellicott City. In 2010 the least populated census block group contained 872 people, while the most populated census block group contained 3,251 people. The average population for all block groups was 1,890 people and the median population for all block groups was 1,732 people.
In 2010 the smallest block group covered an area of 0.32 square miles (about 208 acres), while the largest block group covered an area of 1.68 square miles. The average block group area was 0.85 square miles, and the median was 0.8 square miles.
A typical census block group is thus comparable in both size and population to a traditional village or small town. It is large enough to be a recognizable “place”, but small enough to have its own identity distinct from that of other places in the county.
There are 823 census blocks in the Ellicott City CDP, of which 354 contained no people at all in the 2010 census; the most populated census block contained 3,245 people. The average population for all blocks was 80 people.
In 2010 there were 4 census blocks that contained no land at all (i.e., had zero land area), while the largest block covered an area of 0.81 square miles (about 520 acres). The average census block area was 0.04 square miles (about 23 acres).
I first want to look at the differing population densities among the various census block groups. I start by computing population densities for 2010, including an overall population density for Ellicott City itself.
To do this I take the area_cbg_2010
table (which contains the variable area
specifying the land area of the block groups in square miles), delete the geometry (which I don’t need in this context), and join to it the table pop_cbg_2010
(which contains the variable population
). I then create a new variable pop_density
containing the population density. I also calculate the population density for Ellicott City as a whole.
density_cbg_2010 <- area_cbg_2010 %>%
st_drop_geometry() %>%
left_join(pop_cbg_2010, by = "GEOID") %>%
mutate(pop_density = population / area)
density_cdp_2010 <- pop_cdp_2010 %>%
left_join(area_cdp_2010, by = "GEOID") %>%
mutate(pop_density = population / area)
Population density in 2010 varied from a low of 1,028 people per square mile to a high of 9,352 people per square mile, a difference of two orders of magnitude. Overall population density for Ellicott City in 2010 was 2,234 people per square mile.
The following histogram shows the distribution of population density among the various block groups for the 2010 census. For consistency I use the same limits and break points as in the histogram shown in the analysis for 2017 below.
density_cbg_2010 %>%
ggplot(aes(x = pop_density)) +
geom_histogram(boundary = 0, binwidth = 500) +
geom_vline(
mapping = aes(xintercept = median(pop_density)),
linetype = "dashed"
) +
xlab("Population Density Per Square Mile") +
ylab("Number of Census Block Groups") +
scale_x_continuous(breaks = seq(0, 14000, 2000), limits = c(0, 14000)) +
scale_y_continuous(breaks = seq(0, 12, 2), limits = c(0, 12)) +
labs(
title = "2010 Core Ellicott City Population Density Distribution",
subtitle = "Based on 2010 Census Values for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
There are only 2 census block groups with population density over 4,000 people per square mile, and no census block groups with population densities under 1,000 people per square mile. The median density, shown by the dashed line, was 2,344 people per square mile.
This matches the general perception of Ellicott City: it has less multi-unit housing than Columbia, and fewer large lots with single-family homes than western Howard County.
I now repeat the analysis for the 2017 ACS 5-year population estimates.
density_cbg_2017 <- area_cbg_2017 %>%
st_drop_geometry() %>%
left_join(pop_cbg_2017, by = "GEOID") %>%
mutate(pop_density = population / area)
density_cdp_2017 <- pop_cdp_2017 %>%
left_join(area_cdp_2017, by = "GEOID") %>%
mutate(pop_density = population / area)
Population density in the 2013-2017 time frame varied from a low of 760 people per square mile to a high of 13,756 people per square mile, a wider range of densities than was present in 2010. Overall population density for the core Ellicott City CDP in 2013-2017 was 2,410 people per square mile.
The following histogram shows the distribution of population density among the various block groups based on the 2017 ACS 5-year population estimates.
density_cbg_2017 %>%
ggplot(aes(x = pop_density)) +
geom_histogram(boundary = 0, binwidth = 500) +
geom_vline(
mapping = aes(xintercept = median(pop_density)),
linetype = "dashed"
) +
xlab("Population Density Per Square Mile") +
ylab("Number of Census Block Groups") +
scale_x_continuous(breaks = seq(0, 14000, 2000), limits = c(0, 14000)) +
scale_y_continuous(breaks = seq(0, 12, 2), limits = c(0, 12)) +
labs(
title = "2013-2017 Ellicott City CDP Population Density Distribution",
subtitle = "Based on 2017 ACS 5-Year Estimates for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey, 5-Year Estimates, Table B01003",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
Again there are only 2 census block groups with population density over 4,000 people per square mile, but this time there is one census block group with a population density under 1,000 people per square mile. The median density was 2,545 people per square mile, a little increased from 2010.
The final histogram shows both distributions on a single graph, by combining the tables for 2010 and 2017, converting the year
variable into a categorical variable (“factor”), and then plotting the bars of the histogram side by side using year
to plot them in different colors.
bind_rows(density_cbg_2010, density_cbg_2017) %>%
mutate(year = as.factor(year)) %>%
ggplot(aes(x = pop_density, fill = year)) +
geom_histogram(boundary = 0, binwidth = 500, position = "dodge") +
xlab("Population Density Per Square Mile") +
ylab("Number of Census Block Groups") +
scale_x_continuous(breaks = seq(0, 14000, 2000), limits = c(0, 14000)) +
scale_y_continuous(breaks = seq(0, 12, 2), limits = c(0, 12)) +
scale_fill_viridis(option = "viridis", name = "Year", discrete = TRUE) +
labs(
title = "2010 and 2013-2017 Ellicott City CDP Population Density Distributions",
subtitle = "Based on 2010 Census and 2017 ACS 5-Year Estimates for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B01003",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme_minimal() +
theme(axis.title.x = element_text(margin = margin(t = 10))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
The distribution in 2013-2017 is somewhat flattened and spread out compared to 2010.
I next show the same data in the form of two maps, one for 2010 and one for 2013-2017. Because the population densities span such a wide range of values, I use a logarithmic scale when choosing the colors, based on powers of 10. I also use the same limits for the scale in both maps, to ensure that the color values are comparable between the two maps. (However, the color scale is not comparable to the maps I did for Howard County or Columbia.)
Note that I don’t bother trying to label the various roads because it would be more work and make the map more cluttered—and in any case the intended audience of Howard County residents should be able to identify most if not all of the roads by sight. I compute the boundary of the area covered by all the census block groups, and then intersect that with the roads geometry to show only roads lying within Ellicott City.
col_boundary <- st_union(area_cbg_2010)
major_roads_geo <- st_intersection(col_boundary, hoco_major_roads_geo)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
minor_roads_geo <- st_intersection(col_boundary, hoco_minor_roads_geo)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
area_cbg_2010 %>%
left_join(density_cbg_2010, by = "GEOID") %>%
ggplot(aes(fill = pop_density, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
scale_fill_viridis(
trans = "log10",
option = "plasma",
name = "Population\nPer Sq Mi",
limits = c(100, 14000),
breaks = c(100, 1000, 10000),
labels = comma
) +
labs(
title = "2010 Ellicott City Population Density Variations",
subtitle = "Population Per Square Mile for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
The areas of highest population density appear to be north of U.S. 40 and east of U.S. 29.
I now create a comparable map for the 2013-2017 timeframe.
area_cbg_2017 %>%
left_join(density_cbg_2017, by = "GEOID") %>%
ggplot(aes(fill = pop_density, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 0.8, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.4, fill = NA) +
scale_fill_viridis(
trans = "log10",
option = "plasma",
name = "Population\nPer Sq Mi",
limits = c(100, 14000),
breaks = c(100, 1000, 10000),
labels = comma
) +
labs(title="2013-2017 Ellicott City Population Density Variations",
subtitle = "Population Per Square Mile for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B01003",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
The overall picture is similar to that for 2010—which is only to be expected since only a few years separate the two maps.
I also want to look at the differing population densities among the various census blocks in 2010.
I first create a table density_block_2010
containing per-block population densities. This table is analogous to the previous table density_cbg_2010
and is created in a similar way.
density_block_2010 <- area_block_2010 %>%
st_drop_geometry() %>%
left_join(pop_block_2010, by = "GEOID") %>%
mutate(pop_density = ifelse(area > 0, population / area, 0))
Population density at the block level in 2010 varied from a low of 0 people per square mile (since most blocks contain no people at all) to a high of 53,481 people per square mile, a difference of almost six orders of magnitude. The average density for a block was 3,170.
The following table shows the top 20 census blocks by density.
density_block_2010 %>%
select(GEOID, area, population, pop_density) %>%
arrange(desc(pop_density)) %>%
head(20) %>%
kable()
GEOID | area | population | pop_density |
---|---|---|---|
240276023022003 | 0.0001309 | 7 | 53480.58 |
240276023052011 | 0.0001035 | 5 | 48320.67 |
240276023052025 | 0.0004193 | 17 | 40543.09 |
240276023023073 | 0.0035950 | 142 | 39499.33 |
240276023051009 | 0.0001050 | 4 | 38088.06 |
240276023051020 | 0.0001112 | 4 | 35972.06 |
240276023052017 | 0.0001189 | 4 | 33636.21 |
240276027001016 | 0.0029475 | 99 | 33587.74 |
240276023043013 | 0.0001502 | 5 | 33290.33 |
240276023052023 | 0.0002409 | 8 | 33204.97 |
240276023052030 | 0.0001668 | 5 | 29976.71 |
240276023062047 | 0.0004680 | 14 | 29917.35 |
240276029002030 | 0.0114221 | 333 | 29154.11 |
240276023051008 | 0.0001058 | 3 | 28357.53 |
240276027001015 | 0.0025761 | 72 | 27949.51 |
240276027001013 | 0.0022452 | 62 | 27614.66 |
240276029002014 | 0.0057630 | 156 | 27069.42 |
240276023023008 | 0.0042784 | 111 | 25944.29 |
240276023023010 | 0.0035730 | 89 | 24909.11 |
240276022022039 | 0.0331596 | 822 | 24789.19 |
I change any blocks with a density of zero to have a density of 1 person per square mile. This prevents errors when plotting the density on a logarithmic scale.
density_block_2010 <- density_block_2010 %>%
mutate(pop_density = ifelse(pop_density < 1, 1, pop_density))
I next show the 2010 population density variations at the census block level. Because the population densities span such a wide range of values, I again use a logarithmic scale when choosing the colors, based on powers of 10. Note that the color values are not comparable between this map and the maps above, because this map spans a considerably wider range of densities.
area_block_2010 %>%
left_join(density_block_2010, by = "GEOID") %>%
ggplot(aes(fill = pop_density, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 1.0, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.5, fill = NA) +
scale_fill_viridis(
trans = "log10",
option = "plasma",
name = "Population\nPer Sq Mi",
breaks = c(1, 10, 100, 1000, 10000, 100000),
labels = comma
) +
labs(
title = "2010 Core Ellicott City Population Density Variations",
subtitle = "Population Density Per Square Mile for Census Blocks",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0)) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
This map is most notable for showing areas of Ellicott City that have very low population densities. These include retail areas like the Long Gate shopping center north of MD 100 and east of U.S. 29, park areas including Patapsco State Park and Centennial Park, and golf courses like the one at Turf Valley.
Note also that some areas appear on this map that were not on the prior maps. These reflect census blocks that are in the Ellicott City CDP, but that are in census block groups that are mostly not in Ellicott City. The most notable example of this is the Turf Valley resort.
I now take another look at the population density changes between 2010 and the 2013-2017 time frame, this time mapping the changes for individual census block groups. As in the previous analysis for Howard County as a whole, I choose to show percentage changes in density rather than absolute changes. The map itself is created in exactly the same way as before.
density_changes <- bind_rows(density_cbg_2010, density_cbg_2017) %>%
select(GEOID, pop_density, year) %>%
spread(year, pop_density) %>%
mutate(pct_change = 100 * (`2017` - `2010`) / `2010`) %>%
select(GEOID, pct_change)
upper_limit <- round(max(density_changes$pct_change) + 10, -1)
lower_limit <- round(min(density_changes$pct_change) - 10, -1)
area_cbg_2017 %>%
left_join(density_changes, by = "GEOID") %>%
ggplot(aes(fill = pct_change, geometry = geometry)) +
geom_sf(size = 0) +
geom_sf(data = major_roads_geo, color = "white", size = 0.8, fill = NA) +
geom_sf(data = minor_roads_geo, color = "white", size = 0.4, fill = NA) +
scale_fill_viridis(
option = "viridis",
name = "% Change",
limits = c(lower_limit, upper_limit),
breaks = seq(lower_limit, upper_limit, 20)
) +
labs(title="Changes in Ellicott City Population Density",
subtitle = "2017 5-Year Estimates vs. 2010 Values for Census Block Groups",
caption = paste0(
"Data sources:",
"\n U.S. Census Bureau, 2010 Decennial Census, Table P1",
"\n U.S. Census Bureau, 2017 American Community Survey 5-Year Estimates, Table B101003",
"\n U.S. Census Bureau, Cartographic boundaries",
"\nCreated using the tidyverse and tidycensus R packages"
)
) +
theme(plot.caption = element_text(hjust = 0, margin = margin(t = 15))) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.background = element_blank())
The most notable increases in population density are north of Old Ellicott City and south of U.S. 40. The maximum increase in density was about 100%, the maximum decrease about 40%.
Population values from the American Community Survey are estimates based on survey samples, with associated margins of error. For estimates at the census block group level the associated standard errors can be 10-30% or more of the estimates themselves.
As noted above, 5-year estimates identified as being for a certain year actually reflect surveys conducted over the prior five years. So the 2017 population estimates actually correspond to the 2013-2017 time frame.
For more information about the structure of geographic identifiers used for census data, see “Understanding Geographic Identifiers (GEOIDs)”.
To obtain the 2010 census and 2017 ACS data I used the tidycensus package, which provides an easy-to-use interface to U.S. Census Bureau data made available via a set of public APIs. As its name suggests, the tidycensus package is designed to be compatible with the tidyverse approach to representing and manipulating data.
To obtain the census block group and census block boundaries I used the tigris package, which is designed to accompany the tidycensus package and provides access to U.S. Census Bureau cartographic boundaries and TIGER/Line shapefiles.
Calculation of population density based on land area is discussed in the U.S. Census Bureau document 2010 Census Summary File 1, 2010 Census of Population and Housing, Technical Documentation. See Appendix A, “Geographic Terms and Concepts”, page A-22, “Population and Housing Unit Density”, and Appendix B, “Definitions of Subject Characteristics”, page B-23, “Area Measurement and Density”.
If you look closely at the maps above you’ll notice that there are gaps in several places between roads where there should be no gaps, most noatbly where MD 100 joins U.S. 29. I have not yet been able to identify where the geometry for those road sections is in the table returned by the roads()
function.
I used the following R environment in doing the analysis above:
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.24 scales_1.0.0 viridis_0.5.1
## [4] viridisLite_0.3.0 sf_0.7-7 tigris_0.8.2
## [7] tidycensus_0.9.2 forcats_0.4.0 stringr_1.4.0
## [10] dplyr_0.8.3 purrr_0.3.2 readr_1.3.1
## [13] tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.1
## [16] tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 lubridate_1.7.4 lattice_0.20-38
## [4] class_7.3-15 assertthat_0.2.1 zeallot_0.1.0
## [7] digest_0.6.20 R6_2.4.0 cellranger_1.1.0
## [10] backports_1.1.4 evaluate_0.14 e1071_1.7-2
## [13] httr_1.4.1 highr_0.8 pillar_1.4.2
## [16] rlang_0.4.0 lazyeval_0.2.2 curl_4.0
## [19] readxl_1.3.1 uuid_0.1-2 rstudioapi_0.10
## [22] rmarkdown_1.14 rgdal_1.4-4 foreign_0.8-71
## [25] munsell_0.5.0 broom_0.5.2 compiler_3.6.0
## [28] modelr_0.1.5 xfun_0.8 pkgconfig_2.0.2
## [31] htmltools_0.3.6 tidyselect_0.2.5 gridExtra_2.3
## [34] crayon_1.3.4 withr_2.1.2 rappdirs_0.3.1
## [37] grid_3.6.0 nlme_3.1-139 jsonlite_1.6
## [40] gtable_0.3.0 DBI_1.0.0 magrittr_1.5
## [43] units_0.6-3 KernSmooth_2.23-15 cli_1.1.0
## [46] stringi_1.4.3 sp_1.3-1 xml2_1.2.2
## [49] vctrs_0.2.0 generics_0.0.2 tools_3.6.0
## [52] glue_1.3.1 hms_0.5.0 yaml_2.2.0
## [55] colorspace_1.4-1 maptools_0.9-5 classInt_0.4-1
## [58] rvest_0.3.4 haven_2.1.1
You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.