Howard County Council Expansion, Part 1

Introduction

In this document I create a shapefile to be used for Howard County Council redistricting calculations. The shapefile combines precinct boundaries, precinct populations (including breakdowns by race and ethnicity), and election results for the most recent council council election races.

Redistricting efforts sponsored by political parties (e.g., in attempts to gerrymander districts) use information that is not fully public, including voter registration files kept by election boards. My goal here is rather to do redistricting calculations using only data and software that are freely available to the general public. That allows others to check my work for possible errors and omissions, and to build on it for their own purposes should they wish to do so.

For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the various ways to learn more about the Tidyverse.

Setup and data preparation

Libraries

I use the following R packages for the following purposes:

tidyverse: do general data manipulation.
sf: manipulate geospatial data.
tools. compute MD5 checksums.
gghighlight. customize graphs.
scales. customize graphs.
knitr. print tabular data.

library(tidyverse)
library(sf)
library(tools)
library(gghighlight)
library(scales, warn.conflicts = FALSE)
library(knitr)

Data sources

I use data from the following sources; see the References section below for more information:

Population data for Howard County precincts are from the Maryland Department of Planning data used for Congressional redistricting.
Boundaries for Howard County precincts are from the Howard County GIS Division.
Election results for Howard County as a whole and for individual precincts are from the Maryland Board of Elections for the 2018 general election, the most recent election to feature local Howard County races, including for county council.
Information on numbers of registered voters and voter turnout for the 2018 general election is also from the Maryland Board of Elections.
Information on the size and structure of Maryland county and city councils and boards of county commissioners is from the Maryland Manual On-Line published by the Maryland State Archives.

The precinct geographic boundaries are consistent between these various sources, as are the precinct designations (except for minor differences in formatting).

Downloading the data

I download the following files if they have not already been downloaded:

Adjusted2020Blockcsv.zip. A CSV file containing block-level population data from the 2020 Maryland redistricting effort.
Voting_Precincts.geojson. A GeoJSON-format file containing Howard County precinct boundaries.
Howard_County_2018_General.csv. A CSV file containing overall Howard County 2018 general election results.
Howard_By_Precinct_2018_General.csv. A CSV file containing Howard County 2018 general election results by precinct, for election day only.
Official_by_Party_and_Precinct.csv. A CSV file containing turnout-related statistics for the 2018 general election results by precinct, for both election day voting and other methods of voting, including early voting.

if (!file.exists("Adjusted2020Blockcsv.zip")) {
  download.file(
    "https://apps.planning.maryland.gov/redistricting/data/Adjusted2020Blockcsv.zip",
    "Adjusted2020Blockcsv.zip",
    method = "auto"
  )
}

if (!file.exists("Voting_Precincts.geojson")) {
  download.file(
    "https://data.howardcountymd.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=general:Voting_Precincts&outputFormat=application/json",
    "Voting_Precincts.geojson",
    method = "auto"
  )
}

if (!file.exists("Howard_County_2018_General.csv")) {
  download.file(
    "https://elections.maryland.gov/elections/2018/election_data/Howard_County_2018_General.csv",
    "Howard_County_2018_General.csv",
    method = "auto"
  )
}

if (!file.exists("Howard_By_Precinct_2018_General.csv")) {
  download.file(
    "https://elections.maryland.gov/elections/2018/election_data/Howard_By_Precinct_2018_General.csv",
    "Howard_By_Precinct_2018_General.csv",    
    method = "auto"
  )
}

if (!file.exists("Official_by_Party_and_Precinct.csv")) {
  download.file(
    "https://elections.maryland.gov/elections/2018/turnout/general/Official%20by%20Party%20and%20Precinct.csv",
    "Official_by_Party_and_Precinct.csv",
    method = "auto"
  )
}

I check the MD5 hash values for the files, and stop if the contents are not what are expected.

stopifnot(md5sum("Voting_Precincts.geojson") == "5c732804496e978ff9d0bead4d49d23a")
stopifnot(md5sum("Adjusted2020Blockcsv.zip") == "748adc7c7d717a58bbfb1fc6b1cf7a27")
stopifnot(md5sum("Howard_County_2018_General.csv") == "22f086ede73639a6291fd16d51326c22")
stopifnot(md5sum("Howard_By_Precinct_2018_General.csv") == "5916ceb054fa2035ca6e17557d883d51")
stopifnot(md5sum("Official_by_Party_and_Precinct.csv") == "e97de27d20f077e1327bd7c39c81938b")

I unzip the CSV file with census block-level population data.

unzip("Adjusted2020Blockcsv.zip", overwrite = TRUE)

Reading in and preparing the data

I first read in the CSV file containing adjusted 2018 populations for all Maryland census blocks, and store that data in the data table md_pop_by_block_tb. I rename the VTD column to Precinct to make the code below more clear. (“VTD” stands for “Voting District”.)

NOTE: The first two columns of the data, the census block number and block group number, have invalid values, possibly because someone somewhere made a mistake and let Excel convert the long numeric values in those columns to scientific notation. The equivalent Excel file has the same problem, albeit only on both the census block column. Fortunately I don’t need either of those values for my purposes, so I can ignore this error.

md_pop_by_block_tb <- read_csv("Block.csv", show_col_types = FALSE) %>%
  rename(Precinct = VTD)

I next read in the GeoJSON data for the Howard County precinct boundaries and store that data in a geospatial data table precinct_sf. I create a variable Precinct with a format matching that of the variable of the same name in the population data just loaded, and a variable District with a format matching that used for county council districts in the election results.

(The chosen format for precincts is sscccpp-ppp, where ss is “24”, the two-digit FIPS code for the state of Maryland, ccc is the 3-digit code for each county within the state, and pp-ppp is the designator for a precinct within a county. The chosen format for council districts is a three-digit number, to match the format used in the election results.)

I don’t need any of the other data in the GeoJSON file other than the precinct and council district, so I retain only the Precinct and District variables, along with the boundary geometry data (which is automatically retained).

I also create a data table precinct_district_tb to map each precinct to its corresponding council district. This is essentially the precinct_sf table with the precinct boundary geometry data removed.

precinct_sf <- st_read("Voting_Precincts.geojson") %>%
  mutate(Precinct = paste("240270",
                          substr(PRECINCT20, 1, 1),
                          "-0",
                          substr(PRECINCT20, 3, 5),
                          sep = ""),
         District = paste("00", CODISTRICT, sep = "")) %>%
  select(Precinct, District)

## Reading layer `Voting_Precincts' from data source 
##   `/home/fhecker/src/hocodata/redistricting/Voting_Precincts.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 118 features and 10 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 1259411 ymin: 523188.7 xmax: 1398344 ymax: 620119.3
## Projected CRS: NAD83 / Maryland (ftUS)

precinct_district_tb <- precinct_sf %>% 
  st_drop_geometry()

I then read in the 2018 election data for Howard County. I keep only the results for the Democratic and Republican candidates in the county executive race and the five county council races. The table all_votes_by_candidate_tb accounts for all votes no matter how cast. The table en_votes_by_precinct_tb contains only votes cast on election day and recorded on election night. (Maryland does not assign votes to precincts when they are cast by mail or during early voting periods.)

I create a new variable Precinct to designate each precinct using the format described above. I also join the table with the precinct_district_tb table created above, in order to add a new District variable for all precinct-level results. (The Office District variable cannot be used for this, since it has missing values for county-wide races like that for county executive.)

all_votes_by_candidate_tb <- read_csv("Howard_County_2018_General.csv",
                           show_col_types = FALSE) %>%
  filter(`Office Name` %in% c("County Executive", "County Council"),
         Party %in% c("DEM", "REP"))

en_votes_by_precinct_tb <- read_csv("Howard_By_Precinct_2018_General.csv",
                                 show_col_types = FALSE) %>%
  filter(`Office Name` %in% c("County Executive", "County Council"),
         Party %in% c("DEM", "REP")) %>%
  mutate(Precinct = paste("24027",
                          substr(`Election District`, 2, 3),
                          "-",
                          substr(`Election Precinct`, 1, 3),
                          sep = "")) %>%
  left_join(precinct_district_tb, by = "Precinct")

I read in the official data on turnout in the 2018 general election by precinct. Note that unlike the precinct-level election results, the precinct-level turnout statistics account for all votes cast no matter the method.

I keep only the data for Howard County (ignoring votes for which the precinct cannot be determined), create a variable Precinct (as distinct from the original variable PRECINCT) formatted to match that of other data tables, and a variable TOTAL_VOTES to hold the total number of votes cast in a given precinct.

turnout_by_precinct_tb <- read_csv("Official_by_Party_and_Precinct.csv",
                                   show_col_types = FALSE) %>%
  filter(LBE == "Howard",
         PRECINCT != "Unable to Determine") %>%
  mutate(Precinct = paste("24027", substr(PRECINCT, 2, 7), sep = ""),
         TOTAL_VOTES = POLLS + EARLY_VOING + ABSENTEE + PROVISIONAL)

Finally, for each Maryland county I list the number of county council members (for charter counties), county commissioners (for non-charter counties), or city council members (for Baltimore city).

md_council_structure_tb <- tribble(
  ~County, ~Name, ~Type, ~At_Large, ~By_District,
  24001, "Allegany County", "Commissioners", 0, 3,
  24003, "Anne Arundel County", "Council", 0, 7,
  24005, "Baltimore County", "Council", 0, 7,
  24009, "Calvert County", "Commissioners", 2, 3,
  24011, "Caroline County", "Commssioners", 3, 0,
  24013, "Carroll County", "Commissioners", 0, 5,
  24015, "Cecil County", "Council", 0, 5,
  24017, "Charles County", "Commissioners", 1, 4,
  24019, "Dorchester County", "Council", 0, 5,
  24021, "Frederick County", "Council", 2, 5,
  24023, "Garrett County", "Commissioners", 0, 3,
  24025, "Harford County", "Council", 1, 6,
  24027, "Howard County", "Council", 0, 5,
  24029, "Kent County", "Commissioners", 3, 0,
  24031, "Montgomery County", "Council", 4, 5,
  24033, "Prince George’s County", "Council", 2, 9,
  24035, "Queen Anne’s County", "Commissioners", 1, 4,
  24037, "St. Mary’s County", "Commissioners", 1, 4,
  24039, "Somerset County", "Commissioners", 0, 5,
  24041, "Talbot County", "Council", 5, 0,
  24043, "Washington County", "Commissioners", 5, 0,
  24045, "Wicomico County", "Council", 2, 5,
  24047, "Worcester County", "Commissioners", 0, 7,
  24510, "Baltimore city", "Council", 1, 14
)

Which data should be used for redistricting?

My goal in doing this redistricting exercise is to be able to draw Howard County Council district boundaries in a way that promotes overall fairness between the two dominant political parties and their voters and among the various racial and ethnic groups in the county.

However in doing so there are various questions I need to resolve:

How many council members should there be? In particular, how does Howard County compare to other Maryland juridictions?
At what level should redistricting be done? Should calculations be performed at the census block level (as the available data might be seen as allowing) and new precinct lines drawn based on that? Or should redistricting calculations be done at the precinct level, with precincts simply moved between council districts but otherwise kept as is?
Which races and ethnic groups should be considered in redistricting? All of them identified in the census data? Or only a subset? If the latter, should I include multiracial people as a distinct group, or try to lump them in with one or more other groups?
How should the relative strengths of the two main political parties be calculated? Based on registration statistics? Election results? And if election results are to be used, exactly which election(s)?
In drawing the actual council district lines, which factors should be given more weight? As but one example, should I put more weight on having more compact (i.e., not oddly-shaped) districts, even if that conflicts with having district populations be as equal as possible?

There are also various obstacles I face:

The 2020 census block-level data is not necessarily accurate, since the US Census Bureau deliberately introduced random “noise” into the data in an attempt to provide enhanced privacy for personal demographic information.
Unlike the 2018 general election precinct-level turnout statistics, which account for all methods of voting, the precinct-level 2018 general election results include only votes cast on election day.
Not all precincts have Republican vote totals for the 2018 general election for county council, since there was no Republican candidate in District 3.

The following sections discuss how I resolved the questions above and tried to work around the various obstacles.

Deciding on an appropriate council size

Currently the Howard County Council has five members. It has not increased in size in the over fifty years since Howard County became a charter county and replaced its board of county commissioners with a county council. How does the number of people per council member in Howard County compare to the values for other Maryland counties and Baltimore city?

I create a data table md_pop_per_member_tb using the Maryland adjusted population per census block, first summing over the census blocks in each county, then adding columns for the county names and the sizes of their respective councils or boards of county commissioners, and then dividing the county (adjusted) populations by those sizes.

md_pop_by_county_tb <- md_pop_by_block_tb %>%
  ungroup() %>%
  select(County, Adj_Population) %>%
  group_by(County) %>%
  summarize(Adj_Population = sum(Adj_Population))

md_pop_per_member_tb <- md_pop_by_county_tb %>%
  left_join(md_council_structure_tb, by = "County") %>%
  mutate(Size = At_Large + By_District,
         Pop_Per_Member = round(Adj_Population / Size, 0)) %>%
  select(County, Name, Adj_Population, Size, Pop_Per_Member)

ratio_val <- md_pop_per_member_tb %>%
  filter(Name == "Howard County") %>%
  select(Pop_Per_Member) %>%
  as.numeric()

county_with_max_ratio_tb <- md_pop_per_member_tb %>%
  arrange(desc(Pop_Per_Member)) %>%
  head(1)

county_with_min_ratio_tb <- md_pop_per_member_tb %>%
  arrange(Pop_Per_Member) %>%
  head(1)

I then plot the population per council member or commissioner, arranging the counties in rank order by population per member.

md_pop_per_member_tb %>%
  mutate(Name = sub("( County)?$", "", Name)) %>%
  mutate(Name = fct_reorder(Name, -Pop_Per_Member)) %>%
  ggplot() +
  geom_col(aes(x = Name, y = Pop_Per_Member)) +
  gghighlight(Name == "Howard") +
  scale_y_continuous(breaks = seq(0, 140000, 20000), label = comma) +
  xlab("County") +
  ylab("Population per Member") +
  labs(
    title = "Population per Council Member or County Commissioner",
    subtitle = "Includes Members Elected At-Large and by District",
    caption = "Data sources:\n  Maryland Department of Planning, 2020 Redistricting Data\n  Maryland Manual On-Line, Local Government Information\nCreated using the tidyverse R package"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

It’s apparent from the graph that Howard County has a fairly high ratio of people per council member, about 66,000 people per member. This compares to an average ratio of 36,000 people per member and a median of 22,000. Baltimore County has the highest ratio, about 122,000 people per council member, while Somerset County has the lowest ratio, about 4,000 people per county commissioner.

My proposal is to expand the Howard County Council from five members to fifteen members. This would lower the ratio of people per council member to about 22,000 people per member, right at the current median value for Maryland counties.

With this change the new ranking of counties would be as in the following graph.

md_pop_per_member_tb %>%
  mutate(Pop_Per_Member = ifelse(Name == "Howard County",
                                 Pop_Per_Member / 3,
                                 Pop_Per_Member)) %>%
  mutate(Name = sub("( County)?$", "", Name)) %>%
  mutate(Name = fct_reorder(Name, -Pop_Per_Member)) %>%
  ggplot() +
  geom_col(aes(x = Name, y = Pop_Per_Member)) +
  gghighlight(Name == "Howard") +
  scale_y_continuous(breaks = seq(0, 140000, 20000), label = comma) +
  xlab("County") +
  ylab("Population per Member") +
  labs(
    title = "Population per Council Member or County Commissioner",
    subtitle = "After Proposed Howard County Council Expansion",
    caption = "Data sources:\n  Maryland Department of Planning, 2020 Redistricting Data\n  Maryland Manual On-Line, Local Government Information\nCreated using the tidyverse R package"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

Accounting for census block data quality

As briefly noted above, the published 2020 Census block-level data was manipulated in various ways by the US Census Bureau in an attempt to preserve the privacy of people’s personal demographic data. The basic idea is that for a given census block the published data will contain more or fewer people than are present in the original count.

(See “Differential Privacy and the 2020 Census” for the official justification for doing this, a blog post by Jessica Hullman discussing some of the controversy around this practice, and articles on the RedistrictingOnline.org site discussing how it might affect redistricting efforts.)

pop_val <- md_pop_by_block_tb %>%
  filter(County == "24027") %>%
  summarize(Adj_Population = sum(Adj_Population)) %>%
  as.numeric()

num_blocks_val <- md_pop_by_block_tb %>%
  filter(County == "24027") %>%
  summarize(n = n()) %>%
  as.numeric()

num_precincts_val <- precinct_district_tb %>%
  summarize(n = n()) %>%
  as.numeric()

avg_pop_per_block_val <- pop_val / num_blocks_val
avg_pop_per_precinct_val <- pop_val / num_precincts_val

This deliberately introduced noise is particularly problematic when it comes to block-level data on race and ethnicity. Howard County has 2,204 census blocks and a total (adjusted) population of about 332,000 people, so the number of people per census block is about 150 on average. If only a few people of a given race or ethnicity live in a given census block, the noise may greatly distort their numbers as published in the adjusted population data for redistricting.

On the other hand Howard County has only 118 precincts, so the average adjusted population per precinct is about 2,800. This number is large enough so that hopefully the noise introduced at the block level will get cancelled out at the precinct level, and I can treat the precinct-level numbers on race and ethnicity as being at least somewhat close to the actual values.

I thus create a second data table md_pop_by_precinct_tb that aggregates the population data by precinct (i.e., for all census blocks associated with each precinct). Note that this also eliminates the census block, census block group, and census tract variables from the resulting data table.

md_pop_by_precinct_tb <- md_pop_by_block_tb %>%
  group_by(State, County, Precinct) %>%
  summarize(across(Adj_Population:Adj_NH18__2_Races, sum))

## `summarise()` has grouped output by 'State', 'County'. You can override using the `.groups` argument.

Working with precincts rather than census blocks also simplifies the redistricting calculations and drawing the redistricting maps, since manipulating population, electoral, and geospatial data for 118 precincts is considerably less work than doing so for 2204 census blocks.

This is also why I use the Howard County GIS data for precinct boundaries instead of the block boundaries in the published Maryland data. It’s possible to combine the block boundary lines to calculate boundaries for the precincts of which they’re part, but it’s much simpler just to use the already-created precinct boundary data.

Accounting for races and ethnic groups

One of my goals in redistricting is to ensure that racial and ethnic groups are not disadvantaged in the drawing of district boundaries. A key question in redistricting is therefore how to account for the various racial and ethnic populations.

To a large degree this has already been done for us by the US Census Bureau’s policy. In particular, the data published by the Census Bureau for redistricting purposes (and subsequently adjusted by the Maryland Department of Planning) identifies Hispanic people as a unitary ethnic group, regardless of their race, and then groups all non-Hispanic people into five racial groups: White, Black, Asian (which includes both East Asians and South Asians), American Indians, native Hawaiians, people identifying as some other race, and multiracial people (i.e., people identifying as belonging to two or more races).

groups_tb <- md_pop_by_precinct_tb %>%
  filter(County == "24027") %>%
  summarize(across(Adj_Population:Adj_NH18__2_Races, sum))

## `summarise()` has grouped output by 'State'. You can override using the `.groups` argument.

For Howard County as a whole the breakdown of groups is as follows: 8% persons of Hispanic origin, with the remaining (non-Hispanic) population 47% White, 20% Asian, 19% Black, and 5% multiracial, with the number of American Indians, native Hawaiians, and people of other races negligible. (The breakdown for those 18 years of age or older is similar but leans slightly whiter: 7% Hispanic origin, 51% White, 19% Asian, 18% Black, 3% multiracial, and the remaining groups negligible.)

(Note that these numbers cannot necessarily be compared with racial and ethnic breakdowns in previous census data, because the US Census Bureau has changed its questions regarding race over time. See the article “This Is How The White Population Is Actually Changing Based On New Census Data” for more background on this.)

Hispanic, White, Black, and Asian people have traditionally been targeted by politicians as separate interest groups, so it’s appropriate to take their numbers into account in redistricting. At the other end of the spectrum the populations of American Indians, native Hawaiians, and people of other races in Howard County are so small that it doesn’t make sense to factor them into the calculations.

That leaves multiracial people as an interesting corner case. It’s not clear to what degree they have unified political interests, but at the same time their numbers are so large that it feels wrong to leave them out of consideration in redistricting. I’ve therefore chosen to combine the multiracial population with the other smaller groups to form a ”rest of the population” group separate from the main White, Asian, Black, and Hispanic groups.

A second question is whether to consider only people age 18 or older, or to consider everyone, including children who can’t vote. I’ve chosen to include everyone: since redrawn district boundaries are typically in effect until the next decennial census, by the time of the 2030 county council races two-thirds of those under 18 in 2018 will be eligible to vote. (Even in the 2022 county council races there will be many young people eligible to vote who were not eligible in 2018.)

Based on the above considerations, I use the data table md_pop_by_precinct_tb to create a new data table pop_by_precinct_tb. I filter the data to include only Howard County (dispensing with the State and County variables after first removing the grouping that includes them), combine the American Indian, Hawaiian, and “other race” categories with the “two or more races” category to create a single “rest of the non-Hispanic population” category, and discard all the variables for the 18-and-over population.

pop_by_precinct_tb <- md_pop_by_precinct_tb %>%
  ungroup() %>%
  filter(County == "24027") %>%
  mutate(Adj_NH_Rest = Adj_NH_Ind + Adj_NH_Hwn + Adj_NH_Oth + Adj_NH_2__Races) %>%
  select(Precinct, Adj_Population, Adj_Hispanic_Origin,
         Adj_NH_Wht, Adj_NH_Blk, Adj_NH_Asn, Adj_NH_Rest)

Accounting for political parties: registration vs. votes

Another of my goals in redistricting is to ensure that new district boundaries do not unduly disadvantage one or the other of the two main political parties. (The other parties are insignificant in terms of their electoral impact, so they can safely be ignored when it comes to redistricting. However note that this may change in future if Howard County adopts ranked choice voting for the county council.)

turnout_tb <- turnout_by_precinct_tb %>%
  summarize(across(c(POLLS:ELIGIBLE_VOTERS, TOTAL_VOTES), sum))

dem_turnout_tb <- turnout_by_precinct_tb %>%
  filter(PARTY == "DEMOCRAT") %>%
  summarize(across(c(POLLS:ELIGIBLE_VOTERS, TOTAL_VOTES), sum))

rep_turnout_tb <- turnout_by_precinct_tb %>%
  filter(PARTY == "REPUBLICAN") %>%
  summarize(across(c(POLLS:ELIGIBLE_VOTERS, TOTAL_VOTES), sum))

The first possible way to measure relative party strength is based on voter registration. As of the 2018 general election there were about 215,000 registered voters in Howard County, of whom 50% were registered Democrats and 26% were registered Republicans. Democrats thus had a nearly 2-to-1 registration advantage over Republicans.

However this is potentialy misleading, for at least two reasons. First, the parties may have different levels of success in terms of motivating their registered voters to actually vote. For example, in the 2018 general election 72% of registered Democrats actually voted, versus 69% of registered Republicans.

ce_all_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Office Name` == "County Executive") %>%
  summarize(`Total Votes` = sum(`Total Votes`)) %>%
  as.numeric()

dem_ce_all_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Candidate Name` == "Calvin Ball") %>%
  summarize(`Total Votes` = sum(`Total Votes`)) %>%
  as.numeric()

rep_ce_all_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Candidate Name` == "Allan H. Kittleman") %>%
  summarize(`Total Votes` = sum(`Total Votes`)) %>%
  as.numeric()

Second, and more important, nominally independent voters tend to lean toward one party or another, and even registered voters of one party may cross party lines to vote for another party’s candidate. In particular, Republican candidates can poll much better than Republican registration totals might suggest. For example, in the 2018 general election the Republican candidate for county executive (Allan Kittleman) won 47% of the vote, compared to 53% for the Democratic candidate (Calvin Ball). (And of course Kittleman actually won election as county executive in 2014.)

For this reason I’ve chosen to take into account only election results, not registration numbers. But which election(s) should I use? My choice is to use only the county council election results, since my goal is to create districts for the county council. However I also work with the results for county executive, since as discussed below I need those results in order to impute missing data for council district 3.

cc1245_all_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Office Name` == "County Council", `Office District` != "003") %>%
  summarize(`Total Votes` = sum(`Total Votes`)) %>%
  as.numeric()

dem_cc1245_all_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Office Name` == "County Council", `Office District` != "003", Party == "DEM") %>%
  summarize(`Total Votes` = sum(`Total Votes`)) %>%
  as.numeric()

rep_cc1245_all_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Office Name` == "County Council", `Office District` != "003", Party == "REP") %>%
  summarize(`Total Votes` = sum(`Total Votes`)) %>%
  as.numeric()

In the 2018 county council election results, if I exclude District 3, in which there was no Republican candidate for county council, about 115,000 people voted for county council candidates, with Democratic candidates winning about 62% of those votes and Republican candidates winning about 38%. The Republican vote share is higher than one might expect based on voter registration statistics, but lower than what was achieved in the county executive race.

Estimating precinct-level Republican votes in District 3

As mentioned above, in the 2018 general election there was no Republican county council candidate in District 3, where the Democratic candidate (Christiana Rigby) ran unopposed. In order to provide a complete data set for the redistricting algorithm I estimate how many votes a hypothetical GOP candidate would have received in District 3.

One way to do this is to look at how Republican candidates fared in other districts, and extrapolate that performance to District 3. I first look at the individual performances of the Republican candidates in the other districts, comparing that to the performance of the GOP county executive candidate in those same districts.

Note that in order to compare apples to apples I must use only votes cast on election day as reported on election night, since the precinct-level results, and thus the district-level results for the county executive race computed from them, are for election day only.

rep_cc1245_vs_ce1245 <- en_votes_by_precinct_tb %>%
  filter(`Office Name` %in% c("County Council", "County Executive"),
         District != "003",
         Party == "REP") %>%
  group_by(`Office Name`, District) %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`), .groups = "drop") %>%
  pivot_wider(names_from = "Office Name", values_from = "Election Night Votes") %>%
  mutate(`% CC vs CE` = round(100 * `County Council` / `County Executive`, 1))

rep_cc1245_vs_ce1245 %>% kable(caption = "GOP County Council Candidate Relative Performance")

GOP County Council Candidate Relative Performance
District	County Council	County Executive	% CC vs CE
001	6664	9117	73.1
002	5680	7254	78.3
004	5815	7723	75.3
005	11208	13554	82.7

Some Republican candidates performed better than others relative to the performance of the GOP county executive candidate. To estimate the performance of a hypothetical generic Republican candidate in District 3, I compute an overall ratio of Republican county council candidate performance relative to Republican county executive candidate performance, excluding District 3.

rep_cc1245_en_votes_val <- en_votes_by_precinct_tb %>%
  filter(`Office Name` == "County Council",
         District != "003",
         Party == "REP") %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`)) %>%
  as.numeric()

rep_ce1245_en_votes_val <- en_votes_by_precinct_tb %>%
  filter(`Office Name` == "County Executive",
         District != "003",
         Party == "REP") %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`)) %>%
  as.numeric()

rep_cc_vs_ce_val <- rep_cc1245_en_votes_val / rep_ce1245_en_votes_val

Overall in 2018 Republican county council candidates outside District 3 received about 78% of the election day vote that the GOP candidate for county executive did in those districts.

What about the Democratic candidate in District 3? Did Christiana Rigby do relatively better because she had no Republican opponent? I also look at the individual performances of the Democratic county council candidates in each district, comparing that to the performance of the Democratic county executive candidate in those same districts. (Again, I use only votes cast on election day as recorded on election night.)

dem_cc_vs_ce <- en_votes_by_precinct_tb %>%
  filter(`Office Name` %in% c("County Council", "County Executive"),
         Party == "DEM") %>%
  group_by(`Office Name`, District) %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`), .groups = "drop") %>%
  pivot_wider(names_from = "Office Name", values_from = "Election Night Votes") %>%
  mutate(`% CC vs CE` = round(100 * `County Council` / `County Executive`, 1))

dem_cc_vs_ce %>% kable(caption = "Democratic County Council Candidate Relative Performance")

Democratic County Council Candidate Relative Performance
District	County Council	County Executive	% CC vs CE
001	9499	7352	129.2
002	11437	10456	109.4
003	14049	10876	129.2
004	10779	9164	117.6
005	7969	6075	131.2

At first glance it looks as if the Democratic county council candidate in District 3 performed comparably to Democratic council candidates in other districts (especially Districts 1 and 5), and did not receive any special boost from not having a GOP opponent.

As I did for the Republican county candidates, I compute an overall ratio of Democratic county council candidate performance relative to Democratic county executive candidate performance, excluding District 3.

dem_cc1245_en_votes_val <- en_votes_by_precinct_tb %>%
  filter(`Office Name` == "County Council",
         District != "003",
         Party == "DEM") %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`)) %>%
  as.numeric()

dem_ce1245_en_votes_val <- en_votes_by_precinct_tb %>%
  filter(`Office Name` == "County Executive",
         District != "003",
         Party == "DEM") %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`)) %>%
  as.numeric()

dem_cc_vs_ce_val <- dem_cc1245_en_votes_val / dem_ce1245_en_votes_val

Overall in 2018 Democratic county council candidates outside District 3 received about 120% of the election day vote that the Democratic candidate for county executive did in those districts.

At this point I take the original data table en_votes_by_precinct_tb and create a “wide” table en_votes_by_precinct_wide_tb by combining the fields for the office and for the party into a single field, getting rid of all the unneeded fields, and then using pivot_wider() to convert the different rows containing election night vote totals for the county executive and county council races into different columns.

This simplifies calculating estimated election night votes for a hypothetical Republican county council candidate in District 3, producing a second wide table est_en_votes_by_precinct_wide_tb. (I do not adjust the election night votes for the Democratic county council candidate in District 3, but leave them as is.)

en_votes_by_precinct_wide_tb <- en_votes_by_precinct_tb %>%
  mutate(`Office by Party` = paste(`Office Name`, "-", `Party`, sep = "")) %>%
  select(Precinct, `Office by Party`, `Election Night Votes`) %>%
  pivot_wider(names_from = `Office by Party`,
              values_from = `Election Night Votes`,
              values_fill = 0)

est_en_votes_by_precinct_wide_tb <- en_votes_by_precinct_wide_tb %>%
  mutate(`County Council-REP` = ifelse(`County Council-REP` == 0,
                                       round(rep_cc_vs_ce_val * `County Executive-REP`, 0),
                                       `County Council-REP`))

This wide format is also the format required as input to the Auto-Redistrict software, so I use it for the remaining steps.

Accounting for early voting and absentee voting

What I want for redistricting purposes is party vote share per precinct for the county council races, i.e., how many people voted for the Democratic candidate for county council in a given precinct vs. how many people voted for the Republican candidate for county council in that same precinct.

I started with precinct-level results for the county council and county executive races, but only for votes cast on election day, and not including results for a Republican county council candidate in District 3, since no one ran. I then estimated what a hypothetical Republican candidate in District 3 would have polled in each precinct, using the per-precinct results for Allan Kittleman multiplied by a factor based on how Republican county council candidates in other districts fared on average relative to Kittleman.

I now have precinct-level election day vote shares for both county executive and county council for all precincts, including estimated shares for District 3. However I now have the problem that the true vote shares per precinct may differ from those based on election day results, since they don’t take into account the affects of early voting and voting by mail (i.e., using absentee ballots).

ce_en_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Office Name` == "County Executive") %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`)) %>%
  as.numeric()

dem_ce_en_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Candidate Name` == "Calvin Ball") %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`)) %>%
  as.numeric()

rep_ce_en_votes_val <- all_votes_by_candidate_tb %>%
  filter(`Candidate Name` == "Allan H. Kittleman") %>%
  summarize(`Election Night Votes` = sum(`Election Night Votes`)) %>%
  as.numeric()

dem_ce_all_vs_en_val = dem_ce_all_votes_val / dem_ce_en_votes_val

rep_ce_all_vs_en_val = rep_ce_all_votes_val / rep_ce_en_votes_val

Accounting for this is important because those voting for Democratic candidates have different voting habits than those voting for Republican candidates. To take but one example, if only election day votes mattered then Allan Kittleman would be county executive today, since on election day he received 44,065 votes compared to 43,923 votes for Calvin Ball. However Ball did much better in early voting, and ended up winning 75,566 to 67,457. Put another way, Ball’s total vote was 72% greater than his election day vote total, while Allan Kittleman’s total vote was only 53% greater than his election day result.

Thus I could estimate precinct-level results for Calvin Ball (including early and absentee votes) by multiplying his election day precinct-level results by 1.72, the factor by which his total vote exceeded his election day vote. Similarly I could estimate precinct-level results for Allan Kittleman (including early and absentee votes) by multiplying his election day precinct-level results by 1.53, the (smaller) factor by which his total vote exceeded his election day vote.

But, one might ask, since I have precinct-level numbers for turnout by party, including early voting and absentee ballots by party, why don’t I compute a multiplicative factor for each precinct individually? The short answer is that the turnout statistics are for registered voters, so such an approach would not account for registered Democrats crossing party lines to vote Republican (or vice versa), nor for independent voters. Using county-wide factors is thus my preferred approach.

What about estimating per-precinct vote shares for county council races (which are what I really want)? I can look at the difference between election day votes and all votes for the two parties’ county council candidates, again ignoring District 3. I’ve already calculated the vote totals for both cases, so I just need to compute the ratio of all votes to election day votes for each party.

dem_cc1245_all_vs_en_val = dem_cc1245_all_votes_val / dem_cc1245_en_votes_val

rep_cc1245_all_vs_en_val = rep_cc1245_all_votes_val / rep_cc1245_en_votes_val

The situation is similar to that for the county executive race: the combined total vote for all Democratic county council candidates outside District 3 was 79% greater than their combined vote on election day, while the combined total vote for all Republican county council candidates outside District 3 was only 49% greater than their combined election day result.

I can now calculate estimated precinct-level overall vote shares for the two parties, taking the wide table est_en_votes_by_precinct_wide_tb with election night vote shares (estimated for the county council race in District 3) and multiplying the precinct-level votes for each race and party by the multiplicative factors calculated above, to create a new wide table est_en_votes_by_precinct_wide_tb.

est_all_votes_by_precinct_wide_tb <- est_en_votes_by_precinct_wide_tb %>%
  mutate(`County Executive-REP` = round(rep_ce_all_vs_en_val * `County Executive-REP`, 0),
         `County Executive-DEM` = round(dem_ce_all_vs_en_val * `County Executive-DEM`, 0),
         `County Council-REP` = round(rep_cc1245_all_vs_en_val * `County Council-REP`, 0),
         `County Council-DEM` = round(dem_cc1245_all_vs_en_val * `County Council-DEM`, 0))

We can test the consistency of the estimated vote shares by totaling the county executive votes for all precincts. These should match the county-wide vote totals from the official results.

est_all_votes_wide_tb <- est_all_votes_by_precinct_wide_tb %>%
  summarize(across(`County Executive-REP`:`County Council-DEM`, sum))

est_all_votes_wide_tb

## # A tibble: 1 × 4
##   `County Executive-REP` `County Executive-… `County Council-… `County Council-…
##                    <dbl>               <dbl>             <dbl>             <dbl>
## 1                  67458               75563             51235             95918

Compare the estimated vote totals for Calvin Ball and Allan Kittleman to their actual vote totals of 75,566 and 67,457 respectively.

Creating the input file to the redistricting software

I now have all the data I need to create an input file for the AutoRedistrict redirecting software.

I join the geospatial data table precinct_sf together with the data tables pop_by_precinct_tb and est_all_votes_by_precinct_wide_tb using their common field Precinct to create a new geospatial data table containing precinct boundaries, (adjusted) populations (broken down by race and ethnicity), and estimated 2018 per-precinct votes for Democratic and Republican county council candidates (including a hypothetical Republican candidate in District 3).

redistricting_input_sf <- precinct_sf %>%
  inner_join(est_all_votes_by_precinct_wide_tb, by = "Precinct") %>%
  inner_join(pop_by_precinct_tb, by = "Precinct")

(Although I will only be using the results for county council races, I keep the estimate precinct-level results for county executive in case I want to do any analyses on that data.)

Finally, I write the resulting geospatial data out to disk, to be used as input to the AutoRedistrict application.

st_write(redistricting_input_sf, "redistricting-input.shp", append = FALSE)

## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ESRI
## Shapefile driver

## Deleting layer `redistricting-input' using driver `ESRI Shapefile'
## Writing layer `redistricting-input' to data source 
##   `redistricting-input.shp' using driver `ESRI Shapefile'
## Writing 118 features with 12 fields and geometry type Polygon.

Appendix

Caveats

See the discussion above about noise in census block-level data, missing data for the 2018 general election county council race in District 3, and the lack of precinct-level results for early voting and voting by mail.

The election data used omits votes cast by third parties or for write-in candidates. In practice the number of these votes is very small, and should not affect the redistricting results.

References

Data derived from the 2020 census and used for Maryland redistricting is available from the “2020 Redistricting Data for Maryland” page published by the Maryland Citizens Redistricting Commission. I used the CSV format data because I didn’t need the block-level geospatial data, just the population data.

This data is adjusted according to Maryland law (which for redistricting purposes treats prisoners as being counted at their place of residence prior to incarceration). It’s unclear how this adjustment interacts with the adjustment that the US Census Bureau applied to census block data in an attempt to protect people’s privacy.

Precinct-level results for the Howard County 2018 general election are available from the “Data Files for the 2018 Gubernatorial Election Results” page published by the Maryland Board of Elections. This analysis uses the precinct results for Howard County.

As noted above, these results are for election day only, and do not include results from early voting or absentee ballots.

Information on the size and structure of county and city councils and boards of commissioners is from the Maryland Manual On-Line pages for county councils and boards of county commissioners respectively.

Suggestions for future work

There may be better ways to generate suitable election data, or to incorporate additional data.

Environment

I used the following R environment in doing the analysis above:

sessionInfo()

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] knitr_1.33        scales_1.1.1      gghighlight_0.3.2 sf_1.0-2         
##  [5] forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7       purrr_0.3.4      
##  [9] readr_2.0.1       tidyr_1.1.3       tibble_3.1.4      ggplot2_3.3.5    
## [13] tidyverse_1.3.1  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7         lubridate_1.7.10   class_7.3-19       assertthat_0.2.1  
##  [5] digest_0.6.27      utf8_1.2.2         R6_2.5.1           cellranger_1.1.0  
##  [9] backports_1.2.1    reprex_2.0.1       evaluate_0.14      e1071_1.7-8       
## [13] highr_0.9          httr_1.4.2         pillar_1.6.2       rlang_0.4.11      
## [17] readxl_1.3.1       rstudioapi_0.13    jquerylib_0.1.4    rmarkdown_2.10    
## [21] bit_4.0.4          munsell_0.5.0      proxy_0.4-26       broom_0.7.9       
## [25] compiler_4.1.1     modelr_0.1.8       xfun_0.25          pkgconfig_2.0.3   
## [29] htmltools_0.5.2    tidyselect_1.1.1   fansi_0.5.0        crayon_1.4.1      
## [33] tzdb_0.1.2         dbplyr_2.1.1       withr_2.4.2        grid_4.1.1        
## [37] jsonlite_1.7.2     gtable_0.3.0       lifecycle_1.0.0    DBI_1.1.1         
## [41] magrittr_2.0.1     units_0.7-2        KernSmooth_2.23-20 vroom_1.5.4       
## [45] cli_3.0.1          stringi_1.7.4      farver_2.1.0       fs_1.5.0          
## [49] xml2_1.3.2         bslib_0.3.0        ellipsis_0.3.2     generics_0.1.0    
## [53] vctrs_0.3.8        bit64_4.0.5        glue_1.4.2         hms_1.1.0         
## [57] parallel_4.1.1     fastmap_1.1.0      yaml_2.2.1         colorspace_2.0-2  
## [61] classInt_0.4-3     rvest_1.0.1        haven_2.4.3        sass_0.4.0

Source code

You can find the source code for this analysis and others at my hocodata public code repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.