This project seeks to explore the similarities and differences of state election systems, and offer descriptive and visual analysis of the data at hand. Utilizing the confidential voter files of U.S. states, I attempt to analyze the different ways in which race, gender, birth identification, voter status, and party identification variables are classified by a sample of states.


Talk of a nation-wide, standardized election system has ensued for decades. For example, the factors that contributed to the disenfranchisement of thousands of Florida registered voters following the 2000 general election debacle stemmed from the inept actions of local election officials, inadequate databases, and a lack of uniform, statewide systems among other things. Many commissions, organizations, and policy proposals have focused on this latter issue and have called for the enactment of standardized measurements on all 50 states and their different election processes. Ultimately, the need for standards and election uniformity champions a more democratic, efficient, and pervasive system.

This project stems from a larger project overseen by researchers at the University in Florida (UF) in conjunction with VoteShield.


Confidential state voter files and vote history files are gathered via communication with state election officials. .yaml files are then created to efficiently label the different headers. These headers are standardized across state files based off researcher discretion. To demonstrate, the first header “Accession_Number” given by Alaska embodies the same concept in the classification “Voter_ID” (the unique identifier given to each registered voter). Thus, “Accession_Number” is coded instead as “Voter_ID” in Alaska to match the other sampled, standardized states. Below is the first few lines of Washington state’s .yaml file for example.

.yaml Example

.yaml Example

Given that this is an ongoing project, only 20 states out of 50 (plus the District of Columbia) have supplied the requisite materials as of November 11, 2019. The following states analyzed are mapped and graphed within the discussion section using a shapefile created by the @urban_institute [https://medium.com/@urban_institute/how-to-create-state-and-county-maps-easily-in-r-577d29300bb2]

Lastly, the five classifications this project is most concerned with are the “Race”, “Gender”, birth identifier (“Birth_Date”, “Age”, etc.), “Voter_Status”, and party identifiers (“Party_Affiliation”, “Last_Party_Voted”, etc.) as codified by each state.

Getting Started

Before kickoff, install and load in the following packages. The Urban’s own package (“urbnmapr”) is necessary for the U.S. Choropleth shapefile. Choropleth maps shade geographical spaces based on subjective data values. To use the package, devtools must be installed (and the package first installed from Github).

r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org"
options(repos = r)
install.packages(c("readxl", "ggplot2","tidyverse","choroplethrMaps","devtools","urbnmapr","dplyr","yaml","kable", "knitr"))



We can easily manipulate the shapefile using ggplot2. This example will be using Urban’s Median Household datafile, specifically creating a heatmap focusing on the median household income within each county in the United States. You can see below their original county-wide U.S. map.

ggplot() + 
  geom_polygon(data = urbnmapr::states, mapping = aes(x = long, y = lat, group = group),
               fill = "grey", color = "white") +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45)
ggplot(state.map, aes(long, lat, group=group)) + geom_polygon()
household_data <- left_join(countydata, counties, by = "county_fips") 
household_data %>%
  ggplot(aes(long, lat, group = group, fill = medhhincome)) +
  geom_polygon(color = NA) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  labs(fill = "Median Household Income")