Natural disasters can uproot peoples’ entire lives in a matter of days. It goes without saying that such untimely acts of God can create a multitude of challenges for election administrators in affected areas. Do these events affect voter turnout? Are certain areas less equipped to handle these disasters? What can be done to mitigate the adverse effects? In this paper, we will explore these questions and others as we examine the effects of Hurricane Matthew and how it potentially affected voter turnout in the 2016 general election specifically in Georgia.
For the sake of this analysis, I decided to examine the state of Georgia by its counties. All the source data used in my observations also split Georgia by county, so this serves as the best, most compact way to study the location of voters. The goal is to determine which areas were most affected and then analyze turnout, demographics, and other information that could reveal interesting findings. To determine such areas, I found a map of Georgia, also split by counties, published by the Federal Emergency Management Agency (FEMA)1 Georgia Hurricane Matthew (DR-4284) https://www.fema.gov/disaster/4284.
This map indicates which counties required different levels of assistance. Clearly, the southeastern coastal areas were designated to receive both individual and public assistance. As such, I hypothesized that these areas are more likely to have faced lower turnout than the rather high state average of 76.53%2 Georgia Statewide General Election 2016 Results http://results.enr.clarityelections.com/GA/63991/184321/en/summary.html.
Furthermore, I hypothesized that these counties likely had higher rates of early voting than the state did overall. It is well-documented that states experiencing hurricanes often see spikes in early voting since voters might be displaced and early voting does not restrict voters by precinct.
In my observations, I used a recent copy of the Georgia Daily Voter File, as provided by Professor McDonald, along with a publicly available copy of the Georgia Voter History file found online3 Georgia Voter History http://elections.sos.ga.gov/Elections/voterhistory.do.
Before we begin, we must install and load in all the following packages.
install.packages(c("tidvyverse","dplyr", "ggplot2", "RColorBrewer", "devtools", "dplyr", "stringr", "maps", "mapdata"))
library(tidyverse)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(ggmap)
library(maps)
library(mapdata)
Now we must read in the data that we will be observing.
Georgia.County.Codes <- read_csv("georgia_county_codes.txt", col_names = c("state", "state FP", "COUNTY_CODE", "subregion", "Class FP"))
Georgia.Voter.History <- read.table("Georgia 2016.TXT", colClasses = c(rep("character")), sep = " ", fill = TRUE)
Georgia.Voter.File <- read.table("Georgia_Daily_VoterBase.txt", colClasses = c("character", "character", rep("NULL", 12), "character", "character", "character", "character", rep("NULL", 29), "character", "character", rep("NULL", 3)), sep = "|", header = TRUE, fill = TRUE, quote = '', na.strings = 'NULL')
Georgia.Voter.File.Copy <- Georgia.Voter.File
Georgia.Voter.File.Copy2 <- Georgia.Voter.File
Note that copies were made of the voter file to maximize efficiency in case we need to use the unedited original file again.
Since the voter file we are using is from August 2018, we must filter out individuals who have registered since the 2016 general election voter registration deadline.
Georgia.Voter.File.Copy$REGISTRATION_DATE <- as.numeric(Georgia.Voter.File.Copy$REGISTRATION_DATE)
Georgia.Voter.File.Copy <- filter(Georgia.Voter.File.Copy, REGISTRATION_DATE < 20161011)
Next, we create a tibble for the voter history file and organize the columns in a more tidy fashion.
History.tibble <- as_tibble(Georgia.Voter.History)
History.tibble$V2 = NULL
History.tibble <- transform(History.tibble, county = substr(V1,1,3),
REGISTRATION_NUMBER = substr(V1,4,11),
elect.date = substr(V1,12,19),
elect.type = substr(V1,20,22),
party = substr(V1,23,24),
absentee = substr(V3,1,1),
provisional = substr(V3,2,2),
supplemental = substr(V3,3,3))
History.tibble$V1 = NULL
History.tibble$V3 = NULL
Lastly, we filter the voter history tibble to only include voters from the 2016 general election, since the source file includes all elections from 2016.
History.tibble <- filter(History.tibble, History.tibble$elect.date == "20161108" & History.tibble$elect.type == "003")
We can now create our denominator of eligible voters for the 2016 general election.
Voters_By_County_Denominator <- count(Georgia.Voter.File.Copy, COUNTY_CODE)
Before we can proceed, we must merge the voter file with the voter history file (by registration number) to obtain a data frame of all 2016 general election voters in Georgia.
Merge <- merge(Georgia.Voter.File.Copy2, History.tibble, by = "REGISTRATION_NUMBER")
Now we can calculate our numerator of 2016 general election voters and analyze the data in a variety of ways.
Voters_By_County_Numerator <- count(Merge, COUNTY_CODE)
Voters_By_County <- data.frame(Georgia.County.Codes$COUNTY_CODE, Voters_By_County_Numerator[,2], Voters_By_County_Denominator[,2])
colnames(Voters_By_County) <- c("subregion", "Voters_By_County_Numerator", "Voters_By_County_Denominator")
To visualize Georgia’s voter turnout in the 2016 general election, we will create a geo-map of the state split by its counties, outlining in bold the group of counties that were designated to receive individual and public assistance.
states <- map_data("state")
GA_df <- subset(states, region == "georgia")
counties <- map_data("county")
GA_county <- subset(counties, region == "georgia")
FEMA <- c("bulloch", "bryan", "chatham", "effingham", "evans", "glynn", "liberty", "long", "mcintosh", "wayne")
FEMA_county <- subset(GA_county, subregion %in% FEMA)
GA_base <- ggplot(data = GA_df, mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "gray")
Voters_By_County$subregion <- Georgia.County.Codes$subregion[match(Voters_By_County$subregion, Georgia.County.Codes$COUNTY_CODE)]
Voters_By_County_Numerator <- as.numeric(Voters_By_County$Voters_By_County_Numerator)
Voters_By_County_Denominator <- as.numeric(Voters_By_County$Voters_By_County_Denominator)
Voters_By_County$Turnout <- Voters_By_County_Numerator / Voters_By_County_Denominator
County.Data <- inner_join(GA_county, Voters_By_County, by = "subregion")
remove.axes <- theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.title = element_blank()
)
GA_base +
geom_polygon(data = County.Data, aes(fill = Turnout), color = "white") +
geom_polygon(color = "black", fill = NA) +
scale_fill_distiller(palette = "OrRd") +
ggtitle("2016 Georgia Voter Turnout by County") +
theme_bw() +
geom_polygon(color = "black", fill = NA) +
geom_polygon(data = FEMA_county, fill = NA, color = "black") +
remove.axes
Perhaps shockingly, only Liberty County seems to have experienced significantly lower turnout in the subsetted group of counties affected by Hurricane Matthew (outlined in black). As such, we will conduct demographical analysis exclusively on Liberty County and compare our findings to the rest of the state.
First, let’s dive into the demographical data for Liberty County as made available by the U.S. Census Bureau4 QuickFacts Liberty County, Georgia https://www.census.gov/quickfacts/libertycountygeorgia. We will compare this data to that of the entire state of Georgia5 QuickFacts Georgia https://www.census.gov/quickfacts/ga.
Table 1. U.S. Census Bureau Proportions as of July 1, 2017.
| 18 & Under | 65+ | Female | White | Black | Hispanic | |
|---|---|---|---|---|---|---|
| Georgia | 24.1 | 13.5 | 51.3 | 52.8 | 32.2 | 9.6 |
| Liberty | 28.1 | 8.8 | 49.3 | 38.7 | 44.1 | 12.8 |
Obviously, citizens under 18 years of age are not eligible to vote. However, these are the only categories available for us to get an idea of the age distribution for our observations.
With that in mind, it appears that Liberty County has a relatively younger and more racially diverse population than the state as a whole. Liberty County is also significantly less female than the rest of the state.
Now, let’s start comparing turnout between Liberty County and the state of Georgia based on different demographics.
Before we can create graphs for Liberty County voters, we must create a couple new data frames.
Liberty <- filter(Merge, Merge$COUNTY_CODE == "089")
Liberty_Total <- filter(Georgia.Voter.File.Copy, Georgia.Voter.File.Copy$COUNTY_CODE == "089")
Now we are ready to visualize voter turnout in Liberty County and in Georgia based on race.
Merge$RACE[which(Merge$RACE == "")] <- "U"
Liberty$RACE[which(Liberty$RACE == "")] <- "U"
Total_Race <- count(Georgia.Voter.File.Copy, Georgia.Voter.File.Copy$RACE)
Liberty_Total_Race <- count(Liberty_Total, Liberty_Total$RACE)
Merge.Copy <- Merge %>%
group_by(RACE) %>%
summarise(count=n()) %>%
mutate(perc=count/Total_Race$n)
Liberty.Copy <- Liberty %>%
group_by(RACE) %>%
summarise(count=n()) %>%
mutate(perc=count/Liberty_Total_Race$n)
ggplot(data = Merge.Copy, mapping = aes(x = (factor(RACE)), y = perc, fill = "red")) +
geom_bar(stat = "identity") +
ggtitle(" 2016 Georgia Turnout by Race") +
labs(x = "Race", y = "Turnout") +
ylim(0, 1) +
scale_x_discrete(breaks = c("WH", "BH", "HP", "AI", "AP", "OT", "U"),
labels = c("White", "Black", "Hispanic", "Native American", "Asian", "Other", "Unknown")) +
theme(legend.position = "none")
ggplot(data = Liberty.Copy, mapping = aes(x = (factor(RACE)), y = perc, fill = "red")) +
geom_bar(stat = "identity") +
ggtitle(" 2016 Liberty Turnout by Race") +
labs(x = "Race", y = "Turnout") +
ylim(0, 1) +
scale_x_discrete(breaks = c("WH", "BH", "HP", "AI", "AP", "OT", "U"),
labels = c("White", "Black", "Hispanic", "Native American", "Asian", "Other", "Unknown")) +
theme(legend.position = "none")
Based on these graphs, we can see that Liberty County’s dropoff in turnout percentages from the state averages is relatively evenly spread out between the different races. Most significantly, turnout percentage amongst Hispanic voters appears to decrease the most (about 20%) relative to the state average while black turnout percentage appears to decrease the least (about 10%). Turnout percentage amongst white and Asian voters in Liberty County decreases by a fairly wide margin as well.
Here we turn our attention to turnout amongst different genders. Let’s visualize this data in Liberty County and the state of Georgia as a whole.
Merge$GENDER[which(Merge$GENDER == "")] <- "U"
Liberty$GENDER[which(Liberty$GENDER == "")] <- "U"
Merge_Subset <- subset(Merge, Merge$GENDER %in% c("F", "M", "O"))
Liberty_Subset <- subset(Liberty, Liberty$GENDER %in% c("F", "M", "O"))
Total_Gender <- count(Georgia.Voter.File.Copy, Georgia.Voter.File.Copy$GENDER)
Liberty_Total_Gender <- count(Liberty_Total, Liberty_Total$GENDER)
Merge.Copy <- Merge_Subset %>%
group_by(GENDER) %>%
summarise(count=n()) %>%
mutate(perc=count/Total_Gender$n)
Liberty.Copy <- Liberty_Subset %>%
group_by(GENDER) %>%
summarise(count=n()) %>%
mutate(perc=count/Liberty_Total_Gender$n)
ggplot(data = Merge.Copy, mapping = aes(x = (factor(GENDER)), y = perc, fill = "red")) +
geom_bar(stat = "identity") +
ggtitle(" 2016 Georgia Turnout by Gender") +
labs(x = "Gender", y = "Turnout") +
ylim(0, 1) +
scale_x_discrete(breaks = c("F", "M", "O", "U"),
labels = c("Female", "Male", "Other", "Unknown")) +
theme(legend.position = "none")
ggplot(data = Liberty.Copy, mapping = aes(x = (factor(GENDER)), y = perc, fill = "red")) +
geom_bar(stat = "identity") +
ggtitle(" 2016 Liberty Turnout by Gender") +
labs(x = "Gender", y = "Turnout") +
ylim(0, 1) +
scale_x_discrete(breaks = c("F", "M", "O", "U"),
labels = c("Female", "Male", "Other", "Unknown")) +
theme(legend.position = "none")
Nearly 20% fewer females turned out in Liberty County compared to the remainder of the state. For men, there was a slightly smaller difference in turnout percentage (about 15%). Interestingly, although obviously not very significant, a higher proportion of those identifying as a gender other than male or female turned out in Liberty County compared to the rest of Georgia, bucking the overall trend.
Next, let’s analyze whether turnout varied significantly between different age groups.
Merge$Age <- (2016 - as.numeric(Merge$BIRTHDATE))
Liberty$Age <- (2016 - as.numeric(Liberty$BIRTHDATE))
Liberty_Total$Age <- (2016 - as.numeric(Liberty_Total$BIRTHDATE))
Georgia.Voter.File.Copy$Age <- (2016 - as.numeric(Georgia.Voter.File.Copy$BIRTHDATE))
Merge$age_range <- "<30"
Merge$age_range[which(Merge$Age>=30)] <- "30-45"
Merge$age_range[which(Merge$Age>45)] <- "45+"
Liberty$age_range <- "<30"
Liberty$age_range[which(Liberty$Age>=30)] <- "30-45"
Liberty$age_range[which(Liberty$Age>45)] <- "45+"
Georgia.Voter.File.Copy$age_range <- "<30"
Georgia.Voter.File.Copy$age_range[which(Georgia.Voter.File.Copy$Age>=30)] <- "30-45"
Georgia.Voter.File.Copy$age_range[which(Georgia.Voter.File.Copy$Age>45)] <- "45+"
Liberty_Total$age_range <- "<30"
Liberty_Total$age_range[which(Liberty_Total$Age>=30)] <- "30-45"
Liberty_Total$age_range[which(Liberty_Total$Age>45)] <- "45+"
Total_Age <- count(Georgia.Voter.File.Copy, Georgia.Voter.File.Copy$age_range)
Liberty_Total_Age <- count(Liberty_Total, Liberty_Total$age_range)
Merge.Copy <- Merge %>%
group_by(age_range) %>%
summarise(count=n()) %>%
mutate(perc=count/Total_Age$n)
Liberty.Copy <- Liberty %>%
group_by(age_range) %>%
summarise(count=n()) %>%
mutate(perc=count/Liberty_Total_Age$n)
ggplot(data = Merge.Copy, mapping = aes(x = factor(age_range), y = perc, fill = "red")) +
geom_bar(stat = "identity") +
ggtitle(" 2016 Georgia Turnout by Age") +
labs(x = "Age", y = "Turnout") +
ylim(0, 1) +
theme(legend.position = "none")
ggplot(data = Liberty.Copy, mapping = aes(x = factor(age_range), y = perc, fill = "red")) +
geom_bar(stat = "identity") +
ggtitle(" 2016 Liberty Turnout by Age") +
labs(x = "Age", y = "Turnout") +
ylim(0, 1) +
theme(legend.position = "none")
Young voters between 18 and 30 years of age turned out in substantially fewer numbers (about 25% lower) in Liberty County than they did in the state overall. This is quite revealing, as we saw in the Census statistics that Liberty County skews fairly younger than the rest of the state.
Turnout percentage was similarly lower amongst 30 to 45 year olds in Liberty County, as approximately 20% fewer people voted in that age group.
For individuals aged 65 and older, turnout percentage barely dropped off at all. Only 5-7% fewer people in that group turned out in Liberty County compared to the state as a whole.
Finally, let’s visualize how people in Liberty County voted versus how people voted in the rest of the state.
HurricaneCounties <- subset(Merge, Merge$COUNTY_CODE %in% c("015", "016", "025", "051", "054", "063", "089", "091", "098", "151"))
Merge_absentee_summary <- Merge %>%
group_by(absentee) %>%
summarise(Percent = n()/nrow(.) * 100)
Liberty_absentee_summary <- Liberty %>%
group_by(absentee) %>%
summarise(Percent = n()/nrow(.) * 100)
HurricaneCounties_absentee_summary <- HurricaneCounties %>%
group_by(absentee) %>%
summarise(Percent = n()/nrow(.) * 100)
ggplot(data = Merge_absentee_summary, mapping = aes(x = "", y = Percent, fill = absentee)) +
geom_bar(width = 1, stat = "identity") +
ggtitle(" 2016 Georgia Voters by Method of Voting") +
labs(x = "", y = "") +
scale_y_continuous(breaks = round(cumsum(rev(Merge_absentee_summary$Percent)), 1)) +
scale_fill_discrete(name = "Early/Absentee/Mail",
breaks = c("N", "Y"),
labels = c("No", "Yes")) +
coord_polar("y", start = 0)
ggplot(data = Liberty_absentee_summary, mapping = aes(x = "", y = Percent, fill = absentee)) +
geom_bar(width = 1, stat = "identity") +
ggtitle(" 2016 Liberty Voters by Method of Voting") +
labs(x = "", y = "") +
scale_y_continuous(breaks = round(cumsum(rev(Liberty_absentee_summary$Percent)), 1)) +
scale_fill_discrete(name = "Early/Absentee/Mail",
breaks = c("N", "Y"),
labels = c("No", "Yes")) +
coord_polar("y", start = 0)
ggplot(data = HurricaneCounties_absentee_summary, mapping = aes(x = "", y = Percent, fill = absentee)) +
geom_bar(width = 1, stat = "identity") +
ggtitle(" 2016 FEMA Designated Voters by Method of Voting") +
labs(x = "", y = "") +
scale_y_continuous(breaks = round(cumsum(rev(HurricaneCounties_absentee_summary$Percent)), 1)) +
scale_fill_discrete(name = "Early/Absentee/Mail",
breaks = c("N", "Y"),
labels = c("No", "Yes")) +
coord_polar("y", start = 0)
Unsurprisingly, a slightly larger percentage of voters in Liberty County cast their ballots early, in some fashion, than did voters in the state overall. I would have expected there to be a bit larger of a discrepancy in these numbers, but it seems that voters in Liberty County did not need to vote early in substantially higher numbers than usual after experiencing the effects of Hurricane Matthew.
When only considering voters in the counties designated for public and individual assistance according to the FEMA map, we observe a much more interesting phenomenon: a significantly lower percentage of voters cast their ballots early than did the state as a whole. Further study into different areas affected by natural disasters would be helpful to determine whether this is a fluke or indeed an indication of a larger trend that contradicts my hypothesis.
I faced a variety of challenges while compiling this data into a presentable final product. As an R novice, there was a rather steep learning curve to understanding the intricacies of coding in the proper formats, especially with regards to reading in data. I encountered a variety of errors and warning messages that interfered with my progress throughout this semester, and sometimes it took quite a while to figure out what exactly needed to be done to proceed.
An example of a rather tricky problem was figuring out the correct way to merge the Georgia voter file with the Georgia voter history to create my numerator of total voters in the state. After researching different methods, I tried the following code.
Georgia.Voter.File.Copy2$REGISTRATION_NUMBER <- History.tibble$reg.number[match(History.tibble$reg.number, Georgia.Voter.File.Copy2$REGISTRATION_NUMBER)]
Upon running this code, I received the following error message:
Error in $<-.data.frame(*tmp*, REGISTRATION_NUMBER, value = c(3955143L, : replacement has 4165157 rows, data has 6776829
The idea here was to check all the voter registration numbers in the voter file and keep only those numbers that also appeared in the voter history file. This code, however, is not able to handle null replacement values, as there must exist some value in each of the original cells/rows.
I was quite confused at how to work around this error until I came across the merge function, which creates a new data frame only with the desired matched values. I ran the following line of code, also shown previously, which did the trick.
Merge <- merge(Georgia.Voter.File.Copy2, History.tibble, by = "REGISTRATION_NUMBER")
With this newly constructed merged data set, I was able to proceed making great use of it in the remainder of my work.
I would also like to note a choice I made that might have affected the results of my research. Before merging the two data sets, I did not filter out inactive voters from the Georgia voter file because I did not want to assume that such individuals were inactive two years ago and leave out potential matches.
I also think it is worth noting that my aggregate voter turnout figure for the state of Georgia, 74.94455% (computed below), is a bit lower than the official rate of 76.53% posted by the Secretary of State6 Georgia Election Results http://results.enr.clarityelections.com/GA/63991/184321/en/summary.html.
I hypothesize that this can be attributed to the fact that a fair number of voters from the 2016 election could have very well been purged from the rolls, thus causing them not to appear in my 2018 voter file. Former Secretary of State Brian Kemp was well-known for his aggressive purging practices, and it has been reported that over half a million people were removed from the voter rolls in July 20177 6 Takeaways From Georgia’s ‘Use It Or Lose It’ Voter Purge Investigation https://www.npr.org/2018/10/22/659591998/6-takeaways-from-georgias-use-it-or-lose-it-voter-purge-investigation.
After conducting a great deal of analysis on voter turnout in the 2016 general election across the state of Georgia, most of my findings were a bit surprising, to say the least.
Both of my hypotheses were shown to most likely be incorrect, with the only exception being within Liberty County. Since my work strictly utilized observational data, I cannot make any conclusions regarding causation. As such, there might be a multitude of confounding variables that contributed to lower turnout in Liberty County compared to the remainder of the state.
Reverting back to the Census data utilized previously, I notice that Liberty County is poorer than is the rest of the state. Median household income between 2012 and 2016 in Liberty County was $42,484, which seems significantly lower than the state’s median household income of $51,037 during that time. Liberty County’s poverty rate is also a bit higher at 15.3% compared to Georgia’s 14.9%.
If the hurricane did in fact affect voter turnout in Liberty County, it is worth noting these numbers since it is reasonable to infer that the Supervisor of Elections in this county might not have had sufficient resources to tackle the various obstacles that arise with sudden natural disasters like Hurricane Matthew.
Also, if other areas affected by natural disasters have experienced surges in early voting, it might be wise for Supervisors in the counties most affected by the storm to invest more into early voting for future elections.
I am eager to see the findings of my classmates’ research into how Hurricane Matthew affected voter turnout in Florida and North Carolina, and I hope to continue studying the effects of natural disasters on voter turnout in the future. As Hurricane Michael recently swept through the Florida Panhandle, it would be wise to now examine the potential effects that storm had on turnout in that specific area for the 2018 general election. It is possible that since Hurricane Michael was a stronger storm that more directly hit that area, it might be a better case study to analyze.