Voter Registration Data

Author

Sami Engel

Introduction

Based on a UNC study in 2023 (Found HERE), the most Republican counties in North Carolina were Mitchell, Avery, and Yadkin counties. These counties were concentrated in western North Carolina. The UNC study (found HERE) also found that the most Democratic counties in North Carolina in 2023 were Hertford, Bertie, and Northampton counties, all of which were concentrated in northeast North Carolina. 

With this understanding, I wanted to analyze voter weekly voter registration data (found HERE) in each of these counties to see how voter registration rates, both overall and within particular demographics, changed alongside major events that occurred throughout the 2024 presidential election. The dates I focused on were:

June 27: Trump-Biden Debate

July 13: Trump gets shot

July 15-18: RNC Convention

July 21: Biden drops out

August 6: Kamala Chooses Tim Walz as running mate

August 19-22: DNC Convention

September 10: Harris-Trump Debate

Hypothesis

While the voter registration rates will not necessarily show a causation, I am hoping to analyze the correlation between each of these events and the change of voter registration rates within my selected demographics, which will compare gender, race, and political affiliation. 

First, I will need to upload the data and necessary packages.

Code
library(readr)
library(dplyr)
library(ggplot2)
library(lubridate)
library(tidyverse)
Code
dem_counties <- read_csv(("~/Desktop/VoterRegistrationData/DemCounties.csv"))
rep_counties <- read_csv(("~/Desktop/VoterRegistrationData/RepCounties.csv"))

Growth of All Counties

Group by date and calculate total registration for each dataset

Code
dem_total <- dem_counties %>%
  group_by(Date) %>%
  summarise(Total_Voters = sum(Total))

rep_total <- rep_counties %>%
  group_by(Date) %>%
  summarise(Total_Voters = sum(Total))

Create a combined plot

Code
ggplot() +
  geom_line(data = dem_total, aes(x = Date, y = Total_Voters, color = "Democratic Counties"), size = 1) +
  geom_point(data = dem_total, aes(x = Date, y = Total_Voters, color = "Democratic Counties"), size = 3) +
  geom_line(data = rep_total, aes(x = Date, y = Total_Voters, color = "Republican Counties"), size = 1) +
  geom_point(data = rep_total, aes(x = Date, y = Total_Voters, color = "Republican Counties"), size = 3) +
  scale_color_manual(values = c("Democratic Counties" = "blue", "Republican Counties" = "red")) +
  labs(title = "Total Voter Registration Over Summer Months",
       x = "Date",
       y = "Total Number of Registered Voters",
       color = "County Group") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "bottom")

From an aerial view, it seems there wasn’t much change among both the conservative and liberal counties, while there was a slight increase from 8/31-9/14, which was following the DNC Convention and prior to the Harris-Trump Debate,

Let’s save the plot.

Code
ggsave("summer_voter_registration_trend.png", width = 10, height = 6)

Now, we can create a line graph for each county’s total population

Code
ggplot(rep_counties, aes(x = Date, y = Total, color = County, group = County)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  scale_color_manual(values = c("AVERY" = "blue", "MITCHELL" = "green", "YADKIN" = "red")) +
  labs(title = "Total Voter Registration for Republican Counties",
       x = "Date",
       y = "Total Number of Registered Voters",
       color = "County") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "bottom")

This graph shows that Yadkin County’s total registration count increased the most among the conservative counties.

Save the plot

Code
ggsave("republican_counties_trend.png", width = 10, height = 6)

Print out summary statistics for each county

Code
rep_counties %>%
  group_by(County) %>%
  summarise(
    Min_Total = min(Total),
    Max_Total = max(Total),
    Total_Change = max(Total) - min(Total)
  )
# A tibble: 3 × 4
  County   Min_Total Max_Total Total_Change
  <chr>        <dbl>     <dbl>        <dbl>
1 AVERY        12988     13126          138
2 MITCHELL     11303     11359           56
3 YADKIN       25119     25334          215

Growth of Republican Counties

Now, let’s create a line graph with facets for just the conservative counties to see how their total registered voters changed.

Code
ggplot(rep_counties, aes(x = Date, y = Total, group = 1)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "red", size = 3) +
  facet_wrap(~ County, scales = "free_y", ncol = 3) +
  labs(title = "Voter Registration Trends for Republican Counties",
       x = "Date",
       y = "Total Number of Registered Voters") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))

It seems that the sharpest decrease in voter registration in Avery County was following 7/27, in which Biden dropped out of the debate. The sharpest decreases for Mitchell and Yadkin County were following 9/14, which was following the Harris-Trump debate.

Save the plot

Code
ggsave("republican_counties_line_trend.png", width = 12, height = 4)

Now, let’s calculate total change for each conservative county

Code
county_changes <- rep_counties %>%
  group_by(County) %>%
  summarise(
    Start_Total = first(Total),
    End_Total = last(Total),
    Total_Change = last(Total) - first(Total),
    Percent_Change = ((last(Total) - first(Total)) / first(Total)) * 100
  )

Avery County had the largest percent change with 1.0625192.

Growth of Liberal Counties

Now, we can create a line graph for each liberal county’s total population

Code
ggplot(dem_counties, aes(x = Date, y = Total, color = County, group = County)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  scale_color_manual(values = c("BERTIE" = "blue", "HERTFORD" = "green", "NORTHAMPTON" = "red")) +
  labs(title = "Total Voter Registration for Democratic Counties",
       x = "Date",
       y = "Total Number of Registered Voters",
       color = "County") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "bottom")

There is a clear sharp decrease for Northampton County following 7/27, which was after Biden dropped out of the race.

Let’s save this plot

Code
ggsave("democratic_counties_trend.png", width = 10, height = 6)

Let’s also print out summary statistics for each county

Code
dem_counties %>%
  group_by(County) %>%
  summarise(
    Min_Total = min(Total),
    Max_Total = max(Total),
    Total_Change = max(Total) - min(Total)
  )
# A tibble: 3 × 4
  County      Min_Total Max_Total Total_Change
  <chr>           <dbl>     <dbl>        <dbl>
1 BERTIE          12639     12787          148
2 HERTFORD        13829     13971          142
3 NORTHAMPTON     12756     12913          157

This shows that Northampton’s registration count increased the most.

Create a line graph with facets for each county

Code
ggplot(dem_counties, aes(x = Date, y = Total, group = 1)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "red", size = 3) +
  facet_wrap(~ County, scales = "free_y", ncol = 3) +
  labs(title = "Voter Registration Trends for Democratic Counties",
       x = "Date",
       y = "Total Number of Registered Voters") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))

This graph highlights the sharp drop in voter registration for Northampton County following 7/27, but while Bertie and Hertford Counties sharply decreased following 9/14, there was a new peak for Northampton County.

Save the plot

Code
ggsave("democratic_counties_line_trend.png", width = 12, height = 4)

Now, let’s calculate the total change for each liberal county

Code
county_changes <- dem_counties %>%
  group_by(County) %>%
  summarise(
    Start_Total = first(Total),
    End_Total = last(Total),
    Total_Change = last(Total) - first(Total),
    Percent_Change = ((last(Total) - first(Total)) / first(Total)) * 100
  )

The largest change was with Northampton county, with a percent change of 1.121041.

The most Republican county is Mitchell, and the most Democratic County is Hertford. So let’s take a deeper dive how certain demographics changed over time in these two counties.

Demographic Voter Count in Hertford County

First, white voter count in Hertford County

Code
hertford_white <- read_csv(("~/Desktop/VoterRegistrationData/hertford_white.csv"))
Code
hertford_white$Date <- as.Date(paste0("2024-", hertford_white$Date), format = "%Y-%m/%d")

ggplot(hertford_white, aes(x = Date, y = White)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "darkblue", size = 3) +
  labs(
    title = "White Voter Registration in Hertford County",
    x = "Date",
    y = "Number of White Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Most notably, we can see a steady uphill climb once Kamala began her campaign, with a small decrease following the Harris-Trump debate.

Now, let’s find the percent change in this category

Code
hertford_white_changes <- hertford_white |> 
  summarise(
    Start_Total = first(White),
    End_Total = last(White),
    Total_Change = last(White) - first(White),
    Percent_Change = ((last(White) - first(White)) / first(White)) * 100
  )

The percent change was -0.2408257, showing an overall decrease.

Then, black voter count in Hertford county

Code
hertford_black <- read_csv(("~/Desktop/VoterRegistrationData/hertford_black.csv"))

hertford_black$Date <- as.Date(paste0("2024-", hertford_black$Date), format = "%Y-%m/%d")

ggplot(hertford_black, aes(x = Date, y = Black)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "darkblue", size = 3) +
  labs(
    title = "Black Voter Registration in Hertford County",
    x = "Date",
    y = "Number of Black Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

The sharpest climb was after the Democratic National Convention.

Now, let’s find the percent change in this category

Code
hertford_black_changes <- hertford_black |> 
  summarise(
    Start_Total = first(Black),
    End_Total = last(Black),
    Total_Change = last(Black) - first(Black),
    Percent_Change = ((last(Black) - first(Black)) / first(Black)) * 100
  )

The percent change was 0.3976383, showing a minor increase.

Now, female voter count in Hertford County

Code
hertford_female <- read_csv(("~/Desktop/VoterRegistrationData/hertford_female.csv"))

hertford_female$Date <- as.Date(paste0("2024-", hertford_female$Date), format = "%Y-%m/%d")

ggplot(hertford_female, aes(x = Date, y = Female)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "darkblue", size = 3) +
  labs(
    title = "Female Voter Registration in Hertford County",
    x = "Date",
    y = "Number of Female Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

There was a sharp increase following the Democratic National Convention, with sharp decrease following the Republican National Convention and the Harris Trump debate.

Now, let’s find the percent change in this category

Code
hertford_female_changes <- hertford_female |> 
  summarise(
    Start_Total = first(Female),
    End_Total = last(Female),
    Total_Change = last(Female) - first(Female),
    Percent_Change = ((last(Female) - first(Female)) / first(Female)) * 100
  )

The percent change was 0.2953417, showing a very minor increase.

Then, male voter count in Hertford County

Code
hertford_male <- read_csv(("~/Desktop/VoterRegistrationData/hertford_male.csv"))

hertford_male$Date <- as.Date(paste0("2024-", hertford_male$Date), format = "%Y-%m/%d")

ggplot(hertford_male, aes(x = Date, y = Male)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "darkblue", size = 3) +
  labs(
    title = "Male Voter Registration in Hertford County",
    x = "Date",
    y = "Number of Male Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

This graph shows a very steady uphill climb in male voter registration over these summer months, with one noticeable decrease following Tim Walz being chosen as Harris’s running mate.

Now, let’s find the percent change in this category

Code
hertford_male_changes <- hertford_male |> 
  summarise(
    Start_Total = first(Male),
    End_Total = last(Male),
    Total_Change = last(Male) - first(Male),
    Percent_Change = ((last(Male) - first(Male)) / first(Male)) * 100
  )

This shows quite a large percent change of 1.26183, which is a significant increase.

Now, democratic voters in Hertford County

Code
hertford_democratic <- read_csv(("~/Desktop/VoterRegistrationData/hertford_democratic.csv"))

hertford_democratic$Date <- as.Date(paste0("2024-", hertford_democratic$Date), format = "%Y-%m/%d")

ggplot(hertford_democratic, aes(x = Date, y = Democratic)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "darkblue", size = 3) +
  labs(
    title = "Democratic Voter Registration in Hertford County",
    x = "Date",
    y = "Number of Democratic Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

This graph shows a very steady decline in voter registration in Hertford County among democrats. Even though there was a brief increase following Kamala being chosen to step into Biden’s place, there was a continuous decline in voter registration until the Democratic National Convention.

Now, let’s find the percent change in this category

Code
hertford_democratic_changes <- hertford_democratic |> 
  summarise(
    Start_Total = first(Democratic),
    End_Total = last(Democratic),
    Total_Change = last(Democratic) - first(Democratic),
    Percent_Change = ((last(Democratic) - first(Democratic)) / first(Democratic)) * 100
  )

This shows a minor decrease with -0.2408257.

Now, Republican voters in Hertford County

Code
hertford_republican <- read_csv(("~/Desktop/VoterRegistrationData/hertford_republican.csv"))

hertford_republican$Date <- as.Date(paste0("2024-", hertford_republican$Date), format = "%Y-%m/%d")

ggplot(hertford_republican, aes(x = Date, y = Republican)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "darkblue", size = 3) +
  labs(
    title = "Republican Voter Registration in Hertford County",
    x = "Date",
    y = "Number of Republican Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

This shows a consistent uphill climb among Hertford County Republicans, with a slight decline following Kamala choosing Walz as her running mate.

Now, let’s find the percent change in this category

Code
hertford_republican_changes <- hertford_republican |> 
  summarise(
    Start_Total = first(Republican),
    End_Total = last(Republican),
    Total_Change = last(Republican) - first(Republican),
    Percent_Change = ((last(Republican) - first(Republican)) / first(Republican)) * 100
  )

This shows quite a significant increase with a percent change of 1.43213.

Now, let’s do the same for the most Republican county, Mitchell County

Demographic Voter Count in Mitchell County

First, white voter count in Mitchell County

Code
mitchell_white <- read_csv(("~/Desktop/VoterRegistrationData/mitchell_white.csv"))

mitchell_white$Date <- as.Date(paste0("2024-", mitchell_white$Date), format = "%Y-%m/%d")

ggplot(mitchell_white, aes(x = Date, y = White)) +
  geom_line(color = "red", size = 1) +
  geom_point(color = "darkred", size = 3) +
  labs(
    title = "White Voter Registration in Mitchell County",
    x = "Date",
    y = "Number of White Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

This shows the strongest increase following Biden dropping out of the race.

Now, let’s find the percent change in this category

Code
mitchell_white_changes <- mitchell_white|> 
  summarise(
    Start_Total = first(White),
    End_Total = last(White),
    Total_Change = last(White) - first(White),
    Percent_Change = ((last(White) - first(White)) / first(White)) * 100
  )

White residents in Mitchell County had a very minor change with 0.413997.

Then, black voter count in Mitchell County

Code
mitchell_black <- read_csv(("~/Desktop/VoterRegistrationData/mitchell_black.csv"))

mitchell_black$Date <- as.Date(paste0("2024-", mitchell_black$Date), format = "%Y-%m/%d")

ggplot(mitchell_black, aes(x = Date, y = Black)) +
  geom_line(color = "red", size = 1) +
  geom_point(color = "darkred", size = 3) +
  labs(
    title = "Black Voter Registration in Mitchell County",
    x = "Date",
    y = "Number of Black Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Code
#now, let's find the percent change in this category
mitchell_black_changes <- mitchell_black |> 
  summarise(
    Start_Total = first(Black),
    End_Total = last(Black),
    Total_Change = last(Black) - first(Black),
    Percent_Change = ((last(Black) - first(Black)) / first(Black)) * 100
  )

This graph is interesting because the total count only changed by two voters, with the strongest increase following the Democratic National Convention.

Now, let’s find the percent change in this category

Code
mitchell_black_changes <- mitchell_black |> 
  summarise(
    Start_Total = first(Black),
    End_Total = last(Black),
    Total_Change = last(Black) - first(Black),
    Percent_Change = ((last(Black) - first(Black)) / first(Black)) * 100
  )

While actually a small increase, the percent change was 2.325581.

Now, female voter count in Mitchell County

Code
mitchell_female <- read_csv(("~/Desktop/VoterRegistrationData/mitchell_female.csv"))

mitchell_female$Date <- as.Date(paste0("2024-", mitchell_female$Date), format = "%Y-%m/%d")

ggplot(mitchell_female, aes(x = Date, y = Female)) +
  geom_line(color = "red", size = 1) +
  geom_point(color = "darkred", size = 3) +
  labs(
    title = "Female Voter Registration in Mitchell County",
    x = "Date",
    y = "Number of Female Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

This shows a steady incline following Biden dropping out and Kamala joining the race.

Now, let’s find the percent change in this category

Code
mitchell_female_changes <- mitchell_female |> 
  summarise(
    Start_Total = first(Female),
    End_Total = last(Female),
    Total_Change = last(Female) - first(Female),
    Percent_Change = ((last(Female) - first(Female)) / first(Female)) * 100
  )

There was a very minor increase with a percent change of 0.2953417.

Then, male voter count in Mitchell County

Code
mitchell_male <- read_csv(("~/Desktop/VoterRegistrationData/mitchell_male.csv"))

mitchell_male$Date <- as.Date(paste0("2024-", mitchell_male$Date), format = "%Y-%m/%d")

ggplot(mitchell_male, aes(x = Date, y = Male)) +
  geom_line(color = "red", size = 1) +
  geom_point(color = "darkred", size = 3) +
  labs(
    title = "Male Voter Registration in Mitchell County",
    x = "Date",
    y = "Number of Male Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

The largest decrease was following Biden dropping out and Kamala stepping in his place, and the largest increase was following the Harris-Trump debate.

Now, let’s find the percent change in this category

Code
mitchell_male_changes <- mitchell_male |> 
  summarise(
    Start_Total = first(Male),
    End_Total = last(Male),
    Total_Change = last(Male) - first(Male),
    Percent_Change = ((last(Male) - first(Male)) / first(Male)) * 100
  )

There was a minor increase with 0.3798481.

Now, democratic voters in Mitchell County

Code
mitchell_democratic <- read_csv(("~/Desktop/VoterRegistrationData/mitchell_democratic.csv"))

mitchell_democratic$Date <- as.Date(paste0("2024-", mitchell_democratic$Date), format = "%Y-%m/%d")

ggplot(mitchell_democratic, aes(x = Date, y = Democratic)) +
  geom_line(color = "red", size = 1) +
  geom_point(color = "darkred", size = 3) +
  labs(
    title = "Democratic Voter Registration in Mitchell County",
    x = "Date",
    y = "Number of Democratic Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

The largest increase was following the Democratic National Convention and the largest decrease was following the Harris-Trump debate.

Now, let’s find the percent change in this category

Code
mitchell_democratic_changes <- mitchell_democratic|> 
  summarise(
    Start_Total = first(Democratic),
    End_Total = last(Democratic),
    Total_Change = last(Democratic) - first(Democratic),
    Percent_Change = ((last(Democratic) - first(Democratic)) / first(Democratic)) * 100
  )

There was a pretty large increase with 0.8737864.

Now, Republican voters in Mitchell County

Code
mitchell_republican <- read_csv(("~/Desktop/VoterRegistrationData/mitchell_republican.csv"))

mitchell_republican$Date <- as.Date(paste0("2024-", mitchell_republican$Date), format = "%Y-%m/%d")

ggplot(mitchell_republican, aes(x = Date, y = Republican)) +
  geom_line(color = "red", size = 1) +
  geom_point(color = "darkred", size = 3) +
  labs(
    title = "Republican Voter Registration in Mitchell County",
    x = "Date",
    y = "Number of Republican Voters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.caption = element_text(color = "gray50"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

There was a quite significant increase following the Harris-Trump debate and a minor decrease following Trump getting shot.

Now, let’s find the percent change in this category

Code
mitchell_republican_changes <- mitchell_republican|> 
  summarise(
    Start_Total = first(Republican),
    End_Total = last(Republican),
    Total_Change = last(Republican) - first(Republican),
    Percent_Change = ((last(Republican) - first(Republican)) / first(Republican)) * 100
  )

There was a positive increase of 0.5376344.

Conclusion

This study was quite interesting because when comparing each demographic between the two counties, there are many indirect correlations, where one point in Hertford County increases while the same point in Mitchell County decreases.