Project Statement

Ballot access is important to ensure that the population is properly represented by the election results.

Certain populations take advantage of different voting methods, and insights on this may help understand why certain laws are passed in regards to voting methods and how to activate different affected voter groups.

This project aims to analyze and visualize the changes in voting methods between the 2020 and 2024 elections, focusing on specific population groups such as youth voters. This would explain why I began targeting Broward county, as the youth population density is larger in comparison to most other counties, while also having accessible data.

As time went on, my understanding of this topic grew and I took on a new approach to the voting method impact. I wanted to further explore SB-90, as it

Additionally, I decided the project will also assess the impact of SB-90 on political party turnout, particularly for registered Democrats and Republicans.

By examining shifts in voting methods within these groups, the project seeks to better understand how the implementation of SB-90 affects political participation and representation, especially in the context of emerging voting methods like mail-in and early voting.

Additional Info on SB-90:

sources: https://www.votealachua.com/Voters/Vote-by-Mail

https://www.votealachua.com/Portals/Alachua/Florida%20Senate%20Bill%2090%20fact%20Sheet.pdf?ver=-bLLUVN1ZoFwtnlvfnl3Zg%3D%3D

  1. Anyone who does not have either their last 4 digits of their Social Security Number, or their Florida Driver’s License / Florida ID# on file with our office will not be able to make a vote-by-mail request over the phone, or online. Most of these voters would have registered 15 to 20 years ago, before this information was recorded in our databases. You can solve this problem by updating your voter registration. You can follow the link to do it online here. Or you can fill out a Voter Registration Application and mail it in (for Spanish, click here). A voter registration application is also accessible here under the “Miscellaneous” heading.

  2. You must now renew your request every two years! Pursuant to Florida Statutes §101.62(a), the maximum amount of time your request can be honored is “[f]or all elections through the end of the calendar year of the next regularly scheduled general election.” In other words, please be sure to renew your request for a vote-by-mail ballot at the beginning of each odd-numbered year.

To receive vote-by-mail ballots in 2024, you must make a new request.

Code Implementation

#read in the dataset

data<-read.delim("BRO_EVL_43888.txt")

#creating a vote mode variable, setting all of these to E since we loaded early voting data

data <- data|>
  mutate(vote_mode = "E")

#rename data voter ID column

data <- data %>%
  rename(VoterID = FvrsVoterIdNumber, DateVoted=DateofEarlyVote)

#cbind the broward mail-in data once approved and set vote_mode for those to M

#remove all the columns we dont need from each dataset by selecting the columns we do need

#left join extended data using the id numbers FvrsVoterIdNumber

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
## : EOF within quoted string

#add column names

#TODO: we need description file to cutdown the data.

We will use the age comparison to November 5th 2024, to see how their age would ON the day of the election. #create age column

selected_data_bro$Birth_Date <- as.Date(selected_data_bro$Birth_Date, format = "%m/%d/%Y")
target_date<- as.Date("2024-11-05")
selected_data_bro$age <- as.integer(difftime(target_date, selected_data_bro$Birth_Date) / 365.25)

#join mail and early with selected_data and fill all NAs for vote mode with O for other.

selected_data_bro <- selected_data_bro %>%
  mutate(
    VoterID = as.character(VoterID)
  )

#have to set other vote modes as O, since we cannot pinpoint if the person had actually voted(have to wait for january to get this data)

#Set age to range 18-27 for new data(since we want youth votes from 2020)

#gather info for ages 18-22 as youth2024 and 23-27 as youth2020 as well in new data to compare youth vote percentage old vs new, and check 2024 youth vote type preference

#make sure theyre active voters

#registered voters are either vote by mail, early, or other.

#get last General election data

#see IDs that are common from current data

#select only active voters

joined_2024_data_active<- joined_2024_data|>
  filter(Voter_Status == "ACT")
# Convert election_date to Date type
broward_history$Election_Date <- as.Date(broward_history$Election_Date, format = "%m/%d/%Y")

Now, we are going to set up alluvial plots to show transition of vote modes

library(ggplot2)
library(ggalluvial)
alluvial_data<- merge(GEN_BRO_H, joined_2024_data_active, by = "VoterID", suffixes = c("_2020", "_2024"))
str(alluvial_data)
## 'data.frame':    119397 obs. of  15 variables:
##  $ VoterID          : int  100878403 100902376 100902768 100903734 100904859 100905886 100906259 100906309 100906415 100907729 ...
##  $ County           : chr  "BRO" "BRO" "BRO" "BRO" ...
##  $ Election_Date    : Date, format: "2020-11-03" "2020-11-03" ...
##  $ Election_type    : chr  "GEN" "GEN" "GEN" "GEN" ...
##  $ VoteCode         : chr  "A" "Y" "E" "A" ...
##  $ vote_mode_2020   : chr  "M" "P" "E" "M" ...
##  $ Party_Affiliation: chr  "DEM" "REP" "NPA" "NPA" ...
##  $ Voter_Status     : chr  "ACT" "ACT" "ACT" "ACT" ...
##  $ Precinct         : chr  "J009" "B001" "X022" "P001" ...
##  $ Birth_Date       : Date, format: "1962-08-03" "1961-11-25" ...
##  $ Registration_Date: chr  "06/15/2004" "12/10/2004" "12/22/2004" "03/29/1972" ...
##  $ age              : int  62 62 45 86 53 37 37 85 71 58 ...
##  $ vote_mode_2024   : chr  "M" "E" "E" "M" ...
##  $ DateVoted        : chr  "20008" "10/24/2024" "10/28/2024" "20025" ...
##  $ AbsParty         : chr  "DEM" "REP" "NPA" "NPA" ...
alluvial_prep<-alluvial_data|>
  select(VoterID, vote_mode_2020, vote_mode_2024) |>
  group_by(vote_mode_2020, vote_mode_2024) |>
  summarise(count = n()) |>
  ungroup()|>
  mutate( vote_mode_2020 = recode(vote_mode_2020,
                            "E" = "Early Vote",
                            "M" = "Mail Vote",
                            "P" = "In Person",
                            "O" = "Other"),
    vote_mode_2024 = recode(vote_mode_2024,
                            "E" = "Early Vote",
                            "M" = "Mail Vote",
                            "P" = "In Person",
                            "O" = "Other"))
## `summarise()` has grouped output by 'vote_mode_2020'. You can override using
## the `.groups` argument.

#this doesnt work below… because youth should be set to 23-27 since ages are from 2024. must edit to fix.

###alluvial_prep_youth<-alluvial_data|>
 ### filter(age >= 18, age <= 22)|>
 ### select(VoterID, vote_mode_2020, vote_mode_2024) |>
 ### group_by(vote_mode_2020, vote_mode_2024) |>
  ###summarise(count = n()) |>
  ###ungroup()

Here I begin plotting the graphs. I chose Alluvial graphs for the age groups to show the progression of age group voting methods over time. One of the issues I had run into previously was the age calculations, as 2020 data did not have the age data for that group. To combat this, I had to filter ages by adding 4 years to the actual 2020 target age and work backwards. Note: The “Other” portion will encompass any method that is not denoted as “Early vote” or “Mail Vote” (could also mean they did not vote). This is because the updated voter record for the 2024 election has not been updated/released to the public yet. Once that information is available, I will update this project.

alluvial_prep_youth<-alluvial_data|>
  filter(age >= 22, age <= 26)|>
  select(VoterID, vote_mode_2020, vote_mode_2024) |>
  group_by(vote_mode_2020, vote_mode_2024) |>
  summarise(count = n()) |>
  ungroup()|>
  mutate( vote_mode_2020 = recode(vote_mode_2020,
                            "E" = "Early Vote",
                            "M" = "Mail Vote",
                            "P" = "In Person",
                            "O" = "Other"),
    vote_mode_2024 = recode(vote_mode_2024,
                            "E" = "Early Vote",
                            "M" = "Mail Vote",
                            "P" = "In Person",
                            "O" = "Other"))
## `summarise()` has grouped output by 'vote_mode_2020'. You can override using
## the `.groups` argument.

#just repeated process for all other age groups

## `summarise()` has grouped output by 'vote_mode_2020'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'vote_mode_2020'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'vote_mode_2020'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'vote_mode_2020'. You can override using
## the `.groups` argument.

Graphical Output!

#transition graph in chat alluvial

ggplot(alluvial_prep, aes(axis1 = vote_mode_2020, axis2 = vote_mode_2024, y = count)) +
  geom_alluvium(aes(fill = vote_mode_2020), width = 0.2) +
  geom_stratum(width = 0.3, fill = "gray") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 3) +
  scale_x_discrete(limits = c("2020", "2024")) +
  scale_fill_manual(
    values = c("Early Vote" = "blue", "Mail Vote" = "green", 
               "In Person" = "orange", "Other" = "purple")
  ) +
  theme_minimal() +
  labs(
    title = "Change in Voting Methods from 2020 Election to 2024 Election",
    x = "Year",
    y = "Number of Voters",
    fill = "Voting Method"
  )
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

Here, we are able to see that the overall population of Broward in 2020 favored the mail vote. It can be seen that the majority of the 2024 mail voters were returning mail votes. Majority of 2020 early voters also seemed to repeat in 2024. A large portion of the Mail vote seems to be going into “Other”, which we cannot decipher. This leaves it up to interpretation, but it could mean that voters dropped off and ended up not voting.

# Plot the filtered data
ggplot(alluvial_prep_youth, aes(axis1 = vote_mode_2020, axis2 = vote_mode_2024, y = count)) +
  geom_alluvium(aes(fill = vote_mode_2020), width = 0.2) +
  geom_stratum(width = 0.3, fill = "gray") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 3) +
  scale_x_discrete(limits = c("2020", "2024")) +
  scale_fill_manual(
    values = c("Early Vote" = "blue", "Mail Vote" = "green", 
               "In Person" = "orange", "Other" = "purple")
  ) +
  theme_minimal() +
  labs(
    title = "Change in Voting Methods from 2020 to 2024 (Ages 18-22)",
    x = "Year",
    y = "Number of Voters",
    fill = "Vote Mode in 2020"
  )
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

Here, it looks like majority of 2020 votes were in Early or Mail in. In 2024, many of the Mail voters from 2020 ended up in the “Other” category, but a majority of the 2024 mail vote is returning from previous mail votes. This is also seen in the early votes from 2020 to 2024. A large proportion if each 2020 vote method drops into the “Other” category. Until we know further detail of the “other” category, we cannot draw any inferences.

This similar patter continues on to other age groups as shown below:

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

But here, for ages 56 and up, we can see that this age group has a larger proportion of people that voted by mail in 2020.

#We can ask these questions: are the returning voters from 2020 dominating in mail in ballots, early votes, (or other but we cannot be sure that other is a competed vote)

Returning voters seem to be dominating the early vote method, or showing a transition into “Other”.

The massive dropoff in mail votes for each age group can likely be explained by SB-90. Many previous voters may not have known about the new expiration rule for mail-ballots and did not recieve one in time to cast their vote, hence joining the “Other” category.

#We can ask these questions: are the returning voters from 2020 dominating in mail in ballots, early votes, (or other but we cannot be sure that other is a competed vote)

These two graphs below take a closer look at the youth population.

ggplot(youth_combined, aes(x = factor(year), y = count, fill = vote_mode)) +
  geom_bar(stat = "identity") +
  labs(title = "Youth Voting Methods Comparison: 2020 vs 2024",
       x = "Year",
       y = "Number of Voters",
       fill = "Voting Method") +
  theme_minimal()

youth_filtered <- youth_combined %>% 
  filter(vote_mode %in% c("M", "E"))

ggplot(youth_filtered, aes(x = factor(year), y = count, fill = vote_mode)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Youth Voting Methods: Mail vs Early Voting (2020 vs 2024)",
       x = "Year",
       y = "Number of Voters",
       fill = "Voting Method") +
  theme_minimal()

Here, youth is defined as 18-22. There is a massive dropoff in youth voter turnout, possibly explaining the “Other” category transition.

We can see mail ballots dominated 2020 while early votes dominated 2024 (could be explained by COVID-19 increasing the use of mail ballots, or SB-90 decreasing the use of mail ballots).

library(dplyr)

# Modify Party_Affiliation column
alluvial_data <- alluvial_data %>%
  mutate(Party_Affiliation = case_when(
    Party_Affiliation %in% c("DEM", "REP") ~ Party_Affiliation,
    TRUE ~ "OTHER"
  ))

#compare 2024 voting methods to registered party(Democrat vs republican vs other)

# Reshape alluvial_data
reshaped_data <- alluvial_data %>%
  pivot_longer(cols = c(vote_mode_2020, vote_mode_2024),
               names_to = "year",
               values_to = "vote_mode") %>%
  mutate(year = ifelse(year == "vote_mode_2020", "2020", "2024"))


# Calculate percentages
party_vote_percentage <- reshaped_data %>%
  group_by(Party_Affiliation, vote_mode, year) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(Party_Affiliation, year) %>%
  mutate(percentage = (count / sum(count)) * 100)


# Plot the percentages
ggplot(party_vote_percentage, aes(x = Party_Affiliation, y = percentage, fill = vote_mode)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_manual(
    values = c("E" = "blue", "M" = "green", "P" = "orange", "O" = "purple"),
    labels = c("E" = "Early Vote", "M" = "Mail Vote", "P" = "In Person", "O" = "Other")
  ) +
  facet_wrap(~year, ncol = 1) + # Separate plots for 2020 and 2024
  labs(
    title = "Voting Method Percentages by Party Affiliation",
    x = "Party Affiliation",
    y = "Percentage",
    fill = "Voting Method"
  ) +
  theme_minimal()

In this graph above, we take a very overarching look at how voting methods were distributed amongst parties. In 2020, it looks like Democrats dominated the mail ballot, and majority of democratic votes were casted by mail, followed by other parties, and then the republican party. In 2024, we can see that the percentage of mail ballots casted by democrats has decreased significantly. The “other” category seems to have grown as we are not able to specify in person votes, non-voters, and other miscellaneous categories. But what we can see is the almost stagnant percentage in early votes between 2020 and 2024.

The striking drop in mail votes may be explained by the pandemic drop off and the implementation of SB-90, but we will be able to get a better look at this with more detailed data what will be released in 2025.

Closing Remarks

I wish to continue this project by using updated data with accurate voting methods (in-person and non-voter denotation) to get a more precise look at these patterns and come to a conclusion about age group voting method trends and the effect of SB-90 on the 2024 election.