This project examines the impact of same-day registration (SDR) on voter turnout in following elections. Specifically, it focuses on voters in North Carolina during the 2020 general election who took advantage of SDR—allowing them to register and vote after the traditional registration deadline. By comparing this group to individuals who registered during the same period but did not use SDR and were therefore unable to vote in 2020, the analysis seeks to determine whether SDR participants are more likely to vote in the 2024 general election. This project shows how SDR policies affect voting habits over time, helping us understand how making elections more accessible can encourage people to stay engaged in voting.
For this analysis, I utilized two data sets from the state of North Carolina: the statewide voter history file and the statewide voter file.
The voter history file contains records of voters’ participation in past elections, including details about the election, voting method, and party affiliation during voting. The voter file includes information about registered voters, such as demographic details, party affiliation, and registration date.
To focus the analysis on the most relevant information, I only selected columns necessary for examining voter turnout and demographic trends. This includes the unique voter identifier (ncid), the voter’s registration date, election history (whether the voter participated in certain elections), and demographic information (such as age, race, and party affiliation).
The columns were selected using the following code: Reading and selecting relevant columns from the voter history file
voter_registration <- read_tsv(“/Users/anabella/Documents/project/ncvhis_statewide.txt”, col_select = c(“ncid”, “election_lbl”, “election_desc”, “voting_method”, “voted_party_cd”, “vtd_label”, “county_id”, “county_desc”))
Reading and selecting relevant columns from the voter file
voter_history <- read_tsv(“/Users/anabella/Documents/project/ncvoter_statewide.txt”, col_select = c(“ncid”, “county_id”, “county_desc”, “birth_year”, “age_at_year_end”, “race_code”, “ethnic_code”, “party_cd”, “gender_code”, “registr_dt”))
The datasets were merged using a left join on the ncid column, which serves as the unique identifier for each voter. This ensured that all records from the voter file were retained, even if there was no corresponding entry in the voter history file. The code for merging is shown below:
merged_data <- voter_history %>% left_join(voter_registration, by = “ncid”)
To identify same-day registrants (SDR), I first converted the registr_dt column from a character format to a Date format for easier comparison. The registr_dt column contains the date a voter registered, and by converting it, I could filter the data based on a specific registration deadline. I then defined the registration deadline for the 2020 election as October 9, 2020. Finally, I filtered the data to select voters who registered after this deadline and who voted in the election (as indicated by the non-NA values in the voted_party_cd column).
sdr_users <- merged_data %>% filter(registr_dt > registration_deadline & !is.na(voted_party_cd))
The histogram of the age distribution for same-day registrants (SDRs) was created using the ggplot2 package. First, I loaded the package and used the ggplot() function to plot the age data. The geom_histogram() function was applied to visualize the distribution, with the age at year-end as the variable of interest. To customize the plot, I set the bin width to 1 and defined color and fill properties.Additionally, the x-axis was limited to ages between 18 and 100 to focus on the relevant age range for SDRs.
ggplot(sdr_users, aes(x = age_at_year_end)) + geom_histogram(binwidth = 1, fill = “#FFC0CB”, color = “black”, alpha = 0.7) + scale_x_continuous(limits = c(18, 100))
The histogram shows a downward trend, indicating that younger individuals were more likely to use same-day registration. While not every age group follows this pattern, the overall trend shows a decrease in SDR usage as age increases.
The pie chart visualizes the political party breakdown of SDR users.
mutate(voted_party_cd = case_when( voted_party_cd == “rep” ~ “Republican”, voted_party_cd == “dem” ~ “Democrat”, voted_party_cd == “una” ~ “Independent”, TRUE ~ “Other” ))
This ensured consistency in labeling before grouping and summarizing the data.
ggplot(party_breakdown, aes(x = ““, y = Count, fill = voted_party_cd)) + geom_bar(stat =”identity”, width = 1, color = “white”) + coord_polar(theta = “y”)
This visualization highlights the proportion of SDR users in each party, with text labels added to show the percentage breakdown for clarity.
The pie chart reveals that Republicans and Democrats had nearly equal
representation among SDR users, with their proportions being very
similar. This suggests that same-day registration was utilized at
comparable rates by both major political parties.
This pie chart was created to display the percentage of SDR users who identify as Black, White, or neither. First, I categorized users by race using the mutate() function with case_when(), assigning “White” for race code “W”, “Black” for race code “B”, and “Does not Identify with Either” for all other values. ##### This ensured clear labels for each group:
mutate(race_category = case_when( race_code == “W” ~ “White”, race_code == “B” ~ “Black”, TRUE ~ “Does not Identify with Either” )))
Next, we grouped the data by race category and calculated the percentage for each group. Finally, we used ggplot() to create a polar bar chart, transforming it into a pie chart with coord_polar(theta = “y”), and added percentages as text labels for clarity. The colors were customized using scale_fill_manual().
The majority of same-day registrants identify as white, reflecting a larger representation in this group. However, the proportions align closely with the racial demographics of the United States population.
This code was used to calculate the voter turnout in the 2024 General Election for same-day registrants (SDR) from the 2020 General Election. First, it filters the data for the 2024 election and identifies who voted based on their voting method. Then, it filters out the 2020 SDR users and calculates the turnout by comparing the number of SDR users who voted in 2024 to the total number of SDR users. ##### Step 1: Filter the dataset for the November 2024 General Election
merged_data_2024 <- merged_data[merged_data$election_lbl == “11/05/2024”, ]
merged_data_2024\(voted_2024 <- ifelse(merged_data_2024\)voting_method != ““, 1, 0)
sdr_users_2020 <- sdr_users[sdr_users\(ncid %in% merged_data_2024\)ncid, ]
turnout_2024_sdr <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% sdr_users_2020$ncid]) / nrow(sdr_users_2020)
This code identifies late registrants of the 2020 General Election who were not able to vote and did not use same-day registration (SDR). It then calculates the voter turnout in 2024 for these non-voting late registrants by filtering for their participation in the 2024 General Election and computing their turnout rate.
merged_data_2020 <- merged_data %>% filter(election_lbl == “11/03/2020”)
non_voters_2020hopefully <- late_registrants_2020[!late_registrants_2020\(ncid %in% merged_data_2020\)ncid, ]
merged_data_2024 <- merged_data %>% filter(election_lbl == “11/05/2024”)
merged_data_2024\(voted_2024 <- ifelse(merged_data_2024\)voting_method != ““, 1, 0)
non_voters_2020_2024 <- merged_data_2024[merged_data_2024\(ncid %in% non_voters_2020hopefully\)ncid, ]
turnout_2024_non_voters <- sum(non_voters_2020_2024$voted_2024) / nrow(non_voters_2020hopefully)
This bar graph compares the voter turnout in the 2024 general election between Same-Day Registrants and non-voters from the 2020 general election. It highlights the difference in turnout percentages for each group.
turnout_data <- data.frame( group = c(“SDR Users”, “Non-Voters”), turnout = c(turnout_2024_sdr, turnout_2024_non_voters) )
The ggplot function then generates a bar graph, where aes(x = group, y = turnout * 100, fill = group) maps the groups to the x-axis and their turnout percentages to the y-axis, while geom_bar(stat = “identity”) creates the bars. To ensure the y-axis covers the entire percentage range, the expand_limits(y = c(0, 100)) is added, which sets the y-axis to span from 0% to 100%.
The table created in this code is used to compare the voter turnout in 2024 between Same-Day Registration (SDR) voters from 2020 and non-voters from 2020, showing how much more likely SDR voters are to vote in 2024 compared to non-voters. The table displays the turnout percentages for each group and calculates the percentage by which SDR voters are more likely to vote in 2024, using a formula.
percentage_more_likely_all <- ((turnout_2024_sdr - turnout_2024_non_voters) / turnout_2024_non_voters) * 100
turnout_table_all <- data.frame( Group = c(“SDR Voters (2020)”,
“Non-Voters (2020)”), Turnout_2024 = c(paste0(round(turnout_2024_sdr *
100, 1), “%”), paste0(round(turnout_2024_non_voters * 100, 1), “%”)),
% More Likely to Vote in 2024
=
c(paste0(round(percentage_more_likely_all, 1), “%”), “NA”) )
Group | Turnout_2024 | % More Likely to Vote in 2024 |
---|---|---|
SDR Voters (2020) | 39.9% | 262.2% |
Non-Voters (2020) | 11% | NA |
The bar graph and table demonstrate that individuals who utilized Same-Day Registration (SDR) in 2020 and were able to vote that year were significantly more likely to vote in the 2024 election compared to those who registered late in 2020 and were unable to vote. 39.9% of SDR users voted in 2024, while only `11% of non-voters did. Specifically, the table highlights that SDR users were 262.2% (or 2.622 times) more likely to vote in 2024 than non-voters.
sdr_users_rep <- sdr_users_2020[sdr_users_2020$voted_party_cd == “REP”, ]
sdr_users_dem <- sdr_users_2020[sdr_users_2020$voted_party_cd == “DEM”, ]
sdr_users_una <- sdr_users_2020[sdr_users_2020$voted_party_cd == “UNA”, ]
turnout_2024_sdr_rep <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% sdr_users_rep$ncid]) / nrow(sdr_users_rep)
turnout_2024_sdr_dem <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% sdr_users_dem$ncid]) / nrow(sdr_users_dem)
turnout_2024_sdr_una <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% sdr_users_una$ncid]) / nrow(sdr_users_una)
non_sdr_users_rep <- non_voters_2020hopefully[non_voters_2020hopefully$voted_party_cd == “REP”, ]
non_sdr_users_dem <- non_voters_2020hopefully[non_voters_2020hopefully$voted_party_cd == “DEM”, ]
non_sdr_users_una <- non_voters_2020hopefully[non_voters_2020hopefully$voted_party_cd == “UNA”, ]
turnout_2024_non_sdr_rep <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% non_sdr_users_rep$ncid]) / nrow(non_sdr_users_rep)
turnout_2024_non_sdr_dem <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% non_sdr_users_dem$ncid]) / nrow(non_sdr_users_dem)
turnout_2024_non_sdr_una <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% non_sdr_users_una$ncid]) / nrow(non_sdr_users_una)
To produce this graph, I created a data frame containing turnout percentages for each group (SDR and non-SDR voters) across three political affiliations: Democrats, Republicans, and Independents. I used the geom_bar() function to create grouped bar graphs with custom colors for each party and SDR status. Additionally, the geom_text() function was used to add percentage labels on top of the bars, and the ylim(0, 100) function ensured the y-axis spans from 0 to 100% for better comparison.
geom_bar(stat = “identity”, position = position_dodge(width = 0.7), width = 0.6)
was used to create grouped bar graphs, where each bar represents a turnout percentage. The stat = “identity” argument ensures the heights of the bars correspond directly to the values in the data, while position_dodge() spaces the bars within each group for clear comparison.
ylim(0, 100)
was used to set the y-axis range from 0% to 100%, ensuring the different sizes of the bars are represented accurately relative to the full turnout percentage scale.
I will now explain how the following table was created, which displays voter turnout by party and SDR status, along with the calculated differences in voting likelihood between SDR and non-SDR groups.
percentage_more_likely_rep <- ((turnout_2024_sdr_rep - turnout_2024_non_sdr_rep) / turnout_2024_non_sdr_rep) * 100
This line calculates the percentage difference in voting likelihood between SDR and non-SDR Republicans. It computes the difference between the turnout of SDR Republicans and non-SDR Republicans, then divides it by the non-SDR Republicans’ turnout and multiplies by 100 to express it as a percentage. This step was repeated for Democrats and Independents.
The data.frame() function is used to create a table with pre-calculated values for each group. This function is used to organize the data into a tabular format. It combines the party names, their respective voter turnout values, and the calculated percentage differences into a single data structure for easy display.
kable(party_data, caption = “Voter Turnout by Party and SDR Status”, col.names = c(“Party”, “Voter Turnout (2024)”, “(%) More Likely to Vote in 2024”), align = c(“l”, “c”, “c”))
Party | Voter Turnout (2024) | (%) More Likely to Vote in 2024 |
---|---|---|
Republicans (SDR) | 42.9% | 242.1% |
Republicans (Non-SDR) | 12.5% | - |
Democrats (SDR) | 40.9% | 620.4% |
Democrats (Non-SDR) | 5.7% | - |
Independents (SDR) | 58.4% | 293.7% |
Independents (Non-SDR) | 14.8% | - |
The bar graph and table both illustrate the significant impact of Same-Day Registration (SDR) on voter turnout in 2024, comparing the likelihood of voting between SDR and non-SDR groups in each political party. From the bar graph, we observe that individuals who used SDR in 2020 were notably more likely to vote in 2024 compared to those who were non-voters in 2020. Specifically, the voter turnout for Republicans who used SDR was 42.9%, while Republicans who were non-voters in 2020 had a turnout of 12.5%. For Democrats, 40.9% of SDR users voted, while only 5.7% of non-voters in 2020 cast their ballots. Among Independents, 58.4% of SDR users voted, compared to 14.8% of non-voters.
The table reinforces these findings, showing the percentage increase in likelihood to vote for SDR users compared to non-voters. For example, SDR Republicans were 242.1% (2.421 times) more likely to vote than non-voter Republicans, SDR Democrats were 620.4% (6.204 times) more likely to vote than non-voter Democrats, and SDR Independents were 293.7% (2.937 times) more likely to vote than non-SDR Independents.
When comparing the likelihood to vote across these three political groups, we see a clear trend:
Democrats benefit the most from SDR, with an increase in likelihood to vote of 620.4%. Independents come next, with a 293.7% increase in voter turnout. Republicans show the smallest increase, at 242.1%.
In conclusion, while Same-Day Registration positively impacts voter turnout across all three groups—Republicans, Democrats, and Independents—it most strongly influences voter turnout among Democrats. This suggests that SDR has a particularly powerful effect in mobilizing Democratic voters for the 2024 general election.
sdr_users_white <- sdr_users_2020[sdr_users_2020$race_code == “W”, ]
sdr_users_black <- sdr_users_2020[sdr_users_2020$race_code == “B”, ]
turnout_2024_sdr_white <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% sdr_users_white$ncid]) / nrow(sdr_users_white)
turnout_2024_sdr_black <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% sdr_users_black$ncid]) / nrow(sdr_users_black)
non_sdr_users_white <- non_voters_2020hopefully[non_voters_2020hopefully$race_code == “W”, ]
non_sdr_users_black <- non_voters_2020hopefully[non_voters_2020hopefully$race_code == “B”, ]
turnout_2024_non_sdr_white <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% non_sdr_users_white$ncid]) / nrow(non_sdr_users_white)
turnout_2024_non_sdr_black <- sum(merged_data_2024\(voted_2024[merged_data_2024\)ncid %in% non_sdr_users_black$ncid]) / nrow(non_sdr_users_black)
The bar graph displays the difference in voter turnout in 2024 based on race and Same-Day Registration (SDR) status. It compares the turnout rates of SDR users and non-SDR voters among different racial groups, showing the impact of SDR on voter participation amongst different races.
The data preparation and plotting involved a few key steps.
data <- data.frame( Group = c(“SDR White”, “Non-SDR White”, “SDR Black”, “Non-SDR Black”), Turnout = c(turnout_2024_sdr_white * 100, turnout_2024_non_sdr_white * 100, turnout_2024_sdr_black * 100, turnout_2024_non_sdr_black * 100), Race = c(“White”, “White”, “Black”, “Black”), SDR_Status = c(“SDR”, “Non-SDR”, “SDR”, “Non-SDR”) )
The Group column defines the combination of race and SDR status, while the Turnout column represents the voter turnout percentage.
Next, ggplot2 was used to create the bar graph. The aes() function defined the variables for the x-axis (group), y-axis (turnout), and fill color (interaction between race and SDR status). The geom_bar() function created the bar graph, with stat = “identity” ensuring that the bars represent the actual values in the Turnout column.
ggplot(data, aes(x = Group, y = Turnout, fill = interaction(Race, SDR_Status))) + geom_bar(stat = “identity”, position = position_dodge(width = 0.7), width = 0.6) +
The scale_fill_manual() function was used to assign specific colors to each combination of race and SDR status. This allows us to differentiate the groups clearly. The geom_text() was used to display the turnout percentages directly on the bars, and labs() added titles and axis labels for clarity.
This code calculates and creates a table that shows the difference in voter turnout for different racial groups based on their use of Same-Day Registration (SDR) in 2020.
((turnout_SDR - turnout_non_SDR) / turnout_non_SDR) * 100
Next, the code creates a data frame called race_data to store the calculated values and the original voter turnout data for both SDR and non-SDR groups. The columns in this data frame include “Race,” “Voter Turnout (2024),” and “Difference (%),” with turnout values formatted as percentages to one decimal place.
kable(race_data, caption = “Voter Turnout by Race and SDR Status”, col.names = c(“Race”, “Voter Turnout (2024)”, “(%) More Likely to Vote in 2024”), align = c(“l”, “c”, “c”))
Race | Voter Turnout (2024) | (%) More Likely to Vote in 2024 |
---|---|---|
White (SDR) | 39.2% | 264.7% |
White (Non-SDR) | 10.7% | - |
Black (SDR) | 38.4% | 447.4% |
Black (Non-SDR) | 7% | - |
The bar graph and table both highlight the significant impact of Same-Day Registration (SDR) on voter turnout in 2024 based on race. For both white and black voters, SDR users were much more likely to vote compared to non-voters. Specifically, the voter turnout for White SDR voters was 39.2%, compared to 10.7% for White non-voters, and for black SDR voters, it was 38.4%, compared to 7% for black non-voters.
The table shows that white SDR voters were 264.7% more likely to vote than non-SDR white voters (2.647 times more likely), while black SDR voters were 447.4% more likely to vote than non-SDR black voters (4.474 times more likely). These findings indicate that while SDR has a significant impact on voter turnout for both racial groups, the effect is much stronger for black voters than for white voters.
The following table shows the voter turnout rates in 2024 for each age group, comparing non-SDR users and SDR users. It also displays the percentage by which SDR users were more likely to vote in 2024 compared to non-SDR users within each age group.
This code performs several tasks related to analyzing voter turnout by age group, specifically comparing the turnout between SDR (Same-Day Registration) users and non-SDR users in 2024. It categorizes users into age groups, calculates voter turnout rates for each group, and generates a table with the results. The final section of the code creates a new variable that shows how much more likely SDR users are to vote compared to non-SDR users, expressed as a factor (e.g., 1.5 times more likely).
The first part of the code categorizes SDR users and non-SDR users into different age groups using the categorize_age function. For each user, the age_at_year_end value is passed into the categorize_age function, which assigns them to a specific age group based on their age.
sdr_users_2020\(age_group <- sapply(sdr_users_2020\)age_at_year_end, categorize_age)
non_voters_2020hopefully\(age_group <- sapply(non_voters_2020hopefully\)age_at_year_end, categorize_age)
I defined a function (calculate_turnout_by_age) that calculates the voter turnout rate for each age group by comparing the number of voters in each group to the total number of users in that group. It loops through each unique age group, calculates the turnout, and stores the results.This function was then used to calculate voter turnout for both SDR and non-SDR users in 2024, as shown below:
turnout_2024_sdr_age <- calculate_turnout_by_age(sdr_users_2020, merged_data_2024, “age_group”, “voted_2024”)
turnout_2024_non_sdr_age <- calculate_turnout_by_age(non_voters_2020hopefully, merged_data_2024, “age_group”, “voted_2024”)
The results from the calculate_turnout_by_age function are combined into a data frame. The age_groups vector contains all unique age groups, and the sapply function is used to apply the calculated turnout rates for SDR and non-SDR users to each age group.
Finally, the table is displayed using kable from the knitr package. This table shows the voter turnout for SDR users and non-SDR users for each age group, as well as the percentage difference in voter turnout between the two groups.
The last part of the code creates a new column, Times_More_Likely, which calculates how many times more likely SDR users were to vote in 2024 compared to non-SDR users. The value is the ratio of SDR turnout to non-SDR turnout for each age group, and it is calculated as a factor instead of a percentage. This part of the code will be used in a line graph.
Age Group | SDR Voter Turnout (2024) | Non-SDR Voter Turnout (2024) | % More Likely for 2020 SDR Users to Vote in 2024 | |
---|---|---|---|---|
31-40 | 31-40 | 40.4% | 8.2% | 394.4% |
25-30 | 25-30 | 44.1% | 7.5% | 486.1% |
41-60 | 41-60 | 35.4% | 9.9% | 256.2% |
61+ | 61+ | 29.4% | 7% | 320.4% |
18-24 | 18-24 | 69.9% | 13.4% | 421.3% |
The line graph shows the relative likelihood of SDR users voting in 2024 compared to non-SDR users across different age groups. It displays how many times more likely SDR users are to vote than non-SDR users, with each point representing a specific age group and the line connecting these points to show the trend.
Using ggplot2, the ggplot() function is called to create the line graph. The aes() function maps the x-axis to Age_Group and the y-axis to Times_More_Likely. A line is drawn using geom_line(), connecting the data points with a light pink color, and individual points are added with geom_point() in dark red. The labs() function adds a title and axis labels, and the theme_minimal() and theme() functions refine the graph’s appearance by using a minimal theme and rotating the x-axis labels for better readability.
The table and line graph shows that in every age group, SDR users from 2020 were significantly more likely to vote in 2024 than people who were late registrants in 2020 but were not able to vote. For example, in the 18-24 age group, SDR users had a voter turnout of 69.9%, compared to just 13.4% for late registrants, making them 421.3% more likely to vote. In the 25-30 age group, SDR users had a turnout of 44.1%, compared to 7.5% for late registrants, making them 486.1% more likely to vote, the highest among the groups. The 31-40 age group had 40.4% turnout for SDR users versus 8.2% for late registrants, with SDR users being 394.4% more likely to vote. In the 41-60 age group, SDR users had 35.4% turnout versus 9.9% for late registrants, resulting in a 256.2% higher likelihood. Finally, in the 61+ age group, SDR users had 29.4% turnout compared to 7% for late registrants, making them 320.4% more likely to vote.
While the 25-30 age group showed the greatest difference in voter turnout between SDR users and late registrants, the line graph overall shows a negative trend: younger age groups were more strongly affected by SDR in terms of voter turnout.
Same-day registration (SDR) has proven to significantly increase voter turnout in future elections across all demographic groups I examined. By making voting more accessible, SDR encourages participation, particularly among those who might otherwise miss the traditional registration deadline. Voters who used SDR in the 2020 general election were more likely to return to the polls in the 2024 election compared to those who registered after the deadline but were unable to vote in 2020. The experience of voting once increases the likelihood of future participation, and SDR plays a key role in facilitating this.
Although the effect of SDR on voter turnout was observed across all demographics, it had the most substantial impact on Democratic voters compared to Republicans and Independents. Additionally, SDR significantly boosted Black voter turnout more than White voter turnout, highlighting its potential to reduce barriers for historically underrepresented groups. Lastly, there was a clear trend indicating that SDR had a more pronounced effect on younger voters compared to older ones, emphasizing the importance of making voting easier for younger generations.Overall, the data suggests that SDR can foster a more engaged voting population, ultimately leading to higher participation in future elections.
For future research, it would be valuable to explore the long-term effects of same-day registration (SDR) on voter behavior, particularly in terms of how it influences voter turnout over multiple election cycles. Understanding whether the initial boost in turnout provided by SDR leads to sustained political engagement or if its effects diminish over time would help policymakers assess its lasting impact. Additionally, examining how SDR affects turnout in states with varying political environments and demographics would offer insights into whether its effectiveness is influenced by factors such as political polarization, voter access, or regional differences. Furthermore, exploring how SDR interacts with other voting reforms—such as early voting, mail-in ballots, or automatic voter registration—could provide a more comprehensive understanding of how these measures work together to shape overall electoral participation. This research could offer valuable guidance on the combination of reforms that most effectively encourage long-term civic involvement.