Voter Turnout of Naturalized (Foreign Born) Citizens and U.S.-Born Citizens
Research Question: How does voter turnout differ between Naturalized (Foreign Born) citizens and U.S.-Born Citizens?
Introduction.
Naturalized citizens must undergo a longer process and take a citizenship class as part of their path toward United States citizenship. In contrast, U.S.-born citizens may have taken civic courses during their education. Each group, however, has a different experience, which could influence their voter turnout.
In this project, I will compare voter turnout between naturalized (foreign-born) citizens and U.S.-born citizens across various elections. The focus is: How does voter turnout differ between naturalized (foreign-born) citizens and U.S.-born citizens?
Using the North Carolina Voter Registration Data and Voter History Data, I will examine the 2024 election, the November 2023 election, and the 2022 midterm election.
This analysis will focus on Wake County, as it is one of the counties with the largest foreign-born population.
Null Hypothesis: There is no difference in voter turnout rates between naturalized and U.S. born citizens.
Alternative Hypothesis: Naturalized citizens have lower voter turnout rates than U.S.-born citizens.
Getting Started
Download Necessary Libraries.
library(tidycensus) library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.2
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 4063003 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (13): county_desc, voter_reg_num, election_lbl, election_desc, voting_me...
dbl (2): county_id, voted_county_id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
All voters categorized or labeled as “Removed,” “Inactive,” “Denied,” or “Confidential” were excluded. This ensures the study focuses only on voters labeled as active.
This final filter limits the dataset to voters who have left the Birth State section blank, or are recorded as NA. For now, we will assume these individuals are foreign-born and naturalized.
Merging Data-sets
FBVOTER_REGISTR_DATA <- FB_VOTER_Registr |>left_join(WAKE_VOTER_HIST, by ="voter_reg_num")
This step shows that we merged the Voter History File with the filtered Voter Registration Data. The two datasets were merged using voter_reg_num, which serves as a unique identifier for each voter in Wake County, North Carolina.
This was done to determine whether it is possible to distinguish naturalized individuals from those with Birth State recorded as NA. However, this revealed that it is rather difficult to identify naturalized citizens based solely on their registration date. Although there are noticeable peaks in March, October, and November, these may be due to voter registration deadlines.
FBVOTER_REGISTR_DATA %>%count(gender_code) %>%mutate(percent = n /sum(n))
# A tibble: 3 × 3
gender_code n percent
<chr> <int> <dbl>
1 F 172800 0.341
2 M 148303 0.293
3 U 185676 0.366
USBORN_VOTER_Registr %>%count(gender_code) %>%mutate(percent = n /sum(n))
# A tibble: 3 × 3
gender_code n percent
<chr> <int> <dbl>
1 F 1724517 0.546
2 M 1407733 0.446
3 U 25167 0.00797
FBVOTER_REGISTR_DATA %>%count(party_cd) %>%mutate(percent = n /sum(n))
# A tibble: 5 × 3
party_cd n percent
<chr> <int> <dbl>
1 DEM 178157 0.352
2 GRE 470 0.000927
3 LIB 2897 0.00572
4 REP 84009 0.166
5 UNA 241246 0.476
USBORN_VOTER_Registr %>%count(party_cd) %>%mutate(percent = n /sum(n))
# A tibble: 5 × 3
party_cd n percent
<chr> <int> <dbl>
1 DEM 1201821 0.381
2 GRE 871 0.000276
3 LIB 10458 0.00331
4 REP 755584 0.239
5 UNA 1188683 0.376
FBVOTER_REGISTR_DATA %>%count(race_code) %>%mutate(percent = n /sum(n))
# A tibble: 8 × 3
race_code n percent
<chr> <int> <dbl>
1 A 28666 0.0566
2 B 69400 0.137
3 I 1555 0.00307
4 M 3642 0.00719
5 O 25084 0.0495
6 P 69 0.000136
7 U 150949 0.298
8 W 227414 0.449
USBORN_VOTER_Registr %>%count(race_code) %>%mutate(percent = n /sum(n))
# A tibble: 8 × 3
race_code n percent
<chr> <int> <dbl>
1 A 87615 0.0277
2 B 549489 0.174
3 I 6547 0.00207
4 M 14541 0.00461
5 O 108508 0.0344
6 P 152 0.0000481
7 U 89114 0.0282
8 W 2301451 0.729
After calculating the mean age and frequency percentages for gender, party, and race, it is clear that the foreign-born/naturalized and U.S.-born datasets are not balanced. These differences suggest that any comparison between the groups—especially regarding voter turnout—may be confounded by demographic differences unless matching is applied.
I created a separate group within the datasets to differentiate voters and identify whether they are foreign-born or U.S.-born, which was necessary for matching. Due to the large size of the dataset, I used a smaller sample and matched voters by age, gender, race, and party affiliation. As a result, both groups became well balanced.
# 2023 Municipalmunicipal_2023 <- turnout_summary %>%filter(election_lbl ==as.Date("2023-11-07"))ggplot(municipal_2023, aes(x =factor(group), y = turnout_pct, fill =factor(group))) +geom_col() +labs(title ="Voter Turnout: 2023 Municipal Election",x ="Group (0=US-born, 1=Foreign)",y ="Turnout (%)" ) +theme_minimal()
Based on these results, we can conclude that in the general elections, foreign-born (naturalized) citizens have a higher voter turnout in this sample. However, U.S.-born citizens show higher turnout in midterm and local (municipal) elections compared to foreign-born voters.
Gender
turnout_gender <- matched_data %>%filter( election_lbl %in%as.Date(c("2024-11-05", "2023-11-07", "2022-11-08")), gender_code %in%c("F", "M") # Only include Female and Male ) %>%group_by(group, election_lbl, gender_code) %>%summarise(voters =n(), .groups ="drop") %>%left_join( matched_data %>%filter(gender_code %in%c("F", "M")) %>%# Match total registeredgroup_by(group, gender_code) %>%summarise(total_registered =n(), .groups ="drop"),by =c("group", "gender_code") ) %>%mutate(turnout_pct = voters / total_registered *100)ggplot(turnout_gender, aes(x = gender_code, y = turnout_pct, fill =factor(group))) +geom_col(position ="dodge") +facet_wrap(~ election_lbl) +labs(title ="Voter Turnout by Gender (F & M only) and Group",x ="Gender",y ="Turnout (%)",fill ="Group (US-born, Foreign)" ) +theme_minimal()
Based on these results, we can conclude that in the 2024 general election, both female and male foreign-born citizens have higher voter turnout than their U.S.-born counterparts. However, in the midterm and municipal elections, the opposite is true, as both genders among U.S.-born citizens show higher turnout, which supports the previous results.
Age
turnout_age_2024 <- matched_data %>%filter(election_lbl ==as.Date("2024-11-05")) %>%mutate(age_group =cut(age_at_year_end, breaks =seq(18, 90, by =5), right =FALSE)) %>%group_by(group, age_group) %>%summarise(voters =n(), .groups ="drop") %>%left_join( matched_data %>%mutate(age_group =cut(age_at_year_end, breaks =seq(18, 90, by =5), right =FALSE)) %>%group_by(group, age_group) %>%summarise(total_registered =n(), .groups ="drop"),by =c("group", "age_group") ) %>%mutate(turnout_pct = voters / total_registered *100) ggplot(turnout_age_2024, aes(x = age_group, y = turnout_pct, fill =factor(group))) +geom_col(position ="dodge") +labs(title ="Voter Turnout by Age Group (2024 General Election)",x ="Age Group",y ="Turnout (%)",fill ="Group (0 = US-born, 1 = Foreign)" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
Based on these results, we can conclude that voter turnout is higher across all age groups in the foreign-born/naturalized group compared to U.S.-born citizens, suggesting greater civic engagement. However, turnout is particularly higher among the younger age group and those in the 58–63 age range.
Review
After conducting this study, I learned that in some cases, foreign-born/naturalized citizens had higher voter turnout in the past general election. However, I would like to examine previous years to determine if this is a consistent trend. In contrast, for local and midterm elections, U.S.-born citizens appear to be more civically engaged. A gender analysis also shows higher engagement among U.S.-born citizens. Regarding age, the younger demographic and the 58–63 age group among foreign-born citizens dominate voter turnout in the 2024 general election.
#####Things I would do differently: #####I would examine more previous years to see if these trends persist. Additionally, I would like more accurate information to confirm whether voters with Birth State recorded as NA are truly foreign-born/naturalized citizens. Finally, I would conduct the study on a much larger sample, as I only focused on Wake County and had to reduce the dataset to a sample of 5,000 due to its already large size.