# Combine all election years to analyze turnout across municipal and federal elections
turnout_all <- bind_rows(
turnout_2017,
turnout_2018,
turnout_2019,
turnout_2020,
turnout_2021,
turnout_2022,
turnout_2023,
turnout_2024
)When Politics Becomes Local: Racial Turnout Gaps and Ballot Roll-Off in North Carolina Elections
Abstract
US local elections consistently have lower turnout compared to federal elections; thus, important questions are raised regarding who turns out and who stays home. This study examines racial and ethnic patterns of participation in North Carolina from 2016 to 2024, comparing turnout in presidential, midterm, and municipal elections. Using administrative voter files and precinct-level election results, I analyze racial turnout gaps and evaluate whether municipal elections held concurrently with federal contests attract a more representative electorate. I also estimate race-specific roll-off rates from the 2024 concurrent election using ecological inference. A central focus of this study is the representativeness of the electorate itself. Because racial categories in voter files contain some measurement error, observed turnout gaps may diverge from true behavioral disparities. The findings from this study show that racial turnout gaps widen sharply in stand-alone municipal elections, driven by steep declines in Latino, Asian, and Native American turnout. In contrast, Black non-Latino voters maintain near parity with White voters even in low-salience local contests. These results support theoretical claims about unequal mobilization and information costs (Fraga 2018), while also highlighting the importance of empowerment dynamics, civic infrastructures, and community networks in sustaining political participation. Overall, this study provides a comprehensive account of racial differences in local electoral engagement and demonstrates how election timing and ballot structure shape democratic participation.
Introduction
Each presidential year, North Carolinians pack election sites to cast their ballot for federal and local offices. However, in off-year elections, these same election locations are largely empty for local contests. Local governments make crucial decisions regarding education, housing, and policing; these are issues directly affecting residents. Despite their importance, turnout in local elections generally falls below 20% of registered voters.
These differences between federal and local turnout raise a central question that motivates this study: Will moving municipal elections to presidential years produce a more representative electorate? On the one hand, concurrent elections might lead more voters to the polls. In comparison, if some groups are more likely to skip down ballot races or remain unregistered, simply shifting election timing may not meaningful reduce representational inequalities.
To address these questions, I analyze presidential, midterm, and municipal elections in North Carolina from 2016 to 2024. I compare turnout patterns across racial and ethnic groups and examine how these patterns vary depending on the election type. I then turn to ballot roll-off in the 2024 presidential election to ask a different question: once voters show up, who actually completes the ballot for local offices? Concurrent elections may shrink racial turnout gaps by boosting overall participation; however, they could also highlight existing inequalities if minority voters are more likely than White voters to skip down-ballot races.
All together, the turnout and roll-off analyses provide complementary perspectives on racial inequality in local political representation in North Carolina. The results suggest a nuanced answer to this central question. Moving municipal elections on-cycle appears to make the electorate more representative for non-Latino Black voters; this group exhibits the the lowest roll-off in concurrent elections. In comparison, Latino, Asian, and Indigenous voters roll-off is high and municipal turnout is low. Throughout this paper, I further consider these patterns and their implications.
Why North Carolina and Why Race
North Carolina is well suited for studying how election salience and institutional context shape racial turnout. The state holds most municipal elections in odd-numbered years, creating a clear contrast with even-year presidential and midterm contests. It also maintains detailed administrative voter files that include race, ethnicity, age, gender, precinct, and municipality, along with comprehensive voter history. North Carolina is one of the few states where voter files are publicly accessible and where a self-reported race field is available, making it an ideal setting for analyzing racial participation patterns with administrative data.
Recent debates in states such as New York over whether to place local races on the same ballot as presidential contests highlight how election timing has become a prevalent political questions. Some argue that moving municipal elections on-cycle could increase participation and lead to higher representation. North Carolina’s mix of stand-alone municipal elections and concurrent local-federal contests offer a natural setting to evaluate these broader claims about timing and representation.
Beyond these practical advantages, North Carolina provides a theoretically rich context for examining turnout disparities. The state has a racially diverse electorate, long-standing patterns of racially polarized voting, and a recent history of voting rights litigation and redistricting disputes. At the same time, it combines competitive statewide politics with uneven local participation. Presidential and gubernatorial races attract intense mobilization efforts, while municipal elections are frequently characterized by limited campaign contact, weak media coverage, and low information environments.
Focusing on race, rather than class, age, or education, is theoretically motivated by prior work showing that race is a powerful predictor of turnout and partisan alignment in the United States (Fraga 2018). Fraga’s research demonstrates that racial turnout gaps persist even after accounting for socioeconomic status and other individual-level characteristics. These gaps emerge because racial and ethnic minorities are less likely to be mobilized by campaigns and often face higher information and resource costs. Yet much of this literature centers on federal and statewide contests. What remains underexplored is whether racial gaps widen further in local elections that lack partisan cues, media visibility, and large-scale mobilization, and whether concurrent elections can mitigate or exacerbate these inequalities. This study addresses that gap.
Racial Turnout Gap
It is well documented that a racial turnout gap exists between White and non-White voters. In The Turnout Gap (2018), Bernard Fraga argues that this gap is persistent but context-dependent: election timing, election type, and power dynamics all shape who votes. Although Fraga’s analysis focuses primarily on federal and statewide elections, his framework applies directly to municipal contexts.
Local elections are typically low-salience, have fewer partisan cues, and receive less media coverage. Under these conditions, Fraga’s framework suggests that racial turnout gaps should widen. Conversely, high-salience contests like presidential elections tend to generate more attention and mobilization, which should shrink racial turnout gaps relative to local races.
Fraga emphasizes that these disparities are driven less by socioeconomic factors or voter suppression efforts and more on limited electoral influence and a lack of mobilization. Racial and ethnic minorities are less likely to be targeted by campaigns, face higher information costs, and often perceive their influence over outcomes to be weaker. As a result, racial turnout gaps reflect deeper systemic inequalities in who is invited into the electorate and whose voices are consistently heard. This study tests how those dynamics play out when elections move from the presidential level to the local municipal level in North Carolina.
Costs of Voting
Voting can be a costly activity, especially in local elections where partisan cues are weaker and information is scarcer. Verba, Schlozman, and Brady (1995) argue that political participation depends on resources, including time, money, and civic skills; they also suggest that different forms of participation draw on these resources in different ways. Local elections often require more time and effort for voters to learn which offices are on the ballot, what those offices do, and who the candidates are. In this sense, local elections function as high-resource activities, disproportionately burdening lower socioeconomic status (SES) and racially marginalized groups.
Education plays a central role in shaping who can navigate these informational demands. Converse (1964) noted that many citizens hold “non-attitudes” or unstable preferences; Campbell et al. (1960) found that education increases political knowledge and is positively associated with turnout. More educated citizens are better able to interpret limited political information and make sense of low-salience contests. By contrast, less-educated and lower-resourced voters may be more likely to abstain when partisan cues are weak and media attention is low.
If local elections require more information and civic skills, disadvantaged racial and ethnic groups may be less likely to vote. This helps to explain why racial/ethnic turnout gaps might widen among minority groups in municipal elections.
Hypotheses
Grounded in the turnout gap and costs-of-voting literature, this study tests three hypotheses:
H1: The racial turnout gap (difference in turnout rate between White non-Latino voters and each other racial/ethnic group) is larger in municipal elections than in federal elections.
H2: Non-Latino Black voters will exhibit smaller racial turnout gaps in municipal elections than other racial and ethnic minorities, reflecting stronger local civic infrastructures and mobilization networks.
H3: When local elections are held concurrently with federal elections racial turnout gaps will be present in unequal ballot completion (roll-off).
These hypotheses link the turnout analysis and the roll-off analysis: even when concurrency narrows turnout gaps at the ballot level, it may not eliminate inequalities in local representation if some groups remain more likely to skip municipal races.
Method
To analyze North Carolina elections from 2017 to 2024, I first constructed turnout measures using Python and then imported the resulting summaries into R for visualization and additional analysis. All data were obtained from the North Carolina State Board of Elections website.
Two main data sets are used in this study:
The statewide voter history file, which records whether each registrant voted in each election.
Annual voter snapshot files, which list all registered voters (active and inactive) and include race, ethnicity, age, gender, precinct, municipality, and county.
I merged these files using the unique state identifier (NCID) and then cleaned the data to remove duplicate records, deceased voters, and registrants who moved out of the state. This ensures that turnout rates reflect only eligible voters within each election year.
Next, I defined the target elections. For presidential and midterm years, I focus on the general elections, which are the highest-salience contests. For local election years, I focus on municipal elections, explicitly excluding elections held concurrently with federal or statewide contests.
Turnout is computed using carefully constructed numerators and denominators:
The denominator includes all registered voters eligible to participate in a given election; this includes all registrants living in a precinct-municipality that held an election on that date. Voters in municipalities without an election on that date are excluded from the denominator.
The numerator includes all registrants who actually voted in that election.
I then perform subgroup analyses by race and ethnicity. For each election date, I calculate turnout for:
the full electorate,
race-only categories,
ethnicity-only categories, and
joint race × ethnicity categories.
When calculating group-specific turnout, the numerator counts all registrants in that group who voted on that date; the denominator includes all eligible registrants in the same group. This approach yields consistent, comparable turnout measures across years and election types.
After computing these turnout summaries using Python, I import them into R, combine data from 2017–2024, and create tables and plots to compare turnout across racial and ethnic groups and between municipal and general elections. These descriptive analyses set the stage for the more detailed examination of relative turnout and roll-off in later sections.
Turnout Analysis
Figure 1 presents turnout rates across racial and ethnic groups from 2017 to 2024, distinguishing between general and municipal elections. Turnout is substantially higher in general elections for all groups, as expected. White non-Latino voters have the highest turnout in federal contests, consistent with national patterns of elevated White participation in presidential and midterm elections. Hispanic/Latino voters, by contrast, exhibit the lowest turnout overall.
Turnout in municipal elections drops sharply for every racial and ethnic group, showing the effect of election salience: far fewer people vote when the offices are local and media coverage is low and information costs are higher. White non-Latino and Black non-Latino voters remain the most active in local elections, while turnout among Hispanic and Native American voters remains especially low.
These descriptive patterns provide initial support for H1 and H2. Racial turnout gaps widen when contests move from general to municipal elections, and the drop-off is steepest among Latino, Asian, and Native American voters. At the same time, Black non-Latino voters experience a smaller proportional decline and maintain near parity with White non-Latino voters.
# Use choose_col() function to detect the right column names, no matter how they are labeled
choose_col <- function(df, candidates) {
hit <- intersect(candidates, names(df))
if (length(hit)) hit[[1]] else NA_character_
}
# Identify demoninator and numerator column names
den_candidates <- c("denominator_count","denominator","denom","denom_count","denominator_sum","denominator_total")
num_candidates <- c("numerator_count","numerator","num","num_count","numerator_sum","numerator_total")
den_col <- choose_col(turnout_all, den_candidates)
num_col <- choose_col(turnout_all, num_candidates)
# Use make_overall() function to create 'all HL' and 'all NL' turnout categories
make_overall <- function(df, suffix, den_col, num_col) {
x <- df %>% filter(str_detect(label, paste0(" ", suffix, "$")))
if (!is.na(den_col) && !is.na(num_col) && all(c(den_col, num_col) %in% names(x))) {
x %>%
group_by(year) %>%
summarize(
denom = sum(.data[[den_col]], na.rm = TRUE),
numer = sum(.data[[num_col]], na.rm = TRUE),
.groups = "drop"
) %>%
mutate(label = paste("All", suffix), turnout_rate = numer / denom) %>%
select(year, label, turnout_rate)
} else {
x %>%
group_by(year) %>%
summarize(turnout_rate = mean(turnout_rate, na.rm = TRUE), .groups = "drop") %>%
mutate(label = paste("All", suffix)) %>%
select(year, label, turnout_rate)
}
}
hl_overall <- make_overall(turnout_all, "HL", den_col, num_col)
nl_overall <- make_overall(turnout_all, "NL", den_col, num_col)
turnout_all <- bind_rows(
turnout_all %>% select(year, label, turnout_rate),
hl_overall, nl_overall
)# Define which groups I want to keep
keep_labels <- c(
"All voters","All HL","All NL",
"Asian HL","Asian NL",
"Black or African American HL","Black or African American NL",
"American Indian or Alaska Native HL","American Indian or Alaska Native NL",
"Two or More Races HL","Two or More Races NL",
"Other HL","Other NL",
"White HL","White NL"
)
have_weights <- !is.na(den_col) && !is.na(num_col) &&
all(c(den_col, num_col) %in% names(turnout_all))
# Compute turnout rates by group and year
turnout_yearly <- if (have_weights) {
turnout_all %>%
filter(label %in% keep_labels) %>%
group_by(label, year) %>%
summarize(
turnout_rate = sum(.data[[num_col]], na.rm = TRUE) /
sum(.data[[den_col]], na.rm = TRUE),
.groups = "drop"
)
} else {
turnout_all %>%
filter(label %in% keep_labels) %>%
group_by(label, year) %>%
summarize(turnout_rate = mean(turnout_rate, na.rm = TRUE), .groups = "drop")
}
# Create a table
turnout_table <- turnout_yearly %>%
mutate(year = as.integer(year),
turnout_rate = round(turnout_rate * 100, 1),
label = factor(label, levels = keep_labels, ordered = TRUE)) %>%
arrange(label, year)
turnout_table_raw <- turnout_yearly # One column per year
turnout_wide <- turnout_table %>%
tidyr::pivot_wider(
names_from = year,
values_from = turnout_rate,
values_fill = NA_real_
)
# Build a GT table
turnout_gt <- turnout_wide %>%
mutate(
label = as.character(label),
# rename for display only
label = dplyr::case_when(
label == "American Indian or Alaska Native HL" ~ "Native American HL",
label == "American Indian or Alaska Native NL" ~ "Native American NL",
TRUE ~ label
)
) %>%
gt(rowname_col = "label") %>%
# visible "Table 1" label
tab_caption("Table 1. Turnout by Race/Ethnicity, 2017–2024") %>%
tab_header(
title = md("**Turnout by Race/Ethnicity, 2017–2024**"),
subtitle = md("Percent of registered voters who turned out")
) %>%
fmt_number(columns = -c(label), decimals = 1) %>%
cols_label(
label = "Race/Ethnicity",
`2017` = "2017 Municipal",
`2018` = "2018 General",
`2019` = "2019 Municipal (No Sept.)",
`2020` = "2020 General",
`2021` = "2021 Municipal",
`2022` = "2022 General",
`2023` = "2023 Municipal",
`2024` = "2024 General"
) %>%
tab_style(
style = list(cell_text(weight = "bold")),
locations = cells_body(rows = label %in% c("All voters"))
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) %>%
tab_options(table.font.size = 14)
# Render the table
turnout_gt| Turnout by Race/Ethnicity, 2017–2024 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Percent of registered voters who turned out | ||||||||
| 2017 Municipal | 2018 General | 2019 Municipal (No Sept.) | 2020 General | 2021 Municipal | 2022 General | 2023 Municipal | 2024 General | |
| All voters | 12.9 | 51.6 | 14.6 | 73.2 | 12.5 | 49.8 | 10.7 | 70.8 |
| All HL | 4.9 | 33.7 | 5.3 | 56.9 | 5.0 | 25.0 | 4.7 | 54.9 |
| All NL | 17.0 | 53.7 | 17.0 | 74.7 | 16.2 | 53.9 | 16.1 | 74.2 |
| Asian HL | 4.4 | 33.1 | 6.2 | 58.3 | 5.7 | 28.6 | 4.4 | 60.9 |
| Asian NL | 6.2 | 43.0 | 8.1 | 69.8 | 6.5 | 38.4 | 8.1 | 65.9 |
| Black or African American HL | 5.6 | 34.1 | 6.5 | 54.0 | 6.7 | 25.6 | 5.7 | 53.7 |
| Black or African American NL | 14.3 | 48.6 | 16.2 | 67.0 | 14.3 | 42.3 | 14.4 | 65.1 |
| Native American HL | 4.5 | 32.7 | 6.1 | 54.2 | 3.1 | 27.3 | 4.3 | 54.8 |
| Native American NL | 7.1 | 36.7 | 7.8 | 60.2 | 6.8 | 36.6 | 7.1 | 62.0 |
| Two or More Races HL | 5.6 | 34.8 | 7.8 | 54.0 | 6.7 | 29.2 | 6.8 | 55.3 |
| Two or More Races NL | 7.2 | 39.1 | 9.1 | 60.0 | 7.7 | 34.5 | 8.2 | 57.6 |
| Other HL | 2.9 | 28.7 | 3.9 | 52.8 | 3.6 | 22.0 | 2.4 | 54.6 |
| Other NL | 7.9 | 38.5 | 9.5 | 62.1 | 8.4 | 35.7 | 7.7 | 60.5 |
| White HL | 5.8 | 41.4 | 7.9 | 64.4 | 7.4 | 36.4 | 5.7 | 64.7 |
| White NL | 15.0 | 56.0 | 17.2 | 77.7 | 14.7 | 58.5 | 12.0 | 77.7 |
# Tag each row as General vs Municipal
turnout_long <- turnout_table_raw %>%
mutate(election_type = ifelse(year %in% c(2018, 2020, 2022, 2024),
"General", "Municipal"))
# Pick what groups to include (use the TRUE underlying labels here)
SHOW <- c(
"All HL",
"Asian NL",
"White NL",
"Black or African American NL",
"American Indian or Alaska Native NL"
)
# Map underlying labels -> pretty labels for the plot
label_map <- c(
"All HL" = "All HL",
"White NL" = "White NL",
"Black or African American NL" = "Black NL",
"Asian NL" = "Asian NL",
"American Indian or Alaska Native NL" = "Native American NL"
)
# Order using the underlying labels
order_vec <- c(
"All HL",
"Asian NL",
"White NL",
"Black or African American NL",
"American Indian or Alaska Native NL"
)
# Aggregate turnout by election type
plot_df <- turnout_long %>%
dplyr::filter(label %in% SHOW) %>%
dplyr::group_by(label, election_type) %>%
dplyr::summarise(turnout = mean(turnout_rate) * 100, .groups = "drop") %>%
dplyr::mutate(
label = factor(label, levels = order_vec, ordered = TRUE),
nice = factor(label_map[as.character(label)],
levels = label_map[order_vec],
ordered = TRUE)
)
# Plot side-by-side bars (Municipal vs General)
ggplot(plot_df, aes(x = nice, y = turnout, fill = election_type)) +
geom_col(position = "dodge") +
labs(title = "Turnout Rates by Race/Ethnicity and Election Type (2017–2024)",
x = "Race/Ethnicity", y = "Turnout Rate (%)",
fill = "Election Type") +
theme_minimal(base_size = 14)Relative Turnout
Descriptive turnout rates show who participates, but they do not fully capture how large the gaps are relative to White non-Latino voters. To address this, I analyze relative turnout, comparing each group’s turnout to that of White non-Latino voters in both general and municipal elections.
The figure and table in this section show that relative turnout gaps clearly widen as electoral salience declines. While all groups experience lower turnout in municipal elections, the relative decline is steepest among Latino, Asian, and Native American voters, whose participation drops to less than half the rate of White voters in local contests. In contrast, Black non-Latino turnout remains close to parity with White non-Latino turnout even in municipal elections.
These results align with Fraga’s (2018) argument that unequal mobilization and higher information costs widen racial turnout gaps in low-salience elections. At the same time, they highlight an important exception: Black non-Latino voters do not experience as severe a drop-off as other minority groups when contests move from federal to local. The near parity of Black non-Latino turnout suggests that strong local networks, community institutions, and targeted mobilization may partially offset the structural barriers that depress participation in municipal elections.
This pattern motivates the subsequent sections, which explore why Black voters maintain relatively high levels of local engagement and why other groups, particularly Latino and Asian voters, experience more dramatic declines.
# Groups to show in the plot (drop White NL from the bars)
SHOW <- c(
"All HL",
"Asian NL",
"Black or African American NL",
"American Indian or Alaska Native NL"
)
SHORT <- c(
"All HL" = "All HL",
"Asian NL" = "Asian NL",
"Black or African American NL" = "Black NL",
"American Indian or Alaska Native NL" = "Native American NL"
)
white_by_year <- turnout_long %>%
dplyr::filter(label == "White NL") %>%
dplyr::select(year, white_rate = turnout_rate)
ratio_df <- turnout_long %>%
dplyr::left_join(white_by_year, by = "year") %>%
dplyr::mutate(ratio = turnout_rate / white_rate) %>%
dplyr::filter(label %in% c(SHOW, "White NL")) %>% # keep White for calculations only
dplyr::group_by(label, election_type) %>%
dplyr::summarise(ratio = mean(ratio, na.rm = TRUE), .groups = "drop") %>%
dplyr::mutate(
# Force White NL to exactly 1, but we won't plot it
ratio = ifelse(label == "White NL", 1, ratio),
label_short = SHORT[label]
) %>%
dplyr::filter(label != "White NL") # DROP White NL from plot
ggplot(ratio_df, aes(x = label_short, y = ratio, fill = election_type)) +
geom_col(position = "dodge") +
scale_y_continuous(
labels = scales::number_format(accuracy = 0.01),
limits = c(0, 1.1)
) +
labs(
title = "Relative Turnout vs White (Non-Latino) by Election Type",
x = "Race/Ethnicity",
y = "Ratio to White Turnout (1.0 = equal turnout)",
fill = "Election Type"
) +
theme_minimal(base_size = 14)mun_gen <- turnout_long %>%
dplyr::group_by(label, election_type) %>%
dplyr::summarise(rate = mean(turnout_rate) * 100, .groups = "drop") %>%
dplyr::filter(label %in% c(
"White NL",
"Black or African American NL",
"All HL",
"Asian NL",
"American Indian or Alaska Native NL"
)) %>%
tidyr::pivot_wider(names_from = election_type, values_from = rate)
key_tbl <- mun_gen %>%
dplyr::left_join(
ratio_df %>%
dplyr::group_by(label) %>%
dplyr::summarise(
Ratio_Mun = mean(ratio[election_type == "Municipal"]),
Ratio_Gen = mean(ratio[election_type == "General"]),
.groups = "drop"
),
by = "label"
) %>%
dplyr::select(label, General, Municipal, Ratio_Gen, Ratio_Mun) %>%
dplyr::mutate(
across(c(General, Municipal), ~ round(.x, 1)),
Ratio_Gen = scales::number(Ratio_Gen, accuracy = 0.01),
Ratio_Mun = scales::number(Ratio_Mun, accuracy = 0.01)
)
key_tbl %>%
dplyr::mutate(
label = dplyr::case_when(
label == "American Indian or Alaska Native NL" ~ "Native American NL",
TRUE ~ label
)
) %>%
gt::gt() %>%
gt::tab_caption("Table 2. Summary: Turnout & Parity vs White") %>%
gt::tab_header(
title = md("**Summary: Turnout & Parity vs White**")
) %>%
gt::cols_label(
label = "Group",
General = "General (%)",
Municipal = "Municipal (%)",
Ratio_Gen = "Ratio to White (Gen)",
Ratio_Mun = "Ratio to White (Mun)"
) %>%
gt::fmt_number(columns = c(General, Municipal), decimals = 1)| Summary: Turnout & Parity vs White | ||||
|---|---|---|---|---|
| Group | General (%) | Municipal (%) | Ratio to White (Gen) | Ratio to White (Mun) |
| All HL | 42.7 | 5.0 | 0.62 | 0.34 |
| Native American NL | 48.9 | 7.2 | 0.71 | 0.50 |
| Asian NL | 54.3 | 7.2 | 0.79 | 0.50 |
| Black or African American NL | 55.7 | 14.8 | 0.82 | 1.02 |
| White NL | 67.5 | 14.7 | NA | NA |
Near Parity for Black Voters in Municipal Elections
One of the most striking findings of this study is that Black non-Latino voters in North Carolina exhibit municipal turnout rates that are nearly equal to those of White non-Latino voters, even as overall participation declines in local elections. In contrast, Latino and Asian voters show much larger declines in turnout when moving from general to municipal contests. This near parity for Black voters challenges the expectation that a historically marginalized group facing high information and resource costs would experience the greatest drop-off in low-salience elections.
Several mechanisms from the political participation literature help explain why Black voters may remain comparatively engaged in municipal elections:
Bobo and Gilliam (1990) argue that political empowerment can increase participation by cultivating feelings of trust, influence, and efficacy. Keele et al. (2017) find that the presence of viable Black candidates can elevate Black turnout in mayoral elections, and Banducci et al. (2004) show that minority representation can increase minority engagement. In municipalities where Black residents see themselves reflected in local leadership or perceive city government as responsive, empowerment dynamics may sustain turnout even when overall salience is low.
Historically, Black churches, the NAACP, and other groups have played central roles in mobilizing Black voters. Verba, Schlozman, and Brady’s (1995) Civic Voluntarism Model emphasizes how organizations that build civic skills and provide mobilization cues can lower the costs of participation. While Gray and Caul (2000) note a general decline in group-based mobilization, the Black civic infrastructure that exists in many North Carolina communities may continue to function as a powerful mobilizing force, especially around local issues.
As Cox (2015) argues, campaigns tend to target areas where mobilization yields the greatest payoffs. Geographically clustered Black populations in many cities create good conditions for efficient mobilization, particularly by community organizations, churches, and local advocacy groups. Anoll (2018) further highlights the importance of neighborhood norms: in communities where voting is seen as an expected behavior, individuals face social incentives to participate. These norms may encourage Black voters not only to turn out but to remain engaged in local elections over time.
Local decisions about policing, housing, and education often have consequences in majority-Black neighborhoods. As Leighley and Nagler (2013) argue, turnout composition reflects how closely policy stakes align with voters’ lived experiences. When local issues directly affect racial equity and community well-being, Black residents may perceive higher returns to participating in municipal elections than their counterparts in other groups.
These mechanisms suggest that Black municipal turnout is not simply an anomaly but a product of empowerment, mobilization, and issue proximity.
Why Other Groups Experience Greater Drop-Off
In contrast to Black North Carolinians, Latino, Native American, and Asian voters experience much larger declines in turnout when shifting from general to municipal elections. Several factors may contribute to this pattern.
First, campaign targeting and organizational infrastructure are weaker for these groups. Cox (2015) emphasizes that campaigns allocate resources where mobilization is most efficient. Smaller, more geographically dispersed populations are less likely to receive sustained outreach, especially in low-salience local contests. Valenzuela and Michelson (2016) show that group-identity appeals can effectively mobilize Latino voters, but such targeted messages may be rare in municipal campaigns with limited budgets and low visibility.
Second, information and language barriers can have negative effects in local elections. Limited bilingual outreach, not much ethnic media coverage, and complex local ballot structures raise the costs of participation. Without strong local organizations embedded in city governance, Latino and Asian communities may lack the networks that help Black voters navigate these environments.
As a result, Latino, Native American, and Asian voters often deal with both lower mobilization and higher information costs. This leads to strong declines in turnout when elections move from presidential to municipal.
Roll-Off
Roll off occurs when a voter participates in a high-salience contest such as president but does not vote in lower-salience municipal offices. In concurrent elections, it is important to consider roll-off to better understand representation. Even if overall turnout is high, ballot completion may remain unequal across racial groups.
Method
To evaluate how concurrent elections shape ballot completion across racial and ethnic groups, I analyze roll-off in the November 2024 presidential election in North Carolina. Unlike turnout, which captures whether a person cast a ballot at all, roll-off reflects within-ballot inequality in political engagement. Understanding roll-off is therefore central to assessing the representativeness of the electorate in concurrent elections.
Not all municipalities held elections in November 2024, and those that did often featured different sets of local races. To construct a consistent measure of municipal voting across the state, I focus on a set of “safe” municipal contests that meet two criteria:
They appear in a substantial number of precincts, allowing for broad geographic coverage; and
They involve mayoral or citywide offices, which typically have higher visibility and more uniform reporting.
The selected races are:
City of Raleigh Mayoral Election
City of Winston-Salem Mayoral Election
For each precinct, I calculate three mutually exclusive categories using election returns and the 2024 statewide voter file:
Presidential votes: number of ballots cast for President
Municipal votes: total ballots cast across the selected municipal contests
Registered voters (N): total registered voters from the 2024 voter file
To isolate presidential-only voters, I compute:
\[ \text{presidential-only} = \max\bigl(\text{presidential votes} - \text{municipal votes}, 0\bigr). \]
This yields three groups within each precinct:
Municipal voters
Presidential-only voters
Nonvoters
Race totals for each precinct come from the 2024 voter file. I collapse administrative categories into four analytically meaningful groups:
Non-Hispanic White
Non-Hispanic Black
Hispanic/Latino
Other (Asian, Native American, multiracial, and unclassified)
Smaller groups, including Asian and Indigenous voters, are combined into “Other” to reduce classification noise and ensure sufficient sample sizes for stable ecological inference estimates.
Race-specific roll-off cannot be directly observed because ballots do not record race. Ecological inference (EI) provides a methodological solution by estimating how racial groups are distributed across the three behavioral outcomes: municipal voting, presidential-only voting, and nonvoting.
I use the RxC ecological inference model implemented in eiPack, which is designed for settings with more than two racial groups and more than two behavioral outcomes. For each racial group \(r\), roll-off is defined as:
\[ \text{Roll-Off}_r = \frac{p_r}{m_r + p_r} \]
where \(m_r\) is the proportion of group \(r\) that votes in both municipal and presidential contests, and \(p_r\) is the proportion that votes only in the presidential contest.
Considering roll-off in this way is essential because it reveals whether high participation in concurrent elections translates into equitable engagement with local offices. The EI-based estimates provide a theoretically grounded assessment of race-specific roll-off in North Carolina’s 2024 concurrent election, complementing the earlier turnout analysis.
Results
Table 3 reports the aggregate level of ballot roll-off in precincts that held both presidential and municipal elections in November 2024. Across the selected precincts, 161,841 presidential ballots were cast, but only 98,057 included votes in the municipal contests analyzed. This implies an overall roll-off rate of 39.4%, meaning that nearly four in ten presidential voters skipped the municipal portion of the ballot. This level of incomplete ballot participation is consistent with well-documented gaps in information and salience between federal and local elections. Even when voters are already at the polls for a high-salience contest, a substantial share fails to engage with lower-salience local offices, raising concerns about the representativeness of municipal governance.
Table 4 reports race-specific estimates of turnout and roll-off from the ecological inference model. EI is necessary because ballots do not record race and administrative racial classifications contain some measurement error. The model therefore provides probabilistic estimates of within-ballot racial disparities in participation rather than exact individual-level behavior.
Several patterns stand out. White voters exhibit the highest presidential turnout but also relatively high roll-off; many White voters do not complete the municipal ballot once they have voted for President. Black voters show comparatively strong engagement and the lowest roll-off rate. Black non-Hispanic voters are the least likely of any racial group to skip municipal contests once they appear at the polls. This finding aligns with empowerment theories and accounts emphasizing community-based mobilization and strong civic infrastructures in Black communities.
Hispanic voters show extremely low municipal turnout and the second-highest roll-off rate, with more than two-thirds of presidential voters skipping the municipal contests. These patterns are consistent with lower campaign targeting (Fraga 2018), limited information about municipal offices (e.g., Lau & Redlawsk; Converse; Verba et al.), language-access barriers, and the newer and more transient nature of many Latino communities in North Carolina. The “Other” category exhibits the highest rate of incomplete ballots, likely reflecting small group sizes, weak mobilization, and heterogeneous composition. Because this category aggregates multiple distinct populations, these estimates should be interpreted with particular caution.
Together, the roll-off results show that concurrent elections increase overall turnout but do not guarantee equitable engagement with local offices. They also mirror the earlier turnout findings: Black voters are comparatively engaged in municipal politics, while Latino and “Other” voters remain underrepresented even when they show up for the top-of-the-ticket race.
safe_races <- c(
"CITY OF RALEIGH MAYOR",
"CITY OF WINSTON-SALEM MAYOR"
)
# President votes by county–precinct
pres_votes <- results24_real %>%
filter(`Contest Name` == "US PRESIDENT") %>%
group_by(County, Precinct) %>%
summarise(
president_votes = sum(`Total Votes`, na.rm = TRUE),
.groups = "drop"
)
muni_votes <- results24_real %>%
filter(`Contest Name` %in% safe_races) %>%
group_by(County, Precinct) %>%
summarise(
municipal_votes = sum(`Total Votes`, na.rm = TRUE),
.groups = "drop"
)
pres_muni <- pres_votes %>%
inner_join(muni_votes, by = c("County", "Precinct")) %>%
mutate(
bad_muni = municipal_votes > president_votes,
pres_only = pmax(president_votes - municipal_votes, 0),
rolloff_share = 1 - municipal_votes / president_votes
) %>%
filter(!bad_muni, president_votes > 0, municipal_votes > 0)
# Overall roll-off (for the descriptive table you already have)
overall_rolloff <- pres_muni %>%
summarise(
total_pres = sum(president_votes, na.rm = TRUE),
total_muni = sum(municipal_votes, na.rm = TRUE),
overall_rolloff_share = 1 - total_muni / total_pres
)overall_table <- overall_rolloff %>%
mutate(
overall_rolloff_share = scales::percent(overall_rolloff_share, accuracy = 0.1)
) %>%
gt() %>%
tab_caption("Table 3. Overall Roll-Off in 2024 Concurrent Election") %>%
tab_header(
title = md("**Overall Roll-Off in 2024 Concurrent Election**")
) %>%
cols_label(
total_pres = "Total Presidential Votes",
total_muni = "Total Municipal Votes",
overall_rolloff_share = "Roll-Off Rate"
) %>%
cols_align("center", everything())
overall_table| Overall Roll-Off in 2024 Concurrent Election | ||
|---|---|---|
| Total Presidential Votes | Total Municipal Votes | Roll-Off Rate |
| 216276 | 141973 | 34.4% |
ei_data <- pres_muni %>%
inner_join(
precinct_race,
by = c("County" = "county_desc", "Precinct" = "precinct_desc")
) %>%
mutate(
nonvoters = pmax(N_total - president_votes, 0)
) %>%
filter(
N_total > 50, # drop tiny precincts
president_votes <= N_total, # sanity check on totals
nonvoters >= 0
)
ei_data <- ei_data %>%
mutate(
pres_turnout = president_votes / N_total,
muni_turnout = municipal_votes / N_total,
turnout_gap = pres_turnout - muni_turnout
)
ei_df <- as.data.frame(ei_data)
formula_rc <- cbind(municipal_votes, pres_only, nonvoters) ~
cbind(NH_White, NH_Black, Hispanic, Other)
tune_out <- tuneMD(
formula = formula_rc,
data = ei_df
)
ei_results <- ei.MD.bayes(
formula = formula_rc,
data = ei_df,
tune.list = tune_out,
sample = 2000,
burnin = 1000,
thin = 5,
verbose = 500
)alpha_mat <- ei_results$draws$Alpha
alpha_df <- tibble(
race_choice = sub("^[Aa]lpha\\.", "", colnames(alpha_mat)), # drop 'alpha.' prefix
mean = apply(alpha_mat, 2, mean)
)
alpha_tidy <- alpha_df %>%
separate(race_choice, into = c("race", "choice"), sep = "\\.") %>%
pivot_wider(names_from = choice, values_from = mean)
rolloff_race <- alpha_tidy %>%
mutate(
pres_voters = municipal_votes + pres_only,
rolloff = pres_only / pres_voters
) %>%
select(race, rolloff)
# Nice table
rolloff_table <- rolloff_race %>%
mutate(
rolloff = percent(rolloff, accuracy = 0.1)
) %>%
arrange(race) %>%
gt() %>%
tab_header(
title = md("**Estimated Roll-Off by Race (EI Model)**")
) %>%
cols_label(
race = "Race",
rolloff = "Roll-Off Rate"
)turnout_table <- turnout_by_race %>%
mutate(
race = factor(
race,
levels = c("NH_White", "NH_Black", "Hispanic", "Other"),
labels = c("Non-Latino White",
"Non-Latino Black",
"Latino/Hispanic",
"Other")
)
) %>%
gt() %>%
tab_caption("Table 4. Turnout and Roll-Off by Race (EI Model)") %>%
tab_header(
title = md("**Turnout and Roll-Off by Race (EI Model)**"),
subtitle = md("Estimated from Bayesian Ecological Inference")
) %>%
cols_label(
race = "Race",
muni_turnout = "Municipal Turnout",
pres_turnout = "Presidential Turnout",
rolloff = "Roll-Off Rate" # <-- FIXED NAME
) %>%
cols_align("center", everything())
turnout_table| Turnout and Roll-Off by Race (EI Model) | |||
|---|---|---|---|
| Estimated from Bayesian Ecological Inference | |||
| Race | Municipal Turnout | Presidential Turnout | Roll-Off Rate |
| Non-Latino White | 45.5% | 83.3% | 45.4% |
| Non-Latino Black | 39.0% | 51.1% | 23.7% |
| Latino/Hispanic | 8.6% | 36.6% | 76.6% |
| Other | 6.3% | 17.1% | 63.0% |
Below I restricted my analysis to white and non-white voters. Because Black, Hispanic and other populations make up a small share of the population in many of the examined precincts, unreliable estimates could be produced. Grouping all minority voters into a single non-white category strengthens my analysis.
precinct_race <- voter_file_2024 %>%
mutate(
group = case_when(
ethnic_code == "HL" ~ "Hispanic",
ethnic_code != "HL" & race_code %in% c("W", "WHITE") ~ "NH_White",
ethnic_code != "HL" & race_code %in% c("B", "BLACK") ~ "NH_Black",
TRUE ~ "Other"
)
) %>%
group_by(county_desc, precinct_desc, group) %>%
summarise(n = n(), .groups = "drop") %>%
pivot_wider(names_from = group, values_from = n, values_fill = 0) %>%
mutate(N_total = NH_White + NH_Black + Hispanic + Other)precinct_race_bw <- precinct_race %>%
mutate(
White = NH_White,
NonWhite = NH_Black + Hispanic + Other,
N_total = White + NonWhite # just to be explicit
)ei_data_bw <- pres_muni %>%
inner_join(
precinct_race_bw,
by = c("County" = "county_desc", "Precinct" = "precinct_desc")
) %>%
mutate(
nonvoters = pmax(N_total - president_votes, 0)
) %>%
filter(
N_total > 50, # drop tiny precincts
president_votes <= N_total, # sanity check on totals
nonvoters >= 0
) %>%
mutate(
pres_turnout = president_votes / N_total,
muni_turnout = municipal_votes / N_total,
turnout_gap = pres_turnout - muni_turnout
)
ei_df_bw <- as.data.frame(ei_data_bw)# 2 (races) x 3 (outcomes: muni, pres-only, nonvoters)
formula_bw <- cbind(municipal_votes, pres_only, nonvoters) ~
cbind(White, NonWhite)
tune_bw <- tuneMD(
formula = formula_bw,
data = ei_df_bw
)
ei_results_bw <- ei.MD.bayes(
formula = formula_bw,
data = ei_df_bw,
tune.list = tune_bw,
sample = 2000,
burnin = 1000,
thin = 5,
verbose = 500
)alpha_mat_bw <- ei_results_bw$draws$Alpha
alpha_df_bw <- tibble(
race_choice = sub("^[Aa]lpha\\.", "", colnames(alpha_mat_bw)),
mean = apply(alpha_mat_bw, 2, mean)
)
alpha_tidy_bw <- alpha_df_bw %>%
tidyr::separate(race_choice, into = c("race", "choice"), sep = "\\.") %>%
tidyr::pivot_wider(names_from = choice, values_from = mean)
# columns: race, municipal_votes, pres_only, nonvoters
# --- Turnout and roll-off by race (White vs NonWhite) ---
turnout_by_race_bw <- alpha_tidy_bw %>%
mutate(
total = municipal_votes + pres_only + nonvoters,
muni_turnout = municipal_votes / total,
pres_turnout = (municipal_votes + pres_only) / total,
rolloff = pres_only / (municipal_votes + pres_only)
) %>%
mutate(
muni_turnout = scales::percent(muni_turnout, accuracy = 0.1),
pres_turnout = scales::percent(pres_turnout, accuracy = 0.1),
rolloff = scales::percent(rolloff, accuracy = 0.1)
) %>%
select(race, muni_turnout, pres_turnout, rolloff)turnout_table_bw <- turnout_by_race_bw %>%
mutate(
race = factor(
race,
levels = c("NH_White", "NonWhite", "White", "NonWhite"),
labels = c("Non-Latino White", "Non-White",
"Non-Latino White", "Non-White")
)
) %>%
distinct(race, .keep_all = TRUE) %>% # in case both NH_White/White appear
gt() %>%
tab_caption("Table 4. Turnout and Roll-Off by Race (White vs Non-White, EI Model)") %>%
tab_header(
title = md("**Turnout and Roll-Off by Race (White vs Non-White, EI Model)**"),
subtitle = md("Estimated from Bayesian Ecological Inference")
) %>%
cols_label(
race = "Race",
muni_turnout = "Municipal Turnout",
pres_turnout = "Presidential Turnout",
rolloff = "Roll-Off Rate"
) %>%
cols_align("center", everything())
turnout_table_bw| Turnout and Roll-Off by Race (White vs Non-White, EI Model) | |||
|---|---|---|---|
| Estimated from Bayesian Ecological Inference | |||
| Race | Municipal Turnout | Presidential Turnout | Roll-Off Rate |
| Non-Latino White | 43.9% | 84.8% | 48.3% |
| Non-White | 27.6% | 41.2% | 32.9% |
To estimate actual presidential turnout by race in the precincts used for the ecological inference analysis, I merged the 2024 North Carolina voter snapshot file with the statewide voter history file. I restricted the analysis to only those precincts that appeared in my roll-off dataset; this includes precincts that contained both presidential votes and the municipal races. The table below shows, of registered voters, how many of them voted in the presidential election by race and ethnicity. Some of the results are not consistent with the estimates produced by the EI model. Although presidential turnout of white voters is similar across the EI estimate and voter history file, Hispanic and Black turnout are not. Black and Hispanic turnout are significantly higher when calculated from the voter history file compared to the EI model.
ei_precincts <- pres_votes %>%
dplyr::select(County, Precinct) %>%
dplyr::distinct()safe_races <- c(
"CITY OF RALEIGH MAYOR",
"CITY OF WINSTON-SALEM MAYOR",
"CITY OF RALEIGH CITY COUNCIL AT-LARGE"
)
municipal_votes <- results24_real %>%
filter(`Contest Name` %in% safe_races) %>%
group_by(County, Precinct) %>%
summarise(muni_votes = sum(`Total Votes`), .groups = "drop")
pres_votes <- results24_real %>%
filter(`Contest Name` == "US PRESIDENT") %>%
group_by(County, Precinct) %>%
summarise(pres_votes = sum(`Total Votes`), .groups = "drop")ei_precincts <- inner_join(
pres_votes %>% select(County, Precinct) %>% distinct(),
municipal_votes %>% select(County, Precinct) %>% distinct(),
by = c("County", "Precinct")
)eligible_voters <- voter_file_2024 %>%
inner_join(
ei_precincts,
by = c("county_desc" = "County",
"precinct_desc" = "Precinct")
)nrow(ei_precincts)[1] 197
nrow(eligible_voters)[1] 232236
eligible_voters %>%
count(county_desc) %>%
arrange(desc(n))# A tibble: 2 × 2
county_desc n
<chr> <int>
1 FORSYTH 218465
2 DURHAM 13771
votehist_2024 <- votehist %>%
filter(election_desc == "11/05/2024 GENERAL") %>%
distinct(voter_reg_num)eligible_voters <- eligible_voters %>%
mutate(
pres_voted = if_else(
voter_reg_num %in% votehist_2024$voter_reg_num,
1L, 0L
)
)eligible_voters <- eligible_voters %>%
mutate(
race_group = dplyr::case_when(
ethnic_code == "HL" ~ "Latino/Hispanic",
ethnic_code != "HL" & race_code == "W" ~ "Non-Latino White",
ethnic_code != "HL" & race_code == "B" ~ "Non-Latino Black",
TRUE ~ "Other"
)
)target_precincts <- pres_muni %>%
select(County, Precinct) %>%
distinct()eligible_subset <- eligible_voters %>%
inner_join(
target_precincts,
by = c("county_desc" = "County",
"precinct_desc" = "Precinct")
)actual_pres_turnout_subset <- eligible_subset %>%
group_by(race_group) %>%
summarise(
N_eligible = n(),
pres_voters = sum(pres_voted),
turnout_rate = pres_voters / N_eligible,
.groups = "drop"
) %>%
mutate(
turnout_pct = scales::percent(turnout_rate, accuracy = 0.1)
)target_precincts <- pres_muni %>%
select(County, Precinct) %>%
distinct()
eligible_subset <- eligible_voters %>%
inner_join(
target_precincts,
by = c("county_desc" = "County",
"precinct_desc" = "Precinct")
)
actual_pres_turnout_subset <- eligible_subset %>%
group_by(race_group) %>%
summarise(
N_eligible = n(), # denominator: registrants in EI precincts
pres_voters = sum(pres_voted), # numerator: those who voted for President
turnout_rate = pres_voters / N_eligible,
.groups = "drop"
) %>%
mutate(
turnout_pct = scales::percent(turnout_rate, accuracy = 0.1)
)actual_pres_turnout_table <- actual_pres_turnout_subset %>%
mutate(
N_eligible = scales::comma(N_eligible),
pres_voters = scales::comma(pres_voters)
) %>%
gt() %>%
tab_header(
title = md("**Presidential Turnout by Race (2024)**")
) %>%
cols_label(
race_group = "Race/Ethnicity",
N_eligible = "Eligible Registrants",
pres_voters = "Presidential Voters",
turnout_pct = "Turnout"
)
actual_pres_turnout_table | Presidential Turnout by Race (2024) | ||||
|---|---|---|---|---|
| Race/Ethnicity | Eligible Registrants | Presidential Voters | turnout_rate | Turnout |
| Latino/Hispanic | 13,600 | 10,203 | 0.7502206 | 75.0% |
| Non-Latino Black | 64,094 | 54,011 | 0.8426842 | 84.3% |
| Non-Latino White | 123,009 | 108,709 | 0.8837483 | 88.4% |
| Other | 31,533 | 24,202 | 0.7675134 | 76.8% |
To visualize the racial makeup of the precincts analyzed using ecological inference, I produced GIS maps. To create these maps, I merged North Carolina precinct boundary shapefiles with voter-file demographic data. I produced maps for Wake and Forsyth counties where residents of the three elections I analyzed reside. I mapped the share of white, Black, and Hispanic registered voters, as well as a precinct-level racial diversity index.
These maps reveal distinct geographic patterns: Black voters are geographically clustered in urban areas, which often have a strong civic infrastructure. With people living closer together, there is often more demand for civic organizations and greater capacity to sustain them. These results align with my ecological inference results: Black voters in these precincts are concetrated in communities with stronger civic infrastructure, which generally corresponds with higher turnout and lower roll-off.
In comparison, white voters are centralized in suburban areas that are outside of these urban areas. This spatial pattern potentially reflects residential segregation patterns. Lastly, Hispanic voters are more dispersed and most precincts fall below 10% Hispanic. The areas with relatively higher Hispanic shares tend to appear on the edges of major cities; these cities often consist of newer housing developments where Hispanic communities are often growing. Because Hispanic populations appear more dispersed compared to Black populations in these counties, it will be harder for campaigns to effectively target this group. Civic infrastructure in these areas will also likely be weaker. This spatial pattern aligns with my results of low turnout and high roll-off for Hispanic voters.
Finally, the diversity index further highlights which precincts are racially mixed or segregated, providing context for the ecological inference estimates. Overall, the GIS maps may be helpful is suggesting different spatial patterns that map be contributing to roll-off disparities. These spatial patterns underscore how community structure and residential geography shape the racial disparities in turnout and roll-off observed in my analysis.
Precincts_clean <- shp %>%
mutate(
county_desc = toupper(str_squish(county_nam)),
precinct_desc = toupper(str_squish(prec_id)) # "01-01", "01-02", ...
)
precinct_race2 <- precinct_race %>%
mutate(
county_desc = toupper(str_squish(county_desc)),
precinct_desc = toupper(str_squish(precinct_desc)),
precinct_desc = str_remove(precinct_desc, "^PRECINCT\\s+")
)
precinct_map <- Precincts_clean %>%
left_join(
precinct_race2,
by = c("county_desc", "precinct_desc")
) %>%
filter(county_desc %in% c("WAKE", "FORSYTH")) %>%
mutate(
pct_white = NH_White / N_total,
pct_black = NH_Black / N_total,
pct_hisp = Hispanic / N_total,
pct_other = Other / N_total
)precinct_map %>%
filter(county_desc == "WAKE") %>%
summarise(
n_precincts = n(),
with_data = sum(!is.na(N_total)),
missing = sum(is.na(N_total))
)ggplot(precinct_map) +
geom_sf(aes(fill = pct_black), color = NA) +
scale_fill_viridis_c(
labels = scales::percent_format(accuracy = 1),
name = "% Black"
) +
labs(
title = "Share of Black Registered Voters by Precinct",
subtitle = "Wake & Forsyth Counties"
) +
theme_minimal()ggplot(precinct_map) +
geom_sf(aes(fill = pct_white), color = NA) +
scale_fill_viridis_c(
labels = scales::percent_format(accuracy = 1),
name = "% White",
na.value = "grey90"
) +
labs(
title = "Share of White Registered Voters by Precinct",
subtitle = "Wake & Forsyth Counties"
) +
theme_minimal()ggplot(precinct_map) +
geom_sf(aes(fill = pct_hisp), color = NA) +
scale_fill_viridis_c(
labels = scales::percent_format(accuracy = 1),
name = "% Hispanic",
na.value = "grey90"
) +
labs(
title = "Share of Hispanic Registered Voters by Precinct",
subtitle = "Wake & Forsyth Counties"
) +
theme_minimal()precinct_div <- precinct_map %>%
mutate(
# treat missing shares as 0 so they don't break the calc
p_white = if_else(is.na(pct_white), 0, pct_white),
p_black = if_else(is.na(pct_black), 0, pct_black),
p_hisp = if_else(is.na(pct_hisp), 0, pct_hisp),
p_other = if_else(is.na(pct_other), 0, pct_other),
# Herfindahl diversity index (0 = homogeneous, 1 = very diverse)
diversity = 1 - (p_white^2 + p_black^2 + p_hisp^2 + p_other^2)
)library(ggplot2)
ggplot(precinct_div) +
geom_sf(aes(fill = diversity), color = NA) +
scale_fill_viridis_c(
name = "Diversity index",
labels = scales::number_format(accuracy = 0.01),
na.value = "grey90"
) +
labs(
title = "Racial/Ethnic Diversity by Precinct",
subtitle = "Wake & Forsyth Counties"
) +
theme_minimal()Limitations
Several limitations qualify the conclusions of this study. First, the analysis focuses on a single state. North Carolina is institutionally and demographically distinctive, and the patterns observed here may not generalize to states with different electoral rules, racial compositions, or political histories. The external validity of the results is therefore uncertain.
Second, the analysis relies on administrative voter files and official election returns, which may contain measurement error. Racial and ethnic categories in the voter file are imperfect proxies for social identities; misclassification can occur, especially for multiracial, Latino, Asian, and Indigenous voters. Collapsing Asian, Indigenous, multiracial, and unclassified voters into an “Other” category in the EI model helps mitigate small cell-size issues but obscures important heterogeneity within this group. Estimates for the “Other” category should thus be interpreted with caution, and future research should further explore the political behaviors of these individual groups.
Third, turnout measures are based on registered voters rather than the voting-eligible population. As a result, the study captures inequalities among those already incorporated into the electorate and cannot directly speak to disparities in registration itself.
Fourth, ecological inference is not free from assumptions. EI models rely on the idea that precinct-level patterns can be used to infer individual behavior. The estimates presented here should be viewed as probabilistic summaries of race-specific roll-off, not as perfectly observed individual-level behavior.
Fifth, the roll-off analysis is limited to a subset of municipal contests in 2024: two mayoral races and one city council at-large race. These offices were chosen for coverage and data quality, but they do not represent all local offices on the ballot. Roll-off patterns may differ in school board races, judicial contests, or smaller municipalities not captured in this design.
Finally, the study is primarily descriptive. Although I compare turnout and roll-off across election types and racial groups, the design cannot definitively isolate causal effects of election timing or concurrency. Unobserved differences in local political context, candidate quality, and issue environments may contribute to the observed patterns. The findings should therefore be interpreted as evidence of strong associations rather than causal estimates.
Discussion
Taken together, the results highlight the multidimensional nature of racial inequality in electoral participation. Consistent with Fraga’s (2018) argument, racial turnout gaps in North Carolina widen substantially in low-salience municipal elections. Latino, Asian, and Indigenous voters participate at far lower rates than White voters when contests move from the presidential to the local level. These patterns underscore how information costs, resource disparities, and unequal mobilization combine to depress local turnout among historically marginalized communities.
At the same time, Black non-Latino voters maintain turnout that is exceptionally close to that of White voters in municipal elections, and the ecological inference estimates show that Black voters have the lowest roll-off rate of any group in the 2024 concurrent election. Once they appear at the polls, Black voters are more likely than White, Latino, or “Other” voters to complete the ballot down to the municipal contests. This pattern is consistent with empowerment theories and accounts emphasizing dense civic infrastructures in Black communities.
By contrast, Latino voters exhibit both low municipal turnout and very high roll-off. These findings reinforce concerns about the representational marginalization of newer immigrant communities and smaller racial groups in local governance. Limited campaign targeting, language and information barriers, and weaker organizational infrastructures likely combine to produce high rates of incomplete ballots and low local representation for these groups.
Overall, the results suggest that concurrent elections can bring more voters, especially White and Black registrants to the polls, but concurrency alone does not guarantee meaningful engagement with local offices. For Latino and “Other” voters in particular, concurrent elections generate a growing number of presidential voters who remain effectively absent from municipal decision-making.
These findings have several implications for scholars and practitioners. For researchers, they underscore the importance of studying both turnout and ballot completion as dimensions of political inequality. They also illustrate the promise and limitations of ecological inference for recovering race-specific behavior from administrative data. Future work could extend this design by incorporating additional years, comparing multiple states, or combining EI with surname-based or geocoded measures to improve the measurement of racial identity.
For election administrators and campaigns, the results indicate that institutional reforms, such as moving municipal elections on-cycle, are likely necessary but not sufficient. Without targeted outreach to Latino, Asian, and Indigenous communities, through multilingual voter guides, community-based organizations, and culturally specific mobilization strategies, higher presidential turnout will not automatically translate into equal participation in local offices. At the same time, the comparatively high engagement of Black voters in municipal elections highlights the power of community-based networks and suggests that investing in local civic infrastructures can pay dividends for democratic representation.
Ultimately, this study shows that who turns out and how completely they use the ballot varies sharply across racial groups and electoral contexts. Understanding these patterns is essential for evaluating the democratic quality of local governance and for designing reforms that move municipalities toward a more representative electorate.
Appendix Code
Below is the code I used in Jupyter Notebook to obtain turnout data by race and ethnicity for the original analysis on the turnout gap. I calculated turnout by race and ethnicity for the 2016-2024 municipal, midterm, and presidential elections.
General and Midterm Elections
#### General 16, 20, 24 Elections
# --- imports ---
import os
import pandas as pd
import numpy as np
from datetime import datetime
# --- parameters you can change ---
YEAR = 2018
BASE_DIR = r"C:\Users\roryc\OneDrive\Desktop"
NCHIS_PATH = os.path.join(BASE_DIR, "ncvhis_master.csv")
SNAP_PATH = os.path.join(BASE_DIR, f"{YEAR}snap.csv")
OUTPUT_DIR = BASE_DIR # change if you want outputs elsewhere
# --- filename options (prevents overwrite) ---
RUN_SUFFIX = "_allraces" # e.g., "", "_v2"
ADD_TIMESTAMP = True # set False to disable timestamp
TS = "__" + datetime.now().strftime("%Y%m%d_%H%M%S") if ADD_TIMESTAMP else ""
SFX = f"{RUN_SUFFIX}{TS}"
# --- settings ---
pd.options.mode.copy_on_write = True
CHUNK_ROWS = 500_000 # tune for your RAM (250k–1M typical)
DELIM = ','
USE_VRN_MATCH = False # safer off unless VRNs are normalized end-to-end
# --- helpers ---
def normalize_key_series(s: pd.Series) -> pd.Series:
s = s.astype(str).str.strip()
return s.replace({"": np.nan})
def filter_target_statewide_general(df: pd.DataFrame, year: int) -> pd.DataFrame:
"""
Keep ONLY the statewide November General for the specified year.
election_lbl is a date (MM/DD/YYYY); we require month==11 and year match.
election_desc must be exactly '<mm>/<dd>/<yyyy> GENERAL'.
"""
dt = pd.to_datetime(df["election_lbl"], errors="coerce")
mask_year = dt.dt.year.eq(year)
mask_month = dt.dt.month.eq(11)
s = df["election_desc"].astype(str).str.upper().str.strip()
exact_general = s.str.match(r"^\d{1,2}/\d{1,2}/\d{4}\s+GENERAL$")
bad_variants = s.str.contains(r"MUNICIPAL|PRIMARY|RUNOFF|SPECIAL|SCHOOL|SECOND", na=False)
return df[mask_year & mask_month & exact_general & ~bad_variants]
def build_voted_sets(ncvhis_path: str, year: int):
"""
Scan ncvhis_master in chunks; collect unique NCIDs (and VRNs if enabled)
for people who voted in the statewide November GENERAL of 'year'.
"""
ncids_voted, vrns_voted = set(), set()
usecols = ["ncid", "voter_reg_num", "election_lbl", "election_desc"]
total_rows = 0
kept_rows = 0
for chunk in pd.read_csv(
ncvhis_path,
sep=DELIM,
dtype=str,
chunksize=CHUNK_ROWS,
low_memory=False,
usecols=lambda c: c in usecols,
):
total_rows += len(chunk)
chunk = filter_target_statewide_general(chunk, year)
kept_rows += len(chunk)
if chunk.empty:
continue
if "ncid" in chunk.columns:
vals = normalize_key_series(chunk["ncid"]).dropna().unique().tolist()
ncids_voted.update(vals)
if "voter_reg_num" in chunk.columns:
vals = normalize_key_series(chunk["voter_reg_num"]).dropna().unique().tolist()
vrns_voted.update(vals)
print("NCHIS rows scanned:", total_rows,
"| kept (statewide Nov GENERAL, year-matched):", kept_rows)
print("unique voters in sets — ncid:", len(ncids_voted), "| vrn:", len(vrns_voted))
return ncids_voted, vrns_voted
# labels and valid codes
valid_race = ["A","B","I","M","O","U","W"]
race_label = {
"A": "Asian",
"B": "Black or African American",
"I": "American Indian or Alaska Native",
"M": "Two or More Races",
"O": "Other",
"U": "Undesignated",
"W": "White"
}
valid_eth_alone = ["HL","NL"] # standalone table excludes UN
valid_eth_for_cross = ["HL","NL","UN"] # include UN for cross-tab
# --- streaming aggregator for summary ---
class TurnoutAggregator:
def __init__(self, year: int):
self.year = year
self.flag_col = f"voted_{year}_general"
# overall
self.overall_denom = 0
self.overall_numer = 0
# race
self.race_counts = {} # code -> [numer, denom]
# ethnicity (HL/NL)
self.eth_counts = {} # code -> [numer, denom]
# race x ethnicity (HL/NL/UN)
self.re_counts = {} # (race,eth) -> [numer, denom]
def _bump(self, dct, key, numer_add, denom_add):
if key not in dct:
dct[key] = [0, 0]
dct[key][0] += numer_add
dct[key][1] += denom_add
def update(self, chunk: pd.DataFrame):
fc = self.flag_col
# Normalize codes if present (won't add columns if missing)
if "race_code" in chunk.columns:
chunk["race_code"] = chunk["race_code"].astype(str).str.upper().str.strip()
if "ethnic_code" in chunk.columns:
chunk["ethnic_code"] = chunk["ethnic_code"].astype(str).str.upper().str.strip()
# overall
denom = len(chunk)
numer = int(chunk[fc].sum())
self.overall_denom += denom
self.overall_numer += numer
# race
if "race_code" in chunk.columns:
rsub = chunk[chunk["race_code"].isin(valid_race)]
if not rsub.empty:
grp = rsub.groupby("race_code")[fc].agg(["sum","count"])
for code, row in grp.iterrows():
self._bump(self.race_counts, code, int(row["sum"]), int(row["count"]))
# ethnicity HL/NL only
if "ethnic_code" in chunk.columns:
esub = chunk[chunk["ethnic_code"].isin(valid_eth_alone)]
if not esub.empty:
grp = esub.groupby("ethnic_code")[fc].agg(["sum","count"])
for code, row in grp.iterrows():
self._bump(self.eth_counts, code, int(row["sum"]), int(row["count"]))
# race x ethnicity (HL/NL/UN)
if {"race_code","ethnic_code"}.issubset(chunk.columns):
resub = chunk[
chunk["race_code"].isin(valid_race) &
chunk["ethnic_code"].isin(valid_eth_for_cross)
]
if not resub.empty:
grp = resub.groupby(["race_code","ethnic_code"])[fc].agg(["sum","count"])
for (rc, ec), row in grp.iterrows():
self._bump(self.re_counts, (rc, ec), int(row["sum"]), int(row["count"]))
def to_summary(self) -> pd.DataFrame:
rows = []
# overall
rows.append({
"year": self.year,
"election_type": "General",
"group_type": "overall",
"code": "ALL",
"label": "All voters",
"denominator": self.overall_denom,
"numerator": self.overall_numer,
"turnout_rate": (self.overall_numer / self.overall_denom) if self.overall_denom else np.nan
})
# race
for code in valid_race:
if code in self.race_counts:
numer, denom = self.race_counts[code]
rows.append({
"year": self.year,
"election_type": "General",
"group_type": "race",
"code": code,
"label": race_label.get(code, code),
"denominator": denom,
"numerator": numer,
"turnout_rate": (numer / denom) if denom else np.nan
})
# ethnicity HL/NL
for code in ["HL","NL"]:
if code in self.eth_counts:
numer, denom = self.eth_counts[code]
rows.append({
"year": self.year,
"election_type": "General",
"group_type": "ethnicity",
"code": code,
"label": "Hispanic/Latino" if code=="HL" else "Not Hispanic/Latino",
"denominator": denom,
"numerator": numer,
"turnout_rate": (numer / denom) if denom else np.nan
})
# race x ethnicity (HL, NL, UN) — ordered race then HL, NL, UN
for rc in valid_race:
for ec in ["HL","NL","UN"]:
key = (rc, ec)
if key in self.re_counts:
numer, denom = self.re_counts[key]
rows.append({
"year": self.year,
"election_type": "General",
"group_type": "race_ethnicity",
"code": f"{rc}_{ec}",
"label": f"{race_label.get(rc, rc)} {ec}",
"denominator": denom,
"numerator": numer,
"turnout_rate": (numer / denom) if denom else np.nan
})
return pd.DataFrame(rows)
# --- run ---
if not os.path.exists(OUTPUT_DIR):
os.makedirs(OUTPUT_DIR)
print("building voted sets from ncvhis_master…")
ncids_voted, vrns_voted = build_voted_sets(NCHIS_PATH, YEAR)
flag_col = f"voted_{YEAR}_general"
summary_path = os.path.join(OUTPUT_DIR, f"turnout_summary_{YEAR}_general{SFX}.csv")
flagged_path = os.path.join(OUTPUT_DIR, f"{YEAR}snap_with_voted_flag{SFX}.csv")
agg = TurnoutAggregator(YEAR)
wrote_header = False
print("streaming snapshot, flagging, and aggregating…")
snap_iter = pd.read_csv(
SNAP_PATH,
sep=DELIM,
dtype=str,
chunksize=CHUNK_ROWS,
low_memory=False
)
total_rows = 0
kept_after_status = 0
for i, chunk in enumerate(snap_iter, start=1):
total_rows += len(chunk)
chunk = chunk.copy() # keep ALL original columns
# normalize keys
if "ncid" in chunk.columns:
chunk["ncid"] = normalize_key_series(chunk["ncid"])
if "voter_reg_num" in chunk.columns:
chunk["voter_reg_num"] = normalize_key_series(chunk["voter_reg_num"])
# status filter (drop REMOVED only if column exists)
if "voter_status_desc" in chunk.columns:
chunk["voter_status_desc"] = chunk["voter_status_desc"].astype(str).str.upper().str.strip()
chunk = chunk[chunk["voter_status_desc"] != "REMOVED"]
kept_after_status += len(chunk)
# flag using NCID (and VRN if enabled)
mask_ncid = chunk["ncid"].isin(ncids_voted) if "ncid" in chunk.columns else pd.Series(False, index=chunk.index)
if USE_VRN_MATCH and "voter_reg_num" in chunk.columns:
mask_vrn = chunk["voter_reg_num"].isin(vrns_voted)
else:
mask_vrn = pd.Series(False, index=chunk.index)
chunk[flag_col] = (mask_ncid | mask_vrn).astype("int8")
# update summary counts (uses race_code/ethnic_code if present)
agg.update(chunk)
# append ENTIRE chunk (all original columns + flag)
chunk.to_csv(flagged_path, index=False, mode="a", header=(not wrote_header))
wrote_header = True
if i % 10 == 0:
print(f" wrote {i} chunks…")
print("Snapshot rows total:", total_rows, "| kept after status filter:", kept_after_status)
print("building summary table…")
summary_tbl = agg.to_summary()
summary_tbl.to_csv(summary_path, index=False)
print("\nSaved to:")
print(summary_path)
print(flagged_path)
print("\nPreview of summary table:")
print(summary_tbl.head(25).to_string(index=False))
# quick sanity print
overall_row = summary_tbl[summary_tbl["group_type"]=="overall"]
if not overall_row.empty:
denom = int(overall_row["denominator"].iloc[0])
numer = int(overall_row["numerator"].iloc[0])
rate = float(overall_row["turnout_rate"].iloc[0]) if denom else np.nan
print("\nOverall — denom:", denom, "numer:", numer, "rate:", round(rate*100,2), "%")Municipal Elections
# --- imports ---
import os, re
import pandas as pd
import numpy as np
from datetime import datetime
from collections import defaultdict, Counter
# --- parameters you can change ---
YEAR = 2017
BASE_DIR = r"C:\Users\roryc\OneDrive\Desktop"
NCHIS_PATH = os.path.join(BASE_DIR, "ncvhis_master.csv")
SNAP_PATH = os.path.join(BASE_DIR, f"{YEAR}snap.csv")
OUTPUT_DIR = BASE_DIR
WRITE_MONTH_FILES = False # True => write per-month rollups
# treat only these months in YEAR as municipal cycle
TARGET_MONTHS = [9, 10, 11]
STRICT_MONTHS = True
# --- filename options ---
RUN_SUFFIX = "_allraces"
ADD_TIMESTAMP = True
TS = "__" + datetime.now().strftime("%Y%m%d_%H%M%S") if ADD_TIMESTAMP else ""
SFX = f"{RUN_SUFFIX}{TS}"
# --- settings ---
pd.options.mode.copy_on_write = True
CHUNK_ROWS = 500_000
DELIM = ','
# --- codes/labels ---
valid_race = ["A","B","I","M","O","U","W"]
race_label = {
"A":"Asian","B":"Black or African American","I":"American Indian or Alaska Native",
"M":"Two or More Races","O":"Other","U":"Undesignated","W":"White"
}
valid_eth_alone = ["HL","NL"]
valid_eth_for_cross = ["HL","NL","UN"]
BAD_MUNI = {"", "UNINCORPORATED", "NOT MUNICIPAL LIMITS", "NOT ELIGIBLE"}
# --- helpers ---
def norm_text(s: pd.Series) -> pd.Series:
return s.astype(str).str.strip()
def normalize_key_series(s: pd.Series) -> pd.Series:
s = norm_text(s)
return s.replace({"": np.nan})
def normalize_precinct_str(x: str) -> str:
if x is None:
return ""
return str(x).upper().strip()
def slug_precinct_str(x: str) -> str:
u = normalize_precinct_str(x)
return re.sub(r"[^A-Z0-9]", "", u)
def parse_dt(s: pd.Series) -> pd.Series:
return pd.to_datetime(s, errors="coerce")
def date_token(dt: pd.Timestamp) -> str:
return f"{dt.month:02d}{dt.day:02d}"
def stable_id_frame(df: pd.DataFrame) -> pd.Series:
s = df["ncid"].astype(str).str.strip()
s = s.replace({"": np.nan})
return s
def norm_muni_series(s: pd.Series) -> pd.Series:
return s.astype(str).str.upper().str.strip()
def county_key_series(sid: pd.Series, sdesc: pd.Series) -> pd.Series:
# prefer county_id if present; otherwise county_desc
if sid is not None:
return norm_text(sid)
return norm_text(sdesc)
# ---------- PHASE 1: build per-DATE voter sets + precinct evidence keyed by county ----------
def build_by_date_sets(ncvhis_path: str, year: int):
usecols = [
"ncid","election_lbl",
"pct_label","pct_description","vtd_label","vtd_description",
"county_id","county_desc","election_desc"
]
voted_ids_by_date = {} # {date -> set(ncid)}
pct_raw_by_date = {} # {date -> set((county, pct_norm))}
pct_slug_by_date = {} # {date -> set((county, pct_slug))}
pct_votes_count = {} # {date -> Counter((county, pct_slug))} voter counts
pct_to_muni = {} # {date -> dict((county,pct_slug) -> set(muni))}
total_rows = 0
kept = 0
reader = pd.read_csv(ncvhis_path, sep=DELIM, dtype=str, chunksize=CHUNK_ROWS, low_memory=False, usecols=usecols)
for chunk in reader:
total_rows += len(chunk)
dt = parse_dt(chunk["election_lbl"])
year_mask = dt.dt.year.eq(year)
if STRICT_MONTHS:
month_mask = dt.dt.month.isin(TARGET_MONTHS)
chunk = chunk[year_mask & month_mask]
else:
chunk = chunk[year_mask]
kept += len(chunk)
if chunk.empty:
continue
chunk = chunk.copy()
chunk["ncid"] = normalize_key_series(chunk["ncid"])
chunk["_sid"] = stable_id_frame(chunk)
chunk["_date"] = dt
county = county_key_series(chunk.get("county_id"), chunk.get("county_desc"))
for c in ["pct_label","pct_description","vtd_label","vtd_description"]:
if c in chunk.columns:
chunk[c] = chunk[c].astype(str)
pl = chunk.get("pct_label")
vl = chunk.get("vtd_label")
pdsc = chunk.get("pct_description")
vdsc = chunk.get("vtd_description")
cand = pl.fillna("")
cand = cand.where((pl.notna()) & (pl.astype(str) != ""), vl.fillna(""))
cand = cand.where(cand.astype(str) != "", pdsc.fillna(""))
cand = cand.where(cand.astype(str) != "", vdsc.fillna(""))
pct_norm = cand.map(normalize_precinct_str)
pct_slug = cand.map(slug_precinct_str)
for dtt, sub in chunk.groupby("_date"):
if pd.isna(dtt):
continue
if dtt not in voted_ids_by_date:
voted_ids_by_date[dtt] = set()
if dtt not in pct_raw_by_date:
pct_raw_by_date[dtt] = set()
if dtt not in pct_slug_by_date:
pct_slug_by_date[dtt] = set()
if dtt not in pct_votes_count:
pct_votes_count[dtt] = Counter()
if dtt not in pct_to_muni:
pct_to_muni[dtt] = defaultdict(set)
sid_vals = sub["_sid"].dropna().unique()
for v in sid_vals:
voted_ids_by_date[dtt].add(v)
# build county+pct evidence
idx = sub.index
for j in idx:
ckey = county.loc[j]
pn = pct_norm.loc[j]
ps = pct_slug.loc[j]
if isinstance(pn, str) and pn != "":
pct_raw_by_date[dtt].add((ckey, pn))
if isinstance(ps, str) and ps != "":
pct_slug_by_date[dtt].add((ckey, ps))
pct_votes_count[dtt][(ckey, ps)] += 1
# record muni tokens seen inside election_desc when present (fallback signal)
# keep minimal: rely on snapshot for muni mapping later
dates_present = sorted(voted_ids_by_date.keys())
print("NCHIS scanned:", total_rows, "| kept by date:", kept)
for d in dates_present:
raw_ct = len(pct_raw_by_date.get(d, set()))
slg_ct = len(pct_slug_by_date.get(d, set()))
print(f" {d.date()}: voters={len(voted_ids_by_date[d])}; precincts(raw)={raw_ct}, slug={slg_ct}")
return dates_present, voted_ids_by_date, pct_raw_by_date, pct_slug_by_date, pct_votes_count, pct_to_muni
# ---------- PHASE 1.5: muni sets per date + per-(county,pct_slug) muni maps from SNAP voters ----------
def build_muni_sets_from_snapshot(snapshot_path: str, dates_present, voted_ids_by_date):
muni_by_date = {d: set() for d in dates_present} # {date -> {muni}}
muni_by_date_pct = {d: defaultdict(set) for d in dates_present} # {date -> {(county,pct_slug) -> {muni}}}
reader = pd.read_csv(snapshot_path, sep=DELIM, dtype=str, chunksize=CHUNK_ROWS, low_memory=False)
for chunk in reader:
if chunk.empty:
continue
chunk = chunk.copy()
chunk["ncid"] = normalize_key_series(chunk["ncid"])
chunk["__sid"] = stable_id_frame(chunk)
muni = norm_muni_series(chunk.get("municipality_desc", pd.Series(index=chunk.index, dtype="object")))
pa = norm_text(chunk.get("precinct_abbrv", pd.Series(index=chunk.index, dtype="object"))).str.upper().str.strip()
pa_slug = pa.str.replace(r"[^A-Z0-9]", "", regex=True)
county = county_key_series(chunk.get("county_id"), chunk.get("county_desc"))
for d in dates_present:
ids_voted = voted_ids_by_date[d]
mask = chunk["__sid"].isin(ids_voted)
if not mask.any():
continue
munis = chunk.loc[mask, "municipality_desc"].dropna().unique().tolist()
for m in munis:
if m:
muni_by_date[d].add(m)
# map (county,pct_slug) -> munis seen among that date's actual voters
tmp = pd.DataFrame({
"municipality_desc": muni[mask].values,
"__pct_slug": pa_slug[mask].values,
"__county": county[mask].values
})
grouped = tmp.groupby(["__county","__pct_slug"])
for key, grp in grouped:
uniq_m = grp["municipality_desc"].dropna().unique().tolist()
for mm in uniq_m:
if mm:
muni_by_date_pct[d][key].add(mm)
return muni_by_date, muni_by_date_pct
# ---------- PHASE 2 ----------
class PerDateAggregator:
def __init__(self, year: int, dates_present):
self.year = year
self.dates = dates_present
self.den_ids = {d: set() for d in self.dates}
self.attr_seen = {d: {} for d in self.dates}
def _record_attrs_once(self, d, sid, r, e):
if sid not in self.attr_seen[d]:
self.attr_seen[d][sid] = (r, e)
def update_chunk(self, d, sub_df: pd.DataFrame):
sub_df = sub_df.dropna(subset=["__sid"])
if sub_df.empty:
return
sub_df = sub_df.drop_duplicates(subset="__sid", keep="first")
vals = sub_df["__sid"].tolist()
for v in vals:
self.den_ids[d].add(v)
rc = norm_text(sub_df.get("race_code", "")).str.upper()
ec = norm_text(sub_df.get("ethnic_code", "")).str.upper()
it = zip(sub_df["__sid"], rc, ec)
for sid, r, e in it:
self._record_attrs_once(d, sid, r, e)
def _emit_rows(self, d, denom_ids, numer_ids):
rows = []
rows.append({
"year": self.year,
"election_date": d.date().isoformat(),
"election_type": "Municipal",
"group_type": "overall",
"code": "ALL",
"label": "All voters",
"denominator": len(denom_ids),
"numerator": len(numer_ids),
"turnout_rate": (len(numer_ids)/len(denom_ids)) if denom_ids else np.nan
})
by_race = {}
for sid in denom_ids:
r, e = self.attr_seen[d].get(sid, ("",""))
if r in valid_race:
if r not in by_race:
by_race[r] = {"den":0,"num":0}
by_race[r]["den"] += 1
for sid in numer_ids:
r, e = self.attr_seen[d].get(sid, ("",""))
if r in valid_race:
if r not in by_race:
by_race[r] = {"den":0,"num":0}
by_race[r]["num"] += 1
for r in valid_race:
if r in by_race:
den = by_race[r]["den"]; num = by_race[r]["num"]
rows.append({
"year": self.year, "election_date": d.date().isoformat(),
"election_type": "Municipal", "group_type": "race",
"code": r, "label": race_label.get(r, r),
"denominator": den, "numerator": num,
"turnout_rate": (num/den) if den else np.nan
})
by_eth = {}
for sid in denom_ids:
r, e = self.attr_seen[d].get(sid, ("",""))
if e in valid_eth_alone:
if e not in by_eth:
by_eth[e] = {"den":0,"num":0}
by_eth[e]["den"] += 1
for sid in numer_ids:
r, e = self.attr_seen[d].get(sid, ("",""))
if e in valid_eth_alone:
if e not in by_eth:
by_eth[e] = {"den":0,"num":0}
by_eth[e]["num"] += 1
for e in ["HL","NL"]:
if e in by_eth:
den = by_eth[e]["den"]; num = by_eth[e]["num"]
rows.append({
"year": self.year, "election_date": d.date().isoformat(),
"election_type": "Municipal", "group_type": "ethnicity",
"code": e, "label": "Hispanic/Latino" if e=="HL" else "Not Hispanic/Latino",
"denominator": den, "numerator": num,
"turnout_rate": (num/den) if den else np.nan
})
by_re = {}
for sid in denom_ids:
r, e = self.attr_seen[d].get(sid, ("",""))
if r in valid_race and e in valid_eth_for_cross:
key = (r,e)
if key not in by_re:
by_re[key] = {"den":0,"num":0}
by_re[key]["den"] += 1
for sid in numer_ids:
r, e = self.attr_seen[d].get(sid, ("",""))
if r in valid_race and e in valid_eth_for_cross:
key = (r,e)
if key not in by_re:
by_re[key] = {"den":0,"num":0}
by_re[key]["num"] += 1
for r in valid_race:
for e in ["HL","NL","UN"]:
key = (r,e)
if key in by_re:
den = by_re[key]["den"]; num = by_re[key]["num"]
rows.append({
"year": self.year, "election_date": d.date().isoformat(),
"election_type": "Municipal", "group_type": "race_ethnicity",
"code": f"{r}_{e}", "label": f"{race_label.get(r, r)} {e}",
"denominator": den, "numerator": num,
"turnout_rate": (num/den) if den else np.nan
})
return rows
def to_summary(self, voted_ids_by_date):
rows = []
for d in self.dates:
denom = self.den_ids[d]
numer = voted_ids_by_date.get(d, set()) & denom
rows.extend(self._emit_rows(d, denom, numer))
return pd.DataFrame(rows)
# --------------------------- RUN ---------------------------
if not os.path.exists(OUTPUT_DIR):
os.makedirs(OUTPUT_DIR)
print("Phase 1: building per-DATE voter + precinct evidence…")
(
dates_present,
voted_ids_by_date,
pct_raw_by_date,
pct_slug_by_date,
pct_votes_count,
_pct_to_muni_placeholder
) = build_by_date_sets(NCHIS_PATH, YEAR)
if not dates_present:
raise RuntimeError(f"No election dates found in {YEAR} with current month filter.")
print("Phase 1.5: building muni sets and (county,pct_slug)->muni maps from snapshot voters…")
muni_by_date, muni_by_date_pct = build_muni_sets_from_snapshot(SNAP_PATH, dates_present, voted_ids_by_date)
summary_path = os.path.join(OUTPUT_DIR, f"turnout_summary_{YEAR}_municipal_byDATE_strictDenom{SFX}.csv")
flagged_path = os.path.join(OUTPUT_DIR, f"{YEAR}snap_with_votedflag_muni_byDATE_strictDenom{SFX}.csv")
flag_cols = {d: f"voted_{YEAR}_municipal_d{date_token(d)}" for d in dates_present}
agg = PerDateAggregator(YEAR, dates_present)
print("Phase 2: snapshot pass — tighten denominator; keep numerator unchanged…")
wrote_header = False
total_rows = kept_rows_total = 0
snap_iter = pd.read_csv(SNAP_PATH, sep=DELIM, dtype=str, chunksize=CHUNK_ROWS, low_memory=False)
for i, chunk in enumerate(snap_iter, start=1):
total_rows += len(chunk)
if chunk.empty:
continue
chunk = chunk.copy()
chunk["ncid"] = normalize_key_series(chunk.get("ncid", pd.Series(index=chunk.index, dtype="object")))
chunk["race_code"] = norm_text(chunk.get("race_code", "")).str.upper()
chunk["ethnic_code"] = norm_text(chunk.get("ethnic_code", "")).str.upper()
status = norm_text(chunk.get("voter_status_desc", "")).str.upper()
chunk["__status"] = status
chunk["__sid"] = stable_id_frame(chunk)
pa = norm_text(chunk.get("precinct_abbrv", "")).map(normalize_precinct_str)
pa_slug = pa.map(slug_precinct_str)
chunk["__pct_norm"] = pa
chunk["__pct_slug"] = pa_slug
muni = norm_muni_series(chunk.get("municipality_desc", pd.Series(index=chunk.index, dtype="object")))
chunk["municipality_desc"] = muni
county = county_key_series(chunk.get("county_id"), chunk.get("county_desc"))
kept_rows_total += len(chunk)
for d in dates_present:
chunk[flag_cols[d]] = np.int8(0)
for d in dates_present:
raw_set = pct_raw_by_date.get(d, set())
slg_set = pct_slug_by_date.get(d, set())
if (not raw_set) and (not slg_set):
continue
# precinct eligibility from NCHIS by (county,pct)
# build series: precinct_ok if either norm or slug pair appears
idx = chunk.index
precinct_ok_vals = []
for j in idx:
key_raw = (county.loc[j], pa.loc[j])
key_slg = (county.loc[j], pa_slug.loc[j])
ok = False
if key_raw in raw_set:
ok = True
elif key_slg in slg_set:
ok = True
precinct_ok_vals.append(ok)
precinct_ok = pd.Series(precinct_ok_vals, index=idx, dtype=bool)
has_valid_muni = ~muni.fillna("").str.upper().str.strip().isin(BAD_MUNI)
date_munis = muni_by_date.get(d, set())
per_pct_map = muni_by_date_pct.get(d, defaultdict(set))
# muni_ok_pct_bool: muni is in the per-(county,pct_slug) map for that date
muni_ok_pct_vals = []
pct_has_map_vals = []
allowed_sizes_vals = []
pct_votes_seen_vals = []
# grab Counter for evidence of voters in this (county,pct_slug) on that date
votes_counter = pct_votes_count.get(d, Counter())
for j in idx:
ckey = county.loc[j]
pslug = pa_slug.loc[j]
mval = muni.loc[j]
key = (ckey, pslug)
aset = per_pct_map.get(key, set())
has_map = isinstance(aset, set) and len(aset) > 0
pct_has_map_vals.append(has_map)
if isinstance(aset, set):
allowed_sizes_vals.append(len(aset))
muni_ok_pct_vals.append(mval in aset)
else:
allowed_sizes_vals.append(0)
muni_ok_pct_vals.append(False)
pct_votes_seen_vals.append(votes_counter.get(key, 0))
muni_ok_pct_bool = pd.Series(muni_ok_pct_vals, index=idx, dtype=bool)
pct_has_map = pd.Series(pct_has_map_vals, index=idx, dtype=bool)
allowed_sizes = pd.Series(allowed_sizes_vals, index=idx)
pct_votes_seen = pd.Series(pct_votes_seen_vals, index=idx)
muni_ok_any = muni.isin(list(date_munis))
# numerator gate stays lenient inside denom rows
# status: ACTIVE allowed; INACTIVE only if voted
ids_voted = voted_ids_by_date.get(d, set())
is_voter = chunk["__sid"].isin(ids_voted)
status_ok = (status == "ACTIVE") | ((status == "INACTIVE") & is_voter)
# muni_gate for denominator:
# if muni present: require pct->muni match if map exists, else muni in date set
# if muni missing/invalid: allow only when this (county,pct_slug) maps to EXACTLY ONE muni AND precinct saw >=2 voters
inner_present = np.where(pct_has_map.values, muni_ok_pct_bool.values, muni_ok_any.values)
inner_missing = (pct_has_map.values & (allowed_sizes.values == 1) & (pct_votes_seen.values >= 2))
muni_gate = np.where(has_valid_muni.values, inner_present, inner_missing)
denom_mask = chunk["__sid"].notna() & (is_voter | (precinct_ok & pd.Series(muni_gate, index=chunk.index) & status_ok))
if not denom_mask.any():
continue
sub = chunk.loc[denom_mask, ["__sid","race_code","ethnic_code"]]
agg.update_chunk(d, sub)
vote_mask = denom_mask & is_voter
chunk.loc[denom_mask, flag_cols[d]] = np.int8(0)
chunk.loc[vote_mask, flag_cols[d]] = np.int8(1)
out_chunk = chunk.drop(columns=["__sid","__pct_norm","__pct_slug","__status"], errors="ignore")
out_chunk.to_csv(flagged_path, index=False, mode="a", header=(not wrote_header))
wrote_header = True
if i % 10 == 0:
print(f" wrote {i} chunks…")
print("Snapshot rows total:", total_rows, "| processed rows:", kept_rows_total)
print("Building per-date summary (numerator = NCHIS voters ∩ eligible)…")
summary_tbl = agg.to_summary(voted_ids_by_date)
summary_tbl.to_csv(summary_path, index=False)
print("\nSaved to:")
print(summary_path)
print(flagged_path)
print("\nPreview (first 30 rows):")
print(summary_tbl.head(30).to_string(index=False))
# -------- Optional per-MONTH rollups --------
if WRITE_MONTH_FILES:
print("\nBuilding per-month rollups…")
numer_ids_by_date = {d: voted_ids_by_date.get(d, set()) & agg.den_ids[d] for d in dates_present}
month_den = defaultdict(list)
month_num = defaultdict(list)
for d in dates_present:
ym = f"{d.year}-{d.month:02d}"
month_den[ym].append(agg.den_ids[d])
month_num[ym].append(numer_ids_by_date[d])
month_rows = []
for ym in sorted(month_den.keys()):
den_union = set()
for s in month_den[ym]:
den_union |= s
num_union = set()
for s in month_num[ym]:
num_union |= s
month_rows.append({
"year_month": ym, "election_type": "Municipal", "group_type": "overall",
"code": "ALL", "label": "All voters",
"denominator": len(den_union), "numerator": len(num_union),
"turnout_rate": (len(num_union)/len(den_union)) if den_union else np.nan
})
month_summary = pd.DataFrame(month_rows).sort_values("year_month")
month_summary_path = os.path.join(OUTPUT_DIR, f"turnout_summary_{YEAR}_municipal_byMONTH_STANDALONE{SFX}.csv")
month_summary.to_csv(month_summary_path, index=False)
print("Per-month (stand-alone) saved to:", month_summary_path)
seen_den, seen_num = set(), set()
month_inc_rows = []
for ym in sorted(month_den.keys()):
den_union = set()
for s in month_den[ym]:
den_union |= s
num_union = set()
for s in month_num[ym]:
num_union |= s
den_inc = den_union - seen_den
num_inc = num_union - seen_num
month_inc_rows.append({
"year_month": ym, "election_type": "Municipal", "group_type": "overall",
"code": "ALL", "label": "All voters (incremental)",
"denominator": len(den_inc), "numerator": len(num_inc),
"turnout_rate": (len(num_inc)/len(den_inc)) if den_inc else np.nan
})
seen_den |= den_union
seen_num |= num_union
month_incremental = pd.DataFrame(month_inc_rows).sort_values("year_month")
month_incremental_path = os.path.join(OUTPUT_DIR, f"turnout_summary_{YEAR}_municipal_byMONTH_INCREMENTAL{SFX}.csv")
month_incremental.to_csv(month_incremental_path, index=False)
print("Per-month (incremental) saved to:", month_incremental_path)