Show Code
knitr::opts_chunk$set(
fig.width = 8,
fig.height = 5,
dpi = 300
)This report examines whether observable statistical anomalies are present in the certified results of the 2024 Presidential Election in North Carolina. The objective is not to advance a predetermined conclusion, but to evaluate whether the reported vote totals exhibit structural irregularities that would be inconsistent with ordinary electoral processes.
The dataset used in this analysis was obtained from Election Truth Alliance and cross‑verified against the official results published by the North Carolina State Board of Elections. Vote totals, reporting units, and voting method breakdowns were confirmed to match the state’s certified results.
Elections generate large, structured datasets. If systematic manipulation were to occur, it would likely leave detectable statistical signatures — such as discontinuities, nonlinear threshold effects, or unexplained shifts in vote share across independent dimensions of the data. Accordingly, this analysis evaluates candidate vote shares across several distinct vectors:
Each of these dimensions provides an independent lens through which to assess the internal consistency of the reported results.
The guiding principle of this report is straightforward:
If irregular intervention occurred, it must manifest in observable structure.
The analysis proceeds from descriptive structure to targeted tests. First, we document how votes were cast across administrative voting methods and analytic voting modes. We then evaluate whether candidate vote shares vary abnormally with precinct size, cumulative vote totals, or turnout levels. Throughout, visual inspection precedes formal modeling in order to avoid imposing assumptions (such as arbitrary hinge points or thresholds) not supported by the data.
By examining multiple independent dimensions of the dataset, this report seeks to determine whether the reported results behave as expected under ordinary electoral dynamics, or whether they display patterns requiring further explanation.
knitr::opts_chunk$set(
fig.width = 8,
fig.height = 5,
dpi = 300
)This section defines the core units and terms used throughout the analysis. The goal is to establish a consistent vocabulary before moving into descriptive and statistical evaluation.
Election results in North Carolina are reported at the reporting unit level. Reporting units include:
In many counties, Early Voting and Absentee by Mail ballots are not attributed to individual physical precincts. Instead, they are aggregated into county‑level reporting units (e.g., “Early Voting” or “Absentee”). As a result, the dataset contains both geographically bounded precincts and administrative units that function solely as reporting categories.
Each reporting unit therefore represents the smallest publicly available aggregation of votes.
Throughout this report, the term precinct size refers to the total number of votes reported within a reporting unit, regardless of whether it is a physical precinct or an administrative unit.
The raw dataset separates ballots into four administrative voting methods:
These categories reflect how ballots were cast and processed within the state’s reporting framework.
These are administrative classifications and are reported separately in the official results.
For analytic clarity, the four administrative voting methods are consolidated into two broader voting modes:
This distinction reflects differences in ballot handling and chain‑of‑custody procedures. In‑person ballots are processed through voting equipment at physical locations, while mail ballots follow a separate request, verification, and tabulation process.
Unless otherwise noted, comparisons in this report are conducted at the voting mode level rather than the individual method level.
The analysis is restricted to the two major party candidates in the 2024 presidential contest:
For clarity and consistency in tables and figures, party labels are abbreviated as:
Vote share calculations are computed within reporting unit and voting mode combinations.
County‑level and precinct‑level counts of registered voters are incorporated to evaluate voter turnout, defined as:
\[\text{Turnout} = \frac{\text{Total Ballots Cast}} {\text{Registered Voters}}\]
Turnout is used later in the report to assess whether vote share patterns change as participation increases. If a systematic threshold or structural shift were present, it could potentially appear as a nonlinear relationship between turnout and candidate vote share.
In summary, the dataset consists of:
With these definitions established, the next section begins by examining how votes are distributed across voting methods statewide, providing context for the relative weight of each voting channel.
Load Precinct-level election data. Verify the Election Date is 2024-11-05. Show glimpse of data. Breakdown by Voting Method.
library(tidyverse)
library(sandwich)
library(lmtest)
library(moments) # for kurtosis()
#library(ggplot2)
# We will use randomization, setting random seed for reproducibility
set.seed(2024)
# ------------------------------------------------------------
# Read North Carolina 2024 precinct results using readr
# (tab-delimited file). Explicit column types are specified
# to ensure consistent parsing and avoid type guessing.
# ------------------------------------------------------------
library(readr)
file_path <- "../Data/elections-main/data/raw/US_NC/2024/results_pct_20241105/results_pct_20241105.txt"
nc_results_raw <- read_tsv(
file = file_path,
col_types = cols(
County = col_character(),
`Election Date` = col_date(format = "%m/%d/%Y"),
Precinct = col_character(),
`Contest Group ID` = col_double(),
`Contest Type` = col_character(),
`Contest Name` = col_character(),
Choice = col_character(),
`Choice Party` = col_character(),
`Vote For` = col_double(),
`Election Day` = col_double(),
`Early Voting` = col_double(),
`Absentee by Mail` = col_double(),
Provisional = col_double(),
`Total Votes` = col_double(),
`Real Precinct` = col_character(),
`...16` = col_skip() # skip the junk column
),
progress = FALSE
)
glimpse(nc_results_raw)Rows: 233,510
Columns: 15
$ County <chr> "BUNCOMBE", "BUNCOMBE", "BUNCOMBE", "BUNCOMBE", "BU…
$ `Election Date` <date> 2024-11-05, 2024-11-05, 2024-11-05, 2024-11-05, 20…
$ Precinct <chr> "01.1", "01.1", "01.1", "01.1", "01.1", "01.1", "02…
$ `Contest Group ID` <dbl> 7, 22, 1301, 1351, 1393, 1393, 22, 1005, 1011, 1393…
$ `Contest Type` <chr> "C", "C", "S", "S", "S", "S", "C", "S", "S", "S", "…
$ `Contest Name` <chr> "CITY OF ASHEVILLE CITY COUNCIL", "CITY OF ASHEVILL…
$ Choice <chr> "Write-In (Miscellaneous)", "No", "Shannon W. Bray"…
$ `Choice Party` <chr> NA, NA, "LIB", "REP", NA, "GRE", NA, "DEM", "DEM", …
$ `Vote For` <dbl> 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, …
$ `Election Day` <dbl> 6, 62, 14, 50, 0, 11, 90, 338, 326, 0, 1, 33, 47, 4…
$ `Early Voting` <dbl> 14, 215, 21, 123, 0, 9, 247, 1553, 1512, 0, 8, 102,…
$ `Absentee by Mail` <dbl> 1, 20, 4, 19, 0, 1, 32, 222, 217, 0, 0, 20, 19, 19,…
$ Provisional <dbl> 0, 1, 0, 2, 0, 0, 5, 10, 10, 0, 0, 2, 2, 2, 0, 1, 0…
$ `Total Votes` <dbl> 21, 298, 39, 194, 0, 21, 374, 2123, 2065, 0, 9, 157…
$ `Real Precinct` <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "…
There is an ambiguous column named Real Precinct. We only want actual vote data, citizens casting a ballot.
# Check Real Precinct values
nc_results_raw %>%
count(`Real Precinct`)# A tibble: 2 × 2
`Real Precinct` n
<chr> <int>
1 N 25640
2 Y 207870
In North Carolina State Board of Elections files, the field Real Precinct indicates whether a row corresponds to a physical precinct or another type of reporting unit.
"Y" — An actual geographic voting precinct"N" — A reporting unit that is not a physical precinctRows coded "N" commonly include:
These entries are not hierarchical rollups of precinct totals. Rather, they represent parallel reporting categories used for administrative aggregation of specific ballot types or voting modes.
# Unique election dates
nc_results_raw %>%
distinct(`Election Date`)# A tibble: 1 × 1
`Election Date`
<date>
1 2024-11-05
# Statewide Voting Method Distribution (Presidential Contest)
method_usage_nc <- nc_results_raw %>%
# Restrict to Presidential contest
dplyr::filter(`Contest Name` == "US PRESIDENT") %>%
# Collapse across candidates so each reporting unit counted once
dplyr::group_by(County, Precinct, `Real Precinct`) %>%
dplyr::summarise(
`Election Day` = sum(`Election Day`, na.rm = TRUE),
`Early Voting` = sum(`Early Voting`, na.rm = TRUE),
`Absentee by Mail` = sum(`Absentee by Mail`, na.rm = TRUE),
Provisional = sum(Provisional, na.rm = TRUE),
.groups = "drop"
) %>%
# Sum statewide totals
dplyr::summarise(
`Election Day` = sum(`Election Day`),
`Early Voting` = sum(`Early Voting`),
`Absentee by Mail` = sum(`Absentee by Mail`),
Provisional = sum(Provisional)
) %>%
# Convert to long format
tidyr::pivot_longer(
cols = everything(),
names_to = "Voting Method",
values_to = "Total Votes"
) %>%
# Compute percentages
dplyr::mutate(
Percent = `Total Votes` / sum(`Total Votes`)
) %>%
# Arrange descending
dplyr::arrange(dplyr::desc(Percent))
method_usage_nc# A tibble: 4 × 3
`Voting Method` `Total Votes` Percent
<chr> <dbl> <dbl>
1 Early Voting 4208839 0.739
2 Election Day 1169088 0.205
3 Absentee by Mail 295602 0.0519
4 Provisional 25612 0.00449
Before proceeding to anomaly testing, it is important to understand how ballots were cast across administrative voting methods.
After validating that the dataset corresponds to the certified November 5, 2024 general election, we aggregated votes statewide across all reporting units for the presidential contest. The distribution of ballots by voting method is as follows:
Several structural observations follow from this distribution:
Early Voting dominates the electorate. Nearly three‑quarters of all ballots were cast during the early voting period. This is substantially larger than Election Day voting and represents the primary voting channel in North Carolina.
Election Day voting represents roughly one‑fifth of ballots cast. While traditionally considered the focal point of elections, it accounts for a minority of total votes in 2024.
Mail voting constitutes a small share of total ballots. Absentee by Mail represents approximately 5% of statewide votes. This limits its overall impact on aggregate outcomes, though it remains analytically distinct due to its separate administrative handling.
Provisional ballots are negligible in volume. At less than one‑half of one percent of total votes, provisional ballots do not materially influence statewide totals.
This distribution has important implications for the remainder of the analysis. Because nearly 95% of ballots were cast in person (Early Voting + Election Day + Provisional), the primary structural evaluation of vote share patterns will focus on in‑person voting channels. Mail voting will be examined separately but represents a comparatively small component of the overall electorate.
North Carolina permits no-excuse absentee voting, but the process includes a witness requirement, ID documentation requirements, and ballot receipt deadlines that differ from universal vote-by-mail states.
The statewide total_votes value of 157,960,234 does not represent the number of ballots cast in North Carolina.
The election results file is structured in long format, meaning that each row represents a specific precinct × contest × candidate combination rather than a single ballot.
As a result:
Total Votes across all rows, ballots are effectively counted multiple times — once per contest.Therefore:
\[ \text{total_votes} = \sum \text{contest-level votes} \]
not
\[ \text{number of unique ballots cast} \]
This inflation is expected when working with long-format election returns.
For statewide ballot counts, totals must instead be derived from a single contest that appears on every ballot (e.g., President or Governor), or from county-level ballot summaries.
County size was proxied using total ballots cast.
We tested the association between county size and mail share using:
Results:
These results indicate a moderate positive, monotonic relationship between county size and mail share.
When plotted on a log scale, the upward trend persists across the full distribution, indicating the relationship is not driven solely by a small number of very large counties.
cor(county_mail_summary$total_ballots,
county_mail_summary$mail_share,
use = "complete.obs")[1] 0.5375289
cor(log(county_mail_summary$total_ballots),
county_mail_summary$mail_share,
use = "complete.obs")[1] 0.5941734
cor(county_mail_summary$total_ballots,
county_mail_summary$mail_share,
method = "spearman",
use = "complete.obs")[1] 0.5515872
To assess whether any counties deviate substantially from the expected mail share given their size, we examined residuals from the log-linear regression:
\[\text{Mail Share}_i = \beta_0 + \beta_1 \log(\text{Total Ballots}_i) + \epsilon_i\]
The residual plot shows a tight horizontal band centered at zero across the full range of county sizes. There is:
The largest positive residual is approximately 0.033 (3.3 percentage points), and the largest negative residual is approximately -0.021. Both fall within expected statistical bounds for a sample of this size.
Importantly, high-residual counties are isolated observations rather than members of a structured pattern.
#library(ggplot2)
# ------------------------------------------------------------
# Add model residuals and fitted values to county_mail_summary
# so they are available for plotting during document render.
# ------------------------------------------------------------
county_mail_summary <- county_mail_summary %>%
mutate(
fitted = fitted(model),
residuals = resid(model),
std_resid = rstandard(model)
)
ggplot(county_mail_summary,
aes(x = log(total_ballots), y = residuals)) +
geom_point(size = 2, alpha = 0.7) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_hline(yintercept = c(-0.03, 0.03),
linetype = "dotted",
color = "red") +
labs(
x = "Log(Total Ballots)",
y = "Residual (Observed − Predicted Mail Share)",
title = "Residuals from County Mail Share Regression"
) +
theme_minimal()The residual plot shows the difference between each county’s observed mail vote share and the share predicted by the regression on county size. If county size were systematically associated with unusual mail behavior, we would expect to see a pattern in the residuals—such as a visible upward or downward trend, widening dispersion at larger sizes, clustering on one side of zero, or structural breaks in the upper tail. Instead, the points appear randomly scattered around zero across the full range of county sizes, with no discernible slope or curvature. The spread remains relatively constant, and nearly all counties fall within a narrow band around the fitted line. This “static”-like pattern is consistent with ordinary random variation rather than systematic size-related distortion, providing no visual evidence of anomalous mail share behavior linked to county size.
The histogram of residuals is:
The residual standard deviation is approximately 0.011 (1.1 percentage points), indicating that most counties deviate from predicted mail share by only small margins.
While a small number of counties exhibit moderately positive residuals, the overall distribution is tightly clustered and consistent with ordinary cross-county variation.
ggplot(county_mail_summary,
aes(x = residuals)) +
geom_histogram(bins = 25,
fill = "steelblue",
color = "white",
alpha = 0.8) +
geom_vline(xintercept = 0, linetype = "dashed") +
labs(
x = "Residual",
y = "Count",
title = "Distribution of County-Level Residuals"
) +
theme_minimal()The Q–Q plot indicates that residuals closely follow the theoretical normal line across the central portion of the distribution.
Modest deviations appear in the tails, particularly in the upper tail, where a small number of counties exhibit slightly larger-than-expected positive residuals. These departures are gradual rather than abrupt and are consistent with mild skew in a finite sample.
Overall, no extreme departures from normality or structural abnormalities are observed.
ggplot(county_mail_summary,
aes(sample = residuals)) +
stat_qq() +
stat_qq_line(color = "red") +
labs(
title = "Q–Q Plot of County Residuals"
) +
theme_minimal()Across all diagnostic visualizations:
These diagnostics indicate that the log-linear specification provides an appropriate fit to the data and that county-level variation in mail share is well explained by population scaling.
There is no visual or statistical evidence of anomalous county-level behavior inconsistent with ordinary demographic and structural variation.
Having established that county-level mail voting patterns exhibit smooth and statistically normal scaling behavior, we now turn to precinct-level in-person voting.
This section evaluates whether the composition and distribution of in-person voting methods vary systematically with precinct size in ways that could indicate structural irregularities. The focus is specifically on early in-person voting and Election Day in-person voting. Provisional ballots are excluded from this analysis due to their very small volume, which renders them statistically negligible and unsuitable as a meaningful vector for large-scale outcome manipulation.
The objective is not to test for correlation alone. Differences across precinct size are expected due to demographic clustering, geographic variation, and administrative structure. Instead, the purpose of this analysis is to identify potential anomalies such as:
If voting method shares vary smoothly and predictably with precinct size, and if deviations remain within ordinary statistical bounds, then precinct-level in-person voting patterns are consistent with normal structural and demographic variation.
The guiding question is therefore:
Do larger precincts exhibit anomalous in-person voting patterns inconsistent with predictable scaling behavior?
The analyses that follow evaluate method composition as a function of precinct size, examine residual structure, and assess whether observed variation falls within expected statistical limits.
# Create precinct-level total votes for US President (real precincts only)
precinct_sizes <- nc_results_raw %>%
filter(`Contest Name` == "US PRESIDENT") %>%
group_by(County, Precinct, `Real Precinct`) %>%
summarise(
precinct_total = sum(`Total Votes`, na.rm = TRUE),
.groups = "drop"
) %>%
filter(precinct_total > 0)
# set.seed(2024)
precinct_sizes %>%
slice_sample(n = 10) %>%
arrange(County, Precinct) # A tibble: 10 × 4
County Precinct `Real Precinct` precinct_total
<chr> <chr> <chr> <dbl>
1 ALAMANCE 12E Y 960
2 BUNCOMBE 04.1 Y 1209
3 CARTERET MHD4 Y 2563
4 DURHAM 28 Y 651
5 GASTON 41 Y 2996
6 GRAHAM EAST Y 1931
7 GRANVILLE WOEL Y 1128
8 ONSLOW SW19 Y 4554
9 VANCE NH Y 2549
10 WAYNE 22 Y 1209
precinct_inperson <- nc_results_raw %>%
filter(`Contest Name` == "US PRESIDENT") %>%
filter(`Real Precinct` == "Y") %>%
group_by(County, Precinct) %>%
summarise(
election_day = sum(`Election Day`, na.rm = TRUE),
early_voting = sum(`Early Voting`, na.rm = TRUE),
total_votes = sum(`Total Votes`, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
inperson_total = election_day + early_voting
) %>%
filter(inperson_total > 0)
glimpse(precinct_inperson)Rows: 2,658
Columns: 6
$ County <chr> "ALAMANCE", "ALAMANCE", "ALAMANCE", "ALAMANCE", "ALAMAN…
$ Precinct <chr> "01", "02", "035", "03C", "03N", "03N2", "03SE", "03SM"…
$ election_day <dbl> 1054, 870, 574, 396, 400, 256, 505, 501, 466, 691, 691,…
$ early_voting <dbl> 1814, 2201, 2920, 1491, 2057, 358, 2255, 2366, 1909, 25…
$ total_votes <dbl> 3037, 3228, 3665, 2049, 2624, 655, 2938, 3091, 2514, 34…
$ inperson_total <dbl> 2868, 3071, 3494, 1887, 2457, 614, 2760, 2867, 2375, 32…
Restricting the analysis to mixed precincts removes the structural zero observations and allows us to examine how early voting scales where both methods are actively used. The resulting relationship remains strongly positive: larger precincts allocate a greater share of in‑person voting to early voting, even when we exclude Election Day–only precincts from the sample.
Importantly, the pattern is smooth and continuous across the full range of precinct sizes. There is no visible breakpoint, clustering, or sudden shift in slope as precincts grow larger. The fitted line lies comfortably within the feasible 0–1 range and tracks the center of the data cloud closely. This indicates that the earlier size gradient was not merely an artifact of zero‑early precincts; rather, it reflects a consistent scaling relationship in how in‑person voting is distributed across methods.
mixed_precincts <- precinct_inperson %>%
filter(precinct_type == "Mixed")
model_mixed <- lm(early_share ~ log_precinct_size, data = mixed_precincts)
mixed_precincts$residuals <- resid(model_mixed)
ggplot(mixed_precincts, aes(x = log_precinct_size, y = early_share)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm", se = TRUE, color = "red") +
labs(
title = "Early Share vs Log Precinct Size (Mixed Precincts Only)",
x = "Log In-Person Precinct Size",
y = "Early Voting Share"
) +
theme_minimal()With Election Day–only precincts removed, the scatter now reflects variation exclusively among precincts that actively use both voting methods. The horizontal band at zero disappears, and the regression line aligns much more closely with the central mass of the data. This produces a cleaner and more interpretable linear fit, as the model is no longer being mechanically influenced by boundary observations. The positive size gradient remains strong and visually coherent, indicating that larger mixed precincts consistently exhibit higher early voting shares. Importantly, the relationship is smooth and continuous across the full size distribution, with no discontinuities, clustering, or upper‑tail deviations that would suggest anomalous behavior or size‑based vote manipulation.
Among precincts that contain both early and Election Day voting, the proportion of early voting increases smoothly and predictably with in-person precinct size.
To further evaluate the stability of the size relationship, we examine the residuals from the mixed‑precinct regression. Residuals represent the difference between each precinct’s observed early voting share and the share predicted by the fitted model. If precinct size were associated with irregular behavior or structural distortions, we would expect to see systematic patterns in these residuals — such as curvature, widening dispersion at larger sizes, clustering in the upper tail, or visible discontinuities. Plotting residuals against log precinct size provides a direct diagnostic check for such anomalies and allows us to assess whether deviations from the fitted trend are random or structured.
ggplot(mixed_precincts, aes(x = log_precinct_size, y = residuals)) +
geom_point(alpha = 0.3) +
geom_hline(yintercept = 0, color = "red") +
labs(
title = "Residuals vs Log Precinct Size (Mixed Precincts)",
x = "Log In-Person Precinct Size",
y = "Residuals"
) +
theme_minimal()The residual plot exhibits a dense, roughly elliptical cloud centered on zero, with no discernible upward or downward drift as precinct size increases. The majority of precincts cluster tightly around the fitted line, and the vertical spread remains broadly consistent across the size distribution. While a small number of outliers are present — as expected in any dataset of this size — they do not form patterns, bands, or structural breaks tied to precinct size.
Importantly, there is no visible “fanning out” at larger precinct sizes and no upper‑tail clustering that would indicate systematic deviations in high‑volume precincts. The absence of trend, curvature, or discontinuity in the residuals supports the conclusion that the linear size relationship is stable and that deviations from the model are random rather than size‑dependent. In short, the residual diagnostics provide no evidence of anomalous behavior associated with precinct size.
The objective of this analysis is to evaluate whether county-level vote share in the 2024 North Carolina presidential election exhibits abnormal variation as a function of voter turnout.
Specifically, we test the hypothesis that vote share in the NC Presidential contest may change systematically with turnout in a manner inconsistent with ordinary demographic and geographic patterns. If vote allocation were being influenced by an algorithmic or threshold-based mechanism, one might expect to observe:
To assess this possibility, we examine the cross-sectional relationship between county-level turnout and party vote share, test for threshold effects, and evaluate whether the observed patterns remain after controlling for county size.
The central research question is:
Does county-level vote share shift abnormally as turnout increases, beyond what would be expected from ordinary demographic and geographic variation?
By explicitly modeling and testing these relationships, the analysis aims to distinguish between structural electoral patterns and artifacts that could suggest systematic irregularities.
County-level analysis captures broad geographic patterns rather than fine-grained behavior.
We initially explored constructing precinct-level turnout by merging precinct-level election results with precinct-level voter registration data. Although the two datasets exhibited substantial overlap in precinct identifiers, approximately 20% of real precincts—representing roughly 23% of total votes cast—did not match uniquely across files. Importantly, the unmatched precincts were not disproportionately small and were distributed across counties, indicating that their exclusion would remove a nontrivial share of statewide votes and potentially introduce systematic bias.
Because county-level registration and vote totals align cleanly and completely, turnout analysis is conducted at the county level. This approach preserves full vote coverage while avoiding distortions arising from incomplete precinct-level joins.
file_path <- "../Data/elections-main/data/raw/US_NC/2024/voter_stats_20241105/voter_stats_20241105.txt"
voter_stats_20241105 <- read_tsv(file_path,
col_types = cols()) # lets readr guess types
glimpse(voter_stats_20241105)Rows: 685,049
Columns: 12
$ county_desc <chr> "WAKE", "NEW HANOVER", "WATAUGA", "DURHAM", "JOHNSTON",…
$ election_date <chr> "11/05/2024", "11/05/2024", "11/05/2024", "11/05/2024",…
$ stats_type <chr> "voter", "voter", "voter", "voter", "voter", "voter", "…
$ precinct_abbrv <chr> "07-04", "W34", "15", "09", "PR38", "08", "FL", "AE", "…
$ vtd_abbrv <chr> "07-04", "W24", "15", "09", "PR17", "08", "FL", "05", "…
$ party_cd <chr> "UNA", "UNA", "DEM", "UNA", "REP", "DEM", "UNA", "DEM",…
$ race_code <chr> "O", "O", "U", "W", "M", "W", "W", "A", "W", "U", "M", …
$ ethnic_code <chr> "NL", "UN", "UN", "NL", "UN", "NL", "NL", "NL", "NL", "…
$ sex_code <chr> "M", "M", "U", "U", "M", "F", "U", "M", "U", "U", "M", …
$ age <chr> "Age 41 - 65", "Age 26 - 40", "Age 26 - 40", "Age Over …
$ total_voters <dbl> 10, 3, 10, 1, 1, 9, 23, 1, 14, 6, 1, 1, 8, 1, 1, 1, 1, …
$ update_date <chr> "11/05/2024", "11/05/2024", "11/05/2024", "11/05/2024",…
voter_stats_20241105 %>%
summarise(total_registered = sum(total_voters, na.rm = TRUE))# A tibble: 1 × 1
total_registered
<dbl>
1 7854464
North Carolina’s population surpassed 11 million in 2024, following continued post‑2020 growth (osbm.nc.gov). Using recent Census estimates, roughly 22% of residents are under age 18, implying an adult (18+) population of approximately 8.6 million.
With 7,854,464 registered voters, this suggests that roughly 91% of adults were registered to vote in 2024. Because the 18+ population includes non‑citizens and other ineligible residents, the share of eligible citizens who are registered is likely even higher.
North Carolina does not have automatic voter registration (AVR) (findlaw.com). However, the state provides online voter registration through the DMV for eligible citizens with a driver’s license or state ID (brennancenter.org), along with same‑day registration during early voting. These mechanisms likely contribute to high registration coverage.
Statewide participation in the 2024 general election was also high, with nearly 6 million ballots cast (democracync.org). Taken together, these figures indicate a highly engaged electorate. Despite ongoing debates over voter ID requirements and voter roll maintenance (apnews.com), North Carolina’s registration and turnout levels suggest robust civic participation across the state.
Even though we can’t get precinct-level turnout rates, we can still build county-level turnout. All precincts belong to one, and only one, county so this will be a higher-level analysis.
# Aggregate registered voters by county
registered_county <- voter_stats_20241105 %>%
group_by(county_desc) %>%
summarise(
registered_voters = sum(total_voters, na.rm = TRUE),
.groups = "drop"
)
# Aggregate presidential vote totals and party shares by county
pres_county_total <- nc_results_raw %>%
filter(`Contest Name` == "US PRESIDENT") %>%
group_by(County) %>%
summarise(
total_votes = sum(`Total Votes`, na.rm = TRUE),
dem_votes = sum(`Total Votes` * (`Choice Party` == "DEM"), na.rm = TRUE),
rep_votes = sum(`Total Votes` * (`Choice Party` == "REP"), na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
dem_share = dem_votes / total_votes,
rep_share = rep_votes / total_votes
)
county_data <- pres_county_total %>%
left_join(registered_county,
by = c("County" = "county_desc")) %>%
mutate(
turnout = total_votes / registered_voters,
two_party_total = dem_votes + rep_votes,
dem_share = dem_votes / two_party_total,
rep_share = rep_votes / two_party_total,
# Standard definition: Republican minus Democratic
party_gap = rep_share - dem_share,
# Winning party indicator
winning_party = case_when(
party_gap > 0 ~ "Republican",
party_gap < 0 ~ "Democratic",
TRUE ~ "Tie"
)
)
# Summarize county-level turnout distribution
summary(county_data$turnout) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5907 0.7064 0.7389 0.7302 0.7601 0.8198
County-level turnout in the 2024 presidential election ranged from approximately 59% to 82% of registered voters, with a median near 70% and a mean around 73%. This distribution indicates consistently high participation across counties, with no county falling below majority turnout among registered voters.
county_data %>%
mutate(turnout_band = cut(turnout,
breaks = c(0, .2, .3, .4, .5, .6, .7, .8, 1),
include.lowest = TRUE)) %>%
count(turnout_band) %>%
arrange(turnout_band)# A tibble: 4 × 2
turnout_band n
<fct> <int>
1 (0.5,0.6] 2
2 (0.6,0.7] 19
3 (0.7,0.8] 77
4 (0.8,1] 2
All but 4 counties were between 60-80% turnout. Turnout is the number of registered voters in a county divided by the total votes cast in a contest.
# Compare total presidential votes with and without Real Precinct filter
nc_results_raw %>%
filter(`Contest Name` == "US PRESIDENT") %>%
summarise(total_votes = sum(`Total Votes`, na.rm = TRUE))# A tibble: 1 × 1
total_votes
<dbl>
1 5699141
nc_results_raw %>%
filter(`Contest Name` == "US PRESIDENT",
`Real Precinct` == "Y") %>%
summarise(total_votes = sum(`Total Votes`, na.rm = TRUE))# A tibble: 1 × 1
total_votes
<dbl>
1 3923739
# Histogram of county-level turnout (percent of registered voters casting ballots)
library(ggplot2)
ggplot(county_data, aes(x = turnout)) +
geom_histogram(bins = 15, fill = "mediumpurple4", color = "white") +
scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Distribution of County-Level Turnout (2024 Presidential Election)",
x = "Turnout (Percent of Registered Voters)",
y = "Number of Counties"
) +
theme_minimal()County-level turnout in the 2024 presidential election is tightly clustered, with most counties falling between roughly 68% and 78% participation. The distribution is slightly right-skewed, with a small number of lower-turnout counties near 60%, but no county falls below majority turnout. Overall, the histogram indicates consistently high engagement across the state rather than sharp regional disparities.
This analysis set out to examine whether patterns in North Carolina’s 2024 presidential precinct- and county-level results exhibit statistical characteristics consistent with vote manipulation, or whether they are better explained by ordinary electoral structure and voter behavior. Across multiple levels of aggregation and several complementary visual and quantitative diagnostics, the evidence consistently supports the latter interpretation.
We began by examining turnout levels across counties and precincts. Turnout was neither unnaturally uniform nor clustered at suspicious thresholds. Instead, it displayed:
Formally, turnout is defined as:
\[ \text{Turnout} = \frac{\text{Total Votes}}{\text{Registered Voters}} \]
If manipulation were occurring at scale, we might expect:
None were observed. The turnout distribution appears organic and demographically structured.
The turnout–gap relationship appears consistent with:
Importantly, the pattern is gradual and statistically smooth — hallmarks of social processes rather than discrete algorithmic adjustments.
Manipulation models typically generate:
The observed data do not exhibit those features.
Across all examined diagnostics:
There is no statistical evidence in this analysis that suggests systemic vote manipulation.
This does not prove that manipulation is impossible. Rather, it indicates that:
The observed electoral structure is consistent with demographic, geographic, and political sorting — not with algorithmic or procedural distortion.
We did not observe:
In short, none of the commonly cited quantitative red flags appear in these data.
The patterns observed in North Carolina’s 2024 presidential results are statistically coherent, continuous, and structurally plausible. The distributions and relationships examined align with known features of political geography and voter behavior.
Within the scope of this analysis, there is no indication of vote manipulation.
The evidence supports the interpretation that the turnout and partisan outcomes reflect ordinary electoral dynamics rather than systemic distortion.