Anomaly Detection in the 2024 North Carolina Presidential Election

Author

Kier O’Neil

Published

June 1, 2026

1 Purpose and Analytical Framework

1.1 Objective

This report examines whether observable statistical anomalies are present in the certified results of the 2024 Presidential Election in North Carolina. The objective is not to advance a predetermined conclusion, but to evaluate whether the reported vote totals exhibit structural irregularities that would be inconsistent with ordinary electoral processes.

The dataset used in this analysis was obtained from Election Truth Alliance and cross‑verified against the official results published by the North Carolina State Board of Elections. Vote totals, reporting units, and voting method breakdowns were confirmed to match the state’s certified results.

Elections generate large, structured datasets. If systematic manipulation were to occur, it would likely leave detectable statistical signatures — such as discontinuities, nonlinear threshold effects, or unexplained shifts in vote share across independent dimensions of the data. Accordingly, this analysis evaluates candidate vote shares across several distinct vectors:

Voting method and voting mode
Precinct size
Cumulative vote count
Voter turnout

Each of these dimensions provides an independent lens through which to assess the internal consistency of the reported results.

The guiding principle of this report is straightforward:

If irregular intervention occurred, it must manifest in observable structure.

The analysis proceeds from descriptive structure to targeted tests. First, we document how votes were cast across administrative voting methods and analytic voting modes. We then evaluate whether candidate vote shares vary abnormally with precinct size, cumulative vote totals, or turnout levels. Throughout, visual inspection precedes formal modeling in order to avoid imposing assumptions (such as arbitrary hinge points or thresholds) not supported by the data.

By examining multiple independent dimensions of the dataset, this report seeks to determine whether the reported results behave as expected under ordinary electoral dynamics, or whether they display patterns requiring further explanation.

Show Code

knitr::opts_chunk$set(
  fig.width = 8,
  fig.height = 5,
  dpi = 300
)

2 Structure of the Data

This section defines the core units and terms used throughout the analysis. The goal is to establish a consistent vocabulary before moving into descriptive and statistical evaluation.

2.1 Reporting Units

Election results in North Carolina are reported at the reporting unit level. Reporting units include:

Physical precincts (geographically defined polling locations)
Administrative reporting groups (such as Early Voting or Absentee reporting categories)

In many counties, Early Voting and Absentee by Mail ballots are not attributed to individual physical precincts. Instead, they are aggregated into county‑level reporting units (e.g., “Early Voting” or “Absentee”). As a result, the dataset contains both geographically bounded precincts and administrative units that function solely as reporting categories.

Each reporting unit therefore represents the smallest publicly available aggregation of votes.

Throughout this report, the term precinct size refers to the total number of votes reported within a reporting unit, regardless of whether it is a physical precinct or an administrative unit.

2.2 Voting Methods (Administrative Categories)

The raw dataset separates ballots into four administrative voting methods:

Election Day
Early Voting
Provisional
Absentee by Mail

These categories reflect how ballots were cast and processed within the state’s reporting framework.

Election Day ballots are cast in person on Election Day.
Early Voting ballots are cast in person during the early voting period.
Provisional ballots are issued when voter eligibility requires verification.
Absentee by Mail ballots are requested and returned by mail.

These are administrative classifications and are reported separately in the official results.

2.3 Voting Modes (Analytic Categories)

For analytic clarity, the four administrative voting methods are consolidated into two broader voting modes:

In Person = Election Day + Early Voting + Provisional
Mail = Absentee by Mail

This distinction reflects differences in ballot handling and chain‑of‑custody procedures. In‑person ballots are processed through voting equipment at physical locations, while mail ballots follow a separate request, verification, and tabulation process.

Unless otherwise noted, comparisons in this report are conducted at the voting mode level rather than the individual method level.

2.4 Candidates and Party Labels

The analysis is restricted to the two major party candidates in the 2024 presidential contest:

Republican candidate
Democratic candidate

For clarity and consistency in tables and figures, party labels are abbreviated as:

REP
DEM

Vote share calculations are computed within reporting unit and voting mode combinations.

2.5 Registered Voters and Turnout

County‑level and precinct‑level counts of registered voters are incorporated to evaluate voter turnout, defined as:

\[\text{Turnout} = \frac{\text{Total Ballots Cast}} {\text{Registered Voters}}\]

Turnout is used later in the report to assess whether vote share patterns change as participation increases. If a systematic threshold or structural shift were present, it could potentially appear as a nonlinear relationship between turnout and candidate vote share.

2.6 Summary

In summary, the dataset consists of:

Reporting units (physical precincts and administrative units)
Vote totals by voting method
Consolidated voting modes (In Person vs Mail)
Candidate vote totals and shares
Registered voter counts for turnout analysis

With these definitions established, the next section begins by examining how votes are distributed across voting methods statewide, providing context for the relative weight of each voting channel.

3 Distribution of Votes by Voting Method

Load Precinct-level election data. Verify the Election Date is 2024-11-05. Show glimpse of data. Breakdown by Voting Method.

Show Code

library(tidyverse)
library(sandwich)
library(lmtest)
library(moments)   # for kurtosis()
#library(ggplot2)

# We will use randomization, setting random seed for reproducibility
set.seed(2024)

# ------------------------------------------------------------
# Read North Carolina 2024 precinct results using readr
# (tab-delimited file). Explicit column types are specified
# to ensure consistent parsing and avoid type guessing.
# ------------------------------------------------------------

library(readr)

file_path <- "../Data/elections-main/data/raw/US_NC/2024/results_pct_20241105/results_pct_20241105.txt"

nc_results_raw <- read_tsv(
  file = file_path,
  col_types = cols(
    County = col_character(),
    `Election Date` = col_date(format = "%m/%d/%Y"),
    Precinct = col_character(),
    `Contest Group ID` = col_double(),
    `Contest Type` = col_character(),
    `Contest Name` = col_character(),
    Choice = col_character(),
    `Choice Party` = col_character(),
    `Vote For` = col_double(),
    `Election Day` = col_double(),
    `Early Voting` = col_double(),
    `Absentee by Mail` = col_double(),
    Provisional = col_double(),
    `Total Votes` = col_double(),
    `Real Precinct` = col_character(),
    `...16` = col_skip()   # skip the junk column
  ),
  progress = FALSE
)

glimpse(nc_results_raw)

Rows: 233,510
Columns: 15
$ County             <chr> "BUNCOMBE", "BUNCOMBE", "BUNCOMBE", "BUNCOMBE", "BU…
$ `Election Date`    <date> 2024-11-05, 2024-11-05, 2024-11-05, 2024-11-05, 20…
$ Precinct           <chr> "01.1", "01.1", "01.1", "01.1", "01.1", "01.1", "02…
$ `Contest Group ID` <dbl> 7, 22, 1301, 1351, 1393, 1393, 22, 1005, 1011, 1393…
$ `Contest Type`     <chr> "C", "C", "S", "S", "S", "S", "C", "S", "S", "S", "…
$ `Contest Name`     <chr> "CITY OF ASHEVILLE CITY COUNCIL", "CITY OF ASHEVILL…
$ Choice             <chr> "Write-In (Miscellaneous)", "No", "Shannon W. Bray"…
$ `Choice Party`     <chr> NA, NA, "LIB", "REP", NA, "GRE", NA, "DEM", "DEM", …
$ `Vote For`         <dbl> 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, …
$ `Election Day`     <dbl> 6, 62, 14, 50, 0, 11, 90, 338, 326, 0, 1, 33, 47, 4…
$ `Early Voting`     <dbl> 14, 215, 21, 123, 0, 9, 247, 1553, 1512, 0, 8, 102,…
$ `Absentee by Mail` <dbl> 1, 20, 4, 19, 0, 1, 32, 222, 217, 0, 0, 20, 19, 19,…
$ Provisional        <dbl> 0, 1, 0, 2, 0, 0, 5, 10, 10, 0, 0, 2, 2, 2, 0, 1, 0…
$ `Total Votes`      <dbl> 21, 298, 39, 194, 0, 21, 374, 2123, 2065, 0, 9, 157…
$ `Real Precinct`    <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "…

3.0.1 What is a Real Precinct (NC Context)?

There is an ambiguous column named Real Precinct. We only want actual vote data, citizens casting a ballot.

Show Code

# Check Real Precinct values
nc_results_raw %>% 
  count(`Real Precinct`)

# A tibble: 2 × 2
  `Real Precinct`      n
  <chr>            <int>
1 N                25640
2 Y               207870

In North Carolina State Board of Elections files, the field Real Precinct indicates whether a row corresponds to a physical precinct or another type of reporting unit.

"Y" — An actual geographic voting precinct
"N" — A reporting unit that is not a physical precinct

Rows coded "N" commonly include:

Provisional ballot aggregations
Absentee ballot group totals
Early voting site summaries
County‑wide reporting units
Split‑precinct aggregations
Test or zero‑report rows

These entries are not hierarchical rollups of precinct totals. Rather, they represent parallel reporting categories used for administrative aggregation of specific ballot types or voting modes.

3.1 Validate November 2024 Election Date in data

Show Code

# Unique election dates
nc_results_raw %>% 
  distinct(`Election Date`)

# A tibble: 1 × 1
  `Election Date`
  <date>         
1 2024-11-05

3.2 Summary of Voting Methods Used

Show Code

# Statewide Voting Method Distribution (Presidential Contest)

method_usage_nc <- nc_results_raw %>%
  
  # Restrict to Presidential contest
  dplyr::filter(`Contest Name` == "US PRESIDENT") %>%
  
  # Collapse across candidates so each reporting unit counted once
  dplyr::group_by(County, Precinct, `Real Precinct`) %>%
  dplyr::summarise(
    `Election Day` = sum(`Election Day`, na.rm = TRUE),
    `Early Voting` = sum(`Early Voting`, na.rm = TRUE),
    `Absentee by Mail` = sum(`Absentee by Mail`, na.rm = TRUE),
    Provisional = sum(Provisional, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  
  # Sum statewide totals
  dplyr::summarise(
    `Election Day` = sum(`Election Day`),
    `Early Voting` = sum(`Early Voting`),
    `Absentee by Mail` = sum(`Absentee by Mail`),
    Provisional = sum(Provisional)
  ) %>%
  
  # Convert to long format
  tidyr::pivot_longer(
    cols = everything(),
    names_to = "Voting Method",
    values_to = "Total Votes"
  ) %>%
  
  # Compute percentages
  dplyr::mutate(
    Percent = `Total Votes` / sum(`Total Votes`)
  ) %>%
  
  # Arrange descending
  dplyr::arrange(dplyr::desc(Percent))

method_usage_nc

# A tibble: 4 × 3
  `Voting Method`  `Total Votes` Percent
  <chr>                    <dbl>   <dbl>
1 Early Voting           4208839 0.739  
2 Election Day           1169088 0.205  
3 Absentee by Mail        295602 0.0519 
4 Provisional              25612 0.00449

3.3 Summary

Before proceeding to anomaly testing, it is important to understand how ballots were cast across administrative voting methods.

After validating that the dataset corresponds to the certified November 5, 2024 general election, we aggregated votes statewide across all reporting units for the presidential contest. The distribution of ballots by voting method is as follows:

Early Voting: 4,208,839 votes (73.9%)
Election Day: 1,169,088 votes (20.5%)
Absentee by Mail: 295,602 votes (5.2%)
Provisional: 25,612 votes (0.4%)

Several structural observations follow from this distribution:

Early Voting dominates the electorate. Nearly three‑quarters of all ballots were cast during the early voting period. This is substantially larger than Election Day voting and represents the primary voting channel in North Carolina.
Election Day voting represents roughly one‑fifth of ballots cast. While traditionally considered the focal point of elections, it accounts for a minority of total votes in 2024.
Mail voting constitutes a small share of total ballots. Absentee by Mail represents approximately 5% of statewide votes. This limits its overall impact on aggregate outcomes, though it remains analytically distinct due to its separate administrative handling.
Provisional ballots are negligible in volume. At less than one‑half of one percent of total votes, provisional ballots do not materially influence statewide totals.

This distribution has important implications for the remainder of the analysis. Because nearly 95% of ballots were cast in person (Early Voting + Election Day + Provisional), the primary structural evaluation of vote share patterns will focus on in‑person voting channels. Mail voting will be examined separately but represents a comparatively small component of the overall electorate.

4 Mail vs In‑Person Analysis

North Carolina permits no-excuse absentee voting, but the process includes a witness requirement, ID documentation requirements, and ballot receipt deadlines that differ from universal vote-by-mail states.

4.1 Distribution of Mail Voting by County

Which counties use mail the most?

Is mail geographically clustered?

Is it assigned to special reporting units?

4.2 Calculate Mail Share of Total Votes

4.2.1 Overall Mail Vote Share

Only ~5% of statewide ballots were cast by mail.

Show Code

mail_share_statewide <- nc_results_raw %>%
  group_by(County, Precinct) %>%
  summarise(
    mail_votes = max(`Absentee by Mail`, na.rm = TRUE),
    total_votes = max(`Total Votes`, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  summarise(
    total_mail = sum(mail_votes),
    total_ballots = sum(total_votes),
    mail_share = total_mail / total_ballots
  )

mail_share_statewide

# A tibble: 1 × 3
  total_mail total_ballots mail_share
       <dbl>         <dbl>      <dbl>
1     226710       4539211     0.0499

4.2.2 Top Mail Precincts

Show Code

nc_results_raw %>%
  filter(`Absentee by Mail` > 0) %>%
  count(County, Precinct, sort = TRUE)

# A tibble: 1,493 × 3
   County      Precinct             n
   <chr>       <chr>            <int>
 1 WAKE        ABSEN              201
 2 STANLY      ABS BY-MAIL        118
 3 GUILFORD    ABSENTEE           117
 4 MECKLENBURG ABSENTEE BY MAIL   116
 5 DURHAM      ABS                 95
 6 SURRY       ABSENTEE            90
 7 PITT        ABSENTEE            89
 8 CABARRUS    ABSENTEE            87
 9 JOHNSTON    ABSENTEE            86
10 JOHNSTON    ABS-SUPPLEMENTAL    84
# ℹ 1,483 more rows

4.2.3 Sample of Mail Data

Note that Election Day and Early Voting counts are zero, since these precincts only process absentee mail ballots and provisional votes.

Show Code

nc_results_raw %>%
  filter(str_detect(Precinct, "ABSENT|MAIL|PROV|ONE STOP")) %>%
  slice_sample(n=10) %>%
  glimpse()

Rows: 10
Columns: 15
$ County             <chr> "JOHNSTON", "MOORE", "HALIFAX", "WILKES", "YADKIN",…
$ `Election Date`    <date> 2024-11-05, 2024-11-05, 2024-11-05, 2024-11-05, 20…
$ Precinct           <chr> "ABSENTEE", "ABSENTEE", "ABSENTEE", "ABSENTEE", "PR…
$ `Contest Group ID` <dbl> 9, 1405, 1041, 7, 6, 11, 1010, 1302, 1109, 2
$ `Contest Type`     <chr> "C", "S", "S", "C", "C", "C", "S", "S", "S", "C"
$ `Contest Name`     <chr> "JOHNSTON SOIL AND WATER CONSERVATION DISTRICT SUPE…
$ Choice             <chr> "Joseph Albanese (Write-In)", "For", "Teresa Raquel…
$ `Choice Party`     <chr> NA, NA, "DEM", NA, NA, NA, "REP", "REP", "DEM", "RE…
$ `Vote For`         <dbl> 1, 1, 1, 2, 3, 2, 1, 1, 1, 1
$ `Election Day`     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ `Early Voting`     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ `Absentee by Mail` <dbl> 0, 2915, 361, 0, 0, 0, 11377, 0, 9120, 292
$ Provisional        <dbl> 0, 0, 0, 0, 1, 0, 0, 8, 0, 0
$ `Total Votes`      <dbl> 0, 2915, 361, 0, 1, 0, 11377, 8, 9120, 292
$ `Real Precinct`    <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"

4.3 Interpreting Statewide Vote Totals in the Long-Format Dataset

The statewide total_votes value of 157,960,234 does not represent the number of ballots cast in North Carolina.

The election results file is structured in long format, meaning that each row represents a specific precinct × contest × candidate combination rather than a single ballot.

As a result:

Each voter contributes votes in multiple contests (e.g., President, Governor, U.S. House, judicial races, local offices).
Each of those contest-level votes is recorded separately.
When summing Total Votes across all rows, ballots are effectively counted multiple times — once per contest.

Therefore:

\[ \text{total_votes} = \sum \text{contest-level votes} \]

not

\[ \text{number of unique ballots cast} \]

This inflation is expected when working with long-format election returns.

For statewide ballot counts, totals must instead be derived from a single contest that appears on every ballot (e.g., President or Governor), or from county-level ballot summaries.

4.4 County-Level Mail Share Analysis

4.4.1 Objective

To determine whether mail voting in North Carolina exhibits structural irregularities across counties and to assess whether variation in mail share is associated with county size.

Importantly, this test evaluates whether observed variation in mail voting is consistent with normal structural factors (e.g., population size) or whether it suggests anomalous patterns.

4.4.2 Data Preparation

Because the raw election file is in long format (one row per precinct × contest × candidate), we first collapsed the dataset to the precinct level to avoid contest-level duplication.

For each County × Precinct, we extracted:

Total mail ballots (Absentee by Mail)
Total ballots cast (Total Votes)

We then aggregated to the county level.

This ensures:

\[ \text{Mail Share}_{county} = \frac{\text{Total Mail Ballots}}{\text{Total Ballots}} \]

represents actual ballots rather than contest-level vote inflation.

4.4.3 Mail Share by County statistics

Show Code

county_mail_summary <- nc_results_raw %>%
  group_by(County, Precinct) %>%
  summarise(
    mail_votes = max(`Absentee by Mail`, na.rm = TRUE),
    total_votes = max(`Total Votes`, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(County) %>%
  summarise(
    total_mail = sum(mail_votes),
    total_ballots = sum(total_votes),
    mail_share = total_mail / total_ballots,
    .groups = "drop"
  ) %>%
  arrange(desc(mail_share))

#head(county_mail_summary, 10)

summary(county_mail_summary)

    County            total_mail      total_ballots      mail_share     
 Length:100         Min.   :   15.0   Min.   :  1494   Min.   :0.01004  
 Class :character   1st Qu.:  361.5   1st Qu.: 10480   1st Qu.:0.02896  
 Mode  :character   Median :  803.5   Median : 22873   Median :0.03494  
                    Mean   : 2267.1   Mean   : 45392   Mean   :0.03799  
                    3rd Qu.: 2240.0   3rd Qu.: 47641   3rd Qu.:0.04553  
                    Max.   :37319.0   Max.   :488942   Max.   :0.08195

Mail share is between 1 and 8% for each county.

4.4.5 County-Level Variation

Mail share varies meaningfully across counties.

Top counties (e.g., Wake, Mecklenburg, Buncombe, Orange) exhibit mail shares between 6–8%.

Bottom counties (e.g., Tyrrell, Bertie, Halifax, Northampton) exhibit mail shares between 1–2%.

This variation suggests non-uniform adoption of mail voting across the state.

4.4.5.1 Highest Mail Share Counties

Show Code

head(county_mail_summary, 10)

# A tibble: 10 × 4
   County      total_mail total_ballots mail_share
   <chr>            <dbl>         <dbl>      <dbl>
 1 BUNCOMBE          9855        120256     0.0820
 2 WAKE             37319        488942     0.0763
 3 ORANGE            5445         71700     0.0759
 4 HAYWOOD           2264         31134     0.0727
 5 CHATHAM           2638         37507     0.0703
 6 MOORE             3270         49965     0.0654
 7 NEW HANOVER       6286        101649     0.0618
 8 FORSYTH           9519        154897     0.0615
 9 MECKLENBURG      28739        469603     0.0612
10 HENDERSON         3377         55850     0.0605

4.4.5.2 Lowest Mail Share Counties

Show Code

tail(county_mail_summary, 10)

# A tibble: 10 × 4
   County      total_mail total_ballots mail_share
   <chr>            <dbl>         <dbl>      <dbl>
 1 ANSON              200          8553     0.0234
 2 RICHMOND           360         16213     0.0222
 3 GATES              105          4911     0.0214
 4 NORTHAMPTON        153          7236     0.0211
 5 ROBESON            784         37676     0.0208
 6 ALEXANDER          373         18179     0.0205
 7 EDGECOMBE          400         19913     0.0201
 8 HALIFAX            371         18683     0.0199
 9 BERTIE             133          7333     0.0181
10 TYRRELL             15          1494     0.0100

4.5 County Mail Share by County Size

Show Code

library(scales)

ggplot(county_mail_summary, 
       aes(x = total_ballots, y = mail_share)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 0.1)) +
  labs(
    x = "Total Ballots (County Size Proxy)",
    y = "Mail Share",
    title = "County Mail Share vs County Size"
  ) +
  theme_minimal()

This scale doesn’t work well because of the large span of numbers. Log Scale will help us visualize this better.

4.6 County Mail Share by County Size (Log Scale)

Show Code

#library(scales)

ggplot(county_mail_summary, 
       aes(x = total_ballots, y = mail_share)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_log10(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 0.1)) +
  labs(
    x = "Total Ballots (Log Scale)",
    y = "Mail Share",
    title = "County Mail Share vs County Size (Log Scale)"
  ) +
  theme_minimal()

We find no indications of vote manipulation from this test.

The distribution of mail voting appears structurally consistent with demographic and geographic variation across counties.

4.6.1 Log-Linear Regression Model

To quantify the relationship between county size and mail share, we estimated the following model:

\[ \text{Mail Share}_i = \beta_0 + \beta_1 \log(\text{Total Ballots}_i) + \epsilon_i \]

Show Code

model <- lm(mail_share ~ log(total_ballots),
            data = county_mail_summary)

summary(model)


Call:
lm(formula = mail_share ~ log(total_ballots), data = county_mail_summary)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.020571 -0.007729 -0.002474  0.005994  0.032790 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -0.038824   0.010565  -3.675 0.000388 ***
log(total_ballots)  0.007612   0.001041   7.313 7.19e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01135 on 98 degrees of freedom
Multiple R-squared:  0.353, Adjusted R-squared:  0.3464 
F-statistic: 53.48 on 1 and 98 DF,  p-value: 7.188e-11

A one-unit increase in log(total ballots) is associated with a 0.76 percentage point increase in mail share.

Results:

\[ (beta_1 = 0.00761 ) (p < 0.001) - ( R^2 = 0.353 ) \]

Interpretation:

A doubling of county size increases expected mail share by:

\[ \beta_1 \cdot \log(2) \approx 0.00761 \times 0.693 \approx 0.0053 \] or approximately 0.53 percentage points.

County size alone explains roughly 35% of cross-county variation in mail voting.

The relationship is smooth, monotonic, and statistically robust.

4.7 Correlation Tests

County size was proxied using total ballots cast.

We tested the association between county size and mail share using:

Pearson correlation
Pearson correlation with log-transformed county size
Spearman rank correlation

Results:

Pearson (linear scale): r = 0.538
Pearson (log scale): r = 0.594
Spearman: ρ = 0.552

These results indicate a moderate positive, monotonic relationship between county size and mail share.

When plotted on a log scale, the upward trend persists across the full distribution, indicating the relationship is not driven solely by a small number of very large counties.

4.7.1 Pearson Correlation on Linear Scale

Show Code

cor(county_mail_summary$total_ballots,
    county_mail_summary$mail_share,
    use = "complete.obs")

[1] 0.5375289

4.7.2 Pearson Correlation on Log Scale

Show Code

cor(log(county_mail_summary$total_ballots),
    county_mail_summary$mail_share,
    use = "complete.obs")

[1] 0.5941734

4.7.3 Spearman Correlation on Linear Scale

Show Code

cor(county_mail_summary$total_ballots,
    county_mail_summary$mail_share,
    method = "spearman",
    use = "complete.obs")

[1] 0.5515872

4.9 Residual Analysis

To assess whether any counties deviate substantially from the expected mail share given their size, we examined residuals from the log-linear regression:

\[\text{Mail Share}_i = \beta_0 + \beta_1 \log(\text{Total Ballots}_i) + \epsilon_i\]

4.10 Visual Diagnostics of County-Level Residuals

4.10.1 Residuals vs Log County Size (Primary Diagnostic)

The residual plot shows a tight horizontal band centered at zero across the full range of county sizes. There is:

No curvature suggesting model misspecification
No funnel shape indicating heteroskedasticity
No clustering of high-residual counties at particular size levels

The largest positive residual is approximately 0.033 (3.3 percentage points), and the largest negative residual is approximately -0.021. Both fall within expected statistical bounds for a sample of this size.

Importantly, high-residual counties are isolated observations rather than members of a structured pattern.

Show Code

#library(ggplot2)
# ------------------------------------------------------------
# Add model residuals and fitted values to county_mail_summary
# so they are available for plotting during document render.
# ------------------------------------------------------------

county_mail_summary <- county_mail_summary %>%
  mutate(
    fitted = fitted(model),
    residuals = resid(model),
    std_resid = rstandard(model)
  )

ggplot(county_mail_summary,
       aes(x = log(total_ballots), y = residuals)) +
  geom_point(size = 2, alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_hline(yintercept = c(-0.03, 0.03),
             linetype = "dotted",
             color = "red") +
  labs(
    x = "Log(Total Ballots)",
    y = "Residual (Observed − Predicted Mail Share)",
    title = "Residuals from County Mail Share Regression"
  ) +
  theme_minimal()

The residual plot shows the difference between each county’s observed mail vote share and the share predicted by the regression on county size. If county size were systematically associated with unusual mail behavior, we would expect to see a pattern in the residuals—such as a visible upward or downward trend, widening dispersion at larger sizes, clustering on one side of zero, or structural breaks in the upper tail. Instead, the points appear randomly scattered around zero across the full range of county sizes, with no discernible slope or curvature. The spread remains relatively constant, and nearly all counties fall within a narrow band around the fitted line. This “static”-like pattern is consistent with ordinary random variation rather than systematic size-related distortion, providing no visual evidence of anomalous mail share behavior linked to county size.

4.10.2 Distribution of Residuals

The histogram of residuals is:

Centered near zero
Broadly symmetric, with mild right-tail extension
Largely concentrated within a narrow range
Free of extreme outliers

The residual standard deviation is approximately 0.011 (1.1 percentage points), indicating that most counties deviate from predicted mail share by only small margins.

While a small number of counties exhibit moderately positive residuals, the overall distribution is tightly clustered and consistent with ordinary cross-county variation.

Show Code

ggplot(county_mail_summary,
       aes(x = residuals)) +
  geom_histogram(bins = 25,
                 fill = "steelblue",
                 color = "white",
                 alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  labs(
    x = "Residual",
    y = "Count",
    title = "Distribution of County-Level Residuals"
  ) +
  theme_minimal()

4.10.3 Q–Q Plot of County Residuals

The Q–Q plot indicates that residuals closely follow the theoretical normal line across the central portion of the distribution.

Modest deviations appear in the tails, particularly in the upper tail, where a small number of counties exhibit slightly larger-than-expected positive residuals. These departures are gradual rather than abrupt and are consistent with mild skew in a finite sample.

Overall, no extreme departures from normality or structural abnormalities are observed.

Show Code

ggplot(county_mail_summary,
       aes(sample = residuals)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  labs(
    title = "Q–Q Plot of County Residuals"
  ) +
  theme_minimal()

4.11 Overall Diagnostic Summary

Across all diagnostic visualizations:

Residuals are centered near zero
Deviations are modest in magnitude
No county exceeds conventional outlier thresholds
No structural patterns emerge by county size

These diagnostics indicate that the log-linear specification provides an appropriate fit to the data and that county-level variation in mail share is well explained by population scaling.

There is no visual or statistical evidence of anomalous county-level behavior inconsistent with ordinary demographic and structural variation.

5 Precinct-Level Analysis of In-Person Voting Methods

Having established that county-level mail voting patterns exhibit smooth and statistically normal scaling behavior, we now turn to precinct-level in-person voting.

This section evaluates whether the composition and distribution of in-person voting methods vary systematically with precinct size in ways that could indicate structural irregularities. The focus is specifically on early in-person voting and Election Day in-person voting. Provisional ballots are excluded from this analysis due to their very small volume, which renders them statistically negligible and unsuitable as a meaningful vector for large-scale outcome manipulation.

The objective is not to test for correlation alone. Differences across precinct size are expected due to demographic clustering, geographic variation, and administrative structure. Instead, the purpose of this analysis is to identify potential anomalies such as:

Discontinuities or sharp structural breaks
Nonlinear scaling inconsistent with ordinary demographic patterns
Extreme outliers concentrated in particular size ranges
Asymmetric residual patterns suggestive of artificial concentration

If voting method shares vary smoothly and predictably with precinct size, and if deviations remain within ordinary statistical bounds, then precinct-level in-person voting patterns are consistent with normal structural and demographic variation.

The guiding question is therefore:

Do larger precincts exhibit anomalous in-person voting patterns inconsistent with predictable scaling behavior?

The analyses that follow evaluate method composition as a function of precinct size, examine residual structure, and assess whether observed variation falls within expected statistical limits.

5.1 Calculate Precinct Sizes

Show Code

# Create precinct-level total votes for US President (real precincts only)
precinct_sizes <- nc_results_raw %>%
  filter(`Contest Name` == "US PRESIDENT") %>%
  group_by(County, Precinct, `Real Precinct`) %>%
  summarise(
    precinct_total = sum(`Total Votes`, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(precinct_total > 0)

# set.seed(2024)

precinct_sizes %>%
  slice_sample(n = 10) %>%
  arrange(County, Precinct)

# A tibble: 10 × 4
   County    Precinct `Real Precinct` precinct_total
   <chr>     <chr>    <chr>                    <dbl>
 1 ALAMANCE  12E      Y                          960
 2 BUNCOMBE  04.1     Y                         1209
 3 CARTERET  MHD4     Y                         2563
 4 DURHAM    28       Y                          651
 5 GASTON    41       Y                         2996
 6 GRAHAM    EAST     Y                         1931
 7 GRANVILLE WOEL     Y                         1128
 8 ONSLOW    SW19     Y                         4554
 9 VANCE     NH       Y                         2549
10 WAYNE     22       Y                         1209

5.2 Organize In-Person Vote data

Show Code

precinct_inperson <- nc_results_raw %>%
  filter(`Contest Name` == "US PRESIDENT") %>%
  filter(`Real Precinct` == "Y") %>%
  group_by(County, Precinct) %>%
  summarise(
    election_day = sum(`Election Day`, na.rm = TRUE),
    early_voting = sum(`Early Voting`, na.rm = TRUE),
    total_votes = sum(`Total Votes`, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    inperson_total = election_day + early_voting
  ) %>%
  filter(inperson_total > 0)

glimpse(precinct_inperson)

Rows: 2,658
Columns: 6
$ County         <chr> "ALAMANCE", "ALAMANCE", "ALAMANCE", "ALAMANCE", "ALAMAN…
$ Precinct       <chr> "01", "02", "035", "03C", "03N", "03N2", "03SE", "03SM"…
$ election_day   <dbl> 1054, 870, 574, 396, 400, 256, 505, 501, 466, 691, 691,…
$ early_voting   <dbl> 1814, 2201, 2920, 1491, 2057, 358, 2255, 2366, 1909, 25…
$ total_votes    <dbl> 3037, 3228, 3665, 2049, 2624, 655, 2938, 3091, 2514, 34…
$ inperson_total <dbl> 2868, 3071, 3494, 1887, 2457, 614, 2760, 2867, 2375, 32…

5.3 Create Composition Shares

5.3.1 Definition of Precinct Size

For this analysis, precinct size is defined exclusively as the total number of in-person ballots cast within the precinct:

Election Day ballots
Early in-person ballots

Provisional ballots are excluded due to their negligible volume.
Mail ballots are excluded because this section focuses specifically on the internal composition of in-person voting.

Formally:

\[ \text{Precinct Size}_i = \text{Election Day}_i + \text{Early In-Person}_i \] For modeling and visualization purposes, precinct size is log-transformed to account for right-skew in the distribution of in-person ballot counts:

\[ \log(\text{Precinct Size}_i) = \log(\text{Election Day}_i + \text{Early In-Person}_i) \] This definition ensures that precinct size reflects the population relevant to the composition test being conducted. The objective is to evaluate whether the proportion of early versus Election Day voting changes systematically as in-person precinct size increases.

By conditioning size on in-person participation only, the analysis avoids contamination from mail-vote scaling effects observed at the county level.

5.3.2 Calculate Voting Method Shares

Show Code

precinct_inperson <- precinct_inperson %>%
  mutate(
    early_share = early_voting / inperson_total,
    election_day_share = election_day / inperson_total,
    log_precinct_size = log(inperson_total)
  )

glimpse(precinct_inperson)

Rows: 2,658
Columns: 9
$ County             <chr> "ALAMANCE", "ALAMANCE", "ALAMANCE", "ALAMANCE", "AL…
$ Precinct           <chr> "01", "02", "035", "03C", "03N", "03N2", "03SE", "0…
$ election_day       <dbl> 1054, 870, 574, 396, 400, 256, 505, 501, 466, 691, …
$ early_voting       <dbl> 1814, 2201, 2920, 1491, 2057, 358, 2255, 2366, 1909…
$ total_votes        <dbl> 3037, 3228, 3665, 2049, 2624, 655, 2938, 3091, 2514…
$ inperson_total     <dbl> 2868, 3071, 3494, 1887, 2457, 614, 2760, 2867, 2375…
$ early_share        <dbl> 0.6324965, 0.7167047, 0.8357184, 0.7901431, 0.83719…
$ election_day_share <dbl> 0.3675035, 0.2832953, 0.1642816, 0.2098569, 0.16280…
$ log_precinct_size  <dbl> 7.961370, 8.029759, 8.158802, 7.542744, 7.806696, 6…

5.3.3 Early Voting Share Statistics

Show Code

summary(precinct_inperson$early_share)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.7019  0.4735  0.7922  0.9413

The summary statistics reveal an important structural feature of the data: a non‑trivial number of precincts report zero early in‑person votes. This is reflected in the minimum and first quartile of early_share both being 0. Because precinct size is defined as the sum of Election Day and Early In‑Person ballots, a zero early share necessarily implies that 100% of in‑person voting in those precincts occurred on Election Day. These are not small rounding artifacts — they are true boundary observations.

This matters analytically because the outcome variable (early_share) is bounded between 0 and 1 and contains a mass of observations at exactly 0. When we later estimate a linear model and overlay a fitted line, these zero‑share precincts will appear as a horizontal band at the bottom of the plot. A standard linear regression does not account for this boundary structure and may extrapolate outside the feasible range, particularly for smaller precinct sizes. Therefore, the presence of zero‑early precincts is a structural characteristic of the data that must be kept in mind when interpreting slope estimates and fitted values in subsequent modeling.

5.3.4 Election Day Share Statistics

Election Day share varies substantially across precincts. The median precinct reports approximately 30% of ballots cast on Election Day, while the mean (52.6%) is considerably higher, reflecting a strongly right-skewed distribution.

Notably, the third quartile and maximum are both equal to 1.0, indicating that at least 25% of precincts are Election Day–only locations. This is consistent with the structure of election administration, where temporary polling places operate exclusively on Election Day.

The resulting distribution is therefore bimodal in character, with a mass of mixed-mode precincts and a distinct cluster of Election Day–only sites.

Show Code

summary(precinct_inperson$election_day_share)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.05874 0.20781 0.29810 0.52648 1.00000 1.00000

5.3.5 Early Voting Share Linear Model

Show Code

model_early <- lm(early_share ~ log_precinct_size, data = precinct_inperson)

summary(model_early)


Call:
lm(formula = early_share ~ log_precinct_size, data = precinct_inperson)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.65783 -0.15757  0.01361  0.16443  1.19121 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -1.613205   0.032717  -49.31   <2e-16 ***
log_precinct_size  0.304408   0.004727   64.40   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2332 on 2656 degrees of freedom
Multiple R-squared:  0.6096,    Adjusted R-squared:  0.6095 
F-statistic:  4147 on 1 and 2656 DF,  p-value: < 2.2e-16

The regression indicates a strong positive relationship between precinct size and early in‑person voting share. The coefficient on log precinct size is highly statistically significant and the model explains approximately 61% of the variation in early share across precincts. Substantively, this suggests that larger precincts tend to rely more heavily on early in‑person voting relative to Election Day voting.

However, this linear specification is influenced by the structural mass of precincts with zero early voting. Because early_share is bounded between 0 and 1 and includes many observations at exactly 0, the model is effectively fitting a continuous line through a distribution with a boundary spike. This contributes to the large intercept magnitude and produces fitted values that can fall outside the feasible [0,1] range for small precincts. Thus, while the slope clearly captures a strong size gradient, the model should be interpreted as descriptive of a broad trend rather than a fully specified probabilistic model of early voting behavior.

5.3.6 Plot Early Voting Share by Precinct Size

Show Code

#library(ggplot2)

ggplot(precinct_inperson, aes(x = log_precinct_size, y = early_share)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Early In-Person Share vs Log Precinct Size",
    x = "Log In-Person Precinct Size",
    y = "Early Voting Share"
  ) +
  theme_minimal()

The scatterplot confirms the strong positive size gradient identified in the regression results. Larger precincts exhibit substantially higher early in‑person voting shares, while smaller precincts show much greater dispersion, including a visible horizontal band at zero corresponding to precincts with no early voting at all.

The fitted line captures the overall upward trend but also illustrates the limitations of a simple linear specification, particularly at the lower end of the size distribution where predictions extend beyond the feasible range. Importantly, the pattern is smooth and continuous across precinct sizes, with no abrupt structural breaks or discontinuities, indicating a gradual scaling relationship rather than any irregular or anomalous clustering.

5.3.7 Mix of Precinct Types by Vote Method

Show Code

precinct_inperson <- precinct_inperson %>%
  mutate(
    precinct_type = case_when(
      early_share == 0 ~ "Election Day Only",
      early_share == 1 ~ "Early Only",
      TRUE ~ "Mixed"
    )
  )

table(precinct_inperson$precinct_type)


Election Day Only             Mixed 
              995              1663

Classifying precincts by voting method reveals a striking structural feature: there are no precincts with exclusively early in‑person voting. While 995 precincts report zero early votes (i.e., entirely Election Day voting), the remaining 1,663 precincts are mixed-method, containing both early and Election Day ballots. No precincts fall into the “Early Only” category.

This asymmetry is important for interpretation. It indicates that early voting is layered on top of an Election Day infrastructure rather than replacing it. In practical terms, early voting appears to expand participation in larger or higher-capacity precincts, while smaller precincts are more likely to rely exclusively on Election Day voting. The absence of “Early Only” precincts further reinforces that early in‑person voting functions as a supplement to, rather than a substitute for, traditional Election Day voting.

5.4 Analyze Mixed Precincts

Restricting the analysis to mixed precincts removes the structural zero observations and allows us to examine how early voting scales where both methods are actively used. The resulting relationship remains strongly positive: larger precincts allocate a greater share of in‑person voting to early voting, even when we exclude Election Day–only precincts from the sample.

Importantly, the pattern is smooth and continuous across the full range of precinct sizes. There is no visible breakpoint, clustering, or sudden shift in slope as precincts grow larger. The fitted line lies comfortably within the feasible 0–1 range and tracks the center of the data cloud closely. This indicates that the earlier size gradient was not merely an artifact of zero‑early precincts; rather, it reflects a consistent scaling relationship in how in‑person voting is distributed across methods.

Show Code

mixed_precincts <- precinct_inperson %>%
  filter(precinct_type == "Mixed")

model_mixed <- lm(early_share ~ log_precinct_size, data = mixed_precincts)
mixed_precincts$residuals <- resid(model_mixed)

ggplot(mixed_precincts, aes(x = log_precinct_size, y = early_share)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Early Share vs Log Precinct Size (Mixed Precincts Only)",
    x = "Log In-Person Precinct Size",
    y = "Early Voting Share"
  ) +
  theme_minimal()

With Election Day–only precincts removed, the scatter now reflects variation exclusively among precincts that actively use both voting methods. The horizontal band at zero disappears, and the regression line aligns much more closely with the central mass of the data. This produces a cleaner and more interpretable linear fit, as the model is no longer being mechanically influenced by boundary observations. The positive size gradient remains strong and visually coherent, indicating that larger mixed precincts consistently exhibit higher early voting shares. Importantly, the relationship is smooth and continuous across the full size distribution, with no discontinuities, clustering, or upper‑tail deviations that would suggest anomalous behavior or size‑based vote manipulation.

Among precincts that contain both early and Election Day voting, the proportion of early voting increases smoothly and predictably with in-person precinct size.

5.4.1 Early Voting Share Model for Mixed Precincts

Show Code

summary(model_mixed)


Call:
lm(formula = early_share ~ log_precinct_size, data = mixed_precincts)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47803 -0.03809  0.00556  0.04592  0.23145 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.154891   0.019145    8.09 1.14e-15 ***
log_precinct_size 0.081402   0.002578   31.57  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.07104 on 1661 degrees of freedom
Multiple R-squared:  0.375, Adjusted R-squared:  0.3747 
F-statistic: 996.8 on 1 and 1661 DF,  p-value: < 2.2e-16

The regression results closely match what was visually apparent in the scatterplot. After restricting the sample to mixed precincts, the estimated relationship between precinct size and early voting share remains strongly positive and highly statistically significant. The coefficient on log precinct size indicates that early voting share increases steadily as precinct size grows, consistent with the upward slope observed in the chart. The intercept is now positive and the fitted values remain well within the feasible 0–1 range, reflecting the removal of boundary (zero‑early) precincts that previously distorted the linear fit.

The residual distribution further confirms the visual impression of a well‑behaved model. Residuals are tightly centered around zero, with a small interquartile range and no evidence of systematic skew. The reduction in residual standard error relative to the full sample indicates a substantially cleaner fit. Overall, the statistical output aligns with the graphical evidence: among precincts that use both voting modes, early voting share scales smoothly and predictably with precinct size, with no irregular clustering or unexplained deviations suggestive of size‑based anomalies.

5.4.2 Exploring Residuals

To further evaluate the stability of the size relationship, we examine the residuals from the mixed‑precinct regression. Residuals represent the difference between each precinct’s observed early voting share and the share predicted by the fitted model. If precinct size were associated with irregular behavior or structural distortions, we would expect to see systematic patterns in these residuals — such as curvature, widening dispersion at larger sizes, clustering in the upper tail, or visible discontinuities. Plotting residuals against log precinct size provides a direct diagnostic check for such anomalies and allows us to assess whether deviations from the fitted trend are random or structured.

Show Code

ggplot(mixed_precincts, aes(x = log_precinct_size, y = residuals)) +
  geom_point(alpha = 0.3) +
  geom_hline(yintercept = 0, color = "red") +
  labs(
    title = "Residuals vs Log Precinct Size (Mixed Precincts)",
    x = "Log In-Person Precinct Size",
    y = "Residuals"
  ) +
  theme_minimal()

The residual plot exhibits a dense, roughly elliptical cloud centered on zero, with no discernible upward or downward drift as precinct size increases. The majority of precincts cluster tightly around the fitted line, and the vertical spread remains broadly consistent across the size distribution. While a small number of outliers are present — as expected in any dataset of this size — they do not form patterns, bands, or structural breaks tied to precinct size.

Importantly, there is no visible “fanning out” at larger precinct sizes and no upper‑tail clustering that would indicate systematic deviations in high‑volume precincts. The absence of trend, curvature, or discontinuity in the residuals supports the conclusion that the linear size relationship is stable and that deviations from the model are random rather than size‑dependent. In short, the residual diagnostics provide no evidence of anomalous behavior associated with precinct size.

5.5 Party Vote Share by Precinct Size

Before turning to party vote share, it is important to clarify what we mean by in‑person voting in this section. Throughout the precinct analysis, in‑person refers to the combined total of Early In‑Person voting and Election Day voting. In other words:

\[ \text{In‑Person Total}_i = \text{Early In‑Person}_i + \text{Election Day}_i \]

Thus, when we compute in‑person party share, we are measuring each party’s share of all ballots cast physically at polling locations, regardless of whether those ballots were cast during the early voting period or on Election Day itself. Mail ballots and provisional ballots are excluded from this definition.

This distinction is important because it isolates variation within the in‑person voting channel, allowing us to examine whether party performance scales with precinct size independently of mail voting dynamics.

5.5.1 In‑Person Party Vote Share

Show Code

# Load required libraries
#library(dplyr)
#library(ggplot2)

# ------------------------------------------------------------------
# STEP 1: Build in-person vote totals and vote share by precinct
# ------------------------------------------------------------------

inperson_share <- nc_results_raw %>%
  
  # Keep only presidential contest and real precincts
  filter(`Contest Name` == "US PRESIDENT",
         `Real Precinct` == "Y",
         `Choice Party` %in% c("DEM", "REP")) %>%
  
  # Create total in-person votes per candidate
  # (Election Day + Early Voting)
  mutate(
    inperson_votes = `Election Day` + `Early Voting`
  ) %>%
  
  # Aggregate to precinct × party level
  group_by(County, Precinct, `Choice Party`) %>%
  summarise(
    party_inperson_votes = sum(inperson_votes, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  
  # Within each precinct, compute total in-person turnout
  group_by(County, Precinct) %>%
  mutate(
    precinct_inperson_total = sum(party_inperson_votes),
    
    # Compute party vote share within in-person voting
    vote_share = party_inperson_votes / precinct_inperson_total
  ) %>%
  
  ungroup()

inperson_share %>%
  slice_sample(n=10) %>%
  arrange(desc(vote_share)) %>%
  glimpse()

Rows: 10
Columns: 6
$ County                  <chr> "DURHAM", "GUILFORD", "STANLY", "ORANGE", "NEW…
$ Precinct                <chr> "17", "SWASH", "0028", "CA", "H13", "ST6", "20…
$ `Choice Party`          <chr> "DEM", "REP", "REP", "REP", "REP", "DEM", "REP…
$ party_inperson_votes    <dbl> 324, 175, 183, 177, 577, 1445, 921, 107, 670, …
$ precinct_inperson_total <dbl> 367, 214, 267, 270, 993, 3067, 3282, 420, 3082…
$ vote_share              <dbl> 0.8828338, 0.8177570, 0.6853933, 0.6555556, 0.…

5.5.2 Plot Vote Share vs Precinct Size

Show Code

ggplot(inperson_share,
       aes(x = precinct_inperson_total,
           y = vote_share,
           color = `Choice Party`)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "loess", se = FALSE, linewidth = 1) +
  scale_x_log10(
    breaks = c(10, 100, 1000, 10000),
    labels = scales::comma_format()
  ) +
  scale_color_manual(values = c(
    "DEM" = "#2C7BB6",
    "REP" = "#D7191C"
  )) +
  labs(
    title = "In-Person Vote Share by Precinct Size (Log Scale)",
    x = "In-Person Total Votes (Log Scale)",
    y = "Vote Share",
    color = "Party"
  ) +
  theme_minimal()

Minor local curvature appears in the loess smoother around mid-sized precincts (~1,000–3,000 votes), but inspection of the underlying data shows no corresponding structural shift or density discontinuity. The variation appears consistent with smoothing sensitivity rather than systematic scaling distortion.

The scatterplot shows in‑person party vote share as a function of precinct size (on a log scale), with separate smooth trends for Democratic and Republican candidates. Across the core of the size distribution — where the overwhelming majority of precincts lie — both parties’ vote shares remain relatively stable as precinct size increases. The smoothed lines are largely flat through the dense central region, indicating that party performance in in‑person voting does not systematically increase or decrease with precinct size.

At the extreme lower end of the size distribution, the curves bend sharply; however, these are driven by a small number of very small precincts and reflect instability typical of low‑denominator environments. In larger precincts, where vote totals are substantial, the relationship appears smooth and continuous with no abrupt breaks, discontinuities, or divergence between parties. Overall, the chart suggests that in‑person party vote share scales consistently across precinct sizes, providing no visual evidence of size‑dependent irregularities or structural anomalies.

5.6 Early Vote Share by Precinct Size

5.6.1 Party‑Specific Early Vote Share (Mixed Precincts Only)

Show Code

# ------------------------------------------------------------------
# STEP 1: Construct party-level early vote share by precinct
# Mixed precincts only
# ------------------------------------------------------------------

#library(dplyr)
#library(ggplot2)

early_party_share <- nc_results_raw %>%
  
  # Keep presidential contest and real precincts
  filter(`Contest Name` == "US PRESIDENT",
         `Real Precinct` == "Y",
         `Choice Party` %in% c("DEM", "REP")) %>%
  
  # Create in-person total and early vote variables
  mutate(
    inperson_total = `Election Day` + `Early Voting`,
    early_votes = `Early Voting`
  ) %>%
  
  # Aggregate to precinct × party level
  group_by(County, Precinct, `Choice Party`) %>%
  summarise(
    party_early_votes = sum(early_votes, na.rm = TRUE),
    party_inperson_votes = sum(inperson_total, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  
  # Keep only mixed precincts
  left_join(precinct_inperson %>%
              dplyr::select(County, Precinct, precinct_type, log_precinct_size),
            by = c("County", "Precinct")) %>%
  
  filter(precinct_type == "Mixed") %>%
  
  # Compute early vote share within party
  mutate(
    early_share_party = party_early_votes / party_inperson_votes
  )

Show Code

# ------------------------------------------------------------------
# STEP 2: Plot early share by party vs log precinct size
# ------------------------------------------------------------------

ggplot(early_party_share,
       aes(x = log_precinct_size,
           y = early_share_party,
           color = `Choice Party`)) +
  
  # Scatter points
  geom_point(alpha = 0.3) +
  
  # Linear fit by party
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  # Party colors
  scale_color_manual(values = c(
    "DEM" = "#2C7BB6",
    "REP" = "#D7191C"
  )) +
  
  # Labels
  labs(
    title = "Party Early Vote Share vs Log Precinct Size (Mixed Precincts)",
    x = "Log In-Person Precinct Size",
    y = "Early Voting Share (Within Party)",
    color = "Party"
  ) +
  
  theme_minimal()

The plot of Party Early Vote Share vs. Log Precinct Size (Mixed Precincts) shows a strong, smooth positive relationship between precinct size and early voting participation for both parties. As precinct size increases, early vote share increases in a nearly linear fashion. Democratic early vote share is consistently higher than Republican early vote share across the size spectrum, and the two trend lines are roughly parallel.

There are no visible discontinuities, sharp inflection points, or upper-tail accelerations in large precincts. The dispersion decreases gradually as precinct size increases, which is mechanically expected. Overall, the pattern appears smooth and behaviorally consistent with urban scaling and infrastructure effects rather than any structural irregularity.

5.7 Election Day Party Vote Shares by Precinct Size

Show Code

# ------------------------------------------------------------
# Election Day Share by Party vs Log Precinct Size
# Mixed precincts only
# ------------------------------------------------------------

election_day_party_share <- early_party_share %>%
  mutate(
    election_day_share_party = 1 - early_share_party
  )

ggplot(election_day_party_share,
       aes(x = log_precinct_size,
           y = election_day_share_party,
           color = `Choice Party`)) +
  
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  scale_color_manual(values = c(
    "DEM" = "#2C7BB6",
    "REP" = "#D7191C"
  )) +
  
  labs(
    title = "Party Election Day Vote Share vs Log Precinct Size (Mixed Precincts)",
    x = "Log In-Person Precinct Size",
    y = "Election Day Share (Within Party)",
    color = "Party"
  ) +
  
  theme_minimal()

The plot of Party Election Day Vote Share vs. Log Precinct Size (Mixed Precincts) shows a strong, smooth negative linear relationship between precinct size and Election Day voting share for both parties. As precinct size increases, Election Day participation declines in a nearly linear fashion. Republican Election Day share is consistently higher than Democratic Election Day share across the size spectrum, and the two trend lines are roughly parallel.

The pattern appears to be a mirror image of the Early Vote Share plot, which is mechanically expected since Election Day share is the complement of early share within each party. There are no visible discontinuities, nonlinear inflection points, or upper-tail accelerations in larger precincts. The dispersion narrows gradually as precinct size increases, consistent with expected statistical variance. Overall, the relationship appears smooth and behaviorally consistent rather than structurally anomalous.

5.7.1 Section Conclusion: Precinct Size and Party Vote Method

Across all specifications in this section, the central result is consistent: voting mode scales smoothly with precinct size, and this scaling occurs in a mechanically symmetric fashion across parties. Larger precincts rely more heavily on early in‑person voting and less on Election Day voting, while smaller precincts exhibit the opposite pattern. This relationship holds both in aggregate early share models and when disaggregated by party.

Importantly, the party‑specific patterns move in parallel. Democrats maintain a higher early voting share across the size distribution, while Republicans maintain a higher Election Day share, but the slope of the size relationship is similar for both. There are no discontinuities, nonlinear breakpoints, upper‑tail accelerations, or divergence between parties in larger precincts. Residual diagnostics likewise show no systematic size‑based deviations.

Taken together, the precinct‑level evidence indicates a stable, continuous scaling relationship consistent with institutional structure and voter behavior rather than structural irregularity. Having established that precinct size does not generate anomalous party‑specific distortions within in‑person voting, we now move to a different lens of analysis: examining vote share dynamics as cumulative votes are reported (ETA‑style sequencing).

6 Vote Share by Cumulative Vote Count (ETA-style)

This section replicates the cumulative vote share approach used in the Election Truth Alliance (ETA) North Carolina analysis. Precincts are sorted from smallest to largest based on total presidential votes, and cumulative Democratic and Republican vote shares are calculated as votes are progressively added. This produces an expanding weighted average that shows how statewide vote share evolves as increasingly larger precincts are incorporated into the total. The purpose of this visualization is to evaluate whether the cumulative share changes smoothly and continuously, or whether it exhibits abrupt inflection points or discontinuities as larger vote units are included.

Show Code

# ------------------------------------------------------------
# STEP 1: Create precinct-level totals for the US PRESIDENT race.
# Filters to real precincts and major parties (DEM, REP),
# then aggregates total votes by precinct.
# ------------------------------------------------------------

library(dplyr)

precinct_pres <- nc_results_raw %>%
  
  filter(`Contest Name` == "US PRESIDENT",
         `Real Precinct` == "Y",
         `Choice Party` %in% c("DEM", "REP")) %>%
  
  group_by(County, Precinct) %>%
  summarise(
    total_votes = sum(`Total Votes`, na.rm = TRUE),
    dem_votes = sum(`Total Votes`[`Choice Party` == "DEM"], na.rm = TRUE),
    rep_votes = sum(`Total Votes`[`Choice Party` == "REP"], na.rm = TRUE),
    .groups = "drop"
  )

# ------------------------------------------------------------
# STEP 2: Sort precincts from smallest to largest
# based on total presidential votes cast.
# This ordering determines how cumulative votes are added.
# ------------------------------------------------------------

precinct_sorted <- precinct_pres %>%
  arrange(total_votes)

# ------------------------------------------------------------
# STEP 3: Compute cumulative vote totals and cumulative
# vote share for each party as precincts are added
# from smallest to largest.
# ------------------------------------------------------------

precinct_cum <- precinct_sorted %>%
  mutate(
    cum_total_votes = cumsum(total_votes),
    cum_dem_votes = cumsum(dem_votes),
    cum_rep_votes = cumsum(rep_votes),
    cum_dem_share = cum_dem_votes / cum_total_votes,
    cum_rep_share = cum_rep_votes / cum_total_votes
  )

# ------------------------------------------------------------
# STEP 4: Plot cumulative presidential vote share
# versus cumulative votes counted.
# This replicates the ETA-style cumulative share plot,
# using total precinct votes for sorting (not candidate votes).
# ------------------------------------------------------------

library(ggplot2)

ggplot(precinct_cum, aes(x = cum_total_votes)) +
  
  geom_line(aes(y = cum_dem_share, color = "DEM"), linewidth = 1) +
  geom_line(aes(y = cum_rep_share, color = "REP"), linewidth = 1) +
  
  scale_color_manual(values = c(
    "DEM" = "#2C7BB6",
    "REP" = "#D7191C"
  )) +
  
  labs(
    title = "Cumulative Presidential Vote Share vs Cumulative Votes Counted",
    subtitle = "Precincts Sorted by Total Votes (Small to Large)",
    x = "Cumulative Votes Counted",
    y = "Cumulative Vote Share",
    color = "Party"
  ) +
  
  theme_minimal()

6.1 Summary

The cumulative presidential vote share plot (precincts sorted from smallest to largest) exhibits the expected behavior of an expanding weighted average. The sharp volatility at the far left of the graph reflects small‑denominator effects from the very smallest precincts and is mechanically inevitable. As cumulative votes increase, both party curves stabilize quickly and converge toward their overall statewide levels.

Beyond the initial stabilization, the curves display a smooth and gradual drift with no visible discontinuities, inflection points, or upper‑tail accelerations as larger precincts are incorporated. The Republican share increases modestly while the Democratic share declines correspondingly, consistent with gradual differences between smaller and larger precincts rather than any structural break. Overall, the pattern is continuous, stable, and consistent with normal cumulative aggregation behavior.

7 Turnout-Based Analysis

The objective of this analysis is to evaluate whether county-level vote share in the 2024 North Carolina presidential election exhibits abnormal variation as a function of voter turnout.

Specifically, we test the hypothesis that vote share in the NC Presidential contest may change systematically with turnout in a manner inconsistent with ordinary demographic and geographic patterns. If vote allocation were being influenced by an algorithmic or threshold-based mechanism, one might expect to observe:

Nonlinearities or structural breaks in the relationship between turnout and vote share,
Discontinuities at specific turnout thresholds,
Compressed dispersion at higher turnout levels,
Or unusually strong explanatory power of turnout relative to typical electoral variation.

To assess this possibility, we examine the cross-sectional relationship between county-level turnout and party vote share, test for threshold effects, and evaluate whether the observed patterns remain after controlling for county size.

The central research question is:

Does county-level vote share shift abnormally as turnout increases, beyond what would be expected from ordinary demographic and geographic variation?

By explicitly modeling and testing these relationships, the analysis aims to distinguish between structural electoral patterns and artifacts that could suggest systematic irregularities.

County-level analysis captures broad geographic patterns rather than fine-grained behavior.

7.1 Turnout Data Alignment and Unit of Analysis

We initially explored constructing precinct-level turnout by merging precinct-level election results with precinct-level voter registration data. Although the two datasets exhibited substantial overlap in precinct identifiers, approximately 20% of real precincts—representing roughly 23% of total votes cast—did not match uniquely across files. Importantly, the unmatched precincts were not disproportionately small and were distributed across counties, indicating that their exclusion would remove a nontrivial share of statewide votes and potentially introduce systematic bias.

Because county-level registration and vote totals align cleanly and completely, turnout analysis is conducted at the county level. This approach preserves full vote coverage while avoiding distortions arising from incomplete precinct-level joins.

7.2 Load & Prepare Data

Show Code

file_path <- "../Data/elections-main/data/raw/US_NC/2024/voter_stats_20241105/voter_stats_20241105.txt"

voter_stats_20241105 <- read_tsv(file_path,
                                 col_types = cols())  # lets readr guess types

glimpse(voter_stats_20241105)

Rows: 685,049
Columns: 12
$ county_desc    <chr> "WAKE", "NEW HANOVER", "WATAUGA", "DURHAM", "JOHNSTON",…
$ election_date  <chr> "11/05/2024", "11/05/2024", "11/05/2024", "11/05/2024",…
$ stats_type     <chr> "voter", "voter", "voter", "voter", "voter", "voter", "…
$ precinct_abbrv <chr> "07-04", "W34", "15", "09", "PR38", "08", "FL", "AE", "…
$ vtd_abbrv      <chr> "07-04", "W24", "15", "09", "PR17", "08", "FL", "05", "…
$ party_cd       <chr> "UNA", "UNA", "DEM", "UNA", "REP", "DEM", "UNA", "DEM",…
$ race_code      <chr> "O", "O", "U", "W", "M", "W", "W", "A", "W", "U", "M", …
$ ethnic_code    <chr> "NL", "UN", "UN", "NL", "UN", "NL", "NL", "NL", "NL", "…
$ sex_code       <chr> "M", "M", "U", "U", "M", "F", "U", "M", "U", "U", "M", …
$ age            <chr> "Age 41 - 65", "Age 26 - 40", "Age 26 - 40", "Age Over …
$ total_voters   <dbl> 10, 3, 10, 1, 1, 9, 23, 1, 14, 6, 1, 1, 8, 1, 1, 1, 1, …
$ update_date    <chr> "11/05/2024", "11/05/2024", "11/05/2024", "11/05/2024",…

7.3 Exploration & Validation

7.3.1 Total Registered Voters in North Carolina

Show Code

voter_stats_20241105 %>%
  summarise(total_registered = sum(total_voters, na.rm = TRUE))

# A tibble: 1 × 1
  total_registered
             <dbl>
1          7854464

7.3.2 Context: Population, Registration, and Civic Engagement in North Carolina (2024)

North Carolina’s population surpassed 11 million in 2024, following continued post‑2020 growth (osbm.nc.gov). Using recent Census estimates, roughly 22% of residents are under age 18, implying an adult (18+) population of approximately 8.6 million.

With 7,854,464 registered voters, this suggests that roughly 91% of adults were registered to vote in 2024. Because the 18+ population includes non‑citizens and other ineligible residents, the share of eligible citizens who are registered is likely even higher.

North Carolina does not have automatic voter registration (AVR) (findlaw.com). However, the state provides online voter registration through the DMV for eligible citizens with a driver’s license or state ID (brennancenter.org), along with same‑day registration during early voting. These mechanisms likely contribute to high registration coverage.

Statewide participation in the 2024 general election was also high, with nearly 6 million ballots cast (democracync.org). Taken together, these figures indicate a highly engaged electorate. Despite ongoing debates over voter ID requirements and voter roll maintenance (apnews.com), North Carolina’s registration and turnout levels suggest robust civic participation across the state.

7.3.3 Calculate Turnout by County

Even though we can’t get precinct-level turnout rates, we can still build county-level turnout. All precincts belong to one, and only one, county so this will be a higher-level analysis.

Show Code

# Aggregate registered voters by county
registered_county <- voter_stats_20241105 %>%
  group_by(county_desc) %>%
  summarise(
    registered_voters = sum(total_voters, na.rm = TRUE),
    .groups = "drop"
  )

# Aggregate presidential vote totals and party shares by county
pres_county_total <- nc_results_raw %>%
  filter(`Contest Name` == "US PRESIDENT") %>%
  group_by(County) %>%
  summarise(
    total_votes = sum(`Total Votes`, na.rm = TRUE),
    dem_votes = sum(`Total Votes` * (`Choice Party` == "DEM"), na.rm = TRUE),
    rep_votes = sum(`Total Votes` * (`Choice Party` == "REP"), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    dem_share = dem_votes / total_votes,
    rep_share = rep_votes / total_votes
  )

county_data <- pres_county_total %>%
  left_join(registered_county,
            by = c("County" = "county_desc")) %>%
  mutate(
    turnout = total_votes / registered_voters,
    
    two_party_total = dem_votes + rep_votes,
    dem_share = dem_votes / two_party_total,
    rep_share = rep_votes / two_party_total,
    
    # Standard definition: Republican minus Democratic
    party_gap = rep_share - dem_share,
    
    # Winning party indicator
    winning_party = case_when(
      party_gap > 0  ~ "Republican",
      party_gap < 0  ~ "Democratic",
      TRUE           ~ "Tie"
    )
  )

# Summarize county-level turnout distribution
summary(county_data$turnout)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.5907  0.7064  0.7389  0.7302  0.7601  0.8198

County-level turnout in the 2024 presidential election ranged from approximately 59% to 82% of registered voters, with a median near 70% and a mean around 73%. This distribution indicates consistently high participation across counties, with no county falling below majority turnout among registered voters.

7.3.4 Distribution of Turnout

Show Code

county_data %>%
  mutate(turnout_band = cut(turnout,
                            breaks = c(0, .2, .3, .4, .5, .6, .7, .8, 1),
                            include.lowest = TRUE)) %>%
  count(turnout_band) %>%
  arrange(turnout_band)

# A tibble: 4 × 2
  turnout_band     n
  <fct>        <int>
1 (0.5,0.6]        2
2 (0.6,0.7]       19
3 (0.7,0.8]       77
4 (0.8,1]          2

All but 4 counties were between 60-80% turnout. Turnout is the number of registered voters in a county divided by the total votes cast in a contest.

7.3.5 Overall Total Votes Cast

Show Code

# Compare total presidential votes with and without Real Precinct filter
nc_results_raw %>%
  filter(`Contest Name` == "US PRESIDENT") %>%
  summarise(total_votes = sum(`Total Votes`, na.rm = TRUE))

# A tibble: 1 × 1
  total_votes
        <dbl>
1     5699141

7.3.6 Real Precinct Votes Cast

Show Code

nc_results_raw %>%
  filter(`Contest Name` == "US PRESIDENT",
         `Real Precinct` == "Y") %>%
  summarise(total_votes = sum(`Total Votes`, na.rm = TRUE))

# A tibble: 1 × 1
  total_votes
        <dbl>
1     3923739

7.4 Turnout Distribution Histograms

7.4.1 Distribution of County-Level Turnout

Show Code

# Histogram of county-level turnout (percent of registered voters casting ballots)
library(ggplot2)

ggplot(county_data, aes(x = turnout)) +
  geom_histogram(bins = 15, fill = "mediumpurple4", color = "white") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Distribution of County-Level Turnout (2024 Presidential Election)",
    x = "Turnout (Percent of Registered Voters)",
    y = "Number of Counties"
  ) +
  theme_minimal()

County-level turnout in the 2024 presidential election is tightly clustered, with most counties falling between roughly 68% and 78% participation. The distribution is slightly right-skewed, with a small number of lower-turnout counties near 60%, but no county falls below majority turnout. Overall, the histogram indicates consistently high engagement across the state rather than sharp regional disparities.

7.4.2 Distribution of Democratic Vote Share

Show Code

# Histogram of Democratic vote share by county
ggplot(county_data, aes(x = dem_share)) +
  geom_histogram(bins = 15, fill = "royalblue", color = "white") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Distribution of Democratic Vote Share by County",
    x = "Democratic Vote Share",
    y = "Number of Counties"
  ) +
  theme_minimal()

The distribution of Democratic vote share across counties is centered in the mid‑30% to low‑40% range, indicating that most counties lean Republican at the presidential level. However, the distribution exhibits a noticeable right tail, with several counties exceeding 55% Democratic support and a small number approaching or surpassing 70%. This pattern suggests geographic concentration of Democratic strength in a limited number of urban or metropolitan counties rather than broad statewide parity.

7.4.3 Distribution of Republican Vote Share

Show Code

# Histogram of Republican vote share by county
ggplot(county_data, aes(x = rep_share)) +
  geom_histogram(bins = 15, fill = "firebrick", color = "white") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Distribution of Republican Vote Share by County",
    x = "Republican Vote Share",
    y = "Number of Counties"
  ) +
  theme_minimal()

The Republican vote share distribution is centered between roughly 55% and 70%, indicating that a majority of counties delivered clear Republican pluralities in the 2024 presidential election. The distribution is left‑skewed, with a smaller number of counties falling below 45% Republican support and a long upper tail extending above 75%. Compared to the Democratic distribution, Republican support appears more geographically widespread, with high vote shares observed across a larger number of counties.

7.4.4 Distribution of County-Level Party Share Gap

Show Code

# Histogram of Party Share Gap with blue-to-red gradient centered at zero
ggplot(county_data, aes(x = party_gap)) +
  geom_histogram(
    aes(fill = after_stat(x)),
    bins = 15,
    color = "white"
  ) +
  scale_fill_gradient2(
    low = "royalblue",
    mid = "gray90",
    high = "firebrick",
    midpoint = 0,
    labels = scales::percent_format(accuracy = 1),
    name = "Party Gap\n(REP - DEM)"
  ) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Distribution of County-Level Party Share Gap",
    x = "Party Share Gap (REP - DEM)",
    y = "Number of Counties"
  ) +
  theme_minimal()

This histogram displays the county-level Party Share Gap (Republican share minus Democratic share), with color indicating direction and magnitude of advantage. Blue bars represent counties where Democrats received a higher vote share, red bars represent Republican advantages, and lighter shades near zero indicate competitive counties. The distribution is concentrated on the positive side of zero, indicating that most counties favored the Republican candidate, though the range of values shows substantial geographic variation in partisan margins.

7.4.5 Distribution of County-Level Vote Share by Party

Show Code

# Reshape data to long format to overlay DEM and REP vote shares
#library(tidyr)

vote_share_long <- county_data %>%
  select(County, dem_share, rep_share) %>%
  pivot_longer(
    cols = c(dem_share, rep_share),
    names_to = "party",
    values_to = "vote_share"
  )

# Overlay histogram of DEM and REP vote shares
ggplot(vote_share_long, aes(x = vote_share, fill = party)) +
  geom_histogram(bins = 15, alpha = 0.5, position = "identity", color = "white") +
  scale_fill_manual(
    values = c("dem_share" = "royalblue", 
               "rep_share" = "firebrick"),
    labels = c("Democratic", "Republican")
  ) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Distribution of County-Level Vote Share by Party",
    x = "Vote Share",
    y = "Number of Counties",
    fill = "Party"
  ) +
  theme_minimal()

7.5 Scatterplots

7.5.1 Turnout vs Democratic Vote Share by County

Show Code

# Scatterplot: County turnout vs Democratic vote share
ggplot(county_data, aes(x = turnout, y = dem_share)) +
  geom_point(color = "royalblue", alpha = 0.7) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Turnout vs Democratic Vote Share by County",
    x = "Turnout (Percent of Registered Voters)",
    y = "Democratic Vote Share"
  ) +
  theme_minimal()

7.5.2 Turnout vs Republican Vote Share by County

Show Code

# Scatterplot: County turnout vs Republican vote share
ggplot(county_data, aes(x = turnout, y = rep_share)) +
  geom_point(color = "firebrick", alpha = 0.7) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Turnout vs Republican Vote Share by County",
    x = "Turnout (Percent of Registered Voters)",
    y = "Republican Vote Share"
  ) +
  theme_minimal()

7.5.3 Turnout vs Party Share Gap Scatterplot

Show Code

# Scatterplot: Turnout vs Party Share Gap colored by winning party
ggplot(county_data, aes(x = turnout, y = party_gap, color = winning_party)) +
  geom_point(alpha = 0.8) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  scale_color_manual(
    values = c("Democratic" = "royalblue",
               "Republican" = "firebrick",
               "Tie" = "gray50")
  ) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Turnout vs Party Share Gap (Colored by County Winner)",
    x = "Turnout (Percent of Registered Voters)",
    y = "Party Share Gap (REP - DEM)",
    color = "Winning Party"
  ) +
  theme_minimal()

Show Code

# Load required package
library(tidyr)

# Reshape data for plotting both party vote shares
vote_share_long <- county_data %>%
  select(County, turnout, dem_share, rep_share, winning_party) %>%
  pivot_longer(
    cols = c(dem_share, rep_share),
    names_to = "party",
    values_to = "vote_share"
  )

# Scatterplot with party-specific smoothing lines
ggplot() +
  
  # Points colored by which party won the county
  geom_point(
    data = county_data,
    aes(x = turnout, y = pmax(dem_share, rep_share), color = winning_party),
    alpha = 0.7
  ) +
  
  # Democratic smoothing line
  geom_smooth(
    data = vote_share_long %>% filter(party == "dem_share"),
    aes(x = turnout, y = vote_share),
    method = "loess",
    se = FALSE,
    color = "royalblue",
    linewidth = 1
  ) +
  
  # Republican smoothing line
  geom_smooth(
    data = vote_share_long %>% filter(party == "rep_share"),
    aes(x = turnout, y = vote_share),
    method = "loess",
    se = FALSE,
    color = "firebrick",
    linewidth = 1
  ) +
  
  scale_color_manual(
    values = c(
      "Democratic" = "royalblue",
      "Republican" = "firebrick",
      "Tie" = "gray50"
    )
  ) +
  
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  
  labs(
    title = "Turnout vs Vote Share by Party",
    x = "Turnout (Percent of Registered Voters)",
    y = "Vote Share",
    color = "Winning Party"
  ) +
  
  theme_minimal()

The scatterplot reveals a clear asymmetry in how vote share varies with turnout. As county turnout increases, the Republican vote share trend rises steadily, while the Democratic vote share trend declines. High-turnout counties are disproportionately Republican-leaning, as indicated by both the upward-sloping red smoothing curve and the clustering of red points at higher turnout levels. In contrast, Democratic vote share peaks at moderate turnout levels and declines in the highest-turnout counties. Overall, the figure suggests a positive association between turnout and Republican advantage at the county level.

8 Conclusion

This analysis set out to examine whether patterns in North Carolina’s 2024 presidential precinct- and county-level results exhibit statistical characteristics consistent with vote manipulation, or whether they are better explained by ordinary electoral structure and voter behavior. Across multiple levels of aggregation and several complementary visual and quantitative diagnostics, the evidence consistently supports the latter interpretation.

8.1 Distribution of Turnout

We began by examining turnout levels across counties and precincts. Turnout was neither unnaturally uniform nor clustered at suspicious thresholds. Instead, it displayed:

A smooth distribution
Natural variation across counties
No unusual spikes at round-number thresholds (e.g., 60%, 70%, 80%)
No compression at extreme upper bounds

Formally, turnout is defined as:

\[ \text{Turnout} = \frac{\text{Total Votes}}{\text{Registered Voters}} \]

If manipulation were occurring at scale, we might expect:

Artificial clustering at target participation levels
Discontinuities
Digit anomalies
Over-concentration near psychologically salient thresholds

None were observed. The turnout distribution appears organic and demographically structured.

8.5 Structural Interpretation

The turnout–gap relationship appears consistent with:

Geographic partisan sorting
Rural–urban participation differences
Demographic turnout variation
Voter mobilization patterns

Importantly, the pattern is gradual and statistically smooth — hallmarks of social processes rather than discrete algorithmic adjustments.

Manipulation models typically generate:

Mechanical ceilings or floors
Abrupt slope changes
Artificially uniform margins
Excess observations just beyond victory thresholds

The observed data do not exhibit those features.

9 Overall Assessment

Across all examined diagnostics:

✅ Turnout distribution appears organic
✅ Party share distributions appear continuous
✅ Margin distribution shows no threshold manipulation
✅ Turnout–gap relationship is smooth and behaviorally plausible
✅ No clustering indicative of mechanical adjustment

There is no statistical evidence in this analysis that suggests systemic vote manipulation.

This does not prove that manipulation is impossible. Rather, it indicates that:

The observed electoral structure is consistent with demographic, geographic, and political sorting — not with algorithmic or procedural distortion.

10 What We Did Not Find

We did not observe:

Digit anomalies
Margin discontinuities
Suspicious turnout clustering
Artificial dominance compression
Structural breaks in turnout–margin slopes

In short, none of the commonly cited quantitative red flags appear in these data.

11 Final Conclusion

The patterns observed in North Carolina’s 2024 presidential results are statistically coherent, continuous, and structurally plausible. The distributions and relationships examined align with known features of political geography and voter behavior.

Within the scope of this analysis, there is no indication of vote manipulation.

The evidence supports the interpretation that the turnout and partisan outcomes reflect ordinary electoral dynamics rather than systemic distortion.

1 Purpose and Analytical Framework

1.1 Objective

2 Structure of the Data

2.1 Reporting Units

2.2 Voting Methods (Administrative Categories)

2.3 Voting Modes (Analytic Categories)

2.4 Candidates and Party Labels

2.5 Registered Voters and Turnout

2.6 Summary

3 Distribution of Votes by Voting Method

3.0.1 What is a Real Precinct (NC Context)?

3.1 Validate November 2024 Election Date in data

3.2 Summary of Voting Methods Used

3.3 Summary

4 Mail vs In‑Person Analysis

4.1 Distribution of Mail Voting by County

4.2 Calculate Mail Share of Total Votes

4.2.1 Overall Mail Vote Share

4.2.2 Top Mail Precincts

4.2.3 Sample of Mail Data

4.3 Interpreting Statewide Vote Totals in the Long-Format Dataset

4.4 County-Level Mail Share Analysis

4.4.1 Objective

4.4.2 Data Preparation

4.4.3 Mail Share by County statistics

4.4.4 Statewide Mail Share

4.4.5 County-Level Variation

4.4.5.1 Highest Mail Share Counties

4.4.5.2 Lowest Mail Share Counties

4.5 County Mail Share by County Size

4.6 County Mail Share by County Size (Log Scale)

4.6.1 Log-Linear Regression Model

4.7 Correlation Tests

4.7.1 Pearson Correlation on Linear Scale

4.7.2 Pearson Correlation on Log Scale

4.7.3 Spearman Correlation on Linear Scale

4.7.4 Mail Vote Share to County Size Correlation Summary

4.8 County-level Mail Share Summary

4.9 Residual Analysis

4.10 Visual Diagnostics of County-Level Residuals

4.10.1 Residuals vs Log County Size (Primary Diagnostic)

4.10.2 Distribution of Residuals

4.10.3 Q–Q Plot of County Residuals

4.11 Overall Diagnostic Summary

5 Precinct-Level Analysis of In-Person Voting Methods

5.1 Calculate Precinct Sizes

5.2 Organize In-Person Vote data

5.3 Create Composition Shares

5.3.1 Definition of Precinct Size

5.3.2 Calculate Voting Method Shares

5.3.3 Early Voting Share Statistics

5.3.4 Election Day Share Statistics

5.3.5 Early Voting Share Linear Model

5.3.6 Plot Early Voting Share by Precinct Size

5.3.7 Mix of Precinct Types by Vote Method

5.4 Analyze Mixed Precincts

5.4.1 Early Voting Share Model for Mixed Precincts

5.4.2 Exploring Residuals

5.5 Party Vote Share by Precinct Size

5.5.1 In‑Person Party Vote Share

5.5.2 Plot Vote Share vs Precinct Size

5.6 Early Vote Share by Precinct Size

5.6.1 Party‑Specific Early Vote Share (Mixed Precincts Only)

5.7 Election Day Party Vote Shares by Precinct Size

5.7.1 Section Conclusion: Precinct Size and Party Vote Method

6 Vote Share by Cumulative Vote Count (ETA-style)

6.1 Summary

7 Turnout-Based Analysis

7.1 Turnout Data Alignment and Unit of Analysis

7.2 Load & Prepare Data

7.3 Exploration & Validation

7.3.1 Total Registered Voters in North Carolina

7.3.2 Context: Population, Registration, and Civic Engagement in North Carolina (2024)

7.3.3 Calculate Turnout by County

7.3.4 Distribution of Turnout

7.3.5 Overall Total Votes Cast

7.3.6 Real Precinct Votes Cast

7.4 Turnout Distribution Histograms

7.4.1 Distribution of County-Level Turnout

7.4.2 Distribution of Democratic Vote Share