Mapping the Structure of North Carolina’s 2024 Electorate: A County-Level Demographic Typology

Author

Kier O’Neil

Published

June 20, 2026

0.1 Abstract

This paper examines the structure of North Carolina’s 2024 electorate at the county level through a unified demographic and electoral framework. Using publicly available data, I construct a consolidated dataset for all 100 counties that integrates racial composition, age structure, party registration, and voter turnout. The objective is not to test a single causal theory, but to uncover the underlying demographic structure that organizes electoral behavior across the state.

To identify latent patterns, I apply k-means clustering to standardized demographic and political variables and evaluate the results using within-cluster variance and principal component analysis (PCA). The analysis reveals four structurally distinct county types, each defined by a coherent demographic profile and corresponding electoral pattern. These clusters represent:

Urban Metro
Black Belt Democratic
Suburban / Exurban GOP
Rural Aging GOP

Dimensional reduction shows that the first three principal components explain over 85% of the total variance, indicating that the clusters reflect meaningful structural separation rather than arbitrary partitioning. When mapped geographically, the clusters form contiguous regional blocs rather than scattered or checkerboard patterns, reinforcing the conclusion that they capture real political-demographic ecologies.

The findings suggest that county-level voting behavior in North Carolina is best understood as an expression of deeper demographic structure. Rather than a simple urban–rural divide, the state exhibits multiple, clearly defined electoral regions shaped by race, age composition, and partisan alignment. Together, these results demonstrate how population structure organizes political geography in a measurable and geographically coherent way.

A data-driven typology of county-level political-demographic structure in North Carolina.

1 Load & Format 2024 North Carolina Election Results

Show Code

# Load 2024 North Carolina general election results (precinct-level file)
# This dataset will later be aggregated to the county level and merged
# with demographic and registration data.

library(tidyverse)
library(GGally)

file_path <- "../Data/elections-main/data/raw/US_NC/2024/results_pct_20241105/results_pct_20241105.txt"

nc_results_raw <- read_tsv(
  file = file_path,
  col_types = cols(
    County = col_character(),
    `Election Date` = col_date(format = "%m/%d/%Y"),
    Precinct = col_character(),
    `Contest Group ID` = col_double(),
    `Contest Type` = col_character(),
    `Contest Name` = col_character(),
    Choice = col_character(),
    `Choice Party` = col_character(),
    `Vote For` = col_double(),
    `Election Day` = col_double(),
    `Early Voting` = col_double(),
    `Absentee by Mail` = col_double(),
    Provisional = col_double(),
    `Total Votes` = col_double(),
    `Real Precinct` = col_character(),
    `...16` = col_skip()   # Skip unused column
  ),
  progress = FALSE
)

# --------------------------------------------------
# Build streamlined county-level election dataset
# (Election results only — no registration join)
# --------------------------------------------------

county_data <- nc_results_raw %>%
  filter(`Contest Name` == "US PRESIDENT") %>%
  group_by(County) %>%
  summarise(
    total_votes = sum(`Total Votes`, na.rm = TRUE),
    dem_votes   = sum(`Total Votes` * (`Choice Party` == "DEM"), na.rm = TRUE),
    rep_votes   = sum(`Total Votes` * (`Choice Party` == "REP"), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    # Two-party totals
    two_party_total = dem_votes + rep_votes,
    
    dem_share = dem_votes / two_party_total,
    rep_share = rep_votes / two_party_total,
    
    party_gap = rep_share - dem_share,
    
    winning_party = case_when(
      party_gap > 0  ~ "Republican",
      party_gap < 0  ~ "Democratic",
      TRUE           ~ "Tie"
    )
  ) %>%
  select(
    County,
    total_votes,
    dem_share,
    rep_share,
    party_gap,
    winning_party
  )

head(county_data)

# A tibble: 6 × 6
  County    total_votes dem_share rep_share party_gap winning_party
  <chr>           <dbl>     <dbl>     <dbl>     <dbl> <chr>        
1 ALAMANCE        89831     0.459     0.541    0.0826 Republican   
2 ALEXANDER       20677     0.198     0.802    0.603  Republican   
3 ALLEGHANY        6496     0.238     0.762    0.523  Republican   
4 ANSON           10875     0.487     0.513    0.0252 Republican   
5 ASHE            16253     0.276     0.724    0.448  Republican   
6 AVERY            9489     0.236     0.764    0.528  Republican

2 Load & Format County Demographics from Voter Registrations

Show Code

# ---- Construct County-Level Demographic Dataset ----

#library(dplyr)
#library(tidyr)
#library(stringr)

# Generic helper to aggregate voter file counts
make_count_wide <- function(data, var, prefix) {
  data %>%
    group_by(county_desc, !!sym(var)) %>%
    summarise(count = sum(total_voters, na.rm = TRUE), .groups = "drop") %>%
    mutate(
      category = paste0(
        prefix,
        str_replace_all(tolower(!!sym(var)), "[^a-z0-9]", "_")
      ),
      category = str_replace(category, "ethnic_hl", "ethnic_hispanic")
    ) %>%
    select(county_desc, category, count) %>%
    pivot_wider(
      names_from  = category,
      values_from = count,
      values_fill = 0
    )
}

# ---- Build County-Level Counts ----
# Load data
file_path <- "../Data/elections-main/data/raw/US_NC/2024/voter_stats_20241105/voter_stats_20241105.txt"

voter_stats_20241105 <- read_tsv(file_path,
                                 col_types = cols())  # lets readr guess types

county_demographics <- voter_stats_20241105 %>%
  group_by(county_desc) %>%
  summarise(
    registered_voters = sum(total_voters, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  rename(County = county_desc) %>%
  left_join(make_count_wide(voter_stats_20241105, "party_cd",  "party_"),
            by = c("County" = "county_desc")) %>%
  left_join(make_count_wide(voter_stats_20241105, "sex_code",  "sex_"),
            by = c("County" = "county_desc")) %>%
  left_join(make_count_wide(voter_stats_20241105, "age",       "age_"),
            by = c("County" = "county_desc")) %>%
  left_join(make_count_wide(voter_stats_20241105, "race_code", "race_"),
            by = c("County" = "county_desc")) %>%
  left_join(make_count_wide(voter_stats_20241105, "ethnic_code", "ethnic_"),
            by = c("County" = "county_desc")) %>%
  mutate(across(matches("^(party_|sex_|age_|race_|ethnic_)"),
                ~ replace_na(.x, 0)))

# ---- Compute Shares ----

county_demographics <- county_demographics %>%
  mutate(
    party_other = registered_voters - (party_rep + party_dem + party_una)
  ) %>%
  mutate(
    across(matches("^(party_|sex_|age_|race_|ethnic_)"),
           ~ .x / registered_voters,
           .names = "{.col}_share")
  )

# ---- Consolidate Race Categories ----

county_demographics <- county_demographics %>%
  mutate(
    race_white   = race_w,
    race_black   = race_b,
    race_native  = race_i,
    race_unknown = race_u,
    race_other   = race_a + race_m + race_p + race_o
  ) %>%
  select(-matches("^race_(w|b|i|a|m|o|p|u)(_share)?$"))

# ---- Final Merge with Election Data ----

county_full <- county_demographics %>%
  left_join(county_data, by = "County") %>%
  mutate(
    turnout = total_votes / registered_voters
  )


# --------------------------------------------------
# STEP 2: Clean and consolidate race variables
# --------------------------------------------------

county_full <- county_full %>%
  mutate(
    # Compute race shares
    race_white_share   = race_white   / registered_voters,
    race_black_share   = race_black   / registered_voters,
    race_native_share  = race_native  / registered_voters,
    race_unknown_share = race_unknown / registered_voters,
    race_other_share   = race_other   / registered_voters
  ) %>%
  # Remove original detailed race codes
  select(
    -matches("^race_(w|b|i|a|m|o|p|u|NA)(_share)?$")
  )

# --------------------------------------------------
# STEP 3: Rename ethnicity variables
# --------------------------------------------------

county_full <- county_full %>%
  rename(
    ethnic_non_hispanic       = ethnic_nl,
    ethnic_non_hispanic_share = ethnic_nl_share,
    ethnic_unknown            = ethnic_un,
    ethnic_unknown_share      = ethnic_un_share
  )

# --------------------------------------------------
# STEP 4: Collapse minor political parties into "party_other"
# --------------------------------------------------

county_full <- county_full %>%
  mutate(
    party_other = party_cst + party_gre + party_jfa +
                  party_lib + party_nlb + party_wtp,
    party_other_share = party_other / registered_voters
  ) %>%
  # Remove the individual minor party columns
  select(
    -matches("^party_(cst|gre|jfa|lib|nlb|wtp)(_share)?$")
  )


# --------------------------------------------------
# STEP 5: Clean age and sex variable names
# --------------------------------------------------

county_full <- county_full %>%
  rename(
    # ---- Age categories ----
    age_18_25         = age_age_18___25,
    age_18_25_share   = age_age_18___25_share,
    age_26_40         = age_age_26___40,
    age_26_40_share   = age_age_26___40_share,
    age_41_65         = age_age_41___65,
    age_41_65_share   = age_age_41___65_share,
    age_66_plus       = age_age_over_66,
    age_66_plus_share = age_age_over_66_share,
    
    # ---- Sex categories ----
    sex_male          = sex_m,
    sex_male_share    = sex_m_share,
    sex_female        = sex_f,
    sex_female_share  = sex_f_share,
    sex_unknown       = sex_u,
    sex_unknown_share = sex_u_share
  )

# --------------------------------------------------
# SAFE interleaving (demographics only)
# --------------------------------------------------

county_full <- county_full %>%
  {
    # Identify demographic variables that have both count and share
    demo_base <- names(.)[
      !endsWith(names(.), "_share") &
      paste0(names(.), "_share") %in% names(.)
    ]
    
    # Create ordered pairs
    demo_ordered <- unlist(
      lapply(demo_base, function(x) c(x, paste0(x, "_share")))
    )
    
    # Keep everything else untouched
    other_cols <- setdiff(names(.), demo_ordered)
    
    # Final ordering: keep non-demo columns first, then interleaved demo
    select(., all_of(c(other_cols, demo_ordered)))
  }

# --------------------------------------------------
# PHASE 4: Final logical column ordering
# --------------------------------------------------

county_full <- county_full %>%
  
  # Election outcomes immediately after County
  relocate(
    registered_voters,
    total_votes,
    turnout,
    dem_share,
    rep_share,
    party_gap,
    winning_party,
    .after = County
  ) %>%
  
  # Party demographics
  relocate(
    matches("^party_"),
    .after = winning_party
  ) %>%
  
  # Sex demographics
  relocate(
    matches("^sex_"),
    .after = last_col()
  ) %>%
  
  # Age demographics
  relocate(
    matches("^age_"),
    .after = last_col()
  ) %>%
  
  # Race demographics
  relocate(
    matches("^race_"),
    .after = last_col()
  ) %>%
  
  # Ethnicity demographics
  relocate(
    matches("^ethnic_"),
    .after = last_col()
  )

glimpse(county_full)

Rows: 100
Columns: 46
$ County                    <chr> "ALAMANCE", "ALEXANDER", "ALLEGHANY", "ANSON…
$ registered_voters         <dbl> 119766, 25966, 8335, 16745, 20829, 12871, 35…
$ total_votes               <dbl> 89831, 20677, 6496, 10875, 16253, 9489, 2657…
$ turnout                   <dbl> 0.7500543, 0.7963106, 0.7793641, 0.6494476, …
$ dem_share                 <dbl> 0.4587121, 0.1983972, 0.2383025, 0.4873817, …
$ rep_share                 <dbl> 0.5412879, 0.8016028, 0.7616975, 0.5126183, …
$ winning_party             <chr> "Republican", "Republican", "Republican", "R…
$ party_gap                 <dbl> 0.08257585, 0.60320563, 0.52339499, 0.025236…
$ party_dem                 <dbl> 38297, 3991, 1696, 8532, 3688, 1277, 10125, …
$ party_dem_share           <dbl> 0.31976521, 0.15370099, 0.20347930, 0.509525…
$ party_rep                 <dbl> 38341, 12712, 3558, 3387, 9557, 7322, 13342,…
$ party_rep_share           <dbl> 0.3201326, 0.4895633, 0.4268746, 0.2022693, …
$ party_una                 <dbl> 42131, 9064, 3019, 4725, 7444, 4173, 11447, …
$ party_una_share           <dbl> 0.3517776, 0.3490719, 0.3622076, 0.2821738, …
$ party_other               <dbl> 997, 199, 62, 101, 140, 99, 211, 60, 157, 10…
$ party_other_share         <dbl> 0.008324566, 0.007663868, 0.007438512, 0.006…
$ sex_female                <dbl> 59827, 12430, 3941, 7496, 10199, 6231, 17363…
$ sex_female_share          <dbl> 0.4995324, 0.4787029, 0.4728254, 0.4476560, …
$ sex_male                  <dbl> 49500, 11625, 3688, 6247, 9259, 5714, 14827,…
$ sex_male_share            <dbl> 0.4133059, 0.4477008, 0.4424715, 0.3730666, …
$ sex_unknown               <dbl> 10439, 1911, 706, 3002, 1371, 926, 2935, 981…
$ sex_unknown_share         <dbl> 0.08716163, 0.07359624, 0.08470306, 0.179277…
$ age_18_25                 <dbl> 17134, 3054, 778, 1917, 1963, 1245, 3874, 13…
$ age_18_25_share           <dbl> 0.14306230, 0.11761534, 0.09334133, 0.114481…
$ age_26_40                 <dbl> 28661, 5322, 1348, 3799, 3596, 2366, 6281, 2…
$ age_26_40_share           <dbl> 0.2393083, 0.2049603, 0.1617277, 0.2268737, …
$ age_41_65                 <dbl> 45906, 10711, 3181, 6566, 8238, 4817, 13237,…
$ age_41_65_share           <dbl> 0.3832974, 0.4125010, 0.3816437, 0.3921170, …
$ age_66_plus               <dbl> 28065, 6879, 3028, 4463, 7032, 4443, 11733, …
$ age_66_plus_share         <dbl> 0.2343319, 0.2649234, 0.3632873, 0.2665273, …
$ race_white                <dbl> 76734, 22500, 7292, 6948, 18912, 11576, 2428…
$ race_white_share          <dbl> 0.6406994, 0.8665178, 0.8748650, 0.4149298, …
$ race_black                <dbl> 24671, 972, 78, 6490, 123, 67, 7098, 7078, 7…
$ race_black_share          <dbl> 0.205993354, 0.037433567, 0.009358128, 0.387…
$ race_native               <dbl> 273, 39, 12, 35, 25, 18, 42, 19, 418, 411, 3…
$ race_native_share         <dbl> 0.002279445, 0.001501964, 0.001439712, 0.002…
$ race_unknown              <dbl> 10597, 1828, 771, 2920, 1445, 1033, 2781, 98…
$ race_unknown_share        <dbl> 0.08848087, 0.07039975, 0.09250150, 0.174380…
$ race_other                <dbl> 7491, 627, 182, 352, 324, 177, 921, 126, 681…
$ race_other_share          <dbl> 0.062546967, 0.024146961, 0.021835633, 0.021…
$ ethnic_hispanic           <dbl> 7018, 545, 255, 169, 357, 133, 747, 56, 658,…
$ ethnic_hispanic_share     <dbl> 0.058597599, 0.020988986, 0.030593881, 0.010…
$ ethnic_non_hispanic       <dbl> 78953, 20911, 5713, 10879, 16581, 8800, 2558…
$ ethnic_non_hispanic_share <dbl> 0.6592272, 0.8053223, 0.6854229, 0.6496865, …
$ ethnic_unknown            <dbl> 33795, 4510, 2367, 5697, 3891, 3938, 8791, 3…
$ ethnic_unknown_share      <dbl> 0.2821752, 0.1736887, 0.2839832, 0.3402210, …

3 Modeling Dataset Construction

Show Code

# ==================================================
# MODELING DATASET CONSTRUCTION
# ==================================================

county_model <- county_full %>%
  mutate(
    # Log-transform size
    log_registered_voters = log(registered_voters),
    
    # Standardize size
    log_registered_voters_z = as.numeric(scale(log_registered_voters))
  ) %>%
  
  select(
    County,
    party_gap,
    log_registered_voters_z,
    ends_with("_share")
  )

glimpse(county_model)

Rows: 100
Columns: 24
$ County                    <chr> "ALAMANCE", "ALEXANDER", "ALLEGHANY", "ANSON…
$ party_gap                 <dbl> 0.08257585, 0.60320563, 0.52339499, 0.025236…
$ log_registered_voters_z   <dbl> 0.97196717, -0.40222296, -1.42366137, -0.796…
$ dem_share                 <dbl> 0.4587121, 0.1983972, 0.2383025, 0.4873817, …
$ rep_share                 <dbl> 0.5412879, 0.8016028, 0.7616975, 0.5126183, …
$ party_dem_share           <dbl> 0.31976521, 0.15370099, 0.20347930, 0.509525…
$ party_rep_share           <dbl> 0.3201326, 0.4895633, 0.4268746, 0.2022693, …
$ party_una_share           <dbl> 0.3517776, 0.3490719, 0.3622076, 0.2821738, …
$ party_other_share         <dbl> 0.008324566, 0.007663868, 0.007438512, 0.006…
$ sex_female_share          <dbl> 0.4995324, 0.4787029, 0.4728254, 0.4476560, …
$ sex_male_share            <dbl> 0.4133059, 0.4477008, 0.4424715, 0.3730666, …
$ sex_unknown_share         <dbl> 0.08716163, 0.07359624, 0.08470306, 0.179277…
$ age_18_25_share           <dbl> 0.14306230, 0.11761534, 0.09334133, 0.114481…
$ age_26_40_share           <dbl> 0.2393083, 0.2049603, 0.1617277, 0.2268737, …
$ age_41_65_share           <dbl> 0.3832974, 0.4125010, 0.3816437, 0.3921170, …
$ age_66_plus_share         <dbl> 0.2343319, 0.2649234, 0.3632873, 0.2665273, …
$ race_white_share          <dbl> 0.6406994, 0.8665178, 0.8748650, 0.4149298, …
$ race_black_share          <dbl> 0.205993354, 0.037433567, 0.009358128, 0.387…
$ race_native_share         <dbl> 0.002279445, 0.001501964, 0.001439712, 0.002…
$ race_unknown_share        <dbl> 0.08848087, 0.07039975, 0.09250150, 0.174380…
$ race_other_share          <dbl> 0.062546967, 0.024146961, 0.021835633, 0.021…
$ ethnic_hispanic_share     <dbl> 0.058597599, 0.020988986, 0.030593881, 0.010…
$ ethnic_non_hispanic_share <dbl> 0.6592272, 0.8053223, 0.6854229, 0.6496865, …
$ ethnic_unknown_share      <dbl> 0.2821752, 0.1736887, 0.2839832, 0.3402210, …

3.1 Do larger counties systematically differ demographically?

Which variables correlate the most to rising populations

Show Code

size_correlations <- county_model %>%
  select(log_registered_voters_z, ends_with("_share")) %>%
  cor(use = "pairwise.complete.obs") %>%
  as.data.frame() %>%
  tibble::rownames_to_column("variable") %>%
  select(variable, log_registered_voters_z) %>%
  arrange(desc(abs(log_registered_voters_z)))

head(size_correlations, 30)

                    variable log_registered_voters_z
1    log_registered_voters_z              1.00000000
2           race_other_share              0.72514873
3            age_26_40_share              0.68187121
4          age_66_plus_share             -0.63468151
5          party_other_share              0.57285408
6      ethnic_hispanic_share              0.56304092
7         race_unknown_share              0.42398245
8            age_18_25_share              0.40395740
9             sex_male_share             -0.37863267
10                 rep_share             -0.32764323
11                 dem_share              0.32764323
12         sex_unknown_share              0.31741393
13           party_una_share              0.29539297
14          race_white_share             -0.10247016
15          sex_female_share             -0.08097208
16      ethnic_unknown_share             -0.07649435
17           party_dem_share             -0.07169094
18 ethnic_non_hispanic_share             -0.05996186
19           age_41_65_share             -0.05889241
20           party_rep_share             -0.05478141
21          race_black_share             -0.04709500
22         race_native_share             -0.03586515

3.2 Correlation Plot

Show Code

county_model %>%
  select(
    log_registered_voters_z,
    party_gap,
    race_white_share,
    race_black_share,
    ethnic_hispanic_share,
    age_26_40_share,
    age_66_plus_share,
    party_dem_share,
    party_rep_share
  ) %>%
  rename(
    size = log_registered_voters_z,
    gap = party_gap,
    white = race_white_share,
    black = race_black_share,
    hispanic = ethnic_hispanic_share,
    age_26_40 = age_26_40_share,
    age_66p = age_66_plus_share,
    dem_reg = party_dem_share,
    rep_reg = party_rep_share
  ) %>%
  ggpairs(
    upper = list(continuous = wrap("cor", size = 4)),
    diag  = list(continuous = wrap("densityDiag")),
    lower = list(continuous = wrap("points", alpha = 0.5))
  )

3.3 Demographic and Political Covariation Across Counties

The pairwise correlation structure reveals a highly organized demographic landscape across counties. County size (standardized log of registered voters) is strongly associated with racial and age composition. Larger counties tend to have higher Hispanic shares and larger populations aged 26–40, while exhibiting substantially lower shares of residents aged 66 and older. This pattern is consistent with an urban–rural demographic gradient.

Political alignment closely tracks racial composition. Counties with higher Black population shares are strongly associated with higher Democratic registration and lower Republican registration, while counties with higher White population shares exhibit the opposite pattern. The magnitude of these correlations indicates that racial composition and party registration are tightly coupled across counties.

Party gap (Democratic minus Republican vote share) mirrors these structural relationships. It is positively associated with younger population shares and negatively associated with older age shares, reinforcing the generational component of political alignment. Party Gap having high correlated to Republican Party share is not expected.

Overall, the correlation matrix suggests that a small number of latent dimensions — particularly an urban/diverse/younger axis and a rural/older/White axis — organize much of the variation across counties. These structured relationships motivate the clustering analysis that follows.

4 Clustering

4.1 Variable Selection for Clustering

The variables included in the clustering model were selected deliberately to capture the core structural dimensions of county-level political ecology while avoiding redundancy. The goal was to represent size, race, age structure, and party alignment without overweighting any single dimension.

A correlation analysis revealed several very strong relationships among candidate variables. For example:

White racial share and Republican registration share were highly correlated, indicating that including both would effectively double-count the same underlying racial–partisan structure.
Democratic and Republican registration shares were also strongly (and mechanically) related.
Age cohorts were negatively correlated across life stages, reflecting demographic tradeoffs within counties.

To avoid overloading the clustering algorithm with redundant information, the final variable set was designed to capture distinct structural axes:

Population scale: standardized log of registered voters
Racial composition: white share and Hispanic share
Age structure: ages 26–40 (working-age concentration) and 66+ (elderly concentration)
Party structure: Democratic registration share and Unaffiliated registration share

Republican registration share was excluded because it is largely collinear with white population share and negatively related to Democratic share. Including it would have amplified the racial–partisan axis without adding new structural information.

In short, the selected variables reflect a balance: they capture the major demographic and political cleavages in North Carolina counties while minimizing redundancy and preserving interpretability. The resulting clusters therefore reflect multidimensional structural differentiation rather than the dominance of any single correlated feature.

4.2 Prepare data for Clustering

Show Code

# Select structural variables only
cluster_data <- county_model %>%
  select(
    log_registered_voters_z,          # standardized log registered voters
    race_white_share,         # racial composition
    ethnic_hispanic_share,      # ethnic composition
    age_26_40_share,     # working-age population
    age_66_plus_share,       # elderly population
    party_dem_share,       # Democratic registration share
    party_una_share        # Independent registration share
  )

# Scale variables
cluster_scaled <- scale(cluster_data)

4.3 How many clusters to create

Look for the elbow.

Show Code

## Optimal Cluster Count  

# Compute within-cluster sum of squares for different k
wss <- sapply(1:10, function(k){
  kmeans(cluster_scaled, centers = k, nstart = 10)$tot.withinss
})

plot(1:10, wss, type = "b")

4.4 Get number of counties in each cluster

Look for overfitting.

The results reflect the number of counties assigned to each cluster for each cluster size.

Show Code

set.seed(123)

k3 <- kmeans(cluster_scaled, centers = 3, nstart = 25)
k4 <- kmeans(cluster_scaled, centers = 4, nstart = 25)
k5 <- kmeans(cluster_scaled, centers = 5, nstart = 25)

k3$size

[1] 47 26 27

Show Code

k4$size

[1] 22 23 36 19

Show Code

k5$size

[1] 23 30  9 17 21

4.5 Determining the Number of Clusters

To determine the appropriate number of clusters, I evaluated solutions ranging from three to five groups using the within-cluster sum of squares (WSS) criterion and compared the resulting partitions for balance and interpretability.

The WSS plot shows a clear inflection point around four clusters. Moving from three to four clusters produces a substantial reduction in within-cluster variance, indicating improved structural separation. However, the marginal gain from four to five clusters is noticeably smaller, suggesting diminishing returns beyond four groups.

Substantively, the cluster sizes also support this choice:

3 clusters: one very large group (47 counties) and two mid-sized groups — overly coarse and compressing meaningful distinctions.
4 clusters: relatively balanced partitions (22, 23, 36, 19 counties) — differentiated yet stable.
5 clusters: one small cluster of only 9 counties — indicating fragmentation rather than meaningful new structure.

The five-cluster solution primarily subdivided an existing group without introducing a clearly interpretable new regional or demographic type. In contrast, the four-cluster solution produced groups that were:

Structurally distinct in demographic composition
Substantively interpretable
Geographically coherent when mapped

For these reasons, four clusters represent the most parsimonious and theoretically meaningful solution, balancing explanatory power with interpretability while avoiding overfitting.

4.6 Attributes of county in each cluster

Show Code

aggregate(cluster_data,
          by = list(cluster = k4$cluster),
          mean)

  cluster log_registered_voters_z race_white_share ethnic_hispanic_share
1       1               1.0715476        0.5633098            0.05652699
2       2              -0.6736215        0.4848214            0.01880951
3       3               0.1802221        0.7912512            0.02732424
4       4              -0.7667763        0.8474880            0.01460665
  age_26_40_share age_66_plus_share party_dem_share party_una_share
1       0.2630117         0.2067806       0.3355211       0.3690962
2       0.2083722         0.2858653       0.4595765       0.3086569
3       0.2160504         0.2681026       0.2157371       0.3711827
4       0.1721312         0.3547327       0.2089799       0.3633829

4.7 🔵 Cluster 1 (n = 22)

Large counties (z = +1.07)
Moderately white (0.56)
Highest Hispanic share (0.057)
Younger working-age skew
Moderate elderly
Moderate Dem share (0.34)
High UNA (0.37)

Interpretation: Urban / Metro Counties

Large population centers, younger, more diverse, substantial unaffiliated bloc.

Likely: Wake, Mecklenburg-type profile.

4.8 🔵 Cluster 2 (n = 23)

Smaller counties (z = −0.67)
Least white (0.48)
Very low Hispanic share
Older population
Highest Dem registration (0.46)
Lowest UNA (0.31)

Interpretation: Black Belt / Historically Democratic Rural Counties

Lower white share + high Dem registration + older structure.

This is a distinct political culture cluster.

This separation is meaningful.

4.9 🔴 Cluster 3 (n = 36)

Slightly above average size (z = 0.18)
Quite white (0.79)
Moderate age
Low Dem share (0.22)
High UNA (0.37)

Interpretation: Suburban / Exurban Republican-leaning Counties

Whiter, moderately sized, high UNA, low Dem registration.

This looks like outer-ring suburban counties.

4.10 🔴 Cluster 4 (n = 19)

Smallest counties (z = −0.77)
Whitest (0.85)
Oldest (age 66+ = 0.35)
Low Dem share (0.21)
Moderate UNA

Interpretation: Rural Aging Republican Counties

Small, old, white, structurally Republican.

This is a very distinct archetype.

4.11 Attach descriptions to County dataset

Show Code

# --------------------------------------------------
# Attach 4-cluster solution to modeling dataset
# --------------------------------------------------

county_model$cluster4 <- factor(k4$cluster)

# --------------------------------------------------
# Attach cluster labels to full county dataset
# --------------------------------------------------

county_full$cluster4 <- county_model$cluster4

# Replace numeric cluster codes with descriptive labels
county_full$cluster4 <- factor(
  county_full$cluster4,
  levels = c(1, 2, 3, 4),
  labels = c(
    "Urban Metro",
    "Black Belt Democratic",
    "Suburban / Exurban GOP",
    "Rural Aging GOP"
  )
)

# --------------------------------------------------
# Verify cluster sizes
# --------------------------------------------------

table(county_full$cluster4)


           Urban Metro  Black Belt Democratic Suburban / Exurban GOP 
                    22                     23                     36 
       Rural Aging GOP 
                    19

Show Code

# --------------------------------------------------
# Structural variable means by cluster
# --------------------------------------------------

aggregate(
  cluster_data,
  by = list(cluster = county_full$cluster4),
  mean
) %>%
  glimpse()

Rows: 4
Columns: 8
$ cluster                 <fct> Urban Metro, Black Belt Democratic, Suburban /…
$ log_registered_voters_z <dbl> 1.0715476, -0.6736215, 0.1802221, -0.7667763
$ race_white_share        <dbl> 0.5633098, 0.4848214, 0.7912512, 0.8474880
$ ethnic_hispanic_share   <dbl> 0.05652699, 0.01880951, 0.02732424, 0.01460665
$ age_26_40_share         <dbl> 0.2630117, 0.2083722, 0.2160504, 0.1721312
$ age_66_plus_share       <dbl> 0.2067806, 0.2858653, 0.2681026, 0.3547327
$ party_dem_share         <dbl> 0.3355211, 0.4595765, 0.2157371, 0.2089799
$ party_una_share         <dbl> 0.3690962, 0.3086569, 0.3711827, 0.3633829

4.12 Run PCA on standardized structural variables

We use principal components analysis (PCA) to reduce the high-dimensional structural variables into a smaller set of orthogonal components, allowing us to visualize and interpret the primary axes of demographic variation that differentiate counties and clusters.

Show Code

# Run PCA on standardized structural variables
pca_results <- prcomp(cluster_scaled, center = FALSE, scale. = FALSE)

# Attach first two principal components to county dataset
county_full$PC1 <- pca_results$x[, 1]
county_full$PC2 <- pca_results$x[, 2]

# Compute cluster centroids in PC space
cluster_centroids <- county_full %>%
  group_by(cluster4) %>%
  summarise(
    PC1 = mean(PC1),
    PC2 = mean(PC2)
  )

# Display proportion of variance explained
summary(pca_results)

Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7
Standard deviation     1.8019 1.4733 0.80590 0.64691 0.59962 0.30342 0.25108
Proportion of Variance 0.4638 0.3101 0.09278 0.05978 0.05136 0.01315 0.00901
Cumulative Proportion  0.4638 0.7739 0.86669 0.92648 0.97784 0.99099 1.00000

4.13 Principal Component Analysis (PCA) Summary

The PCA results indicate that the demographic and party structure of North Carolina counties is highly concentrated along a small number of underlying dimensions.

PC1 explains 46.4% of total variance, capturing nearly half of all structural variation across counties.
PC2 explains 31.0%, adding a substantial second axis of differentiation.
PC3 explains 9.3% of the variance.

Together, the first three components account for 86.7% of total variance, indicating that county-level demographic and partisan structure is largely organized along three dominant dimensions.

After the third component, additional principal components contribute relatively little explanatory power (each under 6%), suggesting diminishing returns beyond this point.

Substantively, this means that the complex set of racial, age, and party variables used in the clustering model can be effectively summarized in a low-dimensional space. The clustering solution therefore operates within a structure that is not diffuse or noisy, but highly ordered along a few dominant demographic axes.

4.14 Visualize PCA 2D

Show Code

# Visualize counties in PC space colored by cluster
#library(ggplot2)

ggplot(county_full, aes(x = PC1, y = PC2, color = cluster4)) +
  geom_point(size = 3, alpha = 0.8) +
  geom_point(data = cluster_centroids,
             aes(x = PC1, y = PC2),
             size = 6,
             shape = 4,
             stroke = 2,
             color = "black") +
  labs(
    title = "County Structural Typology (PCA Projection)",
    x = "Principal Component 1",
    y = "Principal Component 2",
    color = "Cluster"
  ) +
  theme_minimal()

4.15 Interpretation of PCA Projection

The two-dimensional PCA projection reveals clear structural separation among the four county types, confirming that the clustering solution reflects meaningful demographic differentiation rather than arbitrary partitioning.

4.15.1 Principal Component 1 (Horizontal Axis)

PC1 appears to capture the dominant structural gradient in the state. Counties on the left side of the plot (Urban Metro) are separated from those on the right (Rural Aging GOP), with Suburban / Exurban GOP and Black Belt Democratic counties positioned between or below these poles.

This axis likely reflects a combined racial–partisan–metropolitan gradient:
- Lower white share and higher Democratic registration toward the left
- Higher white share and stronger Republican structure toward the right

4.15.2 Principal Component 2 (Vertical Axis)

PC2 provides a second dimension of separation, distinguishing Black Belt Democratic counties (lower on PC2) from Urban Metro and Suburban counties (higher on PC2).

This vertical separation likely reflects differences in age structure and racial composition that are not captured solely by the primary partisan gradient.

4.15.3 Cluster Separation

The four clusters occupy largely distinct regions of the PCA space.
The centroids (black “X” markers) are well separated, indicating stable cluster centers.
There is limited overlap, suggesting that the groups are structurally cohesive.

Importantly, the projection shows regional cohesion rather than dispersion: counties of the same type cluster tightly around their centroid, reinforcing that the k-means solution aligns with the dominant demographic dimensions identified by PCA.

4.15.4 Substantive Implication

Because the first two components alone explain over 77% of total variance, this two-dimensional projection captures most of the structural information in the data. The visible separation of clusters in this reduced space provides strong validation that North Carolina counties sort into distinct demographic-political ecologies along a small number of underlying axes.

5 Add 3rd Component

Show Code

# Attach third principal component to county dataset
county_full$PC3 <- pca_results$x[, 3]

Show Code

#library(dplyr)
library(plotly)

# Compute 3D centroids
cluster_centroids_3d <- county_full %>%
  group_by(cluster4) %>%
  summarise(
    PC1 = mean(PC1),
    PC2 = mean(PC2),
    PC3 = mean(PC3),
    .groups = "drop"
  )

# Proper legend version
plot_ly(
  data = county_full,
  x = ~PC1,
  y = ~PC2,
  z = ~PC3,
  color = ~cluster4,        # This creates separate legend entries
  colors = "Set1",
  type = "scatter3d",
  mode = "markers",
  text = ~paste(
  "County:", County,
  "<br>Cluster:", cluster4,
  "<br>Turnout:", round(turnout, 3),
  "<br>Dem Share:", round(dem_share, 3)
  ),
  #hovertemplate = "%{text}<extra></extra>",
  hovertemplate = paste(
  "%{text}<br>",
  "PC1: %{x:.2f}<br>",
  "PC2: %{y:.2f}<br>",
  "PC3: %{z:.2f}<br>",
  "<extra></extra>"
  ),
  marker = list(size = 5)
) %>%
  add_trace(
    data = cluster_centroids_3d,
    x = ~PC1,
    y = ~PC2,
    z = ~PC3,
    type = "scatter3d",
    mode = "markers",
    name = "Centroid",
    marker = list(
      size = 10,
      color = "black",
      symbol = "x"
    ),
    inherit = FALSE
  ) %>%
  layout(
    title = "3D PCA Projection with Cluster Centroids",
    scene = list(
      xaxis = list(title = "PC1"),
      yaxis = list(title = "PC2"),
      zaxis = list(title = "PC3")
    )
  )

Adding the third principal component reveals an additional axis of structural differentiation that is not fully visible in the two-dimensional projection. While the four clusters remain clearly separated, PC3 introduces a vertical ordering that forms a cone-like gradient from Black Belt Democratic counties through suburban counties to Rural Aging GOP counties. This pattern suggests that the third component captures variation in the intensity or consolidation of demographic structure—particularly aging and racial homogeneity—rather than an entirely new political dimension. The clusters therefore occupy a coherent three-dimensional manifold rather than isolated partitions.

6 Map the clusters

Show Code

library(sf)
#library(dplyr)
#library(ggplot2)

# Read KML file
nc_kml <- st_read("../Data/north-carolina-counties.kml")

Reading layer `Layer #0' from data source 
  `E:\RStudio\Projects\Election Truth Alliance\Data\north-carolina-counties.kml' 
  using driver `KML'
Simple feature collection with 100 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32187 ymin: 33.84232 xmax: -75.46062 ymax: 36.58812
Geodetic CRS:  WGS 84

Show Code

# Clean county names to match county_full
nc_kml <- nc_kml %>%
  mutate(
    County = toupper(gsub(" County", "", Name))
  )

# Make sure county_full County names are uppercase too
county_full <- county_full %>%
  mutate(County = toupper(County))

# Join structural + political data to spatial map
nc_map <- nc_kml %>%
  left_join(
    county_full %>%
      select(
        County,
        cluster4,
        registered_voters,
        turnout,
        dem_share,
        rep_share,
        party_gap
      ),
    by = "County"
  )

glimpse(nc_map)

Rows: 100
Columns: 10
$ Name              <chr> "Haywood County", "Lenoir County", "Catawba County",…
$ Description       <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ County            <chr> "HAYWOOD", "LENOIR", "CATAWBA", "EDGECOMBE", "MONTGO…
$ cluster4          <fct> Suburban / Exurban GOP, Black Belt Democratic, Subur…
$ registered_voters <dbl> 48250, 39568, 117342, 36542, 17769, 10356, 93455, 11…
$ turnout           <dbl> 0.7844767, 0.6950819, 0.7423514, 0.6690384, 0.743204…
$ dem_share         <dbl> 0.3729427, 0.4658157, 0.3084183, 0.6143063, 0.309565…
$ rep_share         <dbl> 0.6270573, 0.5341843, 0.6915817, 0.3856937, 0.690434…
$ party_gap         <dbl> 0.25411462, 0.06836854, 0.38316347, -0.22861266, 0.3…
$ geometry          <MULTIPOLYGON [°]> MULTIPOLYGON (((-83.25611 3..., MULTIPO…

Show Code

library(plotly)

map_plot <- ggplot(nc_map) +
  geom_sf(
    aes(
      fill = cluster4,
      text = paste(
        "County:", County,
        "<br>Cluster:", cluster4,
        "<br>Registered Voters:", format(registered_voters, big.mark = ","),
        "<br>Turnout:", round(turnout, 3),
        "<br>Dem Share:", round(dem_share, 3)
      )
    ),
    color = "white",
    size = 0.25
  ) +
  scale_fill_brewer(palette = "Set2", name = "Cluster") +
  theme_void() +
  labs(title = "North Carolina County Structural Typology")

ggplotly(map_plot, tooltip = "text")

6.1 Geographic Interpretation of the County Structural Typology

North Carolina counties sort into four structurally distinct demographic ecologies that are geographically coherent and strongly aligned with partisan outcomes.

The geographic distribution of clusters reveals strong regional patterning, indicating that the structural typology reflects coherent spatial political ecologies rather than scattered statistical groupings.

Urban Metro counties are concentrated in and around the state’s major metropolitan corridors — including the Charlotte region, the Research Triangle, and parts of the Piedmont. These counties form contiguous clusters anchored by major population centers.

Black Belt Democratic counties appear in a distinct eastern belt stretching across northeastern and east-central North Carolina. This spatial concentration aligns with the historic Black Belt region, reinforcing the demographic foundations of the cluster.

Suburban / Exurban GOP counties form rings around major metros and extend across much of the western and central Piedmont. Their placement suggests transitional counties that are structurally distinct from both urban cores and deeply rural regions.

Rural Aging GOP counties dominate much of the far western mountains and portions of the southeastern and coastal plain. These counties are geographically contiguous and concentrated in areas characterized by lower population density and older age structures.

Importantly, the clusters are not randomly distributed. They form large, contiguous regions that mirror historical settlement patterns, racial geography, and metropolitan development. This spatial coherence strengthens the interpretation that the clustering solution captures durable structural differences embedded in North Carolina’s political and demographic landscape.

7 Conclusion

This analysis demonstrates that North Carolina’s 2024 county-level electorate is not randomly distributed across demographic space, nor reducible to a simple urban–rural divide. Instead, counties cluster into four distinct structural types shaped by durable configurations of race, age, and partisan registration.

The clustering solution is not arbitrary. Principal component analysis shows that more than three-quarters of the total variance in county structure is concentrated along two dominant axes, with a third component adding meaningful but secondary differentiation. When projected into two- and three-dimensional PCA space, the four clusters separate clearly and occupy coherent regions along these underlying demographic gradients. The addition of the third component reveals ordered variation rather than noise, suggesting that counties lie along a structured manifold of demographic consolidation and aging rather than in isolated partitions.

Geographically, the typology aligns with recognizable regional formations. Urban Metro counties cluster around major population centers. Black Belt Democratic counties form a contiguous eastern belt rooted in the state’s historical racial geography. Suburban and exurban Republican counties surround metropolitan cores and stretch across the Piedmont. Rural Aging GOP counties dominate much of the western mountains and portions of the southeastern coastal plain. The spatial coherence of these clusters reinforces that the statistical solution reflects embedded political ecologies rather than abstract numerical groupings.

Taken together, the results suggest that North Carolina’s electorate is organized along a small number of durable demographic axes that structure both geography and partisan alignment. County political behavior emerges from these underlying configurations of race, age, and registration composition. The typology therefore provides a framework for understanding electoral outcomes not as isolated events, but as expressions of deeper structural organization within the state.

In short, North Carolina’s political landscape is not merely polarized—it is patterned. The county-level electorate sorts into distinct structural environments that shape and constrain political competition. Recognizing these environments offers a clearer lens through which to interpret both contemporary electoral dynamics and future political change. # END

8 Appendix

8.1 Cluster Structural Profiling

To interpret the clusters substantively, I compute two descriptive summaries:

The statewide mean of each demographic and political share variable.
The mean value of those same variables within each cluster.

These summaries allow for direct comparison between the “average North Carolina county” and the average county within each structural type. While the code does not explicitly compute deviations from the statewide mean, the side-by-side summaries make it possible to identify which demographic and political characteristics distinguish each cluster.

In effect, this step translates the clustering solution from abstract numerical partitions into interpretable demographic profiles.

8.1.1 Compute statewide means

Show Code

# Compute statewide means of structural share variables
statewide_means <- county_full %>%
  summarise(across(ends_with("_share"), mean))

glimpse(statewide_means)

Rows: 1
Columns: 21
$ dem_share                 <dbl> 0.3956487
$ rep_share                 <dbl> 0.6043513
$ party_dem_share           <dbl> 0.2968888
$ party_rep_share           <dbl> 0.3394193
$ party_una_share           <dbl> 0.3548608
$ party_other_share         <dbl> 0.00883111
$ sex_female_share          <dbl> 0.4938327
$ sex_male_share            <dbl> 0.4278247
$ sex_unknown_share         <dbl> 0.07834265
$ age_18_25_share           <dbl> 0.1202665
$ age_26_40_share           <dbl> 0.2162712
$ age_41_65_share           <dbl> 0.3883054
$ age_66_plus_share         <dbl> 0.2751569
$ race_white_share          <dbl> 0.6813102
$ race_black_share          <dbl> 0.1846962
$ race_native_share         <dbl> 0.01056293
$ race_unknown_share        <dbl> 0.08482801
$ race_other_share          <dbl> 0.03860167
$ ethnic_hispanic_share     <dbl> 0.02937411
$ ethnic_non_hispanic_share <dbl> 0.6705873
$ ethnic_unknown_share      <dbl> 0.3000386

8.2 Compute Cluster-level Means

Show Code

# ------------------------------------------------------------
# Recompute cluster structural profiles using labeled clusters
# ------------------------------------------------------------

cluster_profiles <- county_full %>%
  group_by(cluster4) %>%
  summarise(across(ends_with("_share"), mean), .groups = "drop")

glimpse(cluster_profiles)

Rows: 4
Columns: 22
$ cluster4                  <fct> Urban Metro, Black Belt Democratic, Suburban…
$ dem_share                 <dbl> 0.4886561, 0.4807382, 0.3301735, 0.3090114
$ rep_share                 <dbl> 0.5113439, 0.5192618, 0.6698265, 0.6909886
$ party_dem_share           <dbl> 0.3355211, 0.4595765, 0.2157371, 0.2089799
$ party_rep_share           <dbl> 0.2842578, 0.2251722, 0.4035613, 0.4200575
$ party_una_share           <dbl> 0.3690962, 0.3086569, 0.3711827, 0.3633829
$ party_other_share         <dbl> 0.011124926, 0.006594414, 0.009518767, 0.007…
$ sex_female_share          <dbl> 0.4920056, 0.4989578, 0.4902643, 0.4965053
$ sex_male_share            <dbl> 0.4140023, 0.4154278, 0.4356809, 0.4439508
$ sex_unknown_share         <dbl> 0.09399209, 0.08561436, 0.07405483, 0.059543…
$ age_18_25_share           <dbl> 0.14488812, 0.11503484, 0.12333604, 0.092274…
$ age_26_40_share           <dbl> 0.2630117, 0.2083722, 0.2160504, 0.1721312
$ age_41_65_share           <dbl> 0.3853196, 0.3907276, 0.3925110, 0.3808617
$ age_66_plus_share         <dbl> 0.2067806, 0.2858653, 0.2681026, 0.3547327
$ race_white_share          <dbl> 0.5633098, 0.4848214, 0.7912512, 0.8474880
$ race_black_share          <dbl> 0.2363473, 0.3911575, 0.0856112, 0.0627029
$ race_native_share         <dbl> 0.021010834, 0.010596498, 0.007614956, 0.004…
$ race_unknown_share        <dbl> 0.10571344, 0.08713824, 0.07988105, 0.067221…
$ race_other_share          <dbl> 0.07361857, 0.02628634, 0.03564156, 0.018572…
$ ethnic_hispanic_share     <dbl> 0.05652699, 0.01880951, 0.02732424, 0.014606…
$ ethnic_non_hispanic_share <dbl> 0.6582714, 0.6855325, 0.6718930, 0.6642823
$ ethnic_unknown_share      <dbl> 0.2852016, 0.2956580, 0.3007828, 0.3211110

8.3 Attach Political Outcomes by Cluster

Show Code

# Summarise electoral outcomes by structural cluster
cluster_politics <- county_full %>%
  group_by(cluster4) %>%
  summarise(
    avg_turnout   = mean(turnout, na.rm = TRUE),
    avg_dem_share = mean(dem_share, na.rm = TRUE),
    avg_rep_share = mean(rep_share, na.rm = TRUE),
    avg_party_gap = mean(party_gap, na.rm = TRUE),
    n = n()
  )

cluster_politics

# A tibble: 4 × 6
  cluster4           avg_turnout avg_dem_share avg_rep_share avg_party_gap     n
  <fct>                    <dbl>         <dbl>         <dbl>         <dbl> <int>
1 Urban Metro              0.703         0.489         0.511        0.0227    22
2 Black Belt Democr…       0.705         0.481         0.519        0.0385    23
3 Suburban / Exurba…       0.751         0.330         0.670        0.340     36
4 Rural Aging GOP          0.752         0.309         0.691        0.382     19

8.3.1 Pre‑Election Polling Context and Post‑Election Outcomes

In the final weeks before the November 5, 2024 election, North Carolina was widely characterized as a true toss‑up. The RealClearPolitics polling average from mid‑October through November 4 showed Trump leading Harris by roughly one percentage point (48.1% to 47.1%), well within the margin of error. The final multi‑candidate average similarly showed an effectively tied race. An October 29 Elon University poll found Harris and Trump statistically even among North Carolina voters, reinforcing the perception of a highly competitive environment. Quinnipiac University’s mid‑October swing‑state release likewise described North Carolina as a tight contest, with margins fluctuating between a narrow Trump edge and a slight Harris advantage depending on wave and likely‑voter screen.

The certified results, however, showed a somewhat clearer Republican margin than many late polls suggested. Trump carried North Carolina with 50.86% of the vote to Harris’s 47.65%, a margin of approximately 3.2 percentage points. While not a landslide, this outcome was roughly double the final polling average margin and slightly stronger than some late “dead heat” surveys implied.

This gap between polling expectations and realized margins is particularly relevant for the Suburban / Exurban GOP cluster in the county‑level analysis. Pre‑election narratives often emphasized suburban volatility and potential Democratic gains in fast‑growing metro counties. Yet in the cluster results, the Suburban / Exurban group delivered an average Republican margin of roughly 34 points—far larger than statewide expectations would have suggested for areas often portrayed as electorally competitive. The uniformity of turnout (approximately 75% in both Republican‑leaning clusters versus ~70% in Democratic‑leaning clusters) further reinforces the impression that the election environment, while close statewide, manifested as geographically efficient Republican consolidation rather than evenly distributed persuasion.

Taken together, the pre‑election polling environment suggested a narrowly divided electorate. The actual county‑level results instead reflect a pattern of intense geographic sorting: competitive statewide margins produced by highly asymmetric and internally homogeneous regional blocs. This tension between competitive polling and structurally lopsided cluster outcomes warrants further scrutiny, particularly in understanding suburban realignment and turnout composition.

8.4 Step 1: Compute Cluster–State Differences

Show Code

#library(dplyr)
#library(tidyr)
set.seed(2024)
# ------------------------------------------------------------
# Convert to long and compute differences
# ------------------------------------------------------------

statewide_long <- statewide_means %>%
  pivot_longer(everything(),
               names_to = "variable",
               values_to = "state_mean")

cluster_long <- cluster_profiles %>%
  pivot_longer(-cluster4,
               names_to = "variable",
               values_to = "cluster_mean")

cluster_comparison <- cluster_long %>%
  left_join(statewide_long, by = "variable") %>%
  mutate(diff_from_state = cluster_mean - state_mean)

# ------------------------------------------------------------
# Pivot back to wide format
# ------------------------------------------------------------

cluster_diff_table <- cluster_comparison %>%
  select(cluster4, variable, diff_from_state) %>%
  pivot_wider(names_from = variable,
              values_from = diff_from_state)

# ------------------------------------------------------------
# Join with electoral outcomes
# ------------------------------------------------------------

cluster_final <- cluster_diff_table %>%
  left_join(cluster_politics, by = "cluster4")

cluster_comparison %>%
  slice_sample(n=10)

# A tibble: 10 × 5
   cluster4               variable       cluster_mean state_mean diff_from_state
   <fct>                  <chr>                 <dbl>      <dbl>           <dbl>
 1 Rural Aging GOP        party_dem_sha…       0.209      0.297       -0.0879   
 2 Black Belt Democratic  race_native_s…       0.0106     0.0106       0.0000336
 3 Suburban / Exurban GOP party_dem_sha…       0.216      0.297       -0.0812   
 4 Suburban / Exurban GOP race_other_sh…       0.0356     0.0386      -0.00296  
 5 Urban Metro            race_unknown_…       0.106      0.0848       0.0209   
 6 Black Belt Democratic  age_26_40_sha…       0.208      0.216       -0.00790  
 7 Black Belt Democratic  sex_male_share       0.415      0.428       -0.0124   
 8 Urban Metro            age_26_40_sha…       0.263      0.216        0.0467   
 9 Urban Metro            race_native_s…       0.0210     0.0106       0.0104   
10 Rural Aging GOP        race_black_sh…       0.0627     0.185       -0.122

8.5 Rank Deviations Within Each Cluster

Show Code

cluster_top_features <- cluster_comparison %>%
  mutate(abs_diff = abs(diff_from_state)) %>%
  arrange(desc(abs_diff))

print(cluster_top_features, n =20)

# A tibble: 84 × 6
   cluster4            variable cluster_mean state_mean diff_from_state abs_diff
   <fct>               <chr>           <dbl>      <dbl>           <dbl>    <dbl>
 1 Black Belt Democra… race_bl…       0.391       0.185          0.206    0.206 
 2 Black Belt Democra… race_wh…       0.485       0.681         -0.196    0.196 
 3 Rural Aging GOP     race_wh…       0.847       0.681          0.166    0.166 
 4 Black Belt Democra… party_d…       0.460       0.297          0.163    0.163 
 5 Rural Aging GOP     race_bl…       0.0627      0.185         -0.122    0.122 
 6 Urban Metro         race_wh…       0.563       0.681         -0.118    0.118 
 7 Black Belt Democra… party_r…       0.225       0.339         -0.114    0.114 
 8 Suburban / Exurban… race_wh…       0.791       0.681          0.110    0.110 
 9 Suburban / Exurban… race_bl…       0.0856      0.185         -0.0991   0.0991
10 Urban Metro         dem_sha…       0.489       0.396          0.0930   0.0930
11 Urban Metro         rep_sha…       0.511       0.604         -0.0930   0.0930
12 Rural Aging GOP     party_d…       0.209       0.297         -0.0879   0.0879
13 Rural Aging GOP     dem_sha…       0.309       0.396         -0.0866   0.0866
14 Rural Aging GOP     rep_sha…       0.691       0.604          0.0866   0.0866
15 Black Belt Democra… rep_sha…       0.519       0.604         -0.0851   0.0851
16 Black Belt Democra… dem_sha…       0.481       0.396          0.0851   0.0851
17 Suburban / Exurban… party_d…       0.216       0.297         -0.0812   0.0812
18 Rural Aging GOP     party_r…       0.420       0.339          0.0806   0.0806
19 Rural Aging GOP     age_66_…       0.355       0.275          0.0796   0.0796
20 Urban Metro         age_66_…       0.207       0.275         -0.0684   0.0684
# ℹ 64 more rows

8.6 Standardize Differences

Show Code

# ------------------------------------------------------------
# Compute county-level SDs for each structural variable
# ------------------------------------------------------------

county_sds <- county_full %>%
  summarise(across(ends_with("_share"), sd)) %>%
  pivot_longer(everything(),
               names_to = "variable",
               values_to = "county_sd")

# ------------------------------------------------------------
# Merge SDs into cluster comparison and compute standardized difference
# ------------------------------------------------------------

cluster_comparison <- cluster_comparison %>%
  left_join(county_sds, by = "variable") %>%
  mutate(std_diff = diff_from_state / county_sd)
# ------------------------------------------------------------
# Identify top 3 distinguishing structural features per cluster
# ------------------------------------------------------------

cluster_comparison %>%
  group_by(cluster4) %>%
  slice_max(abs(diff_from_state), n = 3) %>%
  arrange(cluster4, desc(abs(diff_from_state))) %>%
  print(n=12)

# A tibble: 12 × 7
# Groups:   cluster4 [4]
   cluster4  variable cluster_mean state_mean diff_from_state county_sd std_diff
   <fct>     <chr>           <dbl>      <dbl>           <dbl>     <dbl>    <dbl>
 1 Urban Me… race_wh…       0.563       0.681         -0.118      0.173   -0.682
 2 Urban Me… dem_sha…       0.489       0.396          0.0930     0.134    0.696
 3 Urban Me… rep_sha…       0.511       0.604         -0.0930     0.134   -0.696
 4 Black Be… race_bl…       0.391       0.185          0.206      0.154    1.34 
 5 Black Be… race_wh…       0.485       0.681         -0.196      0.173   -1.14 
 6 Black Be… party_d…       0.460       0.297          0.163      0.127    1.28 
 7 Suburban… race_wh…       0.791       0.681          0.110      0.173    0.636
 8 Suburban… race_bl…       0.0856      0.185         -0.0991     0.154   -0.645
 9 Suburban… party_d…       0.216       0.297         -0.0812     0.127   -0.641
10 Rural Ag… race_wh…       0.847       0.681          0.166      0.173    0.961
11 Rural Ag… race_bl…       0.0627      0.185         -0.122      0.154   -0.794
12 Rural Ag… party_d…       0.209       0.297         -0.0879     0.127   -0.694

8.7 Cluster Structural Profiles (Relative to Statewide Mean)

Using county-level standard deviations to contextualize magnitude, the clusters exhibit clear and substantively meaningful departures from the statewide average.

8.7.1 Urban Metro

White share: −0.68 SD below state mean
Democratic vote share: +0.70 SD above state mean
Republican vote share: −0.70 SD below state mean

Urban Metro counties are electorally distinct but not racially extreme. They are moderately more diverse than the state overall and strongly Democratic in vote behavior.

8.7.2 Black Belt Democratic

Black share: +1.34 SD above state mean
White share: −1.14 SD below state mean
Democratic registration share: +1.28 SD above state mean

This is the most structurally distinct cluster in the state. Counties in this group are more than a full standard deviation above average in Black population share and Democratic registration, indicating deep demographic and partisan consolidation.

8.7.3 Suburban / Exurban GOP

White share: +0.64 SD above state mean
Black share: −0.65 SD below state mean
Democratic registration share: −0.64 SD below state mean

These counties are racially more homogeneous than the state average and structurally Republican, but not extreme outliers. They appear demographically differentiated yet transitional relative to the more polarized clusters.

8.7.4 Rural Aging GOP

White share: +0.96 SD above state mean
Black share: −0.79 SD below state mean
Democratic registration share: −0.69 SD below state mean

Rural Aging GOP counties are the most racially homogeneous in the state, approaching a full standard deviation above average in white share. Their partisan structure is consistently Republican, with distinct demographic separation from the Black Belt cluster.