1 Introduction

Voter participation is a key indicator of civic engagement and a robust democracy, and yet participation rates in the United States can fluctuate substantially based on region, socioeconomic levels, and demographic factors. California is one of the most populous and diverse states in the county and presents a compelling case study for examining a lack of voter participation. According to the U.S. Census Bureau (2016) in the 2016 Presidential election, California had the third lowest voter participation in the country at 48.2%, ranking just above Hawaii (43.3%) and Texas (47.7%)

Due to California’s large and diverse population and vast geographic landscape it presents a distinctive contrast in voting patterns. California has large urban areas like Los Angeles, San Diego, and San Francisco where large population density and diverse populations influence voting behavior. Conversely, there are expansive rural farming communities throughout the Central Valley and other agricultural areas that present contrasting political, economic, and social impacts on voter engagement. This project aims to examine how voter turnout varies across California counties and explore the connections between engagement and key demographic and socioeconomic contributors, such as income, education attainment, age distribution, and urbanization. Through analyzing urban and rural counties, this study seeks to identify patterns of voter turnout and uncover insights that may assist with advocacy groups that assist communities with under representation of civic engagement.

1.0.1 Why the 2016 Election

The 2016 Presidential election data was chosen to provide a more comprehensive dataset for examining voter engagement trends compared to 2020. With the expansion of mail-in voting during the 2020 election due to the COVID-19 pandemic, the 2020 election had atypically increased participation rates (U.S. Census Bureau, 2021). Voter engagement in the US has historically followed predictable trends that are swayed by demographics, socioeconomic factors, and polling accessibility. The significantly elevated participation rates in the 2020 election are considered to be an outlier due to the expansion of mail-in and early voting options given to voters. To ensure a standardized comparison of voter behavior that aligns with historical trends and therefore, a more accurate analysis of socioeconomic and demographic impacts on participation, this report will focus solely on 2016 data.

1.0.2 Research Questions:

  1. In the 2016 Presidential election, how does voter turnout vary across counties in California?
  2. What are the spatial patterns of voter participation, and how do they relate to socioeconomic and demographic factors such as income, education, age, and population density?
  3. Is there a significant statistical difference in voter engagement between urban and rural counties
  4. What geographic areas show consistently low or high voter participation, and what characteristics do these areas have in common?

1.0.2.1 Study Scope

This study will focus on voter turnout at the county level in California for the 2016 Presidential Election. County level was chosen as the unit of study because:

  • Voter turnout data at the county level is more standardized than precinct level data.
  • County level statistics allow for meaningful analysis across different urban and rural locations..
  • Policy decisions are election administration efforts are implemented at the county level

2 Methodology

The purpose of this study is to analyze voter turnout rates in California at the county level for the 2016 Presidential Election. The research combines voter participation data, socioeconomic indicators (such as household income, education level attainment, and median age) along with population density to identify spatial patterns and relationships between these factors. I utilized a variety of statistical tests, spatial analysis, and regression modeling to determine meaningful predictors of voter engagement.

2.1 Key Influencing Factors

The correlation between voter participation and socioeconomic variables has been repeatedly documented in political science and social research (Wolfinger & Rosenstone, 1980; Leighley & Nagler, 2013). This analysis will focus on three key factors: Income levels, educational attainment, and urbanization. The variables were chosen for their strong theoretical and empirical association with voter engagement.

Income

Earning a higher income is typically associated with higher voter participation. Citizens with a higher expendable income tend to have:

  • Higher political engagement and awareness due to their increased access to political news, campaign outreach, and policy debates
  • Fewer barriers to voting like a flexible work schedules, paid time off, or transportation access
  • More political engagement since their economic stability allows them to focus on long-term political issues rather than immediate financial concerns.

For this research median household income from the American Community Survey (ACS) was utilized as a key variable to assess its relationship to voter turnout.

Education

Education attainment levels strongly correlate with political participation. Research shows that individuals with higher educational levels tend to have higher levels of engagment due to:

  • Increased awareness of the electoral process and policy issues
  • A better understanding of policy issues and candidate platforms
  • Greater trust in government institutions, reducing voter disinterest

To examine this correlation, data on percentage of the population with a bachelor’s degree or higher from the ACS was used in the study.

Urbanization (Rural vs. Urban Differences)

Geographic location has an impact on voter participation, Urban areas tend to experience higher voter participation due to:

  • Greater access to polling locations and early voting centers
  • Stronger political mobilization efforts
  • Higher concentrations of young, educated populations

Conversely, rural areas tend to have:

  • Few polling locations that require longer traveling distances
  • Lower population density making political outreach challenging
  • Higher density of people with lower education and income levels, which can lower particiapation rates

For the purpose of this study, population density was used as a proxy for urbanization, where counties classified as “Urban” (density ≥ 500 people/km²) or “Rural” (density < 500 people/km²).

Through integrating these variables into the spatial analysis and regression models, the aim of the study is to identify patterns in voter participation and uncover potential disparities that can impact civic engagement.

3 Data Collection

3.0.0.1 Description of Data:

This study will utilize three primary datasets:

3.0.0.2 1. Voter Turnout Data

  • Source: California Secretary of State
  • Data Year: 2016 Presidential Election
  • Geographic Level: County
  • Format: csv
  • Description: This dataset contains county-level information o the number of registered voters, total ballots cast, and overall voter participation rates.
  • Limitations: Some county data will have incomplete reporting or missing precinct level data.

3.0.0.3 2. Demographic and Socioeconomic Data

  • Source: U.S. Census Bureau (ACS)
  • Data Years: 2012-2016 ACS 5-Year Estimates
  • Variables: Median income, education level, age distribution, and population density
  • Geographic Level: County
  • Format: Retrieved using ‘tidycensus’ package in R
  • Variables:
    • Median Household Income (B19013_001)
    • Educational Attainment (Bachelor’s Degree or Higher) (B15003_022)
    • Median Age (B01002_001)
    • Total Population for Density Calculation (B01003_001)
  • Limitations: ACS estimates are subject to sampling variability, and some rural counties have less survey responses, leading to higher margins of error.

3.0.0.4 3. Urban-Rural Classification Data

  • Derived from Population Density Calculations
  • Threshold:
    • Urban Counties: Population Density ≥ 500 people/km²
    • Rural Counties: Population Density < 500 people/km²
  • Purpose: This classification helps identify differences in voter engagement patterns between densely populated urban areas and more sparsely populated rural counties
  • Geographic Level: County _ Limitations: The RUCC classification is county-based and does not capture intra-county nuances (e.g. more populous areas in rural counties). In addition, demographic factors within urban and rural areas may influence voter turnout in ways not directly linked to the classification.

3.0.0.5 Compile Needed Data

# Load needed libraries
library(tidyverse)
library(dplyr)
library(ggplot2)
library(tidycensus)
library(sf)
library(readr)
library(spdep)
library(leaflet)
library(htmlwidgets)
library(spatialreg)
library(ppcor)
library(RColorBrewer)

# Retrieve ACS Data for Demographics
income_data <- get_acs(
  geography = "county",
  state = "CA",
  variables = "B19013_001",
  year = 2016,
  survey = "acs5",
  key = NULL
)
education_data <- get_acs(
  geography = "county",
  state = "CA",
  variables = "B15003_022",
  year = 2016,
  survey = "acs5",
  key = NULL
)
age_data <- get_acs(
  geography = "county",
  state = "CA",
  variables = "B01002_001",
  year = 2016,
  survey = "acs5",
  key = NULL
)
options(tigris_use_cache = TRUE)
population_data <- get_acs(
  geography = "county",
  state = "CA",
  variables = "B01003_001",
  year = 2016,
  survey = "acs5",
  geometry = TRUE,
  key = NULL
)

# Clean population_data 
population_data <- population_data %>%
  rename(County = NAME)
population_data <- population_data %>%
  mutate(County = gsub(" County, California", "", County))  # Remove " County, California"
age_data <- age_data %>%
  rename(County = NAME)
age_data <- age_data %>%
  mutate(County = gsub(" County, California", "", County))
education_data <- education_data %>%
  rename(County = NAME)
education_data <- education_data %>%
  mutate(County = gsub(" County, California", "", County))
income_data <- income_data %>%
  rename(County = NAME)
income_data <- income_data %>%
  mutate(County = gsub(" County, California", "", County))
population_data <- population_data %>%
  mutate(land_area_km2 = as.numeric(st_area(geometry)) / 1e6) %>%
  mutate(population_density = estimate / land_area_km2)

# Load voter turnout data
file_path <- "C:\\Users\\ranse\\OneDrive\\Desktop\\GEOG_588\\Term_Project\\Term_Project_Data\\CA_VoterTurnout_Cleaned.csv"
voter_data <- read_csv(file_path)
voter_data <- voter_data %>%
  mutate(County = gsub(" County", "", County))

3.0.0.6 Merge Datasets and Clean Data for Processing

# Save County and geometry from the original sf object
geometry_data <- population_data %>%
  dplyr::select(County, geometry)

# Remove geometry before Joining Data Frames
population_data <- st_drop_geometry(population_data)

# Perform the merge of Data Frames 
population_data <- population_data %>%
  left_join(voter_data, by = "County") %>%
  left_join(income_data, by = "County") %>%
  left_join(education_data, by = "County") %>%
  left_join(age_data, by = "County")

# Rejoin saved geometry
population_data <- left_join(population_data, geometry_data, by = "County")

# Reconvert to sf object
population_data <- st_as_sf(population_data, sf_column_name = "geometry")

# Ensure correct Projection Coordinates
population_data <- st_transform(population_data, crs = 4326)

#Clean population_data after merge
population_data <- population_data %>%
  rename(Participation_Perct = `Participation %`)
population_data <- population_data %>%
  rename(Median_Income = estimate.y)
population_data <- population_data %>%
  dplyr::select(
    -variable.x, -moe.x, -GEOID.y,
    -variable.y, -moe.y, -GEOID.x.x,
    -variable.x.x, -moe.x.x, -GEOID.y.y,
    -variable.y.y, -moe.y.y
  )
population_data <- population_data %>%
  rename(Bachelor_Degree = estimate.x.x)
population_data <- population_data %>%
  rename(Median_Age = estimate.y.y)
population_data <- population_data %>%
  rename(Population_total = estimate.x)

3.0.0.7 Urban and Rural Analysis

# Convert Participation_Perct to numeric values
is.numeric(population_data$Participation_Perct)
population_data$Participation_Perct <- as.numeric(gsub("%", "", population_data$Participation_Perct))
str(population_data$Participation_Perct)

#Urban to Rural Analysis ----
# Define urban vs. rural counties based on population density
population_data <- population_data %>%
  mutate(urban_rural = case_when(
    population_density >= 500 ~ "Urban",
    population_density < 500 ~ "Rural",
    TRUE ~ NA_character_
  ))
population_data %>%
  group_by(urban_rural) %>%
  summarise(avg_turnout = mean(Participation_Perct, na.rm = TRUE))

4 Data Preprocessing & Integration

Data was retrieved using R with the ‘tidycensus’ package for ACS data and ‘readr’ for voter participation data.When all the datasets were collected, several pre-processing and merging steps were performed to create a single dataset for analysis. The datasets were then joined based on county names to create a cohesive dataset for analysis.

4.0.1 Data Limitations

Each available dataset has some limitations that can impact reliability and accuracy of the analysis. This research relies on voter turnout data from the California Secretary of State and demographic and socioeconomic data from the U.S. Census Bureau’s American Community Survey (ACS). While these data sources provide comprehensive information, certain limitations should be recognized.

4.0.2 Missing or Incomplete Data

One of the challenges for this analysis is missing or incomplete data in both voter turnout data and data obtained from the ACS. Common issues include:

  • Counties may have missing turnout records, especially for off cycle election years or precinct-level data
  • ACS data is know to be based survey sampling, so counties with smaller populations may have a higher margin of error.
  • Some of the demographic data fields (e.g. educational attainment, income levels) might have suppressed data due to low response rates.

4.0.3 How Missing or Incomplete Data was Handeled

  • Missing numeric values in the ACS datasets were replaced with median imputation or left as ‘NA’ if they could bias the numbers
  • For voter participation data, counties with missing participation rates were excluded from correlation and regression analyses to prevent distortions.
  • Data gaps were visually checked using summary statistics (summary()) and missing value checks (is.na()).

4.0.4 Sampling Errors in ACS Data

The American Community Survey (ACS) is a sample based survey and its estimates are subject to sampling variability. Some key issues that have been found include:

  • Smaller counties have higher sampling errors, therefore their datasets for income, educational levels, and population characteristics could be less reliable.
  • ACS margins of error (MOEs) fluctuate, county to county comparisions must account for uncertainty in estimates.
  • Some underrepresented groups like transients or low-income groups may not fully accounted for in ACS surveys leading to potential biases in some datasets.

4.0.5 How Sampling Errors were Handeled

  • The research utilizes 5-year ACS Estimates (2012-2016) rather than single year estimates to increase reliability of data
  • In order to prevent over-reliance on uncertain data, large discrepancies were verified using confidence intervals and cross-referencing historical trends.
  • Future work could incorporate alternative sources like state specific datasets to validate findings.

4.0.6 Data Cleaning and Integration

Before starting analysis, data cleaning was required to standardize and merge datasets for consistency and accuracy.

  • Standardizing County Names: Since county names from different sources varied slightly (e.g. Los Angeles County, California vs. Los Angeles), all county names were standardized by removing “County, California” from the datasets.
# Standardize county names
#population_data <- population_data %>%
  #mutate(County = gsub(" County, California", "", County))

Handling Non-Numeric Values: Voter turnout percentages, total votes, and registered voter counts were originally stored as text with commas or percentage symbols. These were converted to numeric format to allow for statistical analysis.

Addressing Missing Values: - Counties with missing socioeconomic data (e.g. median income, educational levels) had their missing values replaced with the median of that variable. - Any missing voter participation rates were kept as NA to avoid artificial inflation of turnout rates.

Computing Population Density: - ACS provided total population data not pre-calculated density, so population density was calculated using total population and land area estimates

#population_data <- population_data %>%
  #mutate(land_area_km2 = as.numeric(st_area(geometry)) / 1e6) %>%
  #mutate(population_density = Population_total / land_area_km2)

4.0.7 Merging Data

The cleaned datasets were merged using county names as the common factor. This resulted in a single dataset containing:

  1. Voter turnout data: registered Voters, total ballots cast, and turnout percentage for each county
  2. Demographic and Socioeconomic Indicators: median income, educational attainment, median age
  3. Population density calculations: total population and land area
  4. Urban vs. Rural Classifications: defined by population density thresholds

The cleaned and merged dataset serves as the foundation for spatial and statistical analysis for the following sections

5 Statistical and Spatial Analysis

5.0.1 Initial Findings of Overall Voter Turnout Patterns

5.0.1.1 Distribution of Voter Turnout

The Histogram above shows the distribution of voter turnout rates across all counties in California

The data shows:

  • The majority of counties fall within 30% to 45% voter turnout
  • Some counties exhibit high or extremely low participation rates, possibly due to unique local factors.

Interpretation: The distribution of voter participation suggests that while most counties have moderate turnout rates, local factors impact community engagement levels. Particular counties may have high political engagement trends, while other counties may face impediments limiting voter participation.

5.0.1.2 Urban vs. Rural Voter Turnout

The boxplot chart above compares urban voter turnout to rural voter turnout reveals some noteworthy differences:

  • Participation rates in rural counties, on average, was slightly higher than urban counties in the 2016 Presidential election. However, rural counties showed more variablity with engagement.
  • Some rural counties had extremely low turnout, but other rural counties had exceptionally high engagement. This raises questions about factors that impact voter turnout in high performing rural locations.
  • Urban counties revealed more uniform participation rates, indicating less variability in participation trends, however, their turnout rates were generally lower than some rural counties.

Interpretation: The patterns of voter turnout through urban and rural communities showcases the differences in engagement trends. The rural communities in California have a slightly higher median turnout but also show greater variability, with some counties showing exceptionally high engagement while others reported significantly lower participation levels. In comparison, urban counties display a more consistent engagement rate, but they generally show rates below the highest performing rural counties. The results of this analysis shows that local political culture, economic conditions, and accessibility impact voter participation differently across geographic settings. Historically urban areas offer increased access to polling locations and outreach programs, long wait times or lower engagement among transient populations create barriers to engagement. While some rural counties with a tight-knit community and localized political influence may increase engagement, logistical challenges and low political interest may be contributing factors to lower participation rates.

5.0.1.3 Voter Turnout by County

The choropleth map above illustrates voter turnout rates across California counties in the 2016 Presidential Election, uncovering distinct regional patterns in electoral participation.

  • Counties in dark purple represent areas with low voter turnout, these areas are primarily concentrated in the Central Valley, Southern California, and some inland rural regions.
  • Counties in green and yellow exhibit higher participation rates, with clusters in northern counties and some coastal regions.
  • The variability in turnout rates across counties indicate that local social, economic, and political conditions significantly influence voter engagement.

Interpretation: The distribution of voter turnout shows disparities in electoral participation throughout California. Low turnout in the Central Valley and Southern California are indicators of systemic barriers, such as reduced polling accessibility, lower socioeconomic conditions, or lower levels of political mobilization. In contrast, counties with higher turnout rates, especially in the north and coastal areas, may have increased engagement due to increased access to polling locations, and historically stronger civic participation. These findings support the influence of geographic and demographic nuances in voter turnout analysis, since local factors such as education attainment levels, economic stability, and election policies can have impacts on participation levels. Deeper analysis into specific socioeconomic traits of low-turnout counties could give greater insight into strategic voter engagement strategies in underrepresented communities.

5.0.1.4 Voter Turnout vs. Median Age

The scatterplot above explores the relationship between median age and voter turnout rates across California counties in the 2016 Presidential Election.

  • The data points indicate there is an increased participation rate with greater age median, as indicated by the upward-sloping trend line.
  • Counties with younger populations (median age around 30-35 years old) tend to have lower voter turnout rates, often falling below 40%.
  • In comparison, counties with older populations (median age 45+ years old) show higher turnout rates, with some exceeding 50% participation.

Interpretation: The correlation between age and voter engagement reinforces the pattern that older individuals are more likely to participate in the election process. Younger populations may be impacted by low political engagement, time management issues due to school or work. While older citizens show a trend of consistent voting habits, higher levels of political knowledge, and increased investment in policy outcomes that affect retirement, healthcare, and tax policies. This analysis indicates that a targeted outreach approach focused on younger demographics could increase engagement, especially if focused in counties with lower median ages.

5.0.2 Voter Turnout vs. Median Income

The scatterplot above examines the relationship between median household income and voter turnout rates across California counties in the 2016 Presidential Election.

  • The data points indicate a positive correlation between income and voter participation, as shown by the upward-sloping trendline.
  • Counties with lower median incomes (below $50,000) show a wider variability in turnout, with some counties falling below 30% participation.
  • Counties with higher median incomes ($75,000 and above) generally show higher and more stable turnout rates, with several exceeding 50%.

Interpretation: The scatterplot reveals the relationship between income and voter turnout and implies that higher-income earners are more likely to participate in the election process. Citizens in wealthier areas may find fewer impediments to voting, such as inflexible work schedules, lack of transportation, or limited polling locations. While lower-income communities may face more systematic issues that can reduce participation, such as economic instability, time constraints due to work schedules or multiple jobs, and generally lower levels of political engagement. These findings stress the significance of addressing socioeconomic disparities in voter engagement, since lower-income populations can benefit from increased polling access, early voting options, and focused outreach efforts to decrease barriers to participation.

5.0.3 Voter Turnout vs. Education Attainment

The scatterplot above examines the relationship between educational attainment (percentage of the population with a bachelor’s degree or higher) and voter turnout rates across California counties in the 2016 Presidential Election.

  • The data points indicate a positive correlation between education levels and voter participation, as evidenced by the upward-sloping trendline.
  • Counties with lower levels of educational attainment (below 10%) tend to have lower voter turnout rates, often clustering below 40% participation.
  • In contrast, counties where a higher percentage of residents hold a bachelor’s degree (above 20%) generally show higher voter turnout rates, with some exceeding 50% participation.

Interpretation: The correlation between educational attainment and voter turnout implies that voter turnout rates increase with increased levels of education. Citizens that hold a college degree may be more informed about political issues and have increased access to election resources and feel a stronger sense of civic duty. In contrast, lower levels of educational attainment show a reduction in political interest, limited access to reliable voting information, and potential impediments to engagment. This analysis reinforces the importance of voter education programs, particularly in locations with lower education levels. It may be possible to bridge gaps in engagement through expanding access to civic education programs, voter mobilization efforts, and access to voting resources.

5.0.4 Spatial Analysis of Voter Turnout

The map above visualizes the results of a Local Moran’s I analysis, identifying spatial clusters of voter turnout across California counties in the 2016 Presidential Election.

  • Counties in dark purple indicate extremly low clusters, where voter turnout is lower than anticipated compared to surrounding counties.
  • The presence of spatial clustering suggests regional patterns in voter engagement, with some counties forming distinct high or low turnout regions.

Interpretation: To account for spatial dependence in voting patterns, Local Moran’s I (Anselin, 1995) was used to identify spatial clusters, followed by spatial regression models to address autocorrelation in predictor variables and residuals (Anselin, 1988; Bivand et al., 2013). The spatial clustering of voter turnout rates reveals that electoral participation in California is not randomly distributed but instead impacted by regional social, economic, and political influences. High-turnout clusters in regions of northern and coastal California, could be influenced by higher socioeconomic factors, increased voter outreach efforts, and generally increased political engagement. While the low-turnout clusters centered in the Central Valley and inland communities may be impacted by lower education levels, economic struggles, and reduced access to voting resources. These findings showcase the need for geographically focused voter outreach programs, particularly in low-turnout clusters where turnout levels fall behind neighboring locations. Analysis of these spatial patterns could assist policy makers and advocates reduce voting disparities across California.

5.0.5 Interactive Map: Voter Turnout in California

library(sf)
library(leaflet)
library(dplyr)
# Create palette
pal <- colorNumeric(
  palette = "viridis",
  domain = (population_data$Participation_Perct)
)
# Create leaflet map
leaflet(population_data) %>%
  addTiles() %>%
  addPolygons(
    fillColor = ~pal(Participation_Perct),
    color = "white",
    weight = 1,
    fillOpacity = 0.7,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      bringToFront = TRUE
    ),
    popup = ~paste0(
      "<div style='width:200px;'>",
      "<strong>",County, "</strong><br>",
      "Turnout: ", Participation_Perct, "%<br>",
      "Population Density: ", round(population_density, 1), "<br>",
      "Median Income: $", round(Median_Income, 0), "<br>",
      "Bachelor's Degree Holders: ", round(Bachelor_Degree, 1), "<br>",
      "Median Age: ", round(Median_Age, 1), "<br>",
      "</div>"
  ),
    labelOptions = labelOptions(
      style = list("font-weight" = "bold", "color" = "black"),
      direction = "auto"
    )
  ) %>%
  addLegend(pal = pal, values = population_data$Participation_Perct, title = "Voter Turnout (%)", opacity = 1)

The interactive leaflet map above visualizes voter turnout across California counties in the 2016 Presidential Election using a choropleth color scale.

  • Darker purple counties represent lower levels of voter participation, while lighter green to yellow counties indicate higher turnout percentages.

  • The color gradient corresponds to the percentage of eligible voters who participated in the election, with the legend displaying a range from under 25% to over 55%.

  • Interactive popups provide county-specific details, including turnout rate, population density, median income, educational attainment, and median age.

  • For example, Santa Clara County is shown with a turnout of 36.29%, a population density of 560, and a median income of $101,173, illustrating the kind of demographic and socioeconomic context that accompanies voter participation levels.

This visualization highlights geographic disparities in civic engagement and allows users to explore how structural and regional factors may influence turnout across the state.

Interpretation: The leaflet map shows regional disparities in voter participation throughout California, reinforcing and consolidating observations documented in previous analyses. Counties with low engagement often have lower median incomes, lower educational attainment, and younger populations that implies impediments to participation. Counties with increased levels of turnout show higher levels of college graduates, higher median incomes, and older populations, characteristics that are traditionally linked to high political engagement. These findings reinforce the importance of structured and targeted voter outreach programs, especially in low-turnout communities. Through and understanding of these patterns policymakers can implement local interventions to increase voter accessibility and participation.

5.1 Regression Analysis: Predicting Voter Turnout

To explore the relationship between voter turnout and key socioeconomic factors, an Ordinary Least Squares (OLS) regression model was first constructed using Participation_Perct as the dependent variable, and population_density and Median_Income as predictors. This baseline model provided insight into the general strength and direction of the association between voter engagement and these variables.

However, voter behavior tends to exhibit spatial dependence, where turnout in one county may be influenced by neighboring counties. To test for spatial effects and improve model accuracy, two spatial regression models were constructed: a Spatial Lag Model (SAR) and a Spatial Error Model (SEM).

To support these models, a spatial weights matrix was created using poly2nb() to define county neighbors and nb2listw() to generate row-standardized weights. This structure captures the spatial relationships between California counties.

The Spatial Lag Model (SAR) incorporates the influence of neighboring counties’ turnout into each county’s prediction. This model is suitable when spillover effects are expected — for example, when political mobilization in one region affects turnout in adjacent areas.

The Spatial Error Model (SEM), by contrast, accounts for spatial autocorrelation in the error terms, assuming that unobserved factors influencing turnout are spatially clustered but not explicitly modeled.

Each model was evaluated using the Akaike Information Criterion (AIC). A lower AIC value indicates a better model fit with a penalty for model complexity. Comparing AIC values across the OLS, SAR, and SEM models allowed for a more robust assessment of whether incorporating spatial dependence improved explanatory power.

5.2 Ordinary Least Squares (OLS) Model

ols_model <- lm(Participation_Perct ~ population_density + `Median_Income`, data = population_data)
summary(ols_model)
## 
## Call:
## lm(formula = Participation_Perct ~ population_density + Median_Income, 
##     data = population_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1073  -5.6995  -0.9064   5.1660  21.7053 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2.895e+01  4.166e+00   6.950 4.55e-09 ***
## population_density -1.129e-03  1.279e-03  -0.882    0.381    
## Median_Income       1.683e-04  7.142e-05   2.357    0.022 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.014 on 55 degrees of freedom
## Multiple R-squared:  0.09178,    Adjusted R-squared:  0.05875 
## F-statistic: 2.779 on 2 and 55 DF,  p-value: 0.07084
# Create spatial neighbors
nb <- poly2nb(population_data)  
lw <- nb2listw(nb, style="W")

# Spatial Lag Model
sar_model <- lagsarlm(Participation_Perct ~ population_density + `Median_Income`, 
                      data = population_data, 
                      listw = lw)
summary(sar_model)
## 
## Call:lagsarlm(formula = Participation_Perct ~ population_density + 
##     Median_Income, data = population_data, listw = lw)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10.93584  -4.67022  -0.93859   4.00406  17.94192 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                       Estimate  Std. Error z value Pr(>|z|)
## (Intercept)         8.1042e+00  5.4298e+00  1.4925 0.135555
## population_density -1.3779e-03  1.0825e-03 -1.2729 0.203052
## Median_Income       1.6405e-04  6.1784e-05  2.6552 0.007926
## 
## Rho: 0.55894, LR test value: 11.835, p-value: 0.00058131
## Asymptotic standard error: 0.12733
##     z-value: 4.3898, p-value: 1.1347e-05
## Wald statistic: 19.27, p-value: 1.1347e-05
## 
## Log likelihood: -195.5523 for lag model
## ML residual variance (sigma squared): 45.753, (sigma: 6.7641)
## Number of observations: 58 
## Number of parameters estimated: 5 
## AIC: 401.1, (AIC for lm: 410.94)
## LM test for residual autocorrelation
## test value: 1.5816, p-value: 0.20853
# Spatial Error Model
sem_model <- errorsarlm(Participation_Perct ~ population_density + `Median_Income`, 
                        data = population_data, 
                        listw = lw)
summary(sem_model)
## 
## Call:errorsarlm(formula = Participation_Perct ~ population_density + 
##     Median_Income, data = population_data, listw = lw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7081  -4.5238  -0.8234   4.1101  18.2139 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                       Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)         2.6303e+01  4.8753e+00  5.3951 6.847e-08
## population_density -7.6895e-04  9.7590e-04 -0.7879  0.430735
## Median_Income       2.2148e-04  7.4584e-05  2.9695  0.002983
## 
## Lambda: 0.57779, LR test value: 13.369, p-value: 0.00025577
## Asymptotic standard error: 0.12772
##     z-value: 4.5239, p-value: 6.0708e-06
## Wald statistic: 20.466, p-value: 6.0708e-06
## 
## Log likelihood: -194.7851 for error model
## ML residual variance (sigma squared): 44.262, (sigma: 6.653)
## Number of observations: 58 
## Number of parameters estimated: 5 
## AIC: 399.57, (AIC for lm: 410.94)

5.2.1 Model Performance Comparison

The comparison of AIC values revealed that the Spatial Error Model (SEM) had the lowest AIC, indicating the best overall model fit among the three. This suggests that accounting for spatial autocorrelation in the error terms provided a more accurate representation of the factors influencing voter turnout than the basic OLS model or the Spatial Lag Model. The improvement in model performance implies that unobserved, spatially correlated factors—such as localized political culture or regional election infrastructure—may play a significant role in driving voter participation across California counties.

# Compare Model Performance
AIC(ols_model, sar_model, sem_model)
##           df      AIC
## ols_model  4 410.9396
## sar_model  5 401.1047
## sem_model  5 399.5702

5.3 References

Anselin, L. (1988). Spatial econometrics: Methods and models. Kluwer Academic Publishers.
https://doi.org/10.1007/978-94-015-7799-1

Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical Analysis, 27(2), 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x

Bivand, R. S., Pebesma, E., & Gómez-Rubio, V. (2013). Applied spatial data analysis with R (2nd ed.). Springer. https://doi.org/10.1007/978-1-4614-7618-4

California Secretary of State. (2016). Statewide Voter Participation Statistics by County. Retrieved from https://www.sos.ca.gov/elections

Leighley, J. E., & Nagler, J. (2013). Who Votes Now? Demographics, Issues, Inequality, and Turnout in the United States. Princeton University Press.

U.S. Census Bureau. (2016). Voting and Registration in the Election of November 2016. Retrieved from
https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-580.html

U.S. Census Bureau. (2021). Voting and Registration in the Election of November 2020. Retrieved from https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-585.html

U.S. Census Bureau. (2012–2016). American Community Survey 5-Year Estimates. Retrieved via R tidycensus package.

Wolfinger, R. E., & Rosenstone, S. J. (1980). Who Votes? Yale University Press.