Exploratory Spatial Data Analysis of Presidential Election Results in Ohio

Abstract

This project presents an exploratory spatial analysis of Ohio elections to assess the relationship between Democratic vote share and rurality. By utilizing the Index of Relative Rurality (IRR), county-level presidential election returns, and U.S. Census TIGER boundaries of counties, this study looks to overcome the limitations of discrete urban/rural classifications. The analysis focuses on the 2012, 2016, and 2020 elections in Ohio, using filtering, calculations, and spatial join techniques in R. Results suggest a decline in Democratic vote share with increasing rurality. The research provides novel insights into the influence of rurality on political party performance.

Introduction

The way the U.S. census determines rural areas is limiting and sometimes misleading. Using a simple population determination for a discrete classification of urban or rural leaves little ability for deeper analysis of the spatial aspect of any social phenomenon and whether rurality plays a role. When it comes to election results, it is widely known that in modern times Republicans tend to win more in rural areas. What is impossible to measure, using widely used discrete measurements and definitions of what constitutes rural, is to what extent U.S. political party performance is shaped by how rural (or not) a location is. This leaves a large blind spot in analysis, especially in areas that lie in between classic definitions of what constitutes urban and rural.

Ayoung Kim and Brigitte Waldorf in 2015 conducted research into different urban/rural classification systems and created a continuous measure of rurality across space, the Index of Relative Rurality (IRR). Using the IRR and presidential election results presents the ability to conduct spatial analysis of party performance in otherwise impossible ways. In order to more closely hone in on party performance across rurality, three election results will be observed in pairing with the IRR and only results for counties in Ohio will be observed due to Ohio’s perceived mix of urban and rural spaces and its history as a bellwether state that has changed which party it votes for across multiple elections. Using exploratory spatial data analysis, I will seek to answer how does the IRR defined by Kim and Waldorf in 2015 help explain the relationship between rurality and Democratic Party vote share in presidential elections in Ohio for the years 2012, 2016, and 2020?

#loading all the libraries
library(tidyverse)
library(here)
library(skimr)
library(janitor)
library(scales)
library(plotly)
library(mapview)
library(sf)           
library(rgdal)        
library(ggplot2)      
library(dplyr)        
library(tidyr)        
library(scales)       
library(RColorBrewer) 
library(units)
library(cowplot)
library(spdep)
library(tmap)
library(readxl)
library(rgeoda)
library(tigris)
library(leaflet)
library(biscale)
library(leafpop)

Data

Data includes county level boundaries of the state of Ohio as a multipolygon, taken from the Tigris R library, the Index of Relative Rurality, and county level presidential election returns for the years 2000-2020.

The IRR was created by researchers Ayoung Kim and Brigitte Waldorf (2015) to help establish a continuous measure of how rural a spatial aggregation is. Instead of assigning discrete values of urban or rural to areas like the census does, the index takes multiple variables tied to common conceptions of what constitutes whether a place is rural or not and computes a value on a county basis that corresponds to how rural (or inversely urban) an area is compared to other counties. These variables are population size, population density, urban area as a percentage of total land area, and network distance. The variables are then rescaled so they can all be combined, and an unweighted average of all the variables together is taken thus giving a score between 0 and 1 for each county. Values closer to 1 are deemed more rural and values closer to 0 are considered more urban. Because most of the variables come from the census, this dataset is calculated to represent the census specific years of 2000, 2010, and 2020. This will be joined with county level presidential election returns for the years 2000-2020. This dataset shows vote totals for each candidate on the ballot on a per county basis for presidential elections in 2000, 2004, 2008, 2012, 2016, and 2020.

For the sake of this study, measures of spatial autocorrelation are not used on IRR scores. This is because the network distance variable already accounts for neighboring rurality as part of it. Thus, there is a degree of spatial similarity and measurement of such already built into the score. It would be trying to measure correlation for a score that already has spatial correlation built in. IRR scores are used as exploratory factors in tandem with exploratory spatial data analysis to determine if greater rurality leads to decreasing vote share among Democrats.

Methodology

The methodology involves loading the data, cleaning the data so it’s easier to read and manage, filtering the data down by election, location, and party, joining the datasets together using the county GEOID or FIPS code, and then conducting measures of spatial autocorrelation.

The pairing of IRR scores to election data is done so IRR scores are the ones closest in time to when an election took place. Given the data required to calculate the IRR, it has only been calculated for each census from 2000 onward. In the case of elections that do not fall on a census year, the temporally closest IRR scoring for counties is used. In this study this results in the 2012 election being paired with the 2010 IRR scores, and the 2016 & 2020 elections being paired with the 2020 IRR scores.

#clean the electoral data
clean_county_pres <- clean_names(countypres) %>%
  select(year, state, county_name, county_fips, candidate, party, 
         candidatevotes, totalvotes)
write_csv(clean_county_pres, "clean_county_pres.csv")

#filter down to take out 3rd party candidates, select only results in Ohio, and only the desired elections
county_pres_d_r_ohio <- clean_county_pres %>%
  filter(state %in% c("OHIO"), 
         party %in% c("DEMOCRAT", "REPUBLICAN"),
         year %in% c("2012", "2016", "2020"))

#create a vote share variable representing the percentage of the vote won on a per county basis to avoid
#skewing the data towards urban populations
ohio_county_pres_vote_share <- county_pres_d_r_ohio %>%
  mutate(vote_share = candidatevotes/totalvotes) %>%
  group_by(state, county_name, year, vote_share)

#joining the 2012 data to other data sets
ohio_pres_D_vote_share_2012 <- ohio_county_pres_vote_share %>%
  filter(year %in% c("2012"),
         party %in% c("DEMOCRAT"))
ohio_2012 <- geo_join(ohio_counties, ohio_pres_D_vote_share_2012, "GEOID", "county_fips")
ohio_2012_IRR2010 <- geo_join(ohio_2012, irr_2010, "GEOID", "FIPS2010") %>%
  sf::st_set_crs(4326) %>%
  select(year, state, GEOID, NAME, candidate, party, candidatevotes,
         totalvotes, vote_share, IRR2010, ALAND, AWATER, geometry)

#joining the 2016 data to other data sets
ohio_pres_D_vote_share_2016 <- ohio_county_pres_vote_share %>%
  filter(year %in% c("2016"),
         party %in% c("DEMOCRAT"))
ohio_2016 <- geo_join(ohio_counties, ohio_pres_D_vote_share_2016, "GEOID", "county_fips")
ohio_2016_IRR2020 <- geo_join(ohio_2016, irr_2020, "GEOID", "FIPS2020") %>%
  sf::st_set_crs(4326) %>%
  select(year, state, GEOID, NAME, candidate, party, candidatevotes,
         totalvotes, vote_share, IRR2020, ALAND, AWATER, geometry)

#joining the 2020 data to other data sets
ohio_pres_D_vote_share_2020 <- ohio_county_pres_vote_share %>%
  filter(year %in% c("2020"),
         party %in% c("DEMOCRAT"))
ohio_2020 <- geo_join(ohio_counties, ohio_pres_D_vote_share_2020, "GEOID", "county_fips")
ohio_2020_IRR2020 <- geo_join(ohio_2020, irr_2020, "GEOID", "FIPS2020") %>%
  sf::st_set_crs(4326) %>%
  select(year, state, GEOID, NAME, candidate, party, candidatevotes,
         totalvotes, vote_share, IRR2020, ALAND, AWATER, geometry)

Exploratory Vote Share & IRR Score Mapping

Let us explore and examine whether the variables look like they might cluster across space.

Looking at the vote share choropleth maps, it is striking how much Democratic vote share decreased after 2012. In 2012 the Democratic candidate, Barack Obama, won the state of Ohio while neither of the other candidates since have done so. Thus it is no surprise to see a negative change. What is of note is how across the board the drop off was outside of the strongest areas that are predominantly urban. It is also striking to note, while both lost the state of Ohio, Joe Biden performed only marginally better in 2020 (even though he won the election) than Hillary Clinton did in 2016 when she lost the election.

ggplot(data = ohio_2012_IRR2010) +
  geom_sf(aes(fill = vote_share)) +
  scale_fill_distiller(name = "2012 Democratic Vote Share",
                       palette = "Blues",
                       breaks = pretty_breaks(n = 4),
                       direction = 1) +
  theme_void() +
  theme(legend.position = "bottom")

ggplot(data = ohio_2016_IRR2020) +
  geom_sf(aes(fill = vote_share)) +
  scale_fill_distiller(name = "2016 Democratic Vote Share",
                       palette = "Blues",
                       breaks = pretty_breaks(n = 4),
                       direction = 1) +
  theme_void() +
  theme(legend.position = "bottom")

ggplot(data = ohio_2020_IRR2020) +
  geom_sf(aes(fill = vote_share)) +
  scale_fill_distiller(name = "2020 Democratic Vote Share",
                       palette = "Blues",
                       breaks = pretty_breaks(n = 4),
                       direction = 1) +
  theme_void() +
  theme(legend.position = "bottom")

Exploratory maps for IRR scores below show us that between the 2010 and 2020 measurements of the IRR, there was not much change in scoring. This is to be expected as any positive or negative changes for a county will be small in such a timespan and will not be easily seen on choropleth maps of one state. In the future it might be useful to map the change in IRR scores to be able to observe even the smallest of value changes.

ggplot(data = ohio_2012_IRR2010) +
  geom_sf(aes(fill = IRR2010)) +
  scale_fill_distiller(name = "2010 IRR Values",
                       palette = "Blues",
                       breaks = pretty_breaks(n = 4),
                       direction = -1) +
  theme_void() +
  theme(legend.position = "bottom")

ggplot(data = ohio_2020_IRR2020) +
  geom_sf(aes(fill = IRR2020)) +
  scale_fill_distiller(name = "2020 IRR Values",
                       palette = "Blues",
                       breaks = pretty_breaks(n = 4),
                       direction = -1) +
  theme_void() +
  theme(legend.position = "bottom")

Spatial Autocorrelation

“Spatial autocorrelation is the term used to describe the presence of systematic spatial variation in a variable and positive spatial autocorrelation, which is most often encountered in practical situations, is the tendency for areas or sites that are close together to have similar values (Haining 14826).”

To test for the extent of spatial autocorrelation, global Moran’s I, montecarlo simulations of global Moran’s I, and local Moran’s I measures will be used.

ohio_2012_neighbors <- poly2nb(ohio_2012_IRR2010, queen=T)
ohio_2012_weights <- nb2listw(ohio_2012_neighbors, style="W")

centroids <- st_centroid(st_geometry(ohio_2012_IRR2010))
plot(st_geometry(ohio_2012_IRR2010), border = "grey60", reset = FALSE)
plot(ohio_2012_neighbors, coords = centroids, add=T, col = "red")

summary(ohio_2012_neighbors)
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 462 
## Percentage nonzero weights: 5.965909 
## Average number of links: 5.25 
## Link number distribution:
## 
##  3  4  5  6  7  8 
## 10 16 19 29 13  1 
## 10 least connected regions:
## 3 10 12 17 18 38 39 58 70 75 with 3 links
## 1 most connected region:
## 41 with 8 links

Below is shown the Moran scatterplot for 2012 and measures of global spatial autocorrelation.

moran.plot(as.numeric(scale(ohio_2012_IRR2010$vote_share)), listw=ohio_2012_weights, 
           xlab="Standardized Vote Share", 
           ylab="Standardized Lagged Vote Share",
           main=c("Moran Scatterplot for 2012 Democratic Vote Share", "in Ohio Counties") )

moran.test(ohio_2012_IRR2010$vote_share, ohio_2012_weights)    
## 
##  Moran I test under randomisation
## 
## data:  ohio_2012_IRR2010$vote_share  
## weights: ohio_2012_weights    
## 
## Moran I statistic standard deviate = 5.8794, p-value = 2.059e-09
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.37083385       -0.01149425        0.00422868
moran.mc(ohio_2012_IRR2010$vote_share, ohio_2012_weights, nsim = 999)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  ohio_2012_IRR2010$vote_share 
## weights: ohio_2012_weights  
## number of simulations + 1: 1000 
## 
## statistic = 0.37083, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
set.seed(1918)
locali_2012 <- localmoran_perm(ohio_2012_IRR2010$vote_share, ohio_2012_weights,
                               nsim = 999) %>%
  as_tibble() %>%
  set_names(c("local_i", "exp_i", "var_i", "z_i", "p_i",
              "p_i_sim", "pi_sim_folded", "skewness", "kurtosis"))

ohio_2012_IRR2010 <- ohio_2012_IRR2010 %>%
  bind_cols(locali_2012)

ohio_2012_IRR2010 <- ohio_2012_IRR2010 %>%
  mutate(vote_share_z =  as.numeric(scale(vote_share)),
         votesharezlag = lag.listw(ohio_2012_weights, vote_share_z),
         lisa_cluster = case_when(
           p_i >= 0.05 ~ "Not significant",
           vote_share_z > 0 & local_i > 0 ~ "High-high",
           vote_share_z > 0 & local_i < 0 ~ "High-low",
           vote_share_z < 0 & local_i > 0 ~ "Low-low",
           vote_share_z < 0 & local_i < 0 ~ "Low-high"
         ))

Below is shown the Moran scatterplot for 2016 and measures of global spatial autocorrelation.

ohio_2016_neighbors <- poly2nb(ohio_2016_IRR2020, queen=T)
ohio_2016_weights <- nb2listw(ohio_2016_neighbors, style="W")


moran.plot(as.numeric(scale(ohio_2016_IRR2020$vote_share)), listw=ohio_2016_weights, 
           xlab="Standardized Vote Share", 
           ylab="Standardized Lagged Vote Share",
           main=c("Moran Scatterplot for 2016 Democratic Vote Share", "in Ohio Counties") )

moran.test(ohio_2016_IRR2020$vote_share, ohio_2016_weights)    
## 
##  Moran I test under randomisation
## 
## data:  ohio_2016_IRR2020$vote_share  
## weights: ohio_2016_weights    
## 
## Moran I statistic standard deviate = 5.5565, p-value = 1.376e-08
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.347610787      -0.011494253       0.004176711
moran.mc(ohio_2016_IRR2020$vote_share, ohio_2016_weights, nsim = 999)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  ohio_2016_IRR2020$vote_share 
## weights: ohio_2016_weights  
## number of simulations + 1: 1000 
## 
## statistic = 0.34761, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
set.seed(1918)
locali_2016 <- localmoran_perm(ohio_2016_IRR2020$vote_share, ohio_2016_weights,
                               nsim = 999) %>%
  as_tibble() %>%
  set_names(c("local_i", "exp_i", "var_i", "z_i", "p_i",
              "p_i_sim", "pi_sim_folded", "skewness", "kurtosis"))

ohio_2016_IRR2020 <- ohio_2016_IRR2020 %>%
  bind_cols(locali_2016)

ohio_2016_IRR2020 <- ohio_2016_IRR2020 %>%
  mutate(vote_share_z =  as.numeric(scale(vote_share)),
         votesharezlag = lag.listw(ohio_2016_weights, vote_share_z),
         lisa_cluster = case_when(
           p_i >= 0.05 ~ "Not significant",
           vote_share_z > 0 & local_i > 0 ~ "High-high",
           vote_share_z > 0 & local_i < 0 ~ "High-low",
           vote_share_z < 0 & local_i > 0 ~ "Low-low",
           vote_share_z < 0 & local_i < 0 ~ "Low-high"
         ))

Below is shown the Moran scatterplot for 2020 and measures of global spatial autocorrelation.

ohio_2020_neighbors <- poly2nb(ohio_2020_IRR2020, queen=T)
ohio_2020_weights <- nb2listw(ohio_2020_neighbors, style="W")


moran.plot(as.numeric(scale(ohio_2020_IRR2020$vote_share)), listw=ohio_2020_weights, 
           xlab="Standardized Vote Share", 
           ylab="Standardized Lagged Vote Share",
           main=c("Moran Scatterplot for 2020 Democratic Vote Share", "in Ohio Counties") )

moran.test(ohio_2020_IRR2020$vote_share, ohio_2020_weights)    
## 
##  Moran I test under randomisation
## 
## data:  ohio_2020_IRR2020$vote_share  
## weights: ohio_2020_weights    
## 
## Moran I statistic standard deviate = 5.8283, p-value = 2.8e-09
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.365938843      -0.011494253       0.004193713
moran.mc(ohio_2020_IRR2020$vote_share, ohio_2020_weights, nsim = 999)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  ohio_2020_IRR2020$vote_share 
## weights: ohio_2020_weights  
## number of simulations + 1: 1000 
## 
## statistic = 0.36594, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
set.seed(1918)
locali_2020 <- localmoran_perm(ohio_2020_IRR2020$vote_share, ohio_2020_weights,
                               nsim = 999) %>%
  as_tibble() %>%
  set_names(c("local_i", "exp_i", "var_i", "z_i", "p_i",
              "p_i_sim", "pi_sim_folded", "skewness", "kurtosis"))

ohio_2020_IRR2020 <- ohio_2020_IRR2020 %>%
  bind_cols(locali_2020)

ohio_2020_IRR2020 <- ohio_2020_IRR2020 %>%
  mutate(vote_share_z =  as.numeric(scale(vote_share)),
         votesharezlag = lag.listw(ohio_2020_weights, vote_share_z),
         lisa_cluster = case_when(
           p_i >= 0.05 ~ "Not significant",
           vote_share_z > 0 & local_i > 0 ~ "High-high",
           vote_share_z > 0 & local_i < 0 ~ "High-low",
           vote_share_z < 0 & local_i > 0 ~ "Low-low",
           vote_share_z < 0 & local_i < 0 ~ "Low-high"
         ))

Local Spatial Autocorrelation Visualizations

Below are visualizations showing a plot of local Moran’s I, and a cluster map of local Moran’s I for the 2012 election using vote share as the measured variable.

color_values <- c(`High-high` = "red", 
                  `High-low` = "pink", 
                  `Low-low` = "blue", 
                  `Low-high` = "lightblue", 
                  `Not significant` = "white")

ggplot(ohio_2012_IRR2010, aes(x = vote_share_z, 
                       y = votesharezlag,
                       fill = lisa_cluster)) + 
  geom_point(color = "black", shape = 21, size = 2) + 
  theme_minimal() + 
  geom_hline(yintercept = 0, linetype = "dashed") + 
  geom_vline(xintercept = 0, linetype = "dashed") + 
  scale_fill_manual(values = color_values) + 
  labs(x = "2012 Vote Share (z-score)",
       y = "Spatial lag of Vote Share (z-score)",
       fill = "Cluster type")

tmap_mode("plot")

tm_shape(ohio_2012_IRR2010, unit = "mi") +
  tm_polygons(col = "lisa_cluster", title = "Local Moran's I", palette = color_values) +
  tm_compass(type = "4star", position = c("left", "bottom")) + 
  tm_scale_bar(breaks = c(0, 10, 20), text.size = 1) +
  tm_layout(frame = F, main.title = "2012 Ohio Vote Share clusters",
            legend.outside = T)

bivariate_2012 <- bi_class(ohio_2012_IRR2010, x = IRR2010, y = vote_share, 
                           style = "quantile", dim = 3)

bivariate_2012_map <- ggplot() +
  geom_sf(data = bivariate_2012, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
  bi_scale_fill(pal = "DkViolet", dim = 3) +
  labs(
    title = "2012 Democratic Vote Share & 2010 IRR Scores",
    subtitle = "Quantile with Dark Violet Palette"
  ) +
  bi_theme(base_size = 12)

legend <- bi_legend(pal = "DkViolet",
                     dim = 3,
                     xlab = "IRR Score",
                     ylab = "Democratic Vote Share",
                     size = 10)

bivariate_2012_Plot <- ggdraw() +
  draw_plot(bivariate_2012_map, 0, 0, 1, 1) +
  draw_plot(legend, 0.73, .015, 0.25, 0.2)
bivariate_2012_Plot

Below are visualizations showing a plot of local Moran’s I, and a cluster map of local Moran’s I for the 2016 election using vote share as the measured variable.

ggplot(ohio_2016_IRR2020, aes(x = vote_share_z, 
                       y = votesharezlag,
                       fill = lisa_cluster)) + 
  geom_point(color = "black", shape = 21, size = 2) + 
  theme_minimal() + 
  geom_hline(yintercept = 0, linetype = "dashed") + 
  geom_vline(xintercept = 0, linetype = "dashed") + 
  scale_fill_manual(values = color_values) + 
  labs(x = "2016 Vote Share (z-score)",
       y = "Spatial lag of Vote Share (z-score)",
       fill = "Cluster type")

tmap_mode("plot")

tm_shape(ohio_2016_IRR2020, unit = "mi") +
  tm_polygons(col = "lisa_cluster", title = "Local Moran's I", palette = color_values) +
  tm_compass(type = "4star", position = c("left", "bottom")) + 
  tm_scale_bar(breaks = c(0, 10, 20), text.size = 1) +
  tm_layout(frame = F, main.title = "2016 Ohio Vote Share clusters",
            legend.outside = T) 

bivariate_2016 <- bi_class(ohio_2016_IRR2020, x = IRR2020, y = vote_share, 
                           style = "quantile", dim = 3)

bivariate_2016_map <- ggplot() +
  geom_sf(data = bivariate_2016, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
  bi_scale_fill(pal = "DkViolet", dim = 3) +
  labs(
    title = "2016 Democratic Vote Share & 2020 IRR Scores",
    subtitle = "Quantile with Dark Violet Palette"
  ) +
  bi_theme(base_size = 12)


bivariate_2016_Plot <- ggdraw() +
  draw_plot(bivariate_2016_map, 0, 0, 1, 1) +
  draw_plot(legend, 0.73, .015, 0.25, 0.2)
bivariate_2016_Plot

Below are visualizations showing a plot of local Moran’s I, and a cluster map of local Moran’s I for the 2020 election using vote share as the measured variable.

ggplot(ohio_2020_IRR2020, aes(x = vote_share_z, 
                       y = votesharezlag,
                       fill = lisa_cluster)) + 
  geom_point(color = "black", shape = 21, size = 2) + 
  theme_minimal() + 
  geom_hline(yintercept = 0, linetype = "dashed") + 
  geom_vline(xintercept = 0, linetype = "dashed") + 
  scale_fill_manual(values = color_values) + 
  labs(x = "2020 Vote Share (z-score)",
       y = "Spatial lag of Vote Share (z-score)",
       fill = "Cluster type")

tmap_mode("plot")

tm_shape(ohio_2020_IRR2020, unit = "mi") +
  tm_polygons(col = "lisa_cluster", title = "Local Moran's I", palette = color_values) +
  tm_compass(type = "4star", position = c("left", "bottom")) + 
  tm_scale_bar(breaks = c(0, 10, 20), text.size = 1) +
  tm_layout(frame = F, main.title = "2020 Ohio Vote Share clusters",
            legend.outside = T)

bivariate_2020 <- bi_class(ohio_2020_IRR2020, x = IRR2020, y = vote_share, 
                           style = "quantile", dim = 3)

bivariate_2020_map <- ggplot() +
  geom_sf(data = bivariate_2020, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
  bi_scale_fill(pal = "DkViolet", dim = 3) +
  labs(
    title = "2020 Democratic Vote Share & 2020 IRR Scores",
    subtitle = "Quantile with Dark Violet Palette"
  ) +
  bi_theme(base_size = 12)


bivariate_2020_Plot <- ggdraw() +
  draw_plot(bivariate_2020_map, 0, 0, 1, 1) +
  draw_plot(legend, 0.73, .015, 0.25, 0.2)
bivariate_2020_Plot

Interactive Summary Map

Below is an interactive Leaflet choropleth map with each election year as a toggable layer and the shade of blue determining level of Democratic vote share (with darker blues indicating higher Democratic vote share). Hovering over an area reveals the name of the county, and clicking presents a popup table with all of the corresponding data (from IRR score to LISA cluster information) for that county for that layer. This is to serve as a useful exploration tool for the reader to look at summary findings.

leaflet() %>%
  addProviderTiles(providers$CartoDB) %>%
  addPolygons(data = ohio_2012_IRR2010, 
              fillColor  = ~colorQuantile("Blues", vote_share)(vote_share),
              opacity = 1,
              color = "NA",
              dashArray = "1",
              fillOpacity = 0.7,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                dashArray = "",
                fillOpacity = 0.7,
                bringToFront = TRUE,
              ),
              label = ~NAME,
              popup = leafpop::popupTable(ohio_2012_IRR2010),
              group = "2012") %>%
  addPolygons(data = ohio_2016_IRR2020, 
              fillColor  = ~colorQuantile("Blues", vote_share)(vote_share),
              opacity = 1,
              color = "NA",
              dashArray = "1",
              fillOpacity = 0.7,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                dashArray = "",
                fillOpacity = 0.7,
                bringToFront = TRUE,
              ),
              label = ~NAME,
              popup = leafpop::popupTable(ohio_2016_IRR2020),
              group = "2016") %>%
  addPolygons(data = ohio_2020_IRR2020, 
              fillColor  = ~colorQuantile("Blues", vote_share)(vote_share),
              opacity = 1,
              color = "NA",
              dashArray = "1",
              fillOpacity = 0.7,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                dashArray = "",
                fillOpacity = 0.7,
                bringToFront = TRUE,
              ),
              label = ~NAME,
              popup = leafpop::popupTable(ohio_2020_IRR2020),
              group = "2020") %>%
  addLayersControl(overlayGroups = c("2012", "2016", "2020"))

Results

For each election global spatial autocorrelation was measured at slightly above 0.3 which is generally taken to be meaningful and seen as positive spatial autocorrelation and clustering. For this measure to remain generally in the same range election to election despite the fluctations in vote share and change in election outcomes shows consistency. Each of the maps for local indicators of spatial autocorrelation show generally similar results even if there is some slight variance in the clustering levels for some counties. Using the interactive map or bivariate maps, looking at counties where low-lows and high-highs were consistent it can be seen that the low-lows tend to be significantly more rural and the high-highs tend to be more urban. This is consistent with the expected results showing increasing rurality and decreasing Democratic vote share.

Further Research

The first, most obvious limitation of the research is the potential for selection bias among the state being studied and the elections selected. While the 3 most recent elections were a natural fit for study, it does lend itself (along with the selection of a ‘bellweather’ state) to results that confirm what was already believed. To further test the research question, other states and elections (while still remaining recent) could be studied. When looking at one variable such as Democratic vote share other confounding factors can be missed, in particular the popularity or strength of Republican candidates in any given election year. Finally one of the bigger pitfalls of this analysis is using many measures that are relative instead of absolute, thereby making direct comparisons election to election impossible and trying to gauge off of relative levels per an election at hand.

References

Kim, A. & Waldorf, B. (2023). The Index of Relative Rurality (IRR): US County Data for 2020 (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7675745

Haining, R. P. (2001). Spatial sampling. International Encyclopedia of the Social & Behavioral Sciences, 14822–14827. https://doi.org/10.1016/b0-08-043076-7/02510-9

MIT Election Data and Science Lab. (2022, November 1). County presidential election returns 2000-2020. MIT Election Data + Science Lab. https://doi.org/10.7910/DVN/VOQCHQ

U.S. Census Bureau. (n.d.). Cartographic Boundary Files. U.S. Census Bureau. Retrieved June 6, 2023, from https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html

Waldorf, B. & Kim, A. 2015. “Defining and Measuring Rurality in the US: From Typologies to Continuous Indices.” Commissioned paper prepared for the National Academies of Sciences Workshop on Rationalizing Rural Classifications, April 2015, Washington, DC. http://sites.nationalacademies.org/cs/groups/dbassesite/documents/webpage/dbasse_168031.pdf