Analysis of Socioeconomic Factors and Voting Patterns in US Counties

Jeffery Delva

2025-09-24

1. Introduction

Objective & Background

The objective of this analysis is to explore the relationships between 2020 presidential election outcomes, economic health (unemployment and poverty), and education across U.S. counties.

By visualizing these variables and their interactions, we aim to uncover patterns that can lead to more specific research questions.

The dataset used for this analysis combines county-level data from four public sources: the MIT Election Lab, the Bureau of Labor Statistics, and the U.S. Census Bureau.

2. Data Set Description

This section provides a summary of the dataset’s structure and variables as required.

The dataset contains 3153 observations (counties) and 11 variables.

Variable Definitions

The following table lists each variable and its description.

Variable Definitions for the Consolidated County Dataset
Variable.Name Type Description
county_fips Numerical A unique 5-digit Federal Information Processing Standard code for the county.
state Categorical The name of the U.S. state.
county_name Categorical The name of the county.
winning_party_votes Numerical Total votes for the winning party in the 2020 presidential election.
winning_party Categorical The political party that won the county in 2020.
unemployment_rate Numerical The county’s unemployment rate for 2020 (%).
PCTPOVALL_2019 Numerical The estimated percentage of people in poverty in the county in 2019.
percent_less_than_hs Numerical The percentage of adults with less than a high school diploma (2015-19).
percent_hs_only Numerical The percentage of adults with only a high school diploma (2015-19).
percent_some_college Numerical The percentage of adults with some college or an associate’s degree (2015-19).
percent_bachelors_or_higher Numerical The percentage of adults with a bachelor’s degree or higher (2015-19).

3. Exploratory Data Analysis

This section visualizes variable distributions and their relationships to answer key analytical questions.

Analysis 1: Distribution of Single Variables

Analytic Question: What are the typical distributions of key economic indicators like poverty and unemployment across U.S. counties?

Interpretation: The distribution of poverty rates is right-skewed, meaning most counties have a poverty rate below 20%, but a significant number of counties report higher rates, indicating widespread economic disparity.

Analysis 2

Relationships Between Two Variables

Analytic Question: Is there a relationship between a county’s dominant political party and its level of educational attainment?

Interpretation: The boxplot shows that counties won by Democrats tend to have a higher median percentage of adults with a bachelor’s degree or higher compared to counties won by Republicans. The distribution for Democratic won counties is also more varied, with a wider interquartile range and more outliers.

Analysis 3

Pairwise Comparisons

Analytic Question: How do all the key numerical variables (education, poverty, unemployment) relate to one another?

Interpretation: The most distinct relationship is the strong negative correlation between the percent_bachelors_or_higher and the PCTPOVALL_2019 (poverty rate). As one goes up, the other clearly goes down. There is a strong positive correlation between percent_less_than_hs and the poverty rate, indicating that less educated adults is associated with higher poverty.

4. Results & Discussion

The Exploratory Data Analysis reveals strong, consistent relationships between education and economic status at the county level. Counties with more highly educated populations tend to show lower rates of poverty and unemployment.

Furthermore, these socioeconomic indicators appear to correlate with political outcomes, as seen in the differing educational levels of counties based on their 2020 election results.

Research questions that can be posed following this EDA consists of: