Jeffery Delva
2025-09-24
The objective of this analysis is to explore the relationships between 2020 presidential election outcomes, economic health (unemployment and poverty), and education across U.S. counties.
By visualizing these variables and their interactions, we aim to uncover patterns that can lead to more specific research questions.
The dataset used for this analysis combines county-level data from four public sources: the MIT Election Lab, the Bureau of Labor Statistics, and the U.S. Census Bureau.
This section provides a summary of the dataset’s structure and variables as required.
The dataset contains 3153 observations (counties) and 11 variables.
The following table lists each variable and its description.
Variable.Name | Type | Description |
---|---|---|
county_fips | Numerical | A unique 5-digit Federal Information Processing Standard code for the county. |
state | Categorical | The name of the U.S. state. |
county_name | Categorical | The name of the county. |
winning_party_votes | Numerical | Total votes for the winning party in the 2020 presidential election. |
winning_party | Categorical | The political party that won the county in 2020. |
unemployment_rate | Numerical | The county’s unemployment rate for 2020 (%). |
PCTPOVALL_2019 | Numerical | The estimated percentage of people in poverty in the county in 2019. |
percent_less_than_hs | Numerical | The percentage of adults with less than a high school diploma (2015-19). |
percent_hs_only | Numerical | The percentage of adults with only a high school diploma (2015-19). |
percent_some_college | Numerical | The percentage of adults with some college or an associate’s degree (2015-19). |
percent_bachelors_or_higher | Numerical | The percentage of adults with a bachelor’s degree or higher (2015-19). |
This section visualizes variable distributions and their relationships to answer key analytical questions.
Analytic Question: What are the typical distributions of key economic indicators like poverty and unemployment across U.S. counties?
Interpretation: The distribution of poverty rates is right-skewed, meaning most counties have a poverty rate below 20%, but a significant number of counties report higher rates, indicating widespread economic disparity.
Analytic Question: Is there a relationship between a county’s dominant political party and its level of educational attainment?
Interpretation: The boxplot shows that counties won by Democrats tend to have a higher median percentage of adults with a bachelor’s degree or higher compared to counties won by Republicans. The distribution for Democratic won counties is also more varied, with a wider interquartile range and more outliers.
Analytic Question: How do all the key numerical variables (education, poverty, unemployment) relate to one another?
Interpretation: The most distinct relationship is the strong negative correlation between the percent_bachelors_or_higher and the PCTPOVALL_2019 (poverty rate). As one goes up, the other clearly goes down. There is a strong positive correlation between percent_less_than_hs and the poverty rate, indicating that less educated adults is associated with higher poverty.
The Exploratory Data Analysis reveals strong, consistent relationships between education and economic status at the county level. Counties with more highly educated populations tend to show lower rates of poverty and unemployment.
Furthermore, these socioeconomic indicators appear to correlate with political outcomes, as seen in the differing educational levels of counties based on their 2020 election results.
Research questions that can be posed following this EDA consists of:
Correlation of winning party votes and unemployment rate?
Relationship between adults at differing educational levels and county poverty rates?