My dataset explores the relationships that Maryland counties have and their involvement in the 2024 election. I am evaluating the political affiliation of every county and whether they voted by majority as Democrat/Republican/Other in a diverging bar graph to familiarize my audience with the counties in Maryland. Afterwards, I explore the main factor – education level, and if they correlate with county votership, like in my case, Bachelor’s Degree Attainment % per County. All data is from the Maryland Board of Elections and National Institute on Minority Health and Health Disparities.
Load Tidyverse and the Dataset
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
New names:
Rows: 25 Columns: 12
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): County dbl (11): ...1, FIPS, bachelors_percent, bachelors_count,
democrat_percent, ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Plot 1
Political Affiliation of Counties
Calculate Affiliation
Mutating a new column that sorts every county into a categorical variable of their poltical affiliation from the 2024 election with an if/else statement, and remove Maryland.
This bar chart is organized with diverging bar graph to first visualize how each Maryland county voted in the 2024 election, from left to right replicating the political spectrum and which counties have more % of voters by party.
ggplot(md, aes(x = republican_percent-democrat_percent, y =reorder(County, democrat_percent), fill = affiliation)) +geom_col(aes(x =-democrat_percent), fill ="#71acc1", width =0.7) +geom_col(aes(x =-other_percent), fill ="#999999", width =0.7) +geom_col(aes(x = republican_percent), fill ="#e05252", width =0.7) +labs(title ="Party Demographics of Maryland Counties in the 2024 Election",x ="Democrat > Political Affiliation < Republican",y ="Counties", caption ="Source: Maryland State Board of Elections") +theme_minimal()
Plot 2
Bachelor’s Degree Attainment vs County Political Affiliation
Plotting a Scatter Plot
In this scatter plot, counties are represented by a dot and colored in with their political affiliation. The x-axis is bachelor’s degree % per county, and the y-axis is the political spectrum measured by % of democratic affiliation per county. Included is a line of regression with a confidence interval.
p2 <-ggplot(md, aes(x = bachelors_percent, y = democrat_percent, color = affiliation)) +geom_point(aes(size = bachelors_count)) +labs(title ="Bachelor's Degrees Versus Political Affiliation Correlation in 2024",x ="Bachelor Degree Attainment (%)",y ="Democratic Affiliation (%)", size ="Bachelor's Degree Population", caption ="Source: Maryland State Board of Elections &\nNational Institute on Minority Health and Health Disparities") +xlim(0,75) +ylim(0,100) +scale_color_manual(name ="Political Affiliation", values =c("#71acc1", "#e05252")) +geom_smooth(method='lm', formula=y~x, color ="#999999", linetype ="dotdash", se =TRUE) +theme_minimal() p2
Linear Regression Summary
R Value: 0.57, Positive Correlation Linear Regression Line Equation: 0.83(bachelors_percent) + 19.28. The line equation can be interpreted as: for every 1% in Bachelor’s Degree attainment, there is a predicted increase of 0.83% in Democratic affiliation.
cor(md$bachelors_percent, md$democrat_percent)
[1] 0.5721145
fit1 <-lm(democrat_percent ~ bachelors_percent, data = md)summary(fit1)
Call:
lm(formula = democrat_percent ~ bachelors_percent, data = md)
Residuals:
Min 1Q Median 3Q Max
-17.912 -8.343 -3.985 5.403 36.608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.2816 9.3448 2.063 0.05107 .
bachelors_percent 0.8313 0.2541 3.272 0.00349 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.57 on 22 degrees of freedom
Multiple R-squared: 0.3273, Adjusted R-squared: 0.2967
F-statistic: 10.7 on 1 and 22 DF, p-value: 0.003487
I cleaned up the dataset when it was originally very messy with the categorical/numerical columns being part of the dataset themselves, and when adding both sources, having to alter the names of specific counties (Saint Mary’s & St Marys) to properly join them both into the md.csv dataset. I added my own category, “affiliation” with an if/else statement to put a categorical color on a county’s voting record for 2024 for my visualizations, and removed “Maryland” when only looking at counties.
The visualization represents how the amount of adults with bachelor’s degrees (education levels) in certain counties can correlate with a political affiliation in the 2024 election, with the y-axis being the percentage of Democratic Affiliation. Through my analysis and line of regression, there is an r value of 0.57 that displays a positive, moderately strong correlation with counties that had a Democratic affiliation and percentage of adults with at least bachelor’s degrees. Size is included to symbolize population of these bachelor degree holders, since Maryland’s population is varied across the state. What I found interesting is if I instead used y = republican_percent for the scatter plot, the r would’ve been r = -0.58 and the same trend would occur, just in a negative slope and instead making the statement that higher Republican voting counties have lower percentages of adults with bachelor degrees. Therefore, because of my r = 0.58 being positive and my p-value being 0.003, there is a positive correlation between Bachelor’s degree attainment % (education level) and Democratic Party affiliation.
I wish I could have had more time to include more factors that I also had data of, such as income, & of households with a second language, unemployment, etc. Unfortunately after troubleshooting with joining datasets, I wasn’t able to add those but I decided to stick with bachelor’s degree attainment to focus on education level and its correlation with its political affiliations in the most recent election. Because so, the adjusted R-squared is 0.29. Had there been more data of other factors or perhaps individual data, my R-squared would’ve been higher to explain the model more. Thankfully, this is a first step in confirming some socioeconomic relationships like education level with politically aligned ones in Maryland alone.