Problem Statement and Background

The aim of the following analysis is to examine the relationships between poverty and education, hoping to identify key factors to an individuals educational achievements. Diving deep into this topic will hopefully provide useful insights into addressing the United States’ struggle with the education system, as some political figures look to disband and defund the U.S. Board of Education. This analysis is important for state- and country-wide educational boards as they seek solutions for strugling schools and education systems.

The datasets come from numerous sources, including the US Census Bureau and the Bureau of Labor Statistics. The four datasets that will be used are the Presidential Election Data, Unemployment Data, Poverty Data, Education Data. Links to these datasets can be found below:

R/R Markdown were used in this project as it is free and open-source, allowing users to customize their experience with various libraries and features that other coding software such as SAS do not have. R provides an easy-to-use and comprehensive toolset of statistical analyses and tests. While these advanced analyses will not be used in this project, further work can be done to provide additional insights to education and the role poverty plays in adult’s educational experience.

This analysis seeks to answer the question: what role does poverty play in U.S. adults educational experience?.

Data Integration

Data Filtering & Merging

As per project directions, I was tasked with only keeping relevant information within each dataset.

Presidential Election Data Filtering

Before filtering the Presidential Election Data, the dataset had 72617 observations of 12 variables. Below were the instructions for the Presidential Election Data.

The following information should be kept in the data:

  • only use the 2020 election data
  • only keep the data for the two major parties: Democrats and Republicans
  • aggregate the total votes and keep the winning party in the data
  • county FIPS code, State name, county name, total votes received in the winning party, and the name of the party

After filtering the Presidential Election Data, the dataset had 3153 observations of 5 variables.

Unemployment Data Filtering

Before filtering the Unemployment Data, the dataset had 290441 observations of 5 variables. Below were the instructions for the Unemployment Data.

The following information should be kept in the data:

  • Only keep the unemployment date in the year 2020 or the most recent year if the 2020 employment rate is unavailable
  • County FIPS code
  • Unemployment rate

After filtering the Unemployment Data, the dataset had 3193 obervations of 3 variables.

Poverty Data Filtering

Before filtering the Poverty Data, the dataset had 79919 observations of 5 variables. Below were the instructions for the Poverty Data.

The following information should be kept in the data:

  • Keep only 2019 poverty rate [the variable name in the data set: PCTPOVALL_2019]
  • County FIPS code

After filtering the Poverty Data, the dataset had 3193 observations of 3 variables.

Education Data Filtering

Before filtering the Education Data, the dataset had 3283 observations of 47 variables. Below were the instructions for the Education Data.

The following information should be kept in the data:

  • only keep the percentage of education levels between 2015 and 2019.
  • Education levels
    • less than a high school diploma
    • high school diploma
    • completed some college (1-3 years)
    • completed four years of college
  • County FIPS code

After filtering the Education Data, the dataset had 3273 observations of 5 variables.

Data Merging

When merging the datasets into one large dataset, I first converted each individual dataset’s FIPS value into a character variables.

Then, I joined all 4 individual datasets into one large dataset named Final Data.

Final Data Set

The Final Data dataset now has 3114 observations of 13 variables. Below are the descriptions of each remaining variable:

  • county.fips: A character variable of each counties FIPS code, which acts as a unique identifier for each county
  • county.name: A character variable of the name for each county
  • state: A character variable of the name for each state
  • total.votes: A numeric variable of the total amount of votes received by the winning political party in the 2020 election
  • winning.party: A character variable for the winning political party
  • area.name.x: A character variable for the full name of the county, state abbreviation included
  • unemployment.rate: A numeric variable for the county’s unemployment rate
  • area.name.y: A character variable for the full name of the county
  • Poverty.Rate: A numeric variable for the county’s poverty rate
  • Less.Than.HS: A numerical variable for the county’s percent of adults with less than a high school diploma as their highest educational experience between the years of 2015-2019
  • HS.Only: A numerical variable for the county’s percent of adults with a high school diploma only as their highest educational experience between the years of 2015-2019
  • Associates: A numerical variable for the county’s percent of adults with some college experience or an associates degree as their highest educational experience between the years of 2015-2019
  • Bachelors: A numerical variable for the county’s percent of adults with at least a Bachelor’s degree as their highest educational experience between the years of 2015-2019

Exploratory Data Analysis

Distribution of Bachelors

The histogram above of Bachelors shows that for a majority of U.S. counties, less than 40% of their adult population has at least a Bachelor’s degree. The highest concentration is around 15-20%. The distribution of Bachelors is right-skewed, indicating that very few counties have a high proportion of adult residents who finished a 4-year college or university. There are a few potential outliers beyond a 70% Bachelor’s Degree rate, indicating a higher-than-usual cincentration of college graduates.

Relationship between Bachelors and Poverty Rate

The scatterplot of Bachelors and Poverty.Rate expands upon potential explanations as to the right-skewed distribution of the Bachelors variable. As shown above, there is a negative association between Bachelors and Poverty.Rate, meaning as the county’s poverty rate increases, the percentage of residents with at least a Bachelors degree decreases. This negative correlation suggests that adults or soon-to-be adults in a higher poverty-stricken area have a lower chance of finishing a traditional 4-year college or university. There appear to be two counties with a Bachelor’s rate of close to 0% with a relatively low poverty rate, suggesting a potential confounding variable that the dataset may not account for like access to available schooling.

Pairwise Comparison

To potentially show more to the narrative, the above scatterplot shows the relationship between Less.Than.HS and Poverty.Rate. Less.Than.HS is the “lowest” available education option for residents in this study. There is a slight positive association between Less.Than.HS and Poverty.Rate, meaning as the county’s poverty rate increases, the percentage of residents without a High School diploma also increases. This positive correlation suggests that adults or soon-to-be adults in a higher poverty-stricken area have a higher chance of failing to graduate high school. This narrative aligns with the previous scatterplot that those in a higher poverty-stricken county are less likely to receive a higher education. There appears to be a few outliers, one of which around the 75% Less.Than.HS rate. Further research can be done on this county, potentially finding confounding variables that this dataset does not account for.

Results and Discussions

This exploratory data analysis provides insights to the potential reasoning behind adults educational experience as it relates to poverty levels. Counties with a higher level of poverty have a higher proportion of adults who did not graduate high school and a lower proportion of adults with at least a Bachelor’s degree. These findings show a trend of counties with high poverty and low educational results. These findings are important to policy makers, as they can use this information to target higher poverty-stricken counties and provide more funding for their educational systems, hoping to educate the residents.