Poverty in Ohio is not evenly distributed across the state. In this report, we explore data from the American Community Survey (ACS) conducted by the U.S. Census Bureau to show how Southeastern Ohio experiences greater poverty and lower incomes than the rest of the state. First, we’ll prepare the data for analysis. Then, we’ll analyze the data.
In this section, we load packages, read in our data, tidy columns, and save the data to a new csv.
These packages contain functions that we will use during our analysis.
library(here)
library(tidyverse)
library(skimr)
library(janitor)
library(scales)
We will read the data in from a csv and store it as an object
ohio. Then we display the first few rows of the table.
ohio <- read_csv(here("data", "Ohio_data.csv"))
head(ohio)
## # A tibble: 6 × 9
## ...1 GEOid GEOId2 Geography PctFamsPov PctNoHealth MedHHIncom PCTUnemp Region
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 0 39001 39001 Adams 35.1 10.1 36320 7.5 South…
## 2 1 39003 39003 Allen 24.3 7.9 47905 7.2 North…
## 3 2 39005 39005 Ashland 26 9.5 50893 4.2 North…
## 4 3 39007 39007 Ashtabula 38.9 10 43017 7.4 North…
## 5 4 39009 39009 Athens 35.1 6.9 37191 8.7 South…
## 6 5 39011 39011 Auglaize 17.7 4.3 59516 3.4 South…
Next, we will clean the column names to make them more human
readable. We will reformat the geo_id column and rename the
Geography column to county. We will also
select a subset of the data to work with. The first few rows of the
cleaned data are displayed.
cleanohio <- ohio %>%
clean_names() %>%
rename(geo_id = ge_oid) %>%
rename(county = geography) %>%
select(geo_id, county, region, pct_fams_pov, pct_no_health, med_hh_incom,
pct_unemp )
head(cleanohio)
## # A tibble: 6 × 7
## geo_id county region pct_fams_pov pct_no_health med_hh_incom pct_unemp
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 39001 Adams Southeast 35.1 10.1 36320 7.5
## 2 39003 Allen Northwest 24.3 7.9 47905 7.2
## 3 39005 Ashland Northeast 26 9.5 50893 4.2
## 4 39007 Ashtabula Northeast 38.9 10 43017 7.4
## 5 39009 Athens Southeast 35.1 6.9 37191 8.7
## 6 39011 Auglaize Southwest 17.7 4.3 59516 3.4
Let’s read the cleaned data to a csv.
write.csv(cleanohio, "cleanohio_wk3.csv")
With the data cleaned and prepared, we are ready for analysis.
In this section, we create 3 plots that tell the story of how poverty in Ohio is concentrated in the Southeast region of the state.
We will start by making a simple bar chart that groups individual counties by region and then summarizes the mean poverty rate for each region.
cleanohio %>%
group_by(region) %>%
summarize(mean_pov = mean(pct_fams_pov)) %>%
ggplot(aes(x = region, y = mean_pov, color = region, fill = region)) +
geom_col() +
labs(title = "Mean Percentage of Poverty Rates Across Ohio",
x = "Region",
y = "Mean Percentage of Families in Poverty",
caption = "Figure 1. The Southeast region of Ohio had the highest mean percentage \n of families in poverty compared to other regions in Ohio.")
It’s clear that Southeastern Ohio has a higher mean poverty rate compared to other regions in Ohio. But what about individual counties? Does the distribution of poverty rates in individual counties provide more insight into this geographic pattern?
For this plot, we will look at the distribution of the poverty rates in counties across Ohio grouped by their respective regions. We’ll use a violin plot to compare the distributions of poverty rates among the different regions.
cleanohio %>%
ggplot(aes(x = region, y = pct_fams_pov, color = region)) +
geom_violin() +
geom_point() +
labs(title = "Distribution of Poverty Rates Across Ohio Regions",
x = "Region",
y = "Percentage of Families in Poverty",
caption = "Figure 2. Distrubtion of the percentage of families in \n poverty across Ohio. Each point represents an individual county. The violin \n shape of each region represents the density of values within that region. \n The Southeast region has a higher concentration of high-poverty counties.")
The Southeast region shows higher poverty rates and a higher concentration of counties at the upper end of the distribution. The Northeast and Southwest regions have lower poverty rates. The Central region has the largest spread of poverty rates, and the Northeast region has multiple clusters of counties with similar poverty rates. Overall, these patterns highlight the unequal distribution of poverty across Ohio’s geographic regions. What happens when we bring another variable into the equation? Let’s take a look at household income and how it relates to poverty rates.
To explore the relationship between income and poverty, we will use median household income and percentage of families in poverty. We will use the scale package to format the numbers into a human readable format. We will display these relationships for each region and then compare the relationships and regression lines among the different regions.
cleanohio %>%
filter(med_hh_incom < 80000) %>%
ggplot(aes(x = med_hh_incom, y = pct_fams_pov, color = region)) +
geom_point() +
scale_x_continuous(labels = scales::label_comma()) +
geom_smooth() +
facet_wrap(~region) +
labs(title = "Income vs Poverty Rates in Ohio Regions",
x = "Median Household Income",
y = "Percentage of Families in Poverty",
caption = "Figure 3. Counties with lower median incomes tend to have \n higher poverty rates. The Southeast region has the highest concentration \n of individual counties in the lowest income range.")
Counties in the Southeastern region have the lowest median household incomes and the highest poverty rates.
Poverty in Ohio in concentrated in the Southeast region, which has higher poverty rates and lower incomes compared to other regions in Ohio. As such, economic challenges in Ohio are not evenly distributed and further investigation is needed to determine appropriate action.