I. Introduction

Poverty in Ohio is not evenly distributed across the state. In this report, we explore data from the American Community Survey (ACS) conducted by the U.S. Census Bureau to show how Southeastern Ohio experiences greater poverty and lower incomes than the rest of the state. First, we’ll prepare the data for analysis. Then, we’ll analyze the data.

II. Data Preparation

In this section, we load packages, read in our data, tidy columns, and save the data to a new csv.

1. Load Packages

These packages contain functions that we will use during our analysis.

library(here)
library(tidyverse)
library(skimr)
library(janitor)
library(scales)

2. Read Data

We will read the data in from a csv and store it as an object ohio. Then we display the first few rows of the table.

ohio <- read_csv(here("data", "Ohio_data.csv"))

head(ohio)
## # A tibble: 6 × 9
##    ...1 GEOid GEOId2 Geography PctFamsPov PctNoHealth MedHHIncom PCTUnemp Region
##   <dbl> <dbl>  <dbl> <chr>          <dbl>       <dbl>      <dbl>    <dbl> <chr> 
## 1     0 39001  39001 Adams           35.1        10.1      36320      7.5 South…
## 2     1 39003  39003 Allen           24.3         7.9      47905      7.2 North…
## 3     2 39005  39005 Ashland         26           9.5      50893      4.2 North…
## 4     3 39007  39007 Ashtabula       38.9        10        43017      7.4 North…
## 5     4 39009  39009 Athens          35.1         6.9      37191      8.7 South…
## 6     5 39011  39011 Auglaize        17.7         4.3      59516      3.4 South…

3. Tidy Columns in the Data

Next, we will clean the column names to make them more human readable. We will reformat the geo_id column and rename the Geography column to county. We will also select a subset of the data to work with. The first few rows of the cleaned data are displayed.

cleanohio <- ohio %>%
  clean_names() %>%
  rename(geo_id = ge_oid) %>%
  rename(county = geography) %>%
  select(geo_id, county, region, pct_fams_pov, pct_no_health, med_hh_incom, 
         pct_unemp )

head(cleanohio)
## # A tibble: 6 × 7
##   geo_id county    region    pct_fams_pov pct_no_health med_hh_incom pct_unemp
##    <dbl> <chr>     <chr>            <dbl>         <dbl>        <dbl>     <dbl>
## 1  39001 Adams     Southeast         35.1          10.1        36320       7.5
## 2  39003 Allen     Northwest         24.3           7.9        47905       7.2
## 3  39005 Ashland   Northeast         26             9.5        50893       4.2
## 4  39007 Ashtabula Northeast         38.9          10          43017       7.4
## 5  39009 Athens    Southeast         35.1           6.9        37191       8.7
## 6  39011 Auglaize  Southwest         17.7           4.3        59516       3.4

4. Read the Data to a csv

Let’s read the cleaned data to a csv.

write.csv(cleanohio, "cleanohio_wk3.csv")

With the data cleaned and prepared, we are ready for analysis.

III. Analysis

In this section, we create 3 plots that tell the story of how poverty in Ohio is concentrated in the Southeast region of the state.

Plot 1. Poverty is higher in Southeastern Ohio.

We will start by making a simple bar chart that groups individual counties by region and then summarizes the mean poverty rate for each region.

cleanohio %>%
  group_by(region) %>%
  summarize(mean_pov = mean(pct_fams_pov)) %>%
  ggplot(aes(x = region, y = mean_pov, color = region, fill = region)) +
  geom_col() +
  labs(title = "Mean Percentage of Poverty Rates Across Ohio",
       x = "Region",
       y = "Mean Percentage of Families in Poverty",
       caption = "Figure 1. The Southeast region of Ohio had the highest mean percentage \n of families in poverty compared to other regions in Ohio.")

It’s clear that Southeastern Ohio has a higher mean poverty rate compared to other regions in Ohio. But what about individual counties? Does the distribution of poverty rates in individual counties provide more insight into this geographic pattern?

Plot 2. Southeastern Ohio has a higher concentration of high-poverty counties.

For this plot, we will look at the distribution of the poverty rates in counties across Ohio grouped by their respective regions. We’ll use a violin plot to compare the distributions of poverty rates among the different regions.

cleanohio %>%
  ggplot(aes(x = region, y = pct_fams_pov, color = region)) +
  geom_violin() +
  geom_point() +
  labs(title = "Distribution of Poverty Rates Across Ohio Regions",
       x = "Region",
       y = "Percentage of Families in Poverty",
       caption = "Figure 2. Distrubtion of the percentage of families in \n poverty across Ohio. Each point represents an individual county. The violin \n shape of each region represents the density of values within that region. \n The Southeast region has a higher concentration of high-poverty counties.")

The Southeast region shows higher poverty rates and a higher concentration of counties at the upper end of the distribution. The Northeast and Southwest regions have lower poverty rates. The Central region has the largest spread of poverty rates, and the Northeast region has multiple clusters of counties with similar poverty rates. Overall, these patterns highlight the unequal distribution of poverty across Ohio’s geographic regions. What happens when we bring another variable into the equation? Let’s take a look at household income and how it relates to poverty rates.

Plot 3. Relationship between income and poverty.

To explore the relationship between income and poverty, we will use median household income and percentage of families in poverty. We will use the scale package to format the numbers into a human readable format. We will display these relationships for each region and then compare the relationships and regression lines among the different regions.

cleanohio %>%
  filter(med_hh_incom < 80000) %>%
  ggplot(aes(x = med_hh_incom, y = pct_fams_pov, color = region)) + 
  geom_point() +
  scale_x_continuous(labels = scales::label_comma()) +
  geom_smooth() +
  facet_wrap(~region) +
  labs(title = "Income vs Poverty Rates in Ohio Regions",
       x = "Median Household Income",
       y = "Percentage of Families in Poverty",
       caption = "Figure 3. Counties with lower median incomes tend to have \n higher poverty rates. The Southeast region has the highest concentration \n of individual counties in the lowest income range.")

Counties in the Southeastern region have the lowest median household incomes and the highest poverty rates.

IV. Conclusion

Poverty in Ohio in concentrated in the Southeast region, which has higher poverty rates and lower incomes compared to other regions in Ohio. As such, economic challenges in Ohio are not evenly distributed and further investigation is needed to determine appropriate action.