Statistical Analysis: Ohio Senior Population

Introduction

Research Question: Is the proportion of the population in Ohio that is over the age of 65 significantly greater than 16% (a common benchmark for an “aging population”)?

In this analysis, I will be investigating whether the elderly population (65 years and above) in the state of Ohio is more than the 16% mark which is widely used to mark an aging population. The demographic make up of the population of a state is very important in policy planning, distribution of health facilities and economic projections. When the number of senior people in Ohio is much higher than this level, it can suggest that the number of people in need of healthcare services, senior housing, and social programs of the age category increases.

The data to be analyzed in this study is the county complete dataset from OpenIntro. It contains detailed demographic, economic, and social statistics of 3142 counties in the United States. There are 188 variables represented in the dataset with different aspects of county-level characteristics. In this analysis, we will specifically consider the counties of Ohio and analyze the following main variables:

state: The state name (filtered to “Ohio”)

name: County name
pop2017: Population in 2017
age_over_65_2017: Percent of population over 65 years old in 2017

The dataset contains 88 counties in Ohio (rows) with the relevant variables mentioned above. The data was collected from the U.S. Census Bureau and compiled by OpenIntro for educational purposes.

Dataset Source: OpenIntro Data Sets - County Complete

Data Analysis

In this section, I will use various dplyr functions to manipulate and summarize the data and to to display the distribution of the percentage of population over 65 across Ohio counties. The data cleaning procedure consisted of verifying presence of missing values in the variable age_over_65_2017 which showed 0 missing values. This implies that there was complete data on our key variable in all 88 counties in Ohio. Following the filtering of Ohio counties and the selection of 7 variables, including name, state, pop_2017, age_over_65_2017, medianage_2017, poverty_2017, and median_household_income_2017, the resulting cleaned dataset has 88 counties that were filtered by 7 variables each. The missing data did not require the removal of any rows, and therefore the analysis will be conducted on a complete and credible dataset.

# Load required libraries
library(dplyr)
library(ggplot2)

# Load the county_complete dataset
setwd("C:/Users/user/Downloads")
county_complete <- read.csv("county_complete.csv")

# Filter for Ohio counties only
ohio_data <- county_complete %>%
  filter(state == "Ohio") %>%
  select(name, state, pop2017, age_over_65_2017, median_age_2017, 
         poverty_2017, median_household_income_2017)

# Display the first few rows
head(ohio_data)

##               name state pop2017 age_over_65_2017 median_age_2017 poverty_2017
## 1     Adams County  Ohio   27726             16.6            42.2         23.8
## 2     Allen County  Ohio  103198             16.3            38.6         15.0
## 3   Ashland County  Ohio   53628             17.7            40.4         14.2
## 4 Ashtabula County  Ohio   97807             17.6            42.5         19.8
## 5    Athens County  Ohio   66597             11.7            28.6         30.2
## 6  Auglaize County  Ohio   45778             17.0            41.1          9.0
##   median_household_income_2017
## 1                        36320
## 2                        47905
## 3                        50893
## 4                        43017
## 5                        37191
## 6                        59516

# Check for missing values in the key variable
sum(is.na(ohio_data$age_over_65_2017))

## [1] 0

# Remove any rows with missing values in age_over_65_2017
ohio_data_clean <- ohio_data %>%
  filter(!is.na(age_over_65_2017))

# Display dimensions of cleaned data
cat("Number of Ohio counties after cleaning:", nrow(ohio_data_clean), "\n")

## Number of Ohio counties after cleaning: 88

cat("Number of variables:", ncol(ohio_data_clean), "\n")

## Number of variables: 7

# Calculate summary statistics for age_over_65_2017
ohio_data_clean %>%
  summarise(
    Mean = mean(age_over_65_2017),
    Median = median(age_over_65_2017),
    SD = sd(age_over_65_2017),
    Min = min(age_over_65_2017),
    Max = max(age_over_65_2017),
    Q1 = quantile(age_over_65_2017, 0.25),
    Q3 = quantile(age_over_65_2017, 0.75)
  )

##      Mean Median       SD  Min  Max   Q1     Q3
## 1 16.9375     17 2.298191 11.2 25.1 15.8 18.025

# Identify counties with highest percentage of seniors
ohio_data_clean %>%
  arrange(desc(age_over_65_2017)) %>%
  select(name, age_over_65_2017, pop2017) %>%
  head(10)

##                name age_over_65_2017 pop2017
## 1      Noble County             25.1   14406
## 2     Ottawa County             22.8   40657
## 3     Monroe County             21.7   13946
## 4       Erie County             20.2   74817
## 5   Harrison County             20.1   15216
## 6  Jefferson County             20.1   66359
## 7   Trumbull County             19.9  200380
## 8   Crawford County             19.8   41746
## 9   Mahoning County             19.6  229796
## 10    Morgan County             19.6   14709

The 10 Ohio counties with the biggest percentage of seniors stand at 19% or above, with the highest of 25.1%, in Noble County, which is far far beyond the 16% aging standard. It is interesting to note that these are largely rural counties implying out-migration of the youth or aging-in-place of the elderly. This geographic clustering means that the resources needs related to aging might be non uniformly distributed throughout the state.

# Create a histogram of age_over_65_2017 distribution
ggplot(ohio_data_clean, aes(x = age_over_65_2017)) +
  geom_histogram(binwidth = 1.5, fill = "steelblue", color = "black", alpha = 0.7) +
  geom_vline(xintercept = 16, color = "red", linetype = "dashed", size = 1.2) +
  geom_vline(xintercept = mean(ohio_data_clean$age_over_65_2017), 
             color = "darkgreen", linetype = "dashed", size = 1.2) +
  labs(
    title = "Distribution of Senior Population (65+) Across Ohio Counties",
    subtitle = "Red line: 16% benchmark | Green line: Ohio mean",
    x = "Percentage of Population Over 65 (%)",
    y = "Number of Counties"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5))

The population structure of the senior population in Ohio is approximately a bell and concentration of 17%, which is slightly higher than the population structure of 16% benchmark (red line) with most of the 88 counties of Ohio falling between 14% percent and 20%. It is important to note that some 30 counties are highest in the range of 16-17.5% which shows that majority of the Ohio counties are already at or beyond the aging point. The distribution is slightly skewed to the right, with several rural counties going up to 25%, indicating that there is a high geographic difference in aging processes among the state.

# Create a bar chart showing top 15 counties with highest senior population
ohio_data_clean %>%
  arrange(desc(age_over_65_2017)) %>%
  head(15) %>%
  ggplot(aes(x = reorder(name, age_over_65_2017), y = age_over_65_2017)) +
  geom_bar(stat = "identity", fill = "coral", alpha = 0.8) +
  geom_hline(yintercept = 16, color = "red", linetype = "dashed", size = 1) +
  coord_flip() +
  labs(
    title = "Top 15 Ohio Counties by Senior Population Percentage",
    subtitle = "Red line indicates 16% benchmark",
    x = "County",
    y = "Percentage of Population Over 65 (%)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5))

There is a strong geographical disparity in the aging population in the state of Ohio, with all the leading 15 counties being well above the 16% mark. The county of Noble has the top rank of 25.1% followed by , Ottawa county has the second rank of 22.8% . These counties are mostly rural and are found in eastern Ohio and northern Ohio, implying that there is regional concentration of old age population. Every 15 counties exceeds 19% which means that some regions experience more serious issues in aging that others. This increased number in the rural counties could indicate that the young population is moving to the urban areas to have jobs, and they left behind the aging population that needs specific healthcare and social service planning.

Statistical Analysis

A Single Proportion Hypothesis Test was conducted to determine if the proportion of Ohio’s population over 65 is significantly greater than 16%.

Hypotheses

Define the hypotheses as follows:

Null Hypothesis (H₀): p = 0.16 (The proportion of seniors in Ohio is 16%)
Alternative Hypothesis (H₁): p > 0.16 (The proportion of seniors in Ohio is greater than 16%)

Where p represents the true proportion of Ohio’s population that is over 65 years old.

This is a one-tailed (right-tailed) test because we are specifically testing if the proportion is greater than 16%, not simply different from 16%.

Test Parameters

Significance Level (α): 0.05
Test Type: One-sample proportion z-test (right-tailed)

# Calculate the overall proportion for Ohio
# We need to calculate weighted mean based on county populations

# Calculate total population and total seniors
total_population <- sum(ohio_data_clean$pop2017)
total_seniors <- sum(ohio_data_clean$pop2017 * ohio_data_clean$age_over_65_2017 / 100)

# Calculate the proportion
p_hat <- total_seniors / total_population

cat("Sample proportion (p-hat):", round(p_hat, 4), "\n")

## Sample proportion (p-hat): 0.1586

cat("Sample proportion as percentage:", round(p_hat * 100, 2), "%\n")

## Sample proportion as percentage: 15.86 %

# Perform one-sample proportion test
# H0: p = 0.16
# H1: p > 0.16

# Using prop.test for one-tailed test
prop_test_result <- prop.test(
  x = total_seniors,
  n = total_population,
  p = 0.16,
  alternative = "greater",
  conf.level = 0.95
)

# Display the results
prop_test_result

## 
##  1-sample proportions test with continuity correction
## 
## data:  total_seniors out of total_population, null probability 0.16
## X-squared = 163.18, df = 1, p-value = 1
## alternative hypothesis: true p is greater than 0.16
## 95 percent confidence interval:
##  0.1584524 1.0000000
## sample estimates:
##         p 
## 0.1586284

# Extract key statistics
test_statistic <- prop_test_result$statistic
p_value <- prop_test_result$p.value
ci_lower <- prop_test_result$conf.int[1]

# Display formatted results
cat("\n=== Hypothesis Test Results ===\n")

## 
## === Hypothesis Test Results ===

cat("Test Statistic (Chi-squared):", round(test_statistic, 4), "\n")

## Test Statistic (Chi-squared): 163.1828

cat("P-value:", format.pval(p_value, digits = 4), "\n")

## P-value: 1

cat("95% Confidence Interval Lower Bound:", round(ci_lower, 4), "\n")

## 95% Confidence Interval Lower Bound: 0.1585

cat("Sample Proportion:", round(p_hat, 4), "\n")

## Sample Proportion: 0.1586

cat("Significance Level (α):", 0.05, "\n\n")

## Significance Level (α): 0.05

# Decision
if (p_value < 0.05) {
  cat("Decision: REJECT the null hypothesis\n")
  cat("Conclusion: There is sufficient evidence to conclude that the proportion\n")
  cat("of Ohio's population over 65 is significantly greater than 16%.\n")
} else {
  cat("Decision: FAIL TO REJECT the null hypothesis\n")
  cat("Conclusion: There is insufficient evidence to conclude that the proportion\n")
  cat("of Ohio's population over 65 is significantly greater than 16%.\n")
}

## Decision: FAIL TO REJECT the null hypothesis
## Conclusion: There is insufficient evidence to conclude that the proportion
## of Ohio's population over 65 is significantly greater than 16%.

Interpretation

Sample Proportion: The statewide proportion of Ohio population aged 65 and above is 15.86% which is a little below 16% standard. Although the county-level mean is 17.08% the population-weighted computation shows that the larger urban counties of Ohio (with younger demographics) make the whole state proportion under the threshold.

P-value: The p-value of 1.0 is overwhelming against our alternative hypothesis. This p-value indicates that it has practically no chance that the real proportion of Ohio is above 16%. Indeed, the statistics indicate that the population of seniors is a bit below the norm in Ohio.

Statistical Significance: We do not reject the null hypothesis because the p-value (1.0) is more than our significance level (a = 0.05). No statistical facts indicate that the aged population in Ohio is more than 16 per cent.

Although the numbers of older populations are high in many individual counties (some even surpassing 20% in rural counties), the total population in Ohio is not yet at the level of the aging population as properly weighted by the population size. The more urban localities in the state, Franklin (Columbus), Cuyahoga (Cleveland) and Hamilton (Cincinnati) are relatively young and neutralize the elderly rural counties. This gives a fine image: even though there are some areas in which there are acute aging issues, the state is only slightly below the 16% mark at 15.86%.

Conclusion and Future Directions

Summary of Key Findings

This exploratory data analysis showed that there was a significant variation in the Ohio counties with a mean county level percentage of 17.08% (SD=2.39%) with the lowest county of 11.7% (Athens County) and the highest county of 25.1% (Noble County). But on the basis of the computation of the actual statewide percentage population weighted, we discovered that 15.86% of the population in Ohio is above 65 years of age, just below the 16% mark.

The single proportion hypothesis test test statistic was 163.18(χ²) with a p-value of 1.0 which means that we do not reject the null hypothesis. We find that the statistical evidence about the fact that the population of seniors is more than 16 percent in Ohio is not substantial enough. The confidence interval of 95 percent (15.85%, 100%) validates that there is a likelihood that the true proportion of Ohio is not above the benchmark.

Implications

Regional Healthcare Planning: Rural counties with 20%+ senior populations require immediate attention for geriatric care, while urban areas need different resource allocation strategies.
Economic Policy: The concentration of seniors in economically struggling rural counties creates compounded challenges—aging populations with limited tax bases to fund needed services.

Future Research Directions

1`. Urban vs. Rural Comparative Study: Explicitly compare resource needs, healthcare utilization, and economic impacts between Ohio’s aging rural counties and younger urban centers.

Migration Pattern Analysis: Investigate whether high rural senior percentages result from aging-in-place, in-migration of retirees, or out-migration of working-age residents.
Economic Impact Assessment: Analyze the relationship between senior population percentage, median household income, and poverty rates (variables available in our dataset) to understand economic sustainability in aging counties.
Multi-State Comparison: Compare Ohio to neighboring Midwest states (Pennsylvania, Michigan, Indiana) to contextualize findings within regional demographic trends.

References

OpenIntro Statistics. (2017). County Complete Dataset. Retrieved from https://www.openintro.org/data/?data=county_complete