Task: Analyze the provided obesity data and source additional data on healthcare spending to address this question.
Obesity Data: The state-level adult obesity prevalence dataset is provided in the attached file, Obesity 2024.
Per Capita Healthcare Spending Data:
Find and download state-level per capita healthcare spending data from a reliable source, such as: Kaiser Family Foundation (KFF): State Health Expenditures. US Census Bureau: Health spending data. You will assign states to quintiles based on healthcare spending levels (lowest 20%, second lowest 20%, middle 20%, second highest 20%, highest 20%).
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
library(ggplot2)
library(dplyr) # For data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readxl) # For reading Excel files
library(readr) # For reading CSV files
library(viridis)
## Loading required package: viridisLite
library(rmarkdown)
# Load the datasets
obesity_data <- read_csv("https://raw.githubusercontent.com/Heleinef/Data-Science-Master_Heleine/refs/heads/main/%20adult%20obesity%20(1).csv")
spending_data <- read_csv("https://raw.githubusercontent.com/Heleinef/Data-Science-Master_Heleine/refs/heads/main/state%20healthcare%20spending.csv")
# Check column names
colnames(obesity_data)
## [1] "Rank" "State" "Obesity %"
# Rename the column using colnames() directly
colnames(obesity_data)[colnames(obesity_data) == "Obesity %"] <- "Obesity_Rate"
# Convert Obesity_Rate column to numeric
obesity_data <- obesity_data %>%
mutate(Obesity_Rate = as.numeric(gsub("%", "", Obesity_Rate))) # Remove percentage sign
# Clean and prepare the healthcare spending dataset
colnames(spending_data)[colnames(spending_data) == "Total Health Spending"] <- "Total_Health_Spending"
spending_data <- spending_data %>%
mutate(Total_Health_Spending = as.numeric(gsub("[\\$,]", "", Total_Health_Spending))) # Remove currency symbols
# Merge datasets using the State column
merged_data <- inner_join(obesity_data, spending_data, by = "State")
# Arrange by Total_Health_Spending in descending order
merged_data <- merged_data %>%
arrange(desc(Total_Health_Spending))
# Display the top rows
head(merged_data)
# Assign quintiles based on per capita healthcare spending
merged_data <- merged_data %>%
mutate(Spending_Quintile = ntile(Total_Health_Spending, 5))
head(merged_data)
# Create the box plot
ggplot(merged_data, aes(x = factor(Spending_Quintile), y = Obesity_Rate, fill = factor(Spending_Quintile))) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE) +
labs(
title = "Obesity Prevalence Across Spending Quintiles",
x = "Healthcare Spending Quintile",
y = "Adult Obesity Rate (%)",
fill = "Spending Quintile"
) +
theme_minimal()
This visualization strongly suggests a correlation between healthcare spending and obesity rates, with higher investment in healthcare potentially leading to better health outcomes.
The scatter plot confirms the findings revealed and conveyed by the above boxplot
# Create the scatter plot
ggplot(merged_data, aes(x = Total_Health_Spending, y = Obesity_Rate, color = factor(Spending_Quintile))) +
geom_point(size = 3, alpha = 0.8) + # Scatter points with transparency
geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") + # Add a trend line
scale_color_viridis(discrete = TRUE) + # Use an accessible color palette
labs(
title = "Scatter Plot of Healthcare Spending vs. Adult Obesity Prevalence",
x = "Per Capita Healthcare Spending (USD)",
y = "Adult Obesity Rate (%)",
color = "Spending Quintile"
) +
theme_minimal()
The box plot reveals several key insights regarding the relationship between healthcare spending quintiles and obesity rates across various states:
Trend Across Quintiles:
Higher Spending Quintiles (4th and 5th) tend to show lower median obesity rates, indicating that States with higher healthcare expenditures may be more effective at addressing obesity.
Lower Spending Quintiles (1st and 2nd) show higher median obesity rates, suggesting that insufficient healthcare funding may contribute to higher obesity prevalence.
Variability: The spread (interquartile range) for each quintile is quite variable. However, the upper quintiles (4th and 5th) show more concentrated data, suggesting that higher spending areas have more consistent outcomes in reducing obesity rates.
The lower quintiles show larger variation in obesity rates, which may indicate inconsistent or less effective healthcare policies in these regions.
Outliers: Notable outliers are present in the lower spending quintiles, with some regions showing much higher obesity rates than expected. These outliers highlight areas that may require targeted interventions to address healthcare gaps.
Actionable Insights: - Invest in healthcare funding in lower-spending areas to address obesity rates effectively.