Introduction

This report analyzes whether higher per capita healthcare spending is associated with reduced adult obesity prevalence at the state level. The purpose is to see if healthcare investment correlates with improved public health outcomes related to obesity.

Loada and Inspect Healthcare Spending and adult obesity prevalence Datasets

Hc_Spend <- read_excel("HcSpend.xlsx")

adult_obesity <- read_excel("adult obesity.xlsx")

head(Hc_Spend)

## # A tibble: 6 × 2
##   Location      `2020__Health Spending per Capita`
##   <chr>                                      <dbl>
## 1 United States                              10191
## 2 Alabama                                     9280
## 3 Alaska                                     13642
## 4 Arizona                                     8756
## 5 Arkansas                                    9338
## 6 California                                 10299

head(adult_obesity)

## # A tibble: 6 × 3
##    Rank State         `Obesity %`
##   <dbl> <chr>               <dbl>
## 1     1 West Virginia       0.412
## 2     2 Louisiana           0.399
## 3     2 Mississippi         0.401
## 4     3 Arkansas            0.4  
## 5     5 Alabama             0.392
## 6     6 Oklahoma            0.387

Data Preparation and Cleaning

str(Hc_Spend)

## tibble [52 × 2] (S3: tbl_df/tbl/data.frame)
##  $ Location                        : chr [1:52] "United States" "Alabama" "Alaska" "Arizona" ...
##  $ 2020__Health Spending per Capita: num [1:52] 10191 9280 13642 8756 9338 ...

Hc_Spend_clean <- Hc_Spend %>%
  rename(Per_Capita_Spend = `2020__Health Spending per Capita`) %>%
  select(Location, Per_Capita_Spend)


str(adult_obesity$`Obesity %`)

##  num [1:51] 0.412 0.399 0.401 0.4 0.392 0.387 0.378 0.378 0.376 0.366 ...

adult_obesity <- adult_obesity %>%
  rename(
    Location = State,
    Obesity_Percent = `Obesity %`
  )

Merge datasets

merged_data <- inner_join(adult_obesity, Hc_Spend_clean, by = "Location")


if (max(merged_data$Obesity_Percent, na.rm = TRUE) <= 1) {
  merged_data <- merged_data %>%
    mutate(Obesity_Percent = Obesity_Percent * 100)
}

# Merge Dataset
merged_data <- merged_data %>%
  mutate(
    Spending_Quintile = ntile(Per_Capita_Spend, 5),
    Spending_Quintile = factor(
      Spending_Quintile,
      levels = 1:5,
      labels = c("Lowest 20%", "Low", "Middle", "High", "Highest 20%")
    )
  )

# Inspect final merged data
head(merged_data)

## # A tibble: 6 × 5
##    Rank Location      Obesity_Percent Per_Capita_Spend Spending_Quintile
##   <dbl> <chr>                   <dbl>            <dbl> <fct>            
## 1     1 West Virginia            41.2            12769 Highest 20%      
## 2     2 Louisiana                39.9            10515 High             
## 3     2 Mississippi              40.1             9394 Low              
## 4     3 Arkansas                 40               9338 Low              
## 5     5 Alabama                  39.2             9280 Low              
## 6     6 Oklahoma                 38.7             9444 Low

print(merged_data)

## # A tibble: 51 × 5
##     Rank Location      Obesity_Percent Per_Capita_Spend Spending_Quintile
##    <dbl> <chr>                   <dbl>            <dbl> <fct>            
##  1     1 West Virginia            41.2            12769 Highest 20%      
##  2     2 Louisiana                39.9            10515 High             
##  3     2 Mississippi              40.1             9394 Low              
##  4     3 Arkansas                 40               9338 Low              
##  5     5 Alabama                  39.2             9280 Low              
##  6     6 Oklahoma                 38.7             9444 Low              
##  7     7 Indiana                  37.8            10517 High             
##  8     7 Iowa                     37.8             9789 Low              
##  9     8 Tennessee                37.6             9336 Low              
## 10     9 Nebraska                 36.6            10514 Middle           
## # ℹ 41 more rows

Visual of Boxplot of Obesity % by Spending Quintile

ggplot(merged_data, aes(x = Spending_Quintile, y = Obesity_Percent, fill = Spending_Quintile)) +
  geom_boxplot() +
  scale_fill_viridis(discrete = TRUE, option = "D") +
  labs(
    title = "Obesity Prevalence by Healthcare Spending Quintile",
    x = "Healthcare Spending Quintile",
    y = "Adult Obesity Rate (%)",
    fill = "Spending Quintile"
  ) +
  theme_minimal(base_size = 14)

Interpretation

The boxplot illustrates a nonlinear relationship between healthcare costs and obesity rates. While states in the top expenditure quintet had lower median obesity rates and a broader range of outcomes, the middle quintiles have similar obesity levels, and the low spending quintile has an unexpectedly greater median obesity rate than the lowest spending group. These trends indicate that, while healthcare costs may play a role, other social and economic factors have a significant impact on obesity prevalence.

Healthcare Spending and Adult Obesity: A State-Level Analysis

Woodelyne Durosier

2025-10-05