Health Insurance Coverage in Chicago: A Story of Disparities

Introduction

This visualization portfolio explores health insurance coverage patterns across Chicago’s census tracts, examining the relationship between uninsured rates and socioeconomic factors. The analysis combines spatial mapping with statistical visualization to reveal patterns of healthcare access inequality across the city.

Research Question

Primary Question: How do socioeconomic factors influence health insurance coverage across Chicago’s communities?

This question is explored through three analytical perspectives:

Spatial Patterns: Where are uninsured populations concentrated?
Economic Relationships: How do hardship index and median income correlate with uninsured rates?
Community Factors: How do uninsured rates vary across different income brackets?

Through interactive visualization and statistical analysis, we examine these dimensions to understand healthcare access disparities in Chicago’s census tracts.

Visualization Design and Analysis

The portfolio consists of three complementary visualizations:

Interactive Choropleth Map: The primary visualization uses a color gradient from light yellow (lower uninsured rates) to red (higher uninsured rates) to display the geographic distribution of uninsured populations across Chicago’s census tracts. Interactive features allow users to explore specific areas and view detailed statistics through popups.
Correlation Analysis: Two scatter plots examine the relationships between uninsured rates and socioeconomic indicators (hardship index and median income), with trend lines and color-coding to highlight patterns.
Multi-variable Comparison: A box plot visualization shows the distribution of uninsured rates across different income categories and hardship levels, incorporating individual data points for transparency.

Feedback desired on:

- Color scheme effectiveness for colorblind accessibility

Balance between detail and clarity in the popups

- Additional relevant variables to include

- Alternative visualization methods for showing multi-variable relationships

Design Choices

Graphical Form Selection

Choropleth map: it is my primary spatial visualization, offering an intuitive geographic representation of insurance coverage patterns across Chicago on census tracts level. Its interactive features enable detailed exploration of specific neighborhoods and census tracts, making it particularly effective for identifying spatial clusters and patterns. However, this approach faces some inherent challenges: varying tract sizes can distort visual perception, especially in areas with different population densities. Dense urban areas might appear underrepresented compared to larger, less populated tracts.

Correlation plots: The two scatter plots reveal complementary relationships between uninsured rates and socioeconomic factors in Chicago’s census tracts. The first plot shows a positive correlation between hardship index and uninsured rates, with higher hardship areas generally showing higher uninsured rates. The second plot demonstrates a negative correlation between median income and uninsured rates, with wealthier areas typically having lower uninsured rates. Color gradients in both plots add a third dimension: median income (yellow to purple) in the hardship plot and hardship levels (yellow to purple) in the income plot. Notable outliers in both graphs, particularly areas with uninsured rates above 30%, suggest that while these relationships are strong, other factors also influence insurance coverage.

These visualizations face common limitations: dense clustering of points can obscure patterns in some regions, and the two-dimensional format necessarily simplifies complex community dynamics. However, together they effectively illustrate how economic factors correlate with healthcare coverage across Chicago’s neighborhoods.

Box plots: with overlaid data points offer a sophisticated way to examine distributions and identify outliers while preserving individual tract information. This approach effectively combines summary statistics with raw data, making it particularly useful for comparing groups and identifying unusual patterns. However, the visualization can become visually cluttered when displaying many data points, potentially overwhelming viewers. Its complexity may challenge general audiences, and it doesn’t capture temporal changes in the data. Despite these limitations, the combination of box plots with individual points provides valuable insights into the distribution of uninsured rates across different demographic and economic groups.

Anticipated Challenges

Key challenges in this visualization project include:

- Balancing information density with clarity in the interactive map

- Effectively communicating complex relationships between multiple variables

Data Visualization

library(tidyverse)
library(sf)
library(leaflet)
library(viridis)
library(htmltools)
library(janitor)

health_data <- read_csv("/Users/kohanchen/Documents/2025 Winter/Data Visualization/final-project-Kohaningithub/data/Chicago_Health.csv")

chicago_tracts <- st_read("/Users/kohanchen/Documents/2025 Winter/Data Visualization/final-project-Kohaningithub/data/CensusTractsTIGER2010_20250220/geo_export_842b1c34-da81-4781-b582-60884ca3136c.shp")

## Reading layer `geo_export_842b1c34-da81-4781-b582-60884ca3136c' from data source `/Users/kohanchen/Documents/2025 Winter/Data Visualization/final-project-Kohaningithub/data/CensusTractsTIGER2010_20250220/geo_export_842b1c34-da81-4781-b582-60884ca3136c.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 801 features and 9 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.94025 ymin: 41.64429 xmax: -87.52366 ymax: 42.02392
## Geodetic CRS:  WGS84(DD)

# Clean data 
health_data_clean <- health_data %>%
  janitor::clean_names() %>%
  # Skip the metadata rows
  slice(5:n()) %>%
  # Select and rename relevant columns
  select(
    tract_id = geoid,
    uninsured = uns_2019_2023,
    hardship = hdx_2019_2023,
    median_income = inc_2019_2023
  ) %>%
  # Convert columns to numeric
  mutate(
    across(c(uninsured, hardship, median_income), 
           ~as.numeric(str_replace_all(.x, "[%,]", "")))
  ) %>%
  # Remove NA
  filter(!is.na(uninsured))

# Join data with shapefile
# The census tract ID in the shapefile is in 'geoid10' column
chicago_health <- chicago_tracts %>%
  left_join(health_data_clean, by = c("geoid10" = "tract_id"))

Interactive Choropleth Map

# Create a custom color palette from orange (good) to red (concerning)
pal <- colorNumeric(
  palette = "OrRd",  # Orange to Red palette
  domain = chicago_health$uninsured,
  reverse = FALSE
)

map <- leaflet(chicago_health) %>%
  # Use CartoDB positron basemap for better visibility
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(
    fillColor = ~pal(uninsured),
    weight = 1,
    opacity = 1,
    color = "white",
    dashArray = "3",
    fillOpacity = 0.7,
    highlightOptions = highlightOptions(
      weight = 2,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.7,
      bringToFront = TRUE),
    popup = ~paste0(
      "<strong>Census Tract: </strong>", geoid10, "<br>",
      "<strong>Uninsured Rate: </strong>", 
      ifelse(!is.na(uninsured), paste0(round(uninsured, 1), "%"), "No data"), "<br>",
      "<strong>Hardship Index: </strong>", 
      ifelse(!is.na(hardship), round(hardship, 1), "No data"), "<br>",
      "<strong>Median Income: </strong>", 
      ifelse(!is.na(median_income), 
             paste0("$", format(round(median_income), big.mark=",")), 
             "No data")
    )
  ) %>%
  addLegend(
    position = "bottomright",
    pal = pal,
    values = ~uninsured,
    title = "Uninsured Rate (%)",
    opacity = 0.7,
    bins = 7  # More distinct categories
  ) %>%
  # Add zoom control and scale bar
  addScaleBar(position = "bottomleft") %>%
  addMiniMap(
    toggleDisplay = TRUE,
    position = "bottomleft"
  ) %>%
  setView(
    lng = -87.6298,  # Chicago's approximate center
    lat = 41.8781,
    zoom = 10
  )

# Display the map
map

Key Findings

The interactive choropleth map reveals striking patterns of health insurance coverage across Chicago’s census tracts. Using a color gradient from light orange (lower uninsured rates) to dark red (higher uninsured rates), the visualization shows clear geographic disparities in healthcare coverage. The western and northwestern portions of the city show notably higher uninsured rates, with some tracts experiencing rates above 25%. In contrast, the northern and eastern lakefront areas generally display lower uninsured rates, typically below 10%.

A distinct north-south divide emerges, with concentrated pockets of high uninsured rates (darker red) in several west and south side communities. The spatial clustering suggests that insurance coverage challenges are not randomly distributed but follow neighborhood boundaries and likely correlate with other socioeconomic patterns. Gray areas on the map indicate missing data or non-residential zones, providing important context for data interpretation.

The map’s scale (0-35%) effectively captures the full range of uninsured rates across the city, while the interactive features allow users to explore specific neighborhoods and their exact uninsured percentages. This geographic representation helps identify priority areas for healthcare access interventions and community outreach programs.

Correlation Analysis

# Create base theme
plot_theme <- theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, margin = margin(b = 10)),
    axis.title = element_text(margin = margin(t = 10)),
    legend.position = "right"
  )

# First plot: Uninsured Rate vs Hardship Index
hardship_plot <- ggplot(health_data_clean, aes(x = hardship, y = uninsured)) +
  geom_point(aes(color = median_income), alpha = 0.7, size = 2) +
  geom_smooth(method = "lm", color = "#7f0000", se = TRUE) +
  scale_color_viridis_c(
    labels = scales::dollar_format(),
    name = "Median Income",
    option = "plasma"
  ) +
  labs(
    x = "Hardship Index",
    y = "Uninsured Rate (%)",
    title = "Uninsured Rate vs Hardship Index"
  ) +
  plot_theme

# Display the plot
hardship_plot

# Second plot: Uninsured Rate vs Median Income
income_plot <- ggplot(health_data_clean, aes(x = median_income, y = uninsured)) +
  geom_point(aes(color = hardship), alpha = 0.7, size = 2) +
  geom_smooth(method = "lm", color = "#7f0000", se = TRUE) +
  scale_x_continuous(labels = scales::dollar_format()) +
  scale_color_viridis_c(
    name = "Hardship Index",
    option = "viridis"
  ) +
  labs(
    x = "Median Income",
    y = "Uninsured Rate (%)",
    title = "Uninsured Rate vs Median Income"
  ) +
  plot_theme

# Display the plot
income_plot

Key Findings:

Hardship Relationship: As hardship index increases from 0 to 100, uninsured rates tend to rise, with most high-hardship areas (>75) showing uninsured rates above 10%.
Income Impact: Areas with median incomes above $150,000 consistently show lower uninsured rates (<10%), while areas below $50,000 show greater variability and generally higher rates.
Outlier Patterns: Several census tracts deviate significantly from these trends, particularly in the middle-income range ($50,000-$100,000), suggesting other influential factors beyond economic measures.

Multi-variable Comparison

# Create categories for income and hardship for better visualization
health_data_viz <- health_data_clean %>%
  mutate(
    income_category = case_when(
      median_income < 40000 ~ "Under $40k",
      median_income < 80000 ~ "$40k-$80k",
      median_income < 120000 ~ "$80k-$120k",
      TRUE ~ "Over $120k"
    ),
    hardship_level = case_when(
      hardship < 25 ~ "Low",
      hardship < 50 ~ "Moderate",
      hardship < 75 ~ "High",
      TRUE ~ "Very High"
    )
  ) %>%
  mutate(
    income_category = factor(income_category, 
                           levels = c("Under $40k", "$40k-$80k", "$80k-$120k", "Over $120k")),
    hardship_level = factor(hardship_level, 
                          levels = c("Low", "Moderate", "High", "Very High"))
  )

# Create the multi-variable plot
boxplot_plot <- ggplot(health_data_viz, aes(x = income_category, y = uninsured)) +
  geom_hline(aes(linetype = "Chicago Average", 
                 yintercept = mean(health_data_viz$uninsured, na.rm = TRUE)),
             color = "gray40") +
  geom_boxplot(aes(fill = hardship_level), 
               width = 0.7,
               alpha = 0.7,
               position = position_dodge(width = 0.8)) +
  geom_point(aes(color = hardship_level),
             position = position_jitterdodge(
               dodge.width = 0.8,
               jitter.width = 0.2,
               seed = 123
             ),
             alpha = 0.4,
             size = 1) +
  scale_fill_brewer(palette = "YlOrRd") +
  scale_color_brewer(palette = "YlOrRd") +
  scale_linetype_manual(
    values = "dashed",
    labels = sprintf("Chicago Average: %.1f%%", 
                    mean(health_data_viz$uninsured, na.rm = TRUE))
  ) +
  theme_minimal() +
  labs(
    title = "Health Insurance Coverage Disparities Across Chicago",
    subtitle = "Analysis of uninsured rates across census tracts, grouped by income and hardship levels",
    x = "Median Household Income",
    y = "Uninsured Rate (%)",
    fill = "Hardship Level",
    color = "Hardship Level",
    caption = paste0(
      "Data source: Chicago Health Atlas (2019-2023)\n",
      "Each point represents a census tract (N = ", nrow(health_data_viz), " tracts).\n",
      "Box plots show the distribution: median, 25th-75th percentiles, and range.\n",
      "Hardship Index is a composite score reflecting socioeconomic conditions including:\n",
      "poverty rates, unemployment, education levels, and household crowding."
    )
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 10, color = "gray40"),
    axis.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "right",
    legend.title = element_text(face = "bold"),
    legend.box = "vertical",
    legend.spacing.y = unit(0.2, "cm"),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.margin = margin(t = 20, r = 20, b = 20, l = 20)
  ) +
  guides(
    fill = guide_legend(order = 1, title = "Hardship Level"),
    color = "none",
    linetype = guide_legend(order = 2, title = NULL)
  )

# Add summary statistics
cat("\nKey Findings:\n")

## 
## Key Findings:

cat("\n1. Income Level Analysis:\n")

## 
## 1. Income Level Analysis:

health_data_viz %>%
  group_by(income_category) %>%
  summarise(
    mean_uninsured = mean(uninsured, na.rm = TRUE),
    n_tracts = n()
  ) %>%
  arrange(desc(mean_uninsured)) %>%
  mutate(summary = sprintf("   - %s: %.1f%% average uninsured rate (%d tracts)",
                         income_category, mean_uninsured, n_tracts)) %>%
  pull(summary) %>%
  cat(sep = "\n")

##    - $40k-$80k: 12.1% average uninsured rate (362 tracts)
##    - Under $40k: 11.1% average uninsured rate (123 tracts)
##    - $80k-$120k: 8.0% average uninsured rate (186 tracts)
##    - Over $120k: 4.0% average uninsured rate (131 tracts)

cat("\n\n2. Hardship Level Analysis:\n")

## 
## 
## 2. Hardship Level Analysis:

health_data_viz %>%
  group_by(hardship_level) %>%
  summarise(
    mean_uninsured = mean(uninsured, na.rm = TRUE),
    n_tracts = n()
  ) %>%
  arrange(desc(mean_uninsured)) %>%
  mutate(summary = sprintf("   - %s hardship areas: %.1f%% uninsured (%d tracts)",
                         hardship_level, mean_uninsured, n_tracts)) %>%
  pull(summary) %>%
  cat(sep = "\n")

##    - Very High hardship areas: 12.7% uninsured (350 tracts)
##    - High hardship areas: 10.7% uninsured (149 tracts)
##    - Moderate hardship areas: 8.8% uninsured (94 tracts)
##    - Low hardship areas: 4.3% uninsured (209 tracts)

boxplot_plot

Key Findings

Income Effect: Uninsured rates decrease as income rises, with areas under $40k showing rates up to 40%, while areas over $120k typically fall below the city average (9.7%).
Hardship Impact: Within each income group, higher hardship levels (darker colors) consistently show higher uninsured rates, though this effect diminishes in higher income brackets.
Notable Variations: The $40k-$80k income bracket shows the widest range of uninsured rates, suggesting this middle-income group is most sensitive to other community factors beyond income and hardship.

# Load here package for path management
library(htmlwidgets)
library(here)

# Save the interactive map
saveWidget(map, file = here("img", "chicago_health_map.html"))

# Save the correlation plots
ggsave(
  filename = here("img", "uninsured_vs_hardship.png"),
  plot = hardship_plot,
  width = 10,
  height = 7,
  dpi = 300
)

ggsave(
  filename = here("img", "uninsured_vs_income.png"),
  plot = income_plot,
  width = 10,
  height = 7,
  dpi = 300
)

# Save the multi-variable plot
ggsave(
  filename = here("img", "insurance_disparities_boxplot.png"),
  plot = boxplot_plot,
  width = 12,
  height = 8,
  dpi = 300
)

Potential Expansion Plans (Just a plan overview not exact and complete code or graphic outputs for this proposal)

Community Context Analysis

Add demographic variables to understand who is most affected
Analyze relationship between insurance types (employment-based vs. public)
Identify successful communities with unexpectedly good coverage

health_data_clean_2 <- health_data %>%
  # Skip the metadata rows
  slice(5:n()) %>%
  # Convert columns to numeric
  mutate(across(c(`EMP_2019-2023`, `MCD_2019-2023`, 
                  `UNS_2019-2023`, `HDX_2019-2023`), 
                ~as.numeric(str_replace_all(.x, "[%,]", "")))) %>%
  # Remove NA values
  filter(!is.na(`UNS_2019-2023`))

# Example: Insurance type composition
ggplot(health_data_clean_2) +
     geom_point(aes(x = `EMP_2019-2023`, 
                    y = `MCD_2019-2023`,
                    color = `UNS_2019-2023`,
                    size = `HDX_2019-2023`)) +
     scale_color_viridis_c() +
     labs(title = "Employment vs. Medicaid Coverage")

Policy Implications
- Identify “bright spots” - areas with better-than-expected coverage
- Analyze characteristics of successful communities
- Highlight potential intervention points

# Example: Identifying outlier communities
health_data_clean %>%
  # First remove any NA values for the model
  filter(!is.na(uninsured), !is.na(hardship), !is.na(median_income)) %>%
  # Then create the model and predictions
  mutate(
    expected_uninsured = predict(
      lm(uninsured ~ hardship + median_income, data = .)),
    coverage_success = uninsured - expected_uninsured
  ) %>%
  arrange(coverage_success) %>%
  select(tract_id, uninsured, expected_uninsured, coverage_success, 
         hardship, median_income) %>%
  # Show top 10 over-performing and under-performing tracts
  slice(c(1:10, (n()-9):n())) %>%
  # Round numeric columns for readability
  mutate(across(where(is.numeric), round, 2))

## # A tibble: 20 × 6
##    tract_id uninsured expected_uninsured coverage_success hardship median_income
##    <chr>        <dbl>              <dbl>            <dbl>    <dbl>         <dbl>
##  1 1703128…      0.78              13.0             -12.2     90.0         31500
##  2 1703171…      1.5               13.6             -12.1     95.9         28287
##  3 1703183…      2.41              13.8             -11.4     98.6         14074
##  4 1703127…      2.69              13.9             -11.2     99.1         24375
##  5 1703173…      1.35              12.4             -11.1     84.0         57200
##  6 1703183…      2.36              13.2             -10.9     91.9         45234
##  7 1703140…      3.07              13.7             -10.6     97.4         22648
##  8 1703142…      1.39              12.0             -10.6     80.2         38605
##  9 1703171…      0.85              11.4             -10.5     74.2         52885
## 10 1703138…      2.11              12.5             -10.4     86.7         18690
## 11 1703114…     29.0               11.6              17.3     75.8         67321
## 12 1703158…     30.8               13.4              17.4     92.6         67679
## 13 1703177…     31.6               13.6              18.0     95.0         59911
## 14 1703152…     31.9               13.9              18.1     97.3         58713
## 15 1703130…     32.2               13.7              18.4     97.2         31161
## 16 1703184…     32.0               13.2              18.8     91.3         60741
## 17 1703123…     33.4               13.6              19.9     94.4         57220
## 18 1703125…     31.4               11.4              19.9     74.2         64309
## 19 1703114…     30.5                8.77             21.7     47.0        107790
## 20 1703167…     38.8               13.6              25.3     95.2         39000

Interactive Integration
- Create linked visualizations using plotly or ggplotly
- Enable hover details for more information about specific tracts
- Add dynamic filtering options using R Markdown parameters

# Example: Interactive scatter plot
library(plotly)
p <- ggplot(health_data_clean %>% 
            filter(!is.na(uninsured), 
                   !is.na(hardship), 
                   !is.na(median_income))) +
    aes(x = hardship, y = uninsured) +  # Move base aesthetics here
    geom_point(aes(
                   color = median_income,
                   text = paste0(
                     "Census Tract: ", tract_id, "<br>",
                     "Uninsured Rate: ", round(uninsured, 1), "%<br>",
                     "Hardship Index: ", round(hardship, 1), "<br>",
                     "Median Income: $", format(round(median_income), big.mark=",")
                   ))) +
    geom_smooth(method = "lm", 
                color = "#7f0000", 
                se = TRUE) +
    scale_color_viridis_c(
      labels = scales::dollar_format(),
      name = "Median Income",
      option = "plasma"
    ) +
    theme_minimal() +
    labs(x = "Hardship Index",
         y = "Uninsured Rate (%)",
         title = "Relationship between Hardship and Uninsured Rate")

# Convert to plotly with proper tooltip formatting
ggplotly(p, tooltip = "text") %>%
  layout(hoverlabel = list(bgcolor = "white"))

Analysis

Kohan Chen

2025-02-20

Health Insurance Coverage in Chicago: A Story of Disparities

Introduction

Research Question

Visualization Design and Analysis

Design Choices

Graphical Form Selection

Anticipated Challenges

Data Visualization

Interactive Choropleth Map

Key Findings

Correlation Analysis

Key Findings:

Multi-variable Comparison

Key Findings

Potential Expansion Plans (Just a plan overview not exact and complete code or graphic outputs for this proposal)