This visualization portfolio explores health insurance coverage patterns across Chicago’s census tracts, examining the relationship between uninsured rates and socioeconomic factors. The analysis combines spatial mapping with statistical visualization to reveal patterns of healthcare access inequality across the city.
Primary Question: How do socioeconomic factors influence health insurance coverage across Chicago’s communities?
This question is explored through three analytical perspectives:
Through interactive visualization and statistical analysis, we examine these dimensions to understand healthcare access disparities in Chicago’s census tracts.
The portfolio consists of three complementary visualizations:
Interactive Choropleth Map: The primary visualization uses a color gradient from light yellow (lower uninsured rates) to red (higher uninsured rates) to display the geographic distribution of uninsured populations across Chicago’s census tracts. Interactive features allow users to explore specific areas and view detailed statistics through popups.
Correlation Analysis: Two scatter plots examine the relationships between uninsured rates and socioeconomic indicators (hardship index and median income), with trend lines and color-coding to highlight patterns.
Multi-variable Comparison: A box plot visualization shows the distribution of uninsured rates across different income categories and hardship levels, incorporating individual data points for transparency.
Feedback desired on:
- Color scheme effectiveness for colorblind accessibility
- Additional relevant variables to include
- Alternative visualization methods for showing multi-variable relationships
Choropleth map: it is my primary spatial visualization, offering an intuitive geographic representation of insurance coverage patterns across Chicago on census tracts level. Its interactive features enable detailed exploration of specific neighborhoods and census tracts, making it particularly effective for identifying spatial clusters and patterns. However, this approach faces some inherent challenges: varying tract sizes can distort visual perception, especially in areas with different population densities. Dense urban areas might appear underrepresented compared to larger, less populated tracts.
Correlation plots: The two scatter plots reveal complementary relationships between uninsured rates and socioeconomic factors in Chicago’s census tracts. The first plot shows a positive correlation between hardship index and uninsured rates, with higher hardship areas generally showing higher uninsured rates. The second plot demonstrates a negative correlation between median income and uninsured rates, with wealthier areas typically having lower uninsured rates. Color gradients in both plots add a third dimension: median income (yellow to purple) in the hardship plot and hardship levels (yellow to purple) in the income plot. Notable outliers in both graphs, particularly areas with uninsured rates above 30%, suggest that while these relationships are strong, other factors also influence insurance coverage.
These visualizations face common limitations: dense clustering of points can obscure patterns in some regions, and the two-dimensional format necessarily simplifies complex community dynamics. However, together they effectively illustrate how economic factors correlate with healthcare coverage across Chicago’s neighborhoods.
Box plots: with overlaid data points offer a sophisticated way to examine distributions and identify outliers while preserving individual tract information. This approach effectively combines summary statistics with raw data, making it particularly useful for comparing groups and identifying unusual patterns. However, the visualization can become visually cluttered when displaying many data points, potentially overwhelming viewers. Its complexity may challenge general audiences, and it doesn’t capture temporal changes in the data. Despite these limitations, the combination of box plots with individual points provides valuable insights into the distribution of uninsured rates across different demographic and economic groups.
Key challenges in this visualization project include:
- Balancing information density with clarity in the interactive map
- Effectively communicating complex relationships between multiple variables
library(tidyverse)
library(sf)
library(leaflet)
library(viridis)
library(htmltools)
library(janitor)
health_data <- read_csv("/Users/kohanchen/Documents/2025 Winter/Data Visualization/final-project-Kohaningithub/data/Chicago_Health.csv")
chicago_tracts <- st_read("/Users/kohanchen/Documents/2025 Winter/Data Visualization/final-project-Kohaningithub/data/CensusTractsTIGER2010_20250220/geo_export_842b1c34-da81-4781-b582-60884ca3136c.shp")
## Reading layer `geo_export_842b1c34-da81-4781-b582-60884ca3136c' from data source `/Users/kohanchen/Documents/2025 Winter/Data Visualization/final-project-Kohaningithub/data/CensusTractsTIGER2010_20250220/geo_export_842b1c34-da81-4781-b582-60884ca3136c.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 801 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -87.94025 ymin: 41.64429 xmax: -87.52366 ymax: 42.02392
## Geodetic CRS: WGS84(DD)
# Clean data
health_data_clean <- health_data %>%
janitor::clean_names() %>%
# Skip the metadata rows
slice(5:n()) %>%
# Select and rename relevant columns
select(
tract_id = geoid,
uninsured = uns_2019_2023,
hardship = hdx_2019_2023,
median_income = inc_2019_2023
) %>%
# Convert columns to numeric
mutate(
across(c(uninsured, hardship, median_income),
~as.numeric(str_replace_all(.x, "[%,]", "")))
) %>%
# Remove NA
filter(!is.na(uninsured))
# Join data with shapefile
# The census tract ID in the shapefile is in 'geoid10' column
chicago_health <- chicago_tracts %>%
left_join(health_data_clean, by = c("geoid10" = "tract_id"))
# Create a custom color palette from orange (good) to red (concerning)
pal <- colorNumeric(
palette = "OrRd", # Orange to Red palette
domain = chicago_health$uninsured,
reverse = FALSE
)
map <- leaflet(chicago_health) %>%
# Use CartoDB positron basemap for better visibility
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
fillColor = ~pal(uninsured),
weight = 1,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7,
highlightOptions = highlightOptions(
weight = 2,
color = "#666",
dashArray = "",
fillOpacity = 0.7,
bringToFront = TRUE),
popup = ~paste0(
"<strong>Census Tract: </strong>", geoid10, "<br>",
"<strong>Uninsured Rate: </strong>",
ifelse(!is.na(uninsured), paste0(round(uninsured, 1), "%"), "No data"), "<br>",
"<strong>Hardship Index: </strong>",
ifelse(!is.na(hardship), round(hardship, 1), "No data"), "<br>",
"<strong>Median Income: </strong>",
ifelse(!is.na(median_income),
paste0("$", format(round(median_income), big.mark=",")),
"No data")
)
) %>%
addLegend(
position = "bottomright",
pal = pal,
values = ~uninsured,
title = "Uninsured Rate (%)",
opacity = 0.7,
bins = 7 # More distinct categories
) %>%
# Add zoom control and scale bar
addScaleBar(position = "bottomleft") %>%
addMiniMap(
toggleDisplay = TRUE,
position = "bottomleft"
) %>%
setView(
lng = -87.6298, # Chicago's approximate center
lat = 41.8781,
zoom = 10
)
# Display the map
map
The interactive choropleth map reveals striking patterns of health insurance coverage across Chicago’s census tracts. Using a color gradient from light orange (lower uninsured rates) to dark red (higher uninsured rates), the visualization shows clear geographic disparities in healthcare coverage. The western and northwestern portions of the city show notably higher uninsured rates, with some tracts experiencing rates above 25%. In contrast, the northern and eastern lakefront areas generally display lower uninsured rates, typically below 10%.
A distinct north-south divide emerges, with concentrated pockets of high uninsured rates (darker red) in several west and south side communities. The spatial clustering suggests that insurance coverage challenges are not randomly distributed but follow neighborhood boundaries and likely correlate with other socioeconomic patterns. Gray areas on the map indicate missing data or non-residential zones, providing important context for data interpretation.
The map’s scale (0-35%) effectively captures the full range of uninsured rates across the city, while the interactive features allow users to explore specific neighborhoods and their exact uninsured percentages. This geographic representation helps identify priority areas for healthcare access interventions and community outreach programs.
# Create base theme
plot_theme <- theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, margin = margin(b = 10)),
axis.title = element_text(margin = margin(t = 10)),
legend.position = "right"
)
# First plot: Uninsured Rate vs Hardship Index
hardship_plot <- ggplot(health_data_clean, aes(x = hardship, y = uninsured)) +
geom_point(aes(color = median_income), alpha = 0.7, size = 2) +
geom_smooth(method = "lm", color = "#7f0000", se = TRUE) +
scale_color_viridis_c(
labels = scales::dollar_format(),
name = "Median Income",
option = "plasma"
) +
labs(
x = "Hardship Index",
y = "Uninsured Rate (%)",
title = "Uninsured Rate vs Hardship Index"
) +
plot_theme
# Display the plot
hardship_plot
# Second plot: Uninsured Rate vs Median Income
income_plot <- ggplot(health_data_clean, aes(x = median_income, y = uninsured)) +
geom_point(aes(color = hardship), alpha = 0.7, size = 2) +
geom_smooth(method = "lm", color = "#7f0000", se = TRUE) +
scale_x_continuous(labels = scales::dollar_format()) +
scale_color_viridis_c(
name = "Hardship Index",
option = "viridis"
) +
labs(
x = "Median Income",
y = "Uninsured Rate (%)",
title = "Uninsured Rate vs Median Income"
) +
plot_theme
# Display the plot
income_plot
Hardship Relationship: As hardship index increases from 0 to 100, uninsured rates tend to rise, with most high-hardship areas (>75) showing uninsured rates above 10%.
Income Impact: Areas with median incomes above $150,000 consistently show lower uninsured rates (<10%), while areas below $50,000 show greater variability and generally higher rates.
Outlier Patterns: Several census tracts deviate significantly from these trends, particularly in the middle-income range ($50,000-$100,000), suggesting other influential factors beyond economic measures.
# Create categories for income and hardship for better visualization
health_data_viz <- health_data_clean %>%
mutate(
income_category = case_when(
median_income < 40000 ~ "Under $40k",
median_income < 80000 ~ "$40k-$80k",
median_income < 120000 ~ "$80k-$120k",
TRUE ~ "Over $120k"
),
hardship_level = case_when(
hardship < 25 ~ "Low",
hardship < 50 ~ "Moderate",
hardship < 75 ~ "High",
TRUE ~ "Very High"
)
) %>%
mutate(
income_category = factor(income_category,
levels = c("Under $40k", "$40k-$80k", "$80k-$120k", "Over $120k")),
hardship_level = factor(hardship_level,
levels = c("Low", "Moderate", "High", "Very High"))
)
# Create the multi-variable plot
boxplot_plot <- ggplot(health_data_viz, aes(x = income_category, y = uninsured)) +
geom_hline(aes(linetype = "Chicago Average",
yintercept = mean(health_data_viz$uninsured, na.rm = TRUE)),
color = "gray40") +
geom_boxplot(aes(fill = hardship_level),
width = 0.7,
alpha = 0.7,
position = position_dodge(width = 0.8)) +
geom_point(aes(color = hardship_level),
position = position_jitterdodge(
dodge.width = 0.8,
jitter.width = 0.2,
seed = 123
),
alpha = 0.4,
size = 1) +
scale_fill_brewer(palette = "YlOrRd") +
scale_color_brewer(palette = "YlOrRd") +
scale_linetype_manual(
values = "dashed",
labels = sprintf("Chicago Average: %.1f%%",
mean(health_data_viz$uninsured, na.rm = TRUE))
) +
theme_minimal() +
labs(
title = "Health Insurance Coverage Disparities Across Chicago",
subtitle = "Analysis of uninsured rates across census tracts, grouped by income and hardship levels",
x = "Median Household Income",
y = "Uninsured Rate (%)",
fill = "Hardship Level",
color = "Hardship Level",
caption = paste0(
"Data source: Chicago Health Atlas (2019-2023)\n",
"Each point represents a census tract (N = ", nrow(health_data_viz), " tracts).\n",
"Box plots show the distribution: median, 25th-75th percentiles, and range.\n",
"Hardship Index is a composite score reflecting socioeconomic conditions including:\n",
"poverty rates, unemployment, education levels, and household crowding."
)
) +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 10, color = "gray40"),
axis.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "right",
legend.title = element_text(face = "bold"),
legend.box = "vertical",
legend.spacing.y = unit(0.2, "cm"),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.margin = margin(t = 20, r = 20, b = 20, l = 20)
) +
guides(
fill = guide_legend(order = 1, title = "Hardship Level"),
color = "none",
linetype = guide_legend(order = 2, title = NULL)
)
# Add summary statistics
cat("\nKey Findings:\n")
##
## Key Findings:
cat("\n1. Income Level Analysis:\n")
##
## 1. Income Level Analysis:
health_data_viz %>%
group_by(income_category) %>%
summarise(
mean_uninsured = mean(uninsured, na.rm = TRUE),
n_tracts = n()
) %>%
arrange(desc(mean_uninsured)) %>%
mutate(summary = sprintf(" - %s: %.1f%% average uninsured rate (%d tracts)",
income_category, mean_uninsured, n_tracts)) %>%
pull(summary) %>%
cat(sep = "\n")
## - $40k-$80k: 12.1% average uninsured rate (362 tracts)
## - Under $40k: 11.1% average uninsured rate (123 tracts)
## - $80k-$120k: 8.0% average uninsured rate (186 tracts)
## - Over $120k: 4.0% average uninsured rate (131 tracts)
cat("\n\n2. Hardship Level Analysis:\n")
##
##
## 2. Hardship Level Analysis:
health_data_viz %>%
group_by(hardship_level) %>%
summarise(
mean_uninsured = mean(uninsured, na.rm = TRUE),
n_tracts = n()
) %>%
arrange(desc(mean_uninsured)) %>%
mutate(summary = sprintf(" - %s hardship areas: %.1f%% uninsured (%d tracts)",
hardship_level, mean_uninsured, n_tracts)) %>%
pull(summary) %>%
cat(sep = "\n")
## - Very High hardship areas: 12.7% uninsured (350 tracts)
## - High hardship areas: 10.7% uninsured (149 tracts)
## - Moderate hardship areas: 8.8% uninsured (94 tracts)
## - Low hardship areas: 4.3% uninsured (209 tracts)
boxplot_plot
Income Effect: Uninsured rates decrease as income rises, with areas under $40k showing rates up to 40%, while areas over $120k typically fall below the city average (9.7%).
Hardship Impact: Within each income group, higher hardship levels (darker colors) consistently show higher uninsured rates, though this effect diminishes in higher income brackets.
Notable Variations: The $40k-$80k income bracket shows the widest range of uninsured rates, suggesting this middle-income group is most sensitive to other community factors beyond income and hardship.
# Load here package for path management
library(htmlwidgets)
library(here)
# Save the interactive map
saveWidget(map, file = here("img", "chicago_health_map.html"))
# Save the correlation plots
ggsave(
filename = here("img", "uninsured_vs_hardship.png"),
plot = hardship_plot,
width = 10,
height = 7,
dpi = 300
)
ggsave(
filename = here("img", "uninsured_vs_income.png"),
plot = income_plot,
width = 10,
height = 7,
dpi = 300
)
# Save the multi-variable plot
ggsave(
filename = here("img", "insurance_disparities_boxplot.png"),
plot = boxplot_plot,
width = 12,
height = 8,
dpi = 300
)
Community Context Analysis
health_data_clean_2 <- health_data %>%
# Skip the metadata rows
slice(5:n()) %>%
# Convert columns to numeric
mutate(across(c(`EMP_2019-2023`, `MCD_2019-2023`,
`UNS_2019-2023`, `HDX_2019-2023`),
~as.numeric(str_replace_all(.x, "[%,]", "")))) %>%
# Remove NA values
filter(!is.na(`UNS_2019-2023`))
# Example: Insurance type composition
ggplot(health_data_clean_2) +
geom_point(aes(x = `EMP_2019-2023`,
y = `MCD_2019-2023`,
color = `UNS_2019-2023`,
size = `HDX_2019-2023`)) +
scale_color_viridis_c() +
labs(title = "Employment vs. Medicaid Coverage")
Policy Implications
# Example: Identifying outlier communities
health_data_clean %>%
# First remove any NA values for the model
filter(!is.na(uninsured), !is.na(hardship), !is.na(median_income)) %>%
# Then create the model and predictions
mutate(
expected_uninsured = predict(
lm(uninsured ~ hardship + median_income, data = .)),
coverage_success = uninsured - expected_uninsured
) %>%
arrange(coverage_success) %>%
select(tract_id, uninsured, expected_uninsured, coverage_success,
hardship, median_income) %>%
# Show top 10 over-performing and under-performing tracts
slice(c(1:10, (n()-9):n())) %>%
# Round numeric columns for readability
mutate(across(where(is.numeric), round, 2))
## # A tibble: 20 × 6
## tract_id uninsured expected_uninsured coverage_success hardship median_income
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1703128… 0.78 13.0 -12.2 90.0 31500
## 2 1703171… 1.5 13.6 -12.1 95.9 28287
## 3 1703183… 2.41 13.8 -11.4 98.6 14074
## 4 1703127… 2.69 13.9 -11.2 99.1 24375
## 5 1703173… 1.35 12.4 -11.1 84.0 57200
## 6 1703183… 2.36 13.2 -10.9 91.9 45234
## 7 1703140… 3.07 13.7 -10.6 97.4 22648
## 8 1703142… 1.39 12.0 -10.6 80.2 38605
## 9 1703171… 0.85 11.4 -10.5 74.2 52885
## 10 1703138… 2.11 12.5 -10.4 86.7 18690
## 11 1703114… 29.0 11.6 17.3 75.8 67321
## 12 1703158… 30.8 13.4 17.4 92.6 67679
## 13 1703177… 31.6 13.6 18.0 95.0 59911
## 14 1703152… 31.9 13.9 18.1 97.3 58713
## 15 1703130… 32.2 13.7 18.4 97.2 31161
## 16 1703184… 32.0 13.2 18.8 91.3 60741
## 17 1703123… 33.4 13.6 19.9 94.4 57220
## 18 1703125… 31.4 11.4 19.9 74.2 64309
## 19 1703114… 30.5 8.77 21.7 47.0 107790
## 20 1703167… 38.8 13.6 25.3 95.2 39000
Interactive Integration
# Example: Interactive scatter plot
library(plotly)
p <- ggplot(health_data_clean %>%
filter(!is.na(uninsured),
!is.na(hardship),
!is.na(median_income))) +
aes(x = hardship, y = uninsured) + # Move base aesthetics here
geom_point(aes(
color = median_income,
text = paste0(
"Census Tract: ", tract_id, "<br>",
"Uninsured Rate: ", round(uninsured, 1), "%<br>",
"Hardship Index: ", round(hardship, 1), "<br>",
"Median Income: $", format(round(median_income), big.mark=",")
))) +
geom_smooth(method = "lm",
color = "#7f0000",
se = TRUE) +
scale_color_viridis_c(
labels = scales::dollar_format(),
name = "Median Income",
option = "plasma"
) +
theme_minimal() +
labs(x = "Hardship Index",
y = "Uninsured Rate (%)",
title = "Relationship between Hardship and Uninsured Rate")
# Convert to plotly with proper tooltip formatting
ggplotly(p, tooltip = "text") %>%
layout(hoverlabel = list(bgcolor = "white"))