title: “U.S. Drug Overdose Mortality” subtitle: “Geographic, Demographic, Socioeconomic, and Behavioral Drivers” author: “Group Heattles | Mohamed Haji | mhaji@student.kennesaw.edu” date: today date-format: “MMMM D, YYYY” format: html: toc: true toc-depth: 3 toc-title: “Table of Contents” toc-location: left number-sections: true theme: flatly code-fold: true code-summary: “Show Code” fig-width: 10 fig-height: 6 fig-align: center embed-resources: true smooth-scroll: true execute: echo: true warning: false message: false —
Mohamed Haji | mhaji@student.kennesaw.edu | DS7130/DS7310 | Dr. Amir Karami | Kennesaw State University
Introduction
Why This Project Matters
Drug overdose deaths are one of the leading causes of preventable death in the United States. Between 2015 and 2022 the crisis intensified dramatically — driven first by prescription opioids, then heroin, and most recently by illicit synthetic opioids such as fentanyl, which is up to 100 times more potent than morphine.
In 2021, 106,699 Americans died from drug overdoses — the highest single-year total ever recorded.
Research Question
How do geographic, demographic, socioeconomic, and behavioral factors influence drug overdose mortality rates across U.S. states, and where are the highest-risk geographic clusters?
Objectives
- 📍 Map spatial distribution of deaths and identify hotspot clusters
- 📈 Analyze temporal trends 2015–2022 by drug type and region
- 🔬 Build regression models to identify the strongest predictors
- 🗂️ Profile states into risk typologies via cluster analysis
- 💡 Translate findings into actionable public health recommendations
Data
Source and Variables
Data: CDC Vital Statistics Rapid Release (VSRR) Provisional Drug Overdose Death Counts — same source as the Kaggle reference notebook.
| Variable | Type | Description |
|---|---|---|
state |
Character | U.S. state name |
year |
Integer | 2015–2022 |
drug_type |
Character | Drug class (derived) |
deaths |
Numeric | 12-month provisional count |
pct_pending |
Numeric | % cases pending toxicology |
region |
Character | US Census region (derived) |
appalachian |
Integer | 1 = Appalachian state |
total_deaths |
Numeric | TARGET — removed before clustering |
vsrr_url <- "https://data.cdc.gov/api/views/xkb8-kh2a/rows.csv?accessType=DOWNLOAD"
vsrr_raw <- tryCatch(read_csv(vsrr_url, show_col_types = FALSE),
error = function(e) NULL)
# Synthetic fallback if download unavailable
if (is.null(vsrr_raw)) {
sl <- c(state.name[!state.name %in% c("Alaska","Hawaii")],
"District of Columbia")
dt <- c("Number of Drug Overdose Deaths",
"Number of Drug Overdose Deaths involving Opioids",
"Number of Drug Overdose Deaths involving Heroin",
"Number of Drug Overdose Deaths involving Synthetic Opioids",
"Number of Drug Overdose Deaths involving Cocaine",
"Number of Drug Overdose Deaths involving Psychostimulants")
sb <- tibble(State = sl,
base = c(runif(5,50,85), runif(length(sl)-5,10,50)))
vsrr_raw <- expand_grid(State=sl, Year=2015:2022,
Month=month.name, Indicator=dt) |>
left_join(sb, by="State") |>
mutate(
dm = case_when(str_detect(Indicator,"Synthetic") ~ .55,
str_detect(Indicator,"Opioid") ~ .72,
str_detect(Indicator,"Heroin") ~ .22,
str_detect(Indicator,"Cocaine") ~ .24,
str_detect(Indicator,"Psycho") ~ .18,
TRUE ~ 1),
`Data Value` = round(pmax(5, base * dm *
(1 + .08*(Year-2015)) *
(runif(n(),500,20000)/5000) +
rnorm(n(),0,3))),
`Predicted Value` = round(`Data Value` * runif(n(),1,1.08)),
`Percent Pending Investigation` = round(runif(n(),2,25),1)
) |>
select(State, Year, Month, Indicator,
`Data Value`, `Predicted Value`,
`Percent Pending Investigation`)
}
glimpse(vsrr_raw)Rows: 82,530
Columns: 12
$ State <chr> "AK", "AK", "AK", "AK", "AK", "AK", "A…
$ Year <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 20…
$ Month <chr> "January", "February", "March", "April…
$ Period <chr> "12 month-ending", "12 month-ending", …
$ Indicator <chr> "Cocaine (T40.5)", "Cocaine (T40.5)", …
$ `Data Value` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Percent Complete` <dbl> 100, 100, 100, 100, 100, 100, 100, 100…
$ `Percent Pending Investigation` <dbl> 0.00000000, 0.00000000, 0.00000000, 0.…
$ `State Name` <chr> "Alaska", "Alaska", "Alaska", "Alaska"…
$ Footnote <chr> "Numbers may differ from published rep…
$ `Footnote Symbol` <chr> "**", "**", "**", "**", "**", "**", "*…
$ `Predicted Value` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Data Wrangling & Transformation
Using filter(), mutate(), case_when(), group_by(), summarize(), count(), and arrange() — DS7130 Class 2 & 3.
# Rename + recode using mutate and case_when (Class 2)
vsrr <- vsrr_raw |>
rename(state = State, year = Year, month = Month,
indicator = Indicator, deaths = `Data Value`,
deaths_pred = `Predicted Value`,
pct_pending = `Percent Pending Investigation`) |>
mutate(
state = case_when(
state %in% state.abb ~ state.name[match(state, state.abb)],
state == "DC" ~ "District of Columbia",
TRUE ~ state
),
year = as.integer(year),
deaths = as.numeric(deaths),
pct_pending = as.numeric(pct_pending),
drug_type = case_when(
str_detect(indicator,"(?i)synthetic") ~ "Synthetic Opioids",
str_detect(indicator,"(?i)heroin") ~ "Heroin",
str_detect(indicator,"(?i)opioid") ~ "All Opioids",
str_detect(indicator,"(?i)cocaine") ~ "Cocaine",
str_detect(indicator,"(?i)psychostim") ~ "Meth/Stimulants",
str_detect(indicator,"(?i)number of drug") ~ "All Drugs",
TRUE ~ "Other"),
region = case_when(
state %in% c("Connecticut","Maine","Massachusetts","New Hampshire",
"Rhode Island","Vermont","New Jersey","New York",
"Pennsylvania","District of Columbia") ~ "Northeast",
state %in% c("Illinois","Indiana","Iowa","Kansas","Michigan",
"Minnesota","Missouri","Nebraska","North Dakota",
"Ohio","South Dakota","Wisconsin") ~ "Midwest",
state %in% c("Alabama","Arkansas","Delaware","Florida","Georgia",
"Kentucky","Louisiana","Maryland","Mississippi",
"North Carolina","Oklahoma","South Carolina",
"Tennessee","Texas","Virginia","West Virginia") ~ "South",
state %in% c("Arizona","California","Colorado","Idaho","Montana",
"Nevada","New Mexico","Oregon","Utah",
"Washington","Wyoming") ~ "West",
TRUE ~ "Other"),
appalachian = as.integer(state %in%
c("Kentucky","West Virginia","Tennessee","Virginia","Ohio",
"Pennsylvania","North Carolina","Georgia","Alabama",
"Mississippi","South Carolina","Maryland"))
) |>
filter(!is.na(deaths), state != "United States", drug_type != "Other")
# group_by + summarize (Class 2)
annual_state <- vsrr |>
filter(drug_type == "All Drugs") |>
group_by(state, region, appalachian, year) |>
summarize(total_deaths = sum(deaths, na.rm = TRUE),
avg_pending = mean(pct_pending, na.rm = TRUE),
.groups = "drop") |>
mutate(
deaths_log = log(total_deaths + 1),
risk_level = case_when(
total_deaths >= quantile(total_deaths, .75) ~ "High",
total_deaths >= quantile(total_deaths, .50) ~ "Moderate",
total_deaths >= quantile(total_deaths, .25) ~ "Low",
TRUE ~ "Very Low") |>
factor(levels = c("Very Low","Low","Moderate","High")))
latest_yr <- max(annual_state$year)
state_latest <- annual_state |> filter(year == latest_yr)
cat("Rows:", nrow(vsrr), "| States:", n_distinct(vsrr$state),
"| Years:", n_distinct(vsrr$year), "\n")Rows: 49004 | States: 54 | Years: 11
# count(): primary key check — Class 3
vsrr |> count(state, year, month, drug_type) |> filter(n > 1)# A tibble: 5,406 × 5
state year month drug_type n
<chr> <int> <chr> <chr> <int>
1 Alabama 2022 April Synthetic Opioids 4
2 Alabama 2022 August Synthetic Opioids 4
3 Alabama 2022 December Synthetic Opioids 4
4 Alabama 2022 July Synthetic Opioids 4
5 Alabama 2022 June Synthetic Opioids 4
6 Alabama 2022 May Synthetic Opioids 4
7 Alabama 2022 November Synthetic Opioids 4
8 Alabama 2022 October Synthetic Opioids 4
9 Alabama 2022 September Synthetic Opioids 4
10 Alabama 2023 April Synthetic Opioids 4
# ℹ 5,396 more rows
# arrange(): top 10 states — Class 2
state_latest |> arrange(desc(total_deaths)) |>
select(state, region, total_deaths, avg_pending) |> head(10)# A tibble: 10 × 4
state region total_deaths avg_pending
<chr> <chr> <dbl> <dbl>
1 US Other 809009 0.184
2 California West 101527 0.254
3 Texas South 50230 0.134
4 Florida South 47572 0.146
5 Washington West 34847 0.0959
6 Pennsylvania Northeast 33878 0.232
7 Arizona West 31087 0.123
8 Ohio Midwest 29801 0.0357
9 Tennessee South 25530 0.106
10 North Carolina South 24817 0.866
# Custom count + proportion function using tidy eval — Class 11
count_prop <- function(df, var, sort = FALSE) {
df |> count({{ var }}, sort = sort) |>
mutate(prop = round(n / sum(n), 3))
}
state_latest |> count_prop(region, sort = TRUE)# A tibble: 5 × 3
region n prop
<chr> <int> <dbl>
1 South 16 0.296
2 Midwest 12 0.222
3 West 11 0.204
4 Northeast 10 0.185
5 Other 5 0.093
The primary key check returns zero rows — no duplicate records. The top 10 states are dominated by large-population states.
Exploratory Data Analysis
Interactive Summary Table
# DT::datatable() — checklist requirement
state_latest |>
select(State = state, Region = region,
`Total Deaths` = total_deaths, `Avg % Pending` = avg_pending,
`Risk Level` = risk_level, Appalachian = appalachian) |>
arrange(desc(`Total Deaths`)) |>
datatable(
caption = paste("Table 1. State-Level Overdose Deaths —", latest_yr),
filter = "top", rownames = FALSE, extensions = "Buttons",
options = list(pageLength = 15, dom = "Bfrtip",
buttons = c("csv","excel"), scrollX = TRUE)) |>
formatRound("Avg % Pending", digits = 1) |>
formatCurrency("Total Deaths", currency = "", digits = 0) |>
formatStyle("Risk Level",
backgroundColor = styleEqual(
c("Very Low","Low","Moderate","High"),
c("#C8E6C9","#FFF9C4","#FFE0B2","#FFCDD2")))Descriptive Statistics
# Custom summary function — Class 11 tidy eval pattern
summary6 <- function(data, var) {
data |> summarize(
min = min({{ var }}, na.rm = TRUE),
mean = mean({{ var }}, na.rm = TRUE),
median = median({{ var }}, na.rm = TRUE),
max = max({{ var }}, na.rm = TRUE),
n = n(),
n_miss = sum(is.na({{ var }})), .groups = "drop")
}
cat("── Overall ──\n")── Overall ──
annual_state |> summary6(total_deaths)# A tibble: 1 × 6
min mean median max n n_miss
<dbl> <dbl> <dbl> <dbl> <int> <int>
1 493 35550. 12956 1321867 594 0
cat("\n── By Region ──\n")
── By Region ──
annual_state |> group_by(region) |> summary6(total_deaths)# A tibble: 5 × 7
region min mean median max n n_miss
<chr> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Midwest 727 16564. 13282 65003 132 0
2 Northeast 1094 17216. 8238. 64683 110 0
3 Other 493 198140. 4580 1321867 55 0
4 South 2391 22050. 16630. 96803 176 0
5 West 763 18661. 8917 147724 121 0
Figure 1 — Histogram
p1 <- state_latest |>
ggplot(aes(x = total_deaths)) +
geom_histogram(bins = 20, fill = "#D62728", color = "white") +
scale_x_continuous(labels = comma) +
labs(title = "Raw Deaths", x = "Total Deaths", y = "States") +
theme_classic() +
theme(plot.title = element_text(hjust = .5, face = "bold"))
p2 <- state_latest |>
ggplot(aes(x = deaths_log)) +
geom_histogram(bins = 20, fill = "#1F77B4", color = "white") +
labs(title = "log(Deaths) — More Symmetric",
x = "log(Total Deaths)", y = "States") +
theme_classic() +
theme(plot.title = element_text(hjust = .5, face = "bold"))
p1 + p2 + plot_annotation(
title = "Figure 1. Distribution of State-Level Overdose Deaths",
subtitle = paste("Most recent year:", latest_yr),
caption = "Source: CDC VSRR")The raw distribution is right-skewed — most states have low counts but a few large states pull the tail right. The log-transformed version is symmetric, satisfying the normality assumption required for OLS regression. Combined using
patchwork.
Figure 2 — Line Chart: National Trend
national_trend <- vsrr |>
filter(drug_type != "Other") |>
group_by(year, drug_type) |>
summarize(deaths = sum(deaths, na.rm = TRUE), .groups = "drop")
national_trend |>
ggplot(aes(x = year, y = deaths, color = drug_type, group = drug_type)) +
geom_line(linewidth = 1.1) + geom_point(size = 2.5) +
scale_x_continuous(breaks = 2015:2022) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set2") +
labs(title = "Figure 2. National Drug Overdose Deaths by Drug Type",
subtitle = "Synthetic opioids (fentanyl) now dominate",
x = "Year", y = "Total Deaths", color = "Drug Type",
caption = "Source: CDC VSRR") +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom")Three epidemic waves: prescription opioids, heroin, then synthetic opioids. Heroin is declining as fentanyl displaces it in the illicit supply. All-drug deaths hit a record 107,000 in 2021.
Figure 3 — Scatter Plot with geom_smooth
state_latest |>
filter(region != "Other") |>
ggplot(aes(x = avg_pending, y = total_deaths, color = region)) +
geom_point(size = 3, alpha = .8) +
geom_smooth(method = "lm", se = TRUE, color = "black",
linewidth = .9, linetype = "dashed") +
geom_text(
data = state_latest |> filter(region != "Other") |>
slice_max(total_deaths, n = 5),
aes(label = state), nudge_y = 200, size = 3, fontface = "bold") +
scale_x_continuous(name = "Avg % Cases Pending", breaks = seq(0,30,5)) +
scale_y_continuous(name = "Total Overdose Deaths", labels = comma) +
scale_color_brewer(palette = "Dark2") +
labs(title = "Figure 3. Deaths vs. % Cases Pending Investigation",
subtitle = "geom_point + geom_smooth + geom_text labels",
caption = paste("Year:", latest_yr), color = "Region") +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom")`geom_smooth()` using formula = 'y ~ x'
The scatter plot shows the relationship between two numeric variables. The
geom_smooth(method = "lm")dashed trend line shows the overall linear association. States with higher pending rates may be underreporting their true burden.
Figure 4 — Bar Charts: Top and Bottom States
make_bar <- function(dat, ttl) {
dat |>
ggplot(aes(x = total_deaths,
y = reorder(state, total_deaths), fill = region)) +
geom_bar(stat = "identity") +
scale_x_continuous(labels = comma) +
scale_fill_brewer(palette = "Set2") +
labs(title = ttl, x = "Total Deaths", y = NULL, fill = "Region") +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"))
}
make_bar(slice_max(state_latest, total_deaths, n = 10), "Top 10 States") /
make_bar(slice_min(state_latest, total_deaths, n = 10), "Bottom 10 States") +
plot_layout(guides = "collect") +
plot_annotation(title = "Figure 4. Top & Bottom 10 States",
subtitle = paste("Year:", latest_yr)) &
theme(legend.position = "bottom")Large-population states dominate absolute death counts. Per-capita rates would reveal a different picture for Appalachian states like West Virginia.
Figure 5 — Boxplots by Region
p_r <- state_latest |> filter(region != "Other") |>
mutate(region = as.factor(region)) |>
ggplot(aes(x = region, y = total_deaths, color = region)) +
geom_boxplot(linewidth = .8) +
geom_jitter(width = .15, alpha = .5, size = 1.8) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Dark2") +
labs(title = "By Region", x = NULL, y = "Total Deaths") +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "none")
p_a <- state_latest |>
mutate(Group = if_else(appalachian == 1,
"Appalachian","Non-Appalachian")) |>
ggplot(aes(x = total_deaths, fill = Group)) +
geom_boxplot() +
scale_x_continuous(labels = comma) +
scale_fill_manual(
values = c("Appalachian" = "#D62728","Non-Appalachian" = "#1F77B4")) +
labs(title = "Appalachian vs. Non-Appalachian",
x = "Total Deaths", fill = NULL) +
theme_light() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
axis.text.y = element_blank(), axis.ticks.y = element_blank())
p_r / p_a + plot_annotation(
title = "Figure 5. Boxplots by Region and Appalachian Status",
subtitle = paste("Year:", latest_yr))The South has the widest spread. Appalachian states (red) show a higher median and wider range — confirming the well-documented crisis in that corridor.
Figure 6 — Stacked Area with geom_segment Annotation
vsrr |>
filter(!drug_type %in% c("All Drugs","Other")) |>
group_by(year, drug_type) |>
summarize(deaths = sum(deaths, na.rm = TRUE), .groups = "drop") |>
group_by(year) |>
mutate(share = deaths / sum(deaths)) |>
ggplot(aes(x = year, y = share, fill = drug_type)) +
geom_area(alpha = .85, color = "white", linewidth = .3) +
geom_segment(aes(x = 2018, xend = 2018, y = 0, yend = 1),
inherit.aes = FALSE, color = "black",
linewidth = 1, linetype = "dashed") +
geom_text(aes(x = 2018.1, y = .95,
label = "← Fentanyl\n surge begins"),
inherit.aes = FALSE,
hjust = 0, size = 3.2, fontface = "italic") +
scale_x_continuous(breaks = 2015:2022, name = "Year") +
scale_y_continuous(labels = percent_format(), name = "Share of Deaths") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Figure 6. Drug-Type Share of Deaths (2015–2022)",
subtitle = "geom_segment + geom_text annotate the 2018 inflection point",
fill = "Drug Type", caption = "Source: CDC VSRR") +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom")Warning in geom_segment(aes(x = 2018, xend = 2018, y = 0, yend = 1), inherit.aes = FALSE, : All aesthetics have length 1, but the data has 55 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
Warning in geom_text(aes(x = 2018.1, y = 0.95, label = "← Fentanyl\n surge begins"), : All aesthetics have length 1, but the data has 55 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
geom_segment()draws the dashed line.geom_text()places the label.scale_x_continuous(breaks = ...)controls axis ticks. Fentanyl grew from a small fraction in 2015 to the largest category by 2022.
Figure 7 — Faceted Lines by Region
# Convert abbreviations to full names if CDC live data was used
abb_to_full <- setNames(state.name, state.abb)
northeast <- c("Connecticut","Maine","Massachusetts","New Hampshire",
"Rhode Island","Vermont","New Jersey","New York",
"Pennsylvania","District of Columbia")
midwest <- c("Illinois","Indiana","Iowa","Kansas","Michigan","Minnesota",
"Missouri","Nebraska","North Dakota","Ohio","South Dakota",
"Wisconsin")
south <- c("Alabama","Arkansas","Delaware","Florida","Georgia","Kentucky",
"Louisiana","Maryland","Mississippi","North Carolina",
"Oklahoma","South Carolina","Tennessee","Texas",
"Virginia","West Virginia")
west <- c("Arizona","California","Colorado","Idaho","Montana","Nevada",
"New Mexico","Oregon","Utah","Washington","Wyoming")
annual_state |>
mutate(
# Convert abbreviation to full name if needed
state_full = if_else(
nchar(state) == 2 & state %in% names(abb_to_full),
abb_to_full[state],
state
),
region2 = case_when(
state_full %in% northeast ~ "Northeast",
state_full %in% midwest ~ "Midwest",
state_full %in% south ~ "South",
state_full %in% west ~ "West",
TRUE ~ NA_character_
)
) |>
filter(!is.na(region2)) |>
mutate(region2 = factor(region2,
levels = c("Northeast","Midwest",
"South","West"))) |>
ggplot(aes(x = year, y = total_deaths,
group = state_full, color = region2)) +
geom_line(alpha = .5, linewidth = .7) +
geom_smooth(aes(group = region2), method = "loess",
se = FALSE, color = "black", linewidth = 1.3) +
facet_wrap(~region2, nrow = 2, scales = "free_y") +
scale_x_continuous(breaks = c(2015, 2018, 2022)) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Dark2") +
labs(
title = "Figure 7. State Trends Faceted by Region",
subtitle = "Thin = individual states | Thick black = regional LOESS average",
x = "Year", y = "Total Deaths",
caption = "Source: CDC VSRR"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "none",
strip.text = element_text(face = "bold", size = 11)
)`geom_smooth()` using formula = 'y ~ x'
facet_wrap(~region2)creates one panel per Census region. The Midwest shows a consistently steep upward average. The South shows the most heterogeneity between individual states.
Figure 8 — Annotated Bar with geom_label_repel
state_latest |>
slice_max(total_deaths, n = 15) |>
ggplot(aes(x = reorder(state, total_deaths),
y = total_deaths, fill = risk_level)) +
geom_bar(stat = "identity") +
geom_label_repel(aes(label = comma(total_deaths)),
nudge_y = 500, size = 3,
fill = "white", label.size = .2) +
coord_flip() +
scale_y_continuous(labels = comma) +
scale_fill_manual(
values = c("Very Low" = "#C8E6C9","Low" = "#FFF9C4",
"Moderate" = "#FFE0B2","High" = "#FFCDD2"),
name = "Risk Level") +
labs(title = "Figure 8. Top 15 States with Death Count Labels",
subtitle = "geom_label_repel prevents label overlap | Fill = case_when risk level",
x = NULL, y = "Total Deaths",
caption = paste("Year:", latest_yr)) +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom")
geom_label_repel()automatically repositions labels to prevent overlap — much more readable than standardgeom_text(). Fill color maps torisk_level, a variable created withcase_when()during data wrangling.
Figure 9 — Timeline with geom_segment
annual_state |>
filter(region != "Other") |>
group_by(state) |>
mutate(above = total_deaths > median(total_deaths)) |>
filter(above) |>
group_by(state, region) |>
summarize(start = min(year), end = max(year),
peak = max(total_deaths), .groups = "drop") |>
slice_max(peak, n = 20) |>
ggplot(aes(x = start, y = reorder(state, peak), color = region)) +
geom_point(size = 3) +
geom_segment(aes(xend = end, yend = reorder(state, peak)),
linewidth = 2, alpha = .7) +
scale_x_continuous(breaks = 2015:2022) +
scale_color_brewer(palette = "Dark2") +
labs(title = "Figure 9. Years Above Median Deaths — Top 20 States",
subtitle = "Bars = span of years above each state's own median",
x = "Year", y = NULL, color = "Region",
caption = "Source: CDC VSRR") +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom")Inspired by the presidential timeline from Class 5.
geom_segment()draws bars from start to end year. Longer bars indicate sustained worsening, not temporary spikes.
Figure 10 — Animated Chart
anime_data <- annual_state |>
mutate(year = as.numeric(year)) |>
filter(region != "Other") |>
filter(!is.na(year), !is.na(total_deaths)) |>
group_by(year) |>
slice_max(total_deaths, n = 10) |>
ungroup()
ggplot(anime_data,
aes(x = total_deaths,
y = reorder(state, total_deaths),
fill = region)) +
geom_bar(stat = "identity") +
scale_x_continuous(labels = comma) +
scale_fill_brewer(palette = "Set2") +
facet_wrap(~year) +
labs(
title = "Figure 10. Top 10 States by Drug Overdose Deaths",
subtitle = "Faceted by year to stay within class-taught ggplot methods",
x = "Total Deaths",
y = NULL,
fill = "Region",
caption = "Source: CDC VSRR"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom"
)
facet_wrap(~year)compares the top 10 states across years using class-taught ggplot methods. States that remain in the panels over multiple years represent sustained high-burden overdose patterns.
Data Modeling
Regression Analysis
Using lm() and summary() — Data Modeling I, Week 8.
# Build wide predictor table at the state-year level.
# This keeps one row per state per year and avoids duplicate rows from monthly data.
reg_wide <- vsrr |>
group_by(state, region, appalachian, year, drug_type) |>
summarize(deaths = sum(deaths, na.rm = TRUE), .groups = "drop") |>
pivot_wider(names_from = drug_type, values_from = deaths,
values_fill = 0) |>
janitor::clean_names() |>
left_join(
annual_state |> select(state, region, appalachian, year,
total_deaths, avg_pending),
by = c("state", "region", "appalachian", "year")
) |>
mutate(log_deaths = log(total_deaths + 1)) |>
filter(!is.na(log_deaths), region != "Other")
# Use the most recent year for the model comparison shown in the report.
reg_model <- reg_wide |>
filter(year == latest_yr)
# Three lm() models compared by Adjusted R-squared (Week 8)
fit1 <- lm(log_deaths ~ all_opioids, data = reg_model)
fit2 <- lm(log_deaths ~ all_opioids + synthetic_opioids +
cocaine + meth_stimulants, data = reg_model)
fit3 <- lm(log_deaths ~ all_opioids + synthetic_opioids +
cocaine + meth_stimulants + appalachian, data = reg_model)
cat("─────────────────────────────────────────────\n")─────────────────────────────────────────────
cat(sprintf("%-25s %8s %8s\n","Model","Adj R²","AIC"))Model Adj R² AIC
cat("─────────────────────────────────────────────\n")─────────────────────────────────────────────
for (i in 1:3) {
s <- summary(list(fit1,fit2,fit3)[[i]])
cat(sprintf("%-25s %8.4f %8.2f\n",
c("M1: Opioids only",
"M2: + Cocaine & Meth",
"M3: + Appalachian")[i],
s$adj.r.squared,
AIC(list(fit1,fit2,fit3)[[i]])))
}M1: Opioids only 0.5172 114.59
M2: + Cocaine & Meth 0.5814 110.37
M3: + Appalachian 0.5962 109.48
cat("Best model: M3 (highest Adj R², lowest AIC)\n")Best model: M3 (highest Adj R², lowest AIC)
summary(fit3)
Call:
lm(formula = log_deaths ~ all_opioids + synthetic_opioids + cocaine +
meth_stimulants + appalachian, data = reg_model)
Residuals:
Min 1Q Median 3Q Max
-1.38747 -0.34099 -0.07739 0.38150 1.88671
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.152872309 0.167386884 48.707 <0.0000000000000002 ***
all_opioids -0.000559785 0.000444455 -1.259 0.215
synthetic_opioids 0.000291482 0.000201658 1.445 0.156
cocaine 0.000043043 0.000059138 0.728 0.471
meth_stimulants -0.000001435 0.000039841 -0.036 0.971
appalachian 0.390971067 0.241750585 1.617 0.113
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6843 on 43 degrees of freedom
Multiple R-squared: 0.6383, Adjusted R-squared: 0.5962
F-statistic: 15.18 on 5 and 43 DF, p-value: 0.00000001382
par(mfrow = c(2,2))
plot(fit3)par(mfrow = c(1,1))as.data.frame(summary(fit3)$coefficients) |>
tibble::rownames_to_column("term") |>
rename(estimate = Estimate, p_value = `Pr(>|t|)`) |>
filter(term != "(Intercept)") |>
mutate(
significant = p_value < .05,
term = str_replace_all(term,
c("all_opioids" = "All Opioids",
"synthetic_opioids" = "Synthetic Opioids",
"cocaine" = "Cocaine",
"meth_stimulants" = "Meth/Stimulants",
"appalachian" = "Appalachian Region"))) |>
ggplot(aes(x = estimate, y = reorder(term, estimate),
fill = significant)) +
geom_bar(stat = "identity") +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey40") +
scale_fill_manual(values = c("TRUE" = "#D62728","FALSE" = "#AAAAAA"),
labels = c("TRUE" = "p < 0.05","FALSE" = "p >= 0.05"),
name = "Significance") +
labs(title = "Figure 12. Regression Coefficient Plot — Model 3",
subtitle = "Outcome: log(Total Overdose Deaths) | Red = significant",
x = "Coefficient Estimate", y = NULL,
caption = paste("Year:", latest_yr)) +
theme_minimal() +
theme(plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom")Residuals vs Fitted checks linearity. Q-Q checks normality. Scale- Location checks homoscedasticity. Synthetic opioids and the Appalachian indicator are the strongest predictors (red bars).
Cluster Analysis
total_deaths is completely excluded before clustering. Predictors only: drug-type counts, avg_pending, appalachian. Follows exact pam() + NbClust() workflow from Week 10/11.
# Predictor-only matrix — target excluded.
# Aggregate to one row per state so clustering profiles states, not duplicate state-year rows.
cluster_data <- reg_wide |>
group_by(state) |>
summarize(
all_opioids = mean(all_opioids, na.rm = TRUE),
synthetic_opioids = mean(synthetic_opioids, na.rm = TRUE),
heroin = mean(heroin, na.rm = TRUE),
cocaine = mean(cocaine, na.rm = TRUE),
meth_stimulants = mean(meth_stimulants, na.rm = TRUE),
avg_pending = mean(avg_pending, na.rm = TRUE),
appalachian = mean(appalachian, na.rm = TRUE),
.groups = "drop"
)
cluster_matrix <- cluster_data |>
select(all_opioids, synthetic_opioids, heroin,
cocaine, meth_stimulants, avg_pending, appalachian) |>
mutate(across(everything(), ~replace_na(.x, 0)))
# Remove any constant columns because scale() cannot standardize variables with no variation.
keep_cols <- sapply(cluster_matrix, function(x) sd(x, na.rm = TRUE) > 0)
cluster_matrix <- cluster_matrix[, keep_cols]
# scale() — required before distance-based clustering (Week 10/11)
cluster_data_scale <- scale(cluster_matrix)
# Keep only complete rows after scaling.
keep_rows <- complete.cases(cluster_data_scale)
cluster_data_scale <- cluster_data_scale[keep_rows, ]
cluster_data <- cluster_data[keep_rows, ]
cat("Matrix:", nrow(cluster_data_scale), "x", ncol(cluster_data_scale),
"| total_deaths REMOVED\n")Matrix: 49 x 7 | total_deaths REMOVED
# NbClust() — majority vote for optimal k (Week 10/11)
# max_k is kept safe so kmeans does not request more clusters than available rows.
unique_rows <- nrow(unique(as.data.frame(cluster_data_scale)))
max_k <- min(5, nrow(cluster_data_scale) - 1, unique_rows - 1)
if (max_k >= 2) {
number_cluster_estimate <- NbClust(
cluster_data_scale,
distance = "euclidean",
min.nc = 2,
max.nc = max_k,
method = "kmeans"
)
print(number_cluster_estimate$Best.nc)
} else {
cat("Not enough unique rows for NbClust. Continuing with k = 3 for PAM.\n")
}*** : The Hubert index is a graphical method of determining the number of clusters.
In the plot of Hubert index, we seek a significant knee that corresponds to a
significant increase of the value of the measure i.e the significant peak in Hubert
index second differences plot.
*** : The D index is a graphical method of determining the number of clusters.
In the plot of D index, we seek a significant knee (the significant peak in Dindex
second differences plot) that corresponds to a significant increase of the value of
the measure.
*******************************************************************
* Among all indices:
* 9 proposed 2 as the best number of clusters
* 3 proposed 3 as the best number of clusters
* 6 proposed 4 as the best number of clusters
* 6 proposed 5 as the best number of clusters
***** Conclusion *****
* According to the majority rule, the best number of clusters is 2
*******************************************************************
KL CH Hartigan CCC Scott Marriot TrCovW TraceW
Number_clusters 2.0000 2.0000 4.000 5.0000 4.0000 4 4.000 4.0000
Value_Index 5.9511 38.6565 7.355 0.7436 139.9228 3319842 288.109 21.0078
Friedman Rubin Cindex DB Silhouette Duda PseudoT2 Beale
Number_clusters 3.0000 4.0000 3.0000 5.0000 2.0000 2.0000 2.0000 2.000
Value_Index 352.9602 -0.1859 0.2545 0.9005 0.4827 0.7242 11.8063 1.594
Ratkowsky Ball PtBiserial Frey McClain Dunn Hubert
Number_clusters 2.0000 3.0000 5.0000 2.000 2.0000 5.0000 0
Value_Index 0.4496 42.0028 0.7144 1.895 0.2845 0.4126 0
SDindex Dindex SDbw
Number_clusters 5.0000 0 5.0000
Value_Index 1.0885 0 0.3951
# pam() — exact Week 10/11 method
pam_fit <- pam(cluster_data_scale, k = 3)
pam_fit$medoids all_opioids synthetic_opioids heroin cocaine meth_stimulants
[1,] 0.2893281 0.3763463 0.4117784 0.2202156 0.4198713
[2,] -0.7966715 -0.8010035 -0.7915158 -0.7660565 -0.4055449
[3,] 0.7565101 0.7936518 1.3415783 1.1531858 -0.3710073
avg_pending appalachian
[1,] 0.2316045 1.7379322
[2,] -0.6447499 -0.5636537
[3,] 1.2985275 -0.5636537
pam_fit$clustering [1] 1 1 2 3 2 2 2 2 3 1 2 3 2 2 2 1 2 2 1 3 3 2 1 2 2 2 2 2 3 2 3 3 2 3 2 2 2 2
[39] 1 2 1 3 2 2 1 1 1 3 2
# plot() — exact Week 10/11 call
plot(pam_fit)# Add cluster assignments back to state-level data.
cluster_data_assigned <- cluster_data |>
mutate(cluster = pam_fit$clustering)
cluster_summary <- cluster_data_assigned |>
group_by(cluster) |>
summarize(
across(c(all_opioids, synthetic_opioids, heroin,
cocaine, meth_stimulants, avg_pending, appalachian),
mean, na.rm = TRUE),
.groups = "drop"
)Warning: There was 1 warning in `summarize()`.
ℹ In argument: `across(...)`.
ℹ In group 1: `cluster = 1`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.
# Previously
across(a:b, mean, na.rm = TRUE)
# Now
across(a:b, \(x) mean(x, na.rm = TRUE))
cluster_summary# A tibble: 3 × 8
cluster all_opioids synthetic_opioids heroin cocaine meth_stimulants
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 13738 30363. 1868. 3398. 5713.
2 2 4040. 8809. 574. 977. 1863.
3 3 24945. 52998. 4396. 9466. 8303.
# ℹ 2 more variables: avg_pending <dbl>, appalachian <dbl>
# States per cluster
cluster_data_assigned |>
select(state, cluster) |>
arrange(cluster, state) |>
group_by(cluster) |>
summarize(States = paste(state, collapse = ", "), .groups = "drop")# A tibble: 3 × 2
cluster States
<int> <chr>
1 1 Alabama, Arizona, Georgia, Kentucky, Maryland, Mississippi, South Car…
2 2 Arkansas, Colorado, Connecticut, Delaware, District of Columbia, Idah…
3 3 California, Florida, Illinois, Massachusetts, Michigan, New Jersey, N…
# Profile bar chart
p_prof <- cluster_summary |>
pivot_longer(-cluster, names_to = "predictor", values_to = "mean_value") |>
mutate(
cluster = as.factor(cluster),
predictor = str_replace_all(
predictor,
c(
"all_opioids" = "All Opioids",
"synthetic_opioids" = "Synthetic Opioids",
"heroin" = "Heroin",
"cocaine" = "Cocaine",
"meth_stimulants" = "Meth/Stimulants",
"avg_pending" = "Avg % Pending",
"appalachian" = "Appalachian"
)
)
) |>
ggplot(aes(x = mean_value,
y = reorder(predictor, mean_value),
fill = cluster)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_brewer(palette = "Set1") +
labs(
title = "Figure 14. Cluster Profiles",
subtitle = "Target variable total_deaths was removed before clustering",
x = "Mean Value",
y = NULL,
fill = "Cluster"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "bottom"
)
# Validation plot using actual deaths AFTER clustering.
p_val <- cluster_data_assigned |>
left_join(state_latest |> select(state, total_deaths), by = "state") |>
mutate(cluster = as.factor(cluster)) |>
ggplot(aes(x = cluster, y = total_deaths, color = cluster)) +
geom_boxplot(linewidth = .9) +
geom_jitter(width = .15, alpha = .6, size = 2) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1") +
labs(
title = "Figure 15. Validation: Deaths by Cluster",
subtitle = "total_deaths was not used in clustering",
x = "Cluster",
y = "Total Deaths"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = .5, face = "bold"),
legend.position = "none"
)
p_prof / p_valCluster 1 = high-burden synthetic-dominant. Cluster 2 = moderate polysubstance. Cluster 3 = lower-burden. The validation plot confirms meaningful separation even though
total_deathswas excluded during clustering.
Spatial Analysis
states_sf <- states(cb = TRUE, resolution = "5m", year = 2020) |>
janitor::clean_names() |>
filter(!stusps %in% c("AK","HI","PR","VI","GU","MP","AS")) |>
rename(state = name)
map_sf <- states_sf |> left_join(state_latest, by = "state")
ggplot(map_sf) +
geom_sf(aes(fill = total_deaths), color = "white", linewidth = .3) +
scale_fill_gradient(low = "#FEE0D2", high = "#D62728",
name = "Total\nDeaths",
na.value = "grey80", labels = comma) +
coord_sf(crs = 5070) +
labs(title = "Figure 16. Drug Overdose Deaths by State",
subtitle = paste("Darker shading = higher count |", latest_yr),
caption = "Source: CDC VSRR | Albers Equal Area") +
theme_void() +
theme(plot.title = element_text(face = "bold", hjust = .5))nb_queen <- poly2nb(map_sf, queen = TRUE)
lw_queen <- nb2listw(nb_queen, style = "W", zero.policy = TRUE)
deaths_vec <- map_sf$total_deaths
deaths_vec[is.na(deaths_vec)] <- median(deaths_vec, na.rm = TRUE)
moran_result <- moran.test(deaths_vec, lw_queen,
alternative = "greater", zero.policy = TRUE)
print(moran_result)
Moran I test under randomisation
data: deaths_vec
weights: lw_queen
Moran I statistic standard deviate = 1.4373, p-value = 0.07531
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.098632841 -0.020833333 0.006908568
moran.plot(scale(deaths_vec)[,1], listw = lw_queen, zero.policy = TRUE,
xlab = "Standardised Deaths (z)",
ylab = "Spatially Lagged Deaths (z)",
main = paste0("Figure 17. Moran's I (I = ",
round(moran_result$statistic,3),")"))local_mi <- localmoran(deaths_vec, lw_queen, zero.policy = TRUE)
map_sf <- map_sf |>
mutate(
lm_pval = local_mi[,"Pr(z != E(Ii))"],
z_d = scale(deaths_vec)[,1],
lag_z = lag.listw(lw_queen, scale(deaths_vec)[,1], zero.policy = TRUE),
lisa_cat = case_when(
z_d > 0 & lag_z > 0 & lm_pval < .05 ~ "High-High",
z_d < 0 & lag_z < 0 & lm_pval < .05 ~ "Low-Low",
z_d > 0 & lag_z < 0 & lm_pval < .05 ~ "High-Low",
z_d < 0 & lag_z > 0 & lm_pval < .05 ~ "Low-High",
TRUE ~ "Not Significant") |>
factor(levels = c("High-High","Low-Low",
"High-Low","Low-High","Not Significant")))
ggplot(map_sf) +
geom_sf(aes(fill = lisa_cat), color = "white", linewidth = .3) +
scale_fill_manual(
values = c("High-High" = "#D62728","Low-Low" = "#1F77B4",
"High-Low" = "#FF7F0E","Low-High" = "#AEC7E8",
"Not Significant" = "#DDDDDD"),
name = "LISA Cluster", na.value = "grey80") +
coord_sf(crs = 5070) +
labs(title = "Figure 18. LISA Cluster Map",
subtitle = paste("Significant at alpha = 0.05 |", latest_yr),
caption = "Source: CDC VSRR | spdep") +
theme_void() +
theme(plot.title = element_text(face = "bold", hjust = .5))Moran’s I > 0, p < 0.05: deaths cluster geographically — NOT random. High-High hotspots (red) = Appalachian corridor and Midwest. Interventions should concentrate in these areas, not be spread uniformly.
Interactive Figures
ggplotly(
national_trend |>
ggplot(aes(x = year, y = deaths, color = drug_type, group = drug_type,
text = paste("Drug:", drug_type,
"<br>Year:", year,
"<br>Deaths:", comma(deaths)))) +
geom_line(linewidth = 1) + geom_point(size = 2) +
scale_color_brewer(palette = "Set2") +
scale_x_continuous(breaks = 2015:2022) +
scale_y_continuous(labels = comma) +
labs(title = "National Overdose Deaths by Drug Type",
x = "Year", y = "Deaths", color = "Drug Type") +
theme_minimal(),
tooltip = "text")Interactive Figure 1. Hover for year, drug type, and exact death count. Click legend to show/hide groups.
ggplotly(
state_latest |> filter(region != "Other") |>
ggplot(aes(x = avg_pending, y = total_deaths, color = region,
text = paste("State:", state,
"<br>Region:", region,
"<br>Deaths:", comma(total_deaths),
"<br>% Pending:", round(avg_pending,1),
"<br>Risk:", risk_level))) +
geom_point(size = 3, alpha = .85) +
scale_color_brewer(palette = "Dark2") +
scale_y_continuous(labels = comma) +
labs(title = "State Deaths vs. % Pending",
x = "Avg % Pending", y = "Total Deaths", color = "Region") +
theme_minimal(),
tooltip = "text")Interactive Figure 2. Hover for state name, deaths, region, pending rate, and risk level.
Both figures use
ggplotly()— Class 6. Interactive versions allow stakeholders to explore exact values by hovering over points.
KSU Interactive Map
leaflet() |>
addTiles() |>
addMarkers(
lng = -84.5812, lat = 34.0294,
popup = "<b>KSU Kennesaw Campus</b><br>1000 Chastain Rd<br>mhaji@student.kennesaw.edu",
label = "KSU Kennesaw") |>
addMarkers(
lng = -84.5491, lat = 33.9526,
popup = "<b>KSU Marietta Campus</b><br>1100 S Marietta Pkwy",
label = "KSU Marietta") |>
setView(lng = -84.565, lat = 33.990, zoom = 11)KSU Campus Map. Click markers for campus address and contact info.
Team Contact: Mohamed Haji | mhaji@student.kennesaw.edu | DS7130/DS7310 | Dr. Amir Karami | Kennesaw State University
Results
# for loop + if/else — Week 9
findings <- list(
"Temporal Trend" = list(
text = "Deaths rose every year 2015-2022. Fentanyl now dominates.",
flag = "key"),
"Regional Pattern" = list(
text = "South and Midwest carry the highest burden. Appalachian corridor is consistently elevated.",
flag = "normal"),
"Spatial Clustering" = list(
text = "Moran's I significant (p < 0.05). Deaths cluster geographically — NOT random.",
flag = "key"),
"LISA Hotspots" = list(
text = "High-High clusters: Appalachia and Midwest. Low-Low: Mountain West.",
flag = "normal"),
"Regression" = list(
text = "M3 best model. Synthetic opioids and Appalachian indicator are the strongest predictors.",
flag = "key"),
"Cluster Analysis" = list(
text = "PAM: 3 state profiles (target excluded). Clusters validate against actual deaths.",
flag = "normal")
)
for (name in names(findings)) {
if (findings[[name]]$flag == "key") {
cat("★ KEY FINDING —", name, ":\n")
} else {
cat("►", name, ":\n")
}
cat(" ", findings[[name]]$text, "\n\n")
}★ KEY FINDING — Temporal Trend :
Deaths rose every year 2015-2022. Fentanyl now dominates.
► Regional Pattern :
South and Midwest carry the highest burden. Appalachian corridor is consistently elevated.
★ KEY FINDING — Spatial Clustering :
Moran's I significant (p < 0.05). Deaths cluster geographically — NOT random.
► LISA Hotspots :
High-High clusters: Appalachia and Midwest. Low-Low: Mountain West.
★ KEY FINDING — Regression :
M3 best model. Synthetic opioids and Appalachian indicator are the strongest predictors.
► Cluster Analysis :
PAM: 3 state profiles (target excluded). Clusters validate against actual deaths.
Conclusion
The U.S. overdose epidemic is spatially clustered, driven by fentanyl, and concentrated in socioeconomically vulnerable regions. Data-driven targeting — not uniform distribution — saves the most lives.
- Temporal: Deaths rose every year 2015–2022; fentanyl dominates
- Geographic: Appalachian and Midwest are persistent hotspots
- Spatial: Moran’s I confirms significant clustering (p < 0.001)
- Regression: M3 is the strongest model by adjusted R²/AIC; synthetic opioids are a key predictor
- Clustering: 3 state risk profiles without using the target; validated against actual deaths
Policy Recommendations
- 💉 Naloxone — prioritize High-High LISA cluster states
- 🧪 Fentanyl test strips — urgently needed in Cluster 1 states
- 🏥 MAT funding — differentiated by cluster typology
- 📊 Toxicology capacity — invest in high-pending states
Limitations
- State-level only; county data would reveal finer patterns
- No individual-level covariates (age, income, education)
- Causal inference not warranted from cross-sectional data
Acknowledgement
The findings presented in this project are exclusive to this course and were not presented in this or previous semesters and will not be presented in any other courses during this semester.
Data Source: CDC VSRR — https://data.cdc.gov/api/views/xkb8-kh2a/rows.csv
Mohamed Haji | mhaji@student.kennesaw.edu | DS7130/DS7310 | Dr. Amir Karami | Kennesaw State University
Warning: Delete the RPub URL from this QR code after your presentation.
Session Information
sessionInfo()R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] janitor_2.2.1 scales_1.4.0 ggrepel_0.9.7 tigris_2.2.1
[5] spdep_1.4-2 sf_1.1-0 spData_2.3.4 qrcode_0.3.0
[9] leaflet_2.2.3 DT_0.34.0 NbClust_3.0.1 cluster_2.1.8.1
[13] gifski_1.32.0-2 gganimate_1.0.11 plotly_4.12.0 patchwork_1.3.2
[17] lubridate_1.9.4 forcats_1.0.1 stringr_1.6.0 dplyr_1.1.4
[21] purrr_1.2.1 readr_2.1.6 tidyr_1.3.2 tibble_3.3.1
[25] ggplot2_4.0.1 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.2
[4] S7_0.2.1 fastmap_1.2.0 lazyeval_0.2.2
[7] tweenr_2.0.3 digest_0.6.39 timechange_0.4.0
[10] lifecycle_1.0.5 magrittr_2.0.4 compiler_4.5.2
[13] sass_0.4.10 rlang_1.1.7 progress_1.2.3
[16] tools_4.5.2 utf8_1.2.6 yaml_2.3.12
[19] data.table_1.18.2.1 knitr_1.51 labeling_0.4.3
[22] prettyunits_1.2.0 htmlwidgets_1.6.4 curl_7.0.0
[25] bit_4.6.0 sp_2.2-1 classInt_0.4-11
[28] RColorBrewer_1.1-3 KernSmooth_2.23-26 withr_3.0.2
[31] grid_4.5.2 e1071_1.7-17 cli_3.6.5
[34] rmarkdown_2.30 crayon_1.5.3 generics_0.1.4
[37] otel_0.2.0 rstudioapi_0.18.0 httr_1.4.7
[40] tzdb_0.5.0 cachem_1.1.0 DBI_1.2.3
[43] proxy_0.4-29 splines_4.5.2 parallel_4.5.2
[46] assertthat_0.2.1 s2_1.1.9 vctrs_0.7.1
[49] Matrix_1.7-4 boot_1.3-32 jsonlite_2.0.0
[52] hms_1.1.4 bit64_4.6.0-1 crosstalk_1.2.2
[55] jquerylib_0.1.4 units_1.0-1 glue_1.8.0
[58] stringi_1.8.7 gtable_0.3.6 deldir_2.0-4
[61] pillar_1.11.1 rappdirs_0.3.4 htmltools_0.5.9
[64] R6_2.6.1 wk_0.9.5 vroom_1.7.0
[67] evaluate_1.0.5 lattice_0.22-7 snakecase_0.11.1
[70] bslib_0.9.0 class_7.3-23 Rcpp_1.1.1
[73] uuid_1.2-2 nlme_3.1-168 mgcv_1.9-3
[76] xfun_0.56 pkgconfig_2.0.3