Background

Drug overdose deaths are a major public health issue in the United States.
In this CDC/NCHS dataset, overdose death rates appear to increase dramatically leading up to 2020 and beyond.

Goal: Compare overdose death rates in two time windows:

  • Pre-pandemic: 2012–2019
  • Post-pandemic: 2019–2022

The Data

The dataset provides drug overdose death rates (per 100,000 resident population) for the United States by:

  • Drug type (all overdoses, synthetic opioids, heroin)
  • Population grouping (Total, Sex, Age group, Race and Hispanic origin)

Each row is an estimate of deaths for a specific year, drug category, and population group.

Key Attributes

  • SUBTOPIC: the drug category (“All drug overdose deaths”)
  • GROUP: how the population is split (Total, Sex, Age group)
  • SUBGROUP: the specific category inside the group (Male, Female, 35–44 years etc.)
  • ESTIMATE_TYPE: whether the rate is age-adjusted or crude
  • TIME_PERIOD: year
  • ESTIMATE: the rate value (deaths per 100,000)
  • STANDARD_ERROR, ESTIMATE_LCI, ESTIMATE_UCI: uncertainty (standard error and 95% CI)
  • FLAG: “Warning label” that the estimate may be unreliable (unstable estimate, suppressed value, etc.)
  • SUBGROUP_ORDER: recommended order for age groups (helps keep age groups in the right order)

Research Questions

  1. Did the death rate for overdoses increase overall in 2019–2022 compared to 2012–2019?
  2. Which drugs show as the most used after 2019?
  3. Do trends differ by sex?
  4. Which age groups have the highest overdose rates post-pandemic?

Overview of the Dataset

missing_est <- sum(is.na(raw$ESTIMATE))
flagged <- sum(!is.na(raw$FLAG))

df <- data.frame(
  Observations = num_rows,
  Attributes = num_cols,
  'Beginning Year' = min_year,
  'Latest Year' = max_year,
  'Missing Values' = missing_est,
  'Flagged Observations' = flagged)

verify_kable <- knitr::kable(df, caption = "Data Set Overview")

Overview of the Dataset, Continued

Data Set Overview
Observations Attributes Beginning.Year Latest.Year Missing.Values Flagged.Observations
12228 25 1999 2022 2020 2035

Clean and Define Periods

clean <- raw %>%
  mutate(TIME_PERIOD = as.integer(TIME_PERIOD),
         ESTIMATE = as.numeric(ESTIMATE),
         period = if_else(TIME_PERIOD <= 2019,
                          "Pre-Pandemic",
                          "Post-Pandemic")) %>%
  filter(TIME_PERIOD %in% years) %>%
  filter(!is.na(ESTIMATE)) %>%
  filter(is.na(FLAG))

dup_rows <- clean %>%
  count(SUBTOPIC, ESTIMATE_TYPE, TIME_PERIOD, GROUP, SUBGROUP) %>%
  filter(n > 1) %>%
  nrow()

Clean and Define Periods, Continued

Results
Period Observations Duplicates
2012–2022 5321 0

Overall Trend

overall <- clean %>%
  filter(ESTIMATE_TYPE == "Deaths per 100,000 resident population, age adjusted",
         GROUP == "Total",
         SUBTOPIC == "All drug overdose deaths") %>%
  arrange(TIME_PERIOD)

Overall Trend, Continued

Pre/Post Pandemic Comparison

overall_period <- overall %>%
  group_by(period) %>%
  summarize(mean_rate = mean(ESTIMATE), .groups = "drop")

p_period <- ggplot(overall_period, aes(x = period, y = mean_rate, fill = period)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = c(
    "Pre-Pandemic" = "blue",
    "Post-Pandemic" = "red")) +
  labs(title = "Mean age-adjusted Overdose Rate by Period",
       x = "",
       y = "Mean deaths per 100,000",
       fill = "")

p_period_widget <- ggplotly(p_period, height = 320)

Pre/Post Pandemic Comparison, Continued

Simple Linear Regression

model1 <- lm(ESTIMATE ~ TIME_PERIOD, data = overall)

coef_tbl <- summary(model1)$coefficients
coef_kable <- knitr::kable(coef_tbl, digits = 3, caption = "Regression Coefficients (Overall Trend)")

overall$fitted <- fitted(model1)
overall$resid <- resid(model1)

p_resid <- ggplot(overall, aes(x = fitted, y = resid)) +
  geom_point() +
  geom_hline(yintercept = 0) +
  labs(title = "",
       x = "Fitted Values",
       y = "Residuals")

p_resid_widget <- ggplotly(p_resid, height = 320)

Simple Linear Regression, Continued

Regression Coefficients (Overall Trend)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4089.649 394.152 -10.376 0
TIME_PERIOD 2.038 0.195 10.430 0

Drug-type Change Pre vs Post, Part 1

For Total (age-adjusted), this dataset includes opioid-related categories plus the overall overdose category.
A negative percent change means the average rate decreased from pre-pandemic to post-pandemic.

drug_change <- clean %>%
  filter(ESTIMATE_TYPE == "Deaths per 100,000 resident population, age adjusted",
         GROUP == "Total") %>%
  group_by(SUBTOPIC, period) %>%
  summarize(mean_rate = mean(ESTIMATE), .groups = "drop") %>%
  pivot_wider(names_from = period, values_from = mean_rate) %>%
  mutate(abs_change = `Post-Pandemic` - `Pre-Pandemic`,
         pct_change = abs_change / `Pre-Pandemic` * 100) %>%
  arrange(desc(pct_change))

Drug-type change Pre vs Post, Part 2

Drug-type change Pre vs Post, Part 3

Drug’s Showing Decreased Use
SUBTOPIC Pre-Pandemic Post-Pandemic pct_change
Drug overdose deaths involving natural and semisynthetic opioids 3.86 3.83 -0.76
Drug overdose deaths involving heroin 3.88 2.90 -25.16

Drug Trends

Selected Drug Trends

Trends by Sex

Trends by Sex

Age Groups

For age groups, I used crude rates since “age-adjusted” doesn’t make sense within a single age band.

age_trend <- clean %>%
  filter(ESTIMATE_TYPE == "Deaths per 100,000 resident population, crude",
         GROUP == "Age group",
         SUBTOPIC == "All drug overdose deaths")

age_levels <- age_trend %>%
  distinct(SUBGROUP, SUBGROUP_ORDER) %>%
  arrange(SUBGROUP_ORDER) %>%
  pull(SUBGROUP)

age_summary <- age_trend %>%
  group_by(SUBGROUP, period) %>%
  summarize(mean_rate = mean(ESTIMATE), .groups = "drop") %>%
  mutate(SUBGROUP = factor(SUBGROUP, levels = age_levels))

Age Groups

Comprehensive Visual

Comparing:

  • x = Year
  • y = All overdose rate (age-adjusted, Total)
  • z = Synthetic opioid rate (age-adjusted, Total)
all_rate <- clean %>%
  filter(ESTIMATE_TYPE == "Deaths per 100,000 resident population, age adjusted",
         GROUP == "Total",SUBTOPIC == "All drug overdose deaths") %>%
  select(TIME_PERIOD, all_overdose = ESTIMATE)

syn_rate <- clean %>%
  filter(ESTIMATE_TYPE == "Deaths per 100,000 resident population, age adjusted",
         GROUP == "Total",
         SUBTOPIC == "Drug overdose deaths involving synthetic opioids other than methadone") %>%
  select(TIME_PERIOD, synthetic_opioids = ESTIMATE)

trend3d <- inner_join(all_rate, syn_rate, by = "TIME_PERIOD") %>% arrange(TIME_PERIOD)

Comprehensive Visual

Conclusions

  • The overall age-adjusted overdose death rate is higher post 2019 and dramatically so up to and after 2020.
  • The biggest increase stems from synthetic opioids other than methadone use.
  • The data shows that males had an increase use of these drugs when compared to females. However, both groups showed an increase in drug use.
  • Post-pandemic crude overdose rates are highest among working-age adults,especially in the 35–44 age range.

Limitations

  • This is a descriptive pre/post pandemic comparison.
  • The dataset reports rates per 100,000, not raw death counts.
  • Flagged estimates were removed programmatically to reduce instability in small subgroups.
  • There is much more data pre-pandemic vs post-pandemic.

References