Introduction

The goal of this project is to find notable trends for opioid-related deaths in Washington, D.C., over 9 years (2015-2023). To do this, I will use a dataset published by the National Center for Health Statistics, which contains overdose death statistics for all fifty states and Washington, D.C., over this time span. The dataset includes relevant information such as the type of substance present in the overdose, the month, year, and state in which it was reported, and the percent completeness of aggregated reports for a given month in a given state. The dataset includes a comprehensive breakdown of the type of opioid that was present in the reported overdose. Most notably, they are broken down between synthetic opioids, non-synthetic & semi-synthetic opioids, heroin, methadone, and an overall category of all opioid-related overdoses regardless of type. The dataset also includes information on cocaine overdoses, which will be used to compare trends between overall opioid and cocaine use in order to gather a greater understanding of drug-use in Washington D.C.

Data Analysis (part 1)

My approach to analyzing the data will begin with cleaning up the dataset and making sure all the variables I need are in a format that I can use effectively. I will then be assigning the relevant variables to their own dataframes, making them easier to work with when performing calculations and creating the visualizations.

library (dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library (ggplot2)
library (lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
drug_overdose_info <- read.csv("DrugOverdoseRates.csv")
#examining the layout of the dataset

str(drug_overdose_info)
## 'data.frame':    78120 obs. of  12 variables:
##  $ State                        : chr  "AK" "AK" "AK" "AK" ...
##  $ Year                         : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ Month                        : chr  "January" "February" "March" "April" ...
##  $ Period                       : chr  "12 month-ending" "12 month-ending" "12 month-ending" "12 month-ending" ...
##  $ Indicator                    : chr  "Cocaine (T40.5)" "Cocaine (T40.5)" "Cocaine (T40.5)" "Cocaine (T40.5)" ...
##  $ Data.Value                   : chr  "" "" "" "" ...
##  $ Percent.Complete             : int  100 100 100 100 100 100 100 100 100 100 ...
##  $ Percent.Pending.Investigation: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ State.Name                   : chr  "Alaska" "Alaska" "Alaska" "Alaska" ...
##  $ Footnote                     : chr  "Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality." "Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality." "Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality." "Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality." ...
##  $ Footnote.Symbol              : chr  "**" "**" "**" "**" ...
##  $ Predicted.Value              : chr  "" "" "" "" ...
head(drug_overdose_info)
##   State Year    Month          Period       Indicator Data.Value
## 1    AK 2015  January 12 month-ending Cocaine (T40.5)           
## 2    AK 2015 February 12 month-ending Cocaine (T40.5)           
## 3    AK 2015    March 12 month-ending Cocaine (T40.5)           
## 4    AK 2015    April 12 month-ending Cocaine (T40.5)           
## 5    AK 2015      May 12 month-ending Cocaine (T40.5)           
## 6    AK 2015     June 12 month-ending Cocaine (T40.5)           
##   Percent.Complete Percent.Pending.Investigation State.Name
## 1              100                             0     Alaska
## 2              100                             0     Alaska
## 3              100                             0     Alaska
## 4              100                             0     Alaska
## 5              100                             0     Alaska
## 6              100                             0     Alaska
##                                                                                                                   Footnote
## 1 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 2 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 3 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 4 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 5 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 6 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
##   Footnote.Symbol Predicted.Value
## 1              **                
## 2              **                
## 3              **                
## 4              **                
## 5              **                
## 6              **
#cleaning up the names of the columns, making the data_value column (number of deaths) as numeric rather than character, and using lubridate to create a date column in date format

names(drug_overdose_info) <- gsub("[(). \\-]", "_", names(drug_overdose_info))
names(drug_overdose_info) <- tolower(names(drug_overdose_info))

drug_overdose_info <- drug_overdose_info %>%
  mutate(data_value = as.numeric(data_value))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `data_value = as.numeric(data_value)`.
## Caused by warning:
## ! NAs introduced by coercion
drug_overdose_info <- drug_overdose_info %>%
  mutate(date = make_date(year, match(month, month.name), 1))

#assigning the six different drug types relevant to my report into their own dataframes, filtering to make them all for DC specifically and making sure to exclude any incomplete reports

dc_nonsynth_df <- drug_overdose_info %>%
  filter(state == "DC" & indicator == "Natural & semi-synthetic opioids (T40.2)" & footnote != "Underreported due to incomplete data.")

dc_synth_df <- drug_overdose_info %>%
  filter(state == "DC" & indicator == "Synthetic opioids, excl. methadone (T40.4)" & footnote != "Underreported due to incomplete data.")

dc_opioid_df <- drug_overdose_info %>%
  filter(state == "DC" & indicator == "Opioids (T40.0-T40.4,T40.6)" & footnote != "Underreported due to incomplete data.")

dc_cocaine_df <- drug_overdose_info %>%
  filter(state == "DC" & indicator == "Cocaine (T40.5)" & footnote != "Underreported due to incomplete data.")

dc_methadone_df <- drug_overdose_info %>%
  filter(state == "DC" & indicator == "Methadone (T40.3)" & footnote != "Underreported due to incomplete data.")

dc_heroin_df <- drug_overdose_info %>%
  filter(state == "DC" & indicator == "Heroin (T40.1)" & footnote != "Underreported due to incomplete data.")

head(drug_overdose_info)
##   state year    month          period       indicator data_value
## 1    AK 2015  January 12 month-ending Cocaine (T40.5)         NA
## 2    AK 2015 February 12 month-ending Cocaine (T40.5)         NA
## 3    AK 2015    March 12 month-ending Cocaine (T40.5)         NA
## 4    AK 2015    April 12 month-ending Cocaine (T40.5)         NA
## 5    AK 2015      May 12 month-ending Cocaine (T40.5)         NA
## 6    AK 2015     June 12 month-ending Cocaine (T40.5)         NA
##   percent_complete percent_pending_investigation state_name
## 1              100                             0     Alaska
## 2              100                             0     Alaska
## 3              100                             0     Alaska
## 4              100                             0     Alaska
## 5              100                             0     Alaska
## 6              100                             0     Alaska
##                                                                                                                   footnote
## 1 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 2 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 3 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 4 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 5 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
## 6 Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality.
##   footnote_symbol predicted_value       date
## 1              **                 2015-01-01
## 2              **                 2015-02-01
## 3              **                 2015-03-01
## 4              **                 2015-04-01
## 5              **                 2015-05-01
## 6              **                 2015-06-01
sum(is.na(drug_overdose_info$data_value))
## [1] 32743

Data Analysis (part 2)

Now that I have all of my data cleaned and organized, I will need to focus on creating new dataframes that highlight essential trends in the use. For example, I will be creating a new dataframe called dc_diff_df to find the difference between the amounts of synthetic opioid overdoses and non-synthetic and semi-synthetic opioid overdoses. I will also make new dataframes for both overall opioid overdoses and cocaine overdoses to calculate the yearly trends rather than the monthly trends. Additionally, I will create new dataframes that calculate the proportions of each drug in the total amount of overdose deaths for three different time spans over the nine years. I am doing all of this to garner a greater understanding of the breakdown and trends of drug-related overdoses in DC, as the simple monthly trends are not enough to get a clear picture of all the trends.

#calculating the difference between synthetic and non-synthetic & semi-synthetic opioid overdoses.
dc_diff_df <- data.frame(
  state = dc_synth_df$state,
  month = dc_synth_df$month,
  year = dc_synth_df$year,
  date = dc_synth_df$date,
  diff_value = dc_synth_df$data_value - dc_nonsynth_df$data_value
)

#calculating the yearly totals of opioid overdoses and the raw & percent changes of them over the previous year
yearly_opioid_deaths <- dc_opioid_df %>%
  mutate(year = year(date)) %>%
  group_by(year) %>%
  summarise(total_deaths = sum(data_value))

yearly_opioid_deaths <- yearly_opioid_deaths %>%
  mutate(
    change = total_deaths - lag(total_deaths),
    percent_change = (change / lag(total_deaths)) * 100
  )

#calculating the yearly totals of cocaine overdoses and the raw & percent changes of them over the previous year
yearly_cocaine_deaths <- dc_cocaine_df %>%
  mutate(year = year(date)) %>%
  group_by(year) %>%
  summarise(total_deaths = sum(data_value))

yearly_cocaine_deaths <- yearly_cocaine_deaths %>%
  mutate(
    change = total_deaths - lag(total_deaths),
    percent_change = (change / lag(total_deaths)) * 100
  )

#calculating the proportions of each drug in the total amounts of overdoses over the entire 9-year span
drug_totals <- data.frame(
  drug_type = c("Heroin", "Cocaine", "Methadone", "Non-synthetic & Semi-synthetic Opioids", "Synthetic Opioids"),
  total_deaths = c(
    sum(dc_heroin_df$data_value, na.rm = TRUE),
    sum(dc_cocaine_df$data_value, na.rm = TRUE),
    sum(dc_methadone_df$data_value, na.rm = TRUE),
    sum(dc_nonsynth_df$data_value, na.rm = TRUE),
    sum(dc_synth_df$data_value, na.rm = TRUE)
  )
)

drug_totals <- drug_totals %>%
  mutate(percent = round(100 * total_deaths / sum(total_deaths), 1),
         legend = paste0(drug_type, " (", percent, "%)")) %>%
  arrange(desc(percent)) %>%
  mutate(legend = factor(legend, levels = legend))

#calculating the proportions of each drug in the total amounts of overdoses between 2015 and 2019
drug_totals_pre2020 <- data.frame(
  drug_type = c("Heroin", "Cocaine", "Methadone", "Non-synthetic & Semi-synthetic Opioids", "Synthetic Opioids"),
  total_deaths = c(
    sum(filter(dc_heroin_df, year < 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_cocaine_df, year < 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_methadone_df, year < 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_nonsynth_df, year < 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_synth_df, year < 2020)$data_value, na.rm = TRUE)
  )
)

drug_totals_pre2020 <- drug_totals_pre2020 %>%
  mutate(percent = round(100 * total_deaths / sum(total_deaths), 1),
         legend = paste0(drug_type, " (", percent, "%)")) %>%
  arrange(desc(percent)) %>%
  mutate(legend = factor(legend, levels = legend))

#calculating the proportions of each drug in the total amounts of overdoses between 2020 and 2023
drug_totals_post2020 <- data.frame(
  drug_type = c("Heroin", "Cocaine", "Methadone", "Non-synthetic & Semi-synthetic Opioids", "Synthetic Opioids"),
  total_deaths = c(
    sum(filter(dc_heroin_df, year >= 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_cocaine_df, year >= 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_methadone_df, year >= 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_nonsynth_df, year >= 2020)$data_value, na.rm = TRUE),
    sum(filter(dc_synth_df, year >= 2020)$data_value, na.rm = TRUE)
  )
)

drug_totals_post2020 <- drug_totals_post2020 %>%
  mutate(percent = round(100 * total_deaths / sum(total_deaths), 1),
         legend = paste0(drug_type, " (", percent, "%)")) %>%
  arrange(desc(percent)) %>%
  mutate(legend = factor(legend, levels = legend))

Visualizations

This section of the report will show us the trends of drug overdoses in DC through the use of visuals.

Proportional Breakdown of Overdose Deaths by Substance

ggplot(drug_totals, aes(x = "", y = total_deaths, fill = legend)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(theta = "y") +
  labs(
    title = "Overdose Deaths in DC by Substance \n(2015–2023)",
    fill = "Drug Type"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    legend.title = element_text(face = "bold"),
    legend.text = element_text(size = 10)
  ) +
  scale_fill_brewer(palette = "Set1")


As we can see in the pie chart above, roughly 70% of reported overdoses in Washington, D.C., between 2015 and 2023 are due to opioids. The main subset of opioids causing this are synthetic opioids, which are responsible for 45% of the overdose deaths in these nine years. However, nine years is quite a long time and this pie chart fails to show any trends. To get a better picture, we must break up this time period into shorter intervals.


ggplot(drug_totals_pre2020, aes(x = "", y = total_deaths, fill = legend)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(theta = "y") +
  labs(
    title = "Overdose Deaths in DC by Substance \n(2015–2019)",
    fill = "Drug Type"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    legend.title = element_text(face = "bold"),
    legend.text = element_text(size = 10)
  ) +
  scale_fill_brewer(palette = "Set2")


We see a significantly different breakdown between 2015 and 2019. Synthetic opioids were considerably less prevalent (though still the leading cause of death) pre-2020. Heroin is substantially more prevalent, now responsible for nearly 28% of overdose deaths in this time period.


ggplot(drug_totals_post2020, aes(x = "", y = total_deaths, fill = legend)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(theta = "y") +
  labs(
    title = "Overdose Deaths in DC by Substance \n(2020–2023)",
    fill = "Drug Type"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    legend.title = element_text(face = "bold"),
    legend.text = element_text(size = 10)
  ) +
  scale_fill_brewer(palette = "Set3")


The second half of this time period, 2020 and later, shows a concerning trend with synthetic opioids. Heroin, which was responsible for nearly 28% of overdose deaths between 2015 and 2019, is now responsible for roughly 10% of overdose deaths during the 2020s. That is approximately an 18% shift. Synthetic opioids seem to have replaced heroin as we see around a 15% increase in the proportion of synthetic opioid-related overdoses. The proportion of cocaine use has also increased a bit, albeit less dramatically, with around a 7% increase.


Conclusion

The breakdown and trends of drug overdoses in D.C. between 2015 and 2023 show us a few key findings. The most concerning finding from this data would be the sharp rise in opioid related deaths, seemingly caused by the meteoric rise of synthetic opioids in the region. In January of 2015, only 15 individuals died from a synthetic opioid-related overdose. In December of 2023, 493 individuals died from a synthetic opioid-related overdose. That is a 3287% increase. Opioid deaths, except for 2017 to 2018, have only been on the rise, and the trend shows no signs of stopping.

Contrary to that, heroin, non-synthetic, and semi-synthetic opioids have not seen the same increase that synthetic opioids have seen. Heroin overdoses spiked alongside synthetic opioid overdoses during the first significant increase in opioid overdoses between 2016 and 2018. However, instead of continuing to rise like synthetic opioids, heroin overdoses trended down to the previous baseline and have remained mostly stable there since. Non-synthetic and semi-synthetic opioids have remained stable over the entire 9-year span. It is clear that the driving force behind the increase in opioid overdoses is the synthetic opioids, and that is where D.C. should direct its attention.

Future research should be focused on why we are seeing such a stark increase in synthetic opioid use in D.C., whether these trends are reflected in other states, and how we can address the rise of synthetic opioid overdoses. Finding out what caused the decrease between 2017 and 2018 could reveal other vital variables that were not present in this dataset.

Another notable trend that warrants further research was the similar trend in cocaine overdoses in D.C. There should be follow-up research on what the cause of this is; whether it’s simply an overall increase in drug use in D.C., or if there is another factor at play, such as cocaine supply being contaminated with these highly lethal synthetic opioids.

References

Compute lagged or leading values - lead-lag. dplyr. (n.d.). https://dplyr.tidyverse.org/reference/lead-lag.html
National Center for Health Statistics. (2018, June 3). VSRR Provisional Drug Overdose Death Counts https://data.cdc.gov/National-Center-for-Health-Statistics/VSRR-Provisional-Drug-Overdose-Death-Counts/xkb8-kh2a/about_data
Rinker, T. (2013, September 19). Paste, paste0, and sprintf. TRinker’s R Blog. https://trinkerrstuff.wordpress.com/2013/09/15/paste-paste0-and-sprintf-2/