Drug Shortages Analysis

Question

By using the data available on Drug Shortages Canada and Drug Product Database (DPD), identify the percentage of drugs that had an actual drug shortage reported to Health Canada but did not have a prior report of an anticipated or actual shortage within a 12-month period for the same drug. Restrict your collection of data to the period June 2019 to June 2021.
For those drugs found in a), what was the average age on the market? Use the most recent marketing date found in the Drug Product Database.

library(tidyverse)
library(docstring)
library(magrittr)
library(janitor)
library(lubridate)
library(rdrugshortages)
library(knitr)

#' Disable scientific notation
options(scipen = 999)

#' This is the download location of the data we are interested in from the Drug Product Database:
dpd_file_location <- "https://www.canada.ca/content/dam/hc-sc/documents/services/drug-product-database/allfiles.zip"

#' Start and end dates for Drug Shortages Canada Web API:
my_start_date <- ymd(20190601, tz = "America/Toronto")
my_end_date <- ymd(20210601, tz = "America/Toronto")

#' Create date interval object, used for filtering records after we've pulled them using the API:
my_date_range <- interval(start = my_start_date,
                          end = my_end_date,
                          tzone = "America/Toronto")

Retrieving Data

I’m going to start by pulling data from Drug Shortages Canada using it’s API. For more information about the API, check out the API’s documentation.

There is an R package called rdrugshortages, I’m going to use that. Later I’ll filter the data to exclude records added outside the following date range: 2019-06-01 to 2021-06-01.

So let’s go ahead and get some drug shortages data:

drug_shortages_data <- dsc_search()

I’m going to quickly format some columns as dates.

drug_shortages_data$created_date %<>% as.Date()
drug_shortages_data$anticipated_start_date %<>% as.Date()
drug_shortages_data$actual_start_date %<>% as.Date()

It’s less elegant, but for the Drug Product Database (DPD), we have to download text files, but it’s not so bad.

temp <- tempfile()

download.file(dpd_file_location, temp)

dpd_drug <- read_csv(
  file = unz(temp, "drug.txt"),
  col_names = colnames_drug,
  col_types = coltypes_drug)

dpd_status <- read_csv(
  file = unz(temp, "status.txt"),
  col_names = colnames_status,
  col_types = coltypes_status)

unlink(temp)

I’ll subset the data to keep only the records from 2019-06-01 to 2021-06-01.

drug_shortages_data_within_range <- drug_shortages_data %>%
  filter(ymd(created_date) %within% my_date_range)

Therefore, there were 5788 shortage reports between 2019-06-01 and 2021-06-01.

Filtering Data

Since I’m interested in the percentage of actual drug shortage reports to Health Canada that did not have a prior report of an anticipated or actual shortage within a 12-month period for the same drug, I’m going to filter through the data to find the subsets of interest.

Finding the Actual Shortages

I’m going to filter through the data I’ve pulled from the website and keep only the records that have a status of active_confirmed.

actual_shortage <- drug_shortages_data_within_range %>%
  filter(status == "active_confirmed")

Therefore, there were 593 actual shortages between 2019-06-01 and 2021-06-01.

Drug Shortages Without a Prior Report within a 12 Month Period

Next I’m going to keep only drug shortage reports to Health Canada that did not have a prior report of an anticipated or actual shortage within a 12-month period for the same drug.

This means I’ll take the data, keep only the records that had either an active or anticipated shortage and then keep only drug dins which had reports more than 12 months apart.

more_than_12_months <- drug_shortages_data_within_range %>%
  filter(din %in% actual_shortage$din) %>%
  filter(status %in% c("active_confirmed", "anticipated_shortage")) %>%
  group_by(din) %>%
  arrange(created_date) %>%
  mutate(difference = created_date - lag(created_date)) %>%
  filter(difference >= 365)
  
more_than_12_months_df <- drug_shortages_data_within_range %>% 
  filter(din %in% more_than_12_months$din)

Therefore, these are the drugs which had actual shortages within our time frame of interest but also had a 12 month or more gap between reports.

din	status	created_date
02220091	active_confirmed	2020-01-22
02220091	active_confirmed	2021-01-22
02220105	active_confirmed	2020-01-23
02220105	active_confirmed	2021-01-22
02221314	active_confirmed	2020-01-23
02221314	active_confirmed	2021-01-22
02418223	active_confirmed	2019-10-11
02418223	active_confirmed	2021-04-07

Age on Market

Now we want the average age on the market for the drugs found in the table above.

The first thing I’ll do is merge together the DPD datasets, since we have to work with two tables. DINs are stored in dpd_drug and status is stored in dpd_status. We need status because it will tell us whether or not the drug has been marketed. I’m also going to filter out any records that say CURRENT_STATUS_FLAG == N because I’m only interested in the most up to date marketing date.

#' Mapping between DIN and DRUG_CODE:
din_drug_code_mapping <- dpd_drug %>% select(DRUG_CODE, DRUG_IDENTIFICATION_NUMBER)

#' Add DIN to table of interest:
my_drug_product_database_all_files_status <- merge(dpd_status,
                                                   din_drug_code_mapping,
                                                   by.y = "DRUG_CODE")


#' Ensure types are the same:
my_drug_product_database_all_files_status$DRUG_IDENTIFICATION_NUMBER %<>%
  as.character(my_drug_product_database_all_files_status$DRUG_IDENTIFICATION_NUMBER)

#' Fix up formatting of din
more_than_12_months_df$din %<>%
  as.character(more_than_12_months_df$din) %>%
  str_pad(8, pad = "0")

#' Find the drugs of interest in the status table
my_drug_product_database_status <- my_drug_product_database_all_files_status %>%
  filter(DRUG_IDENTIFICATION_NUMBER %in% more_than_12_months_df$din)

#' Find relevant data and calculate time on market:
my_drug_product_database_status_marketed <- my_drug_product_database_status %>%
  filter(STATUS == "MARKETED") %>%
  filter(CURRENT_STATUS_FLAG == "Y") %>%
  mutate(HISTORY_DATE = dmy(HISTORY_DATE),
         AGE_ON_MARKET = lubridate::today() - HISTORY_DATE)

Conclusion

There were 5788 shortage reports between 2019-06-01 and 2021-06-01.

Of those reports, there were 593 actual shortages, for 528 unique drugs.

Of the actual shortages, there were 4 drugs with a 12 month or more gap between reports.

The proportion of drugs that have a 12 month or more gap between reports is 0.0075758.

The average age on the market for these drugs was 4.35 years.