Question
By using the data available on Drug Shortages Canada and Drug Product Database (DPD), identify the percentage of drugs that had an actual drug shortage reported to Health Canada but did not have a prior report of an anticipated or actual shortage within a 12-month period for the same drug. Restrict your collection of data to the period June 2019 to June 2021.
For those drugs found in a), what was the average age on the market? Use the most recent marketing date found in the Drug Product Database.
library(tidyverse)
library(docstring)
library(magrittr)
library(janitor)
library(lubridate)
library(rdrugshortages)
library(knitr)
#' Disable scientific notation
options(scipen = 999)
#' This is the download location of the data we are interested in from the Drug Product Database:
<- "https://www.canada.ca/content/dam/hc-sc/documents/services/drug-product-database/allfiles.zip"
dpd_file_location
#' Start and end dates for Drug Shortages Canada Web API:
<- ymd(20190601, tz = "America/Toronto")
my_start_date <- ymd(20210601, tz = "America/Toronto")
my_end_date
#' Create date interval object, used for filtering records after we've pulled them using the API:
<- interval(start = my_start_date,
my_date_range end = my_end_date,
tzone = "America/Toronto")
Retrieving Data
I’m going to start by pulling data from Drug Shortages Canada using it’s API. For more information about the API, check out the API’s documentation.
There is an R
package called rdrugshortages
, I’m going to use that. Later I’ll filter the data to exclude records added outside the following date range: 2019-06-01 to 2021-06-01.
So let’s go ahead and get some drug shortages data:
<- dsc_search() drug_shortages_data
I’m going to quickly format some columns as dates.
$created_date %<>% as.Date()
drug_shortages_data$anticipated_start_date %<>% as.Date()
drug_shortages_data$actual_start_date %<>% as.Date() drug_shortages_data
It’s less elegant, but for the Drug Product Database (DPD), we have to download text files, but it’s not so bad.
<- tempfile()
temp
download.file(dpd_file_location, temp)
<- read_csv(
dpd_drug file = unz(temp, "drug.txt"),
col_names = colnames_drug,
col_types = coltypes_drug)
<- read_csv(
dpd_status file = unz(temp, "status.txt"),
col_names = colnames_status,
col_types = coltypes_status)
unlink(temp)
I’ll subset the data to keep only the records from 2019-06-01 to 2021-06-01.
<- drug_shortages_data %>%
drug_shortages_data_within_range filter(ymd(created_date) %within% my_date_range)
Therefore, there were 5788 shortage reports between 2019-06-01 and 2021-06-01.
Filtering Data
Since I’m interested in the percentage of actual drug shortage reports to Health Canada that did not have a prior report of an anticipated or actual shortage within a 12-month period for the same drug, I’m going to filter through the data to find the subsets of interest.
Finding the Actual Shortages
I’m going to filter through the data I’ve pulled from the website and keep only the records that have a status
of active_confirmed
.
<- drug_shortages_data_within_range %>%
actual_shortage filter(status == "active_confirmed")
Therefore, there were 593 actual shortages between 2019-06-01 and 2021-06-01.
Drug Shortages Without a Prior Report within a 12 Month Period
Next I’m going to keep only drug shortage reports to Health Canada that did not have a prior report of an anticipated or actual shortage within a 12-month period for the same drug.
This means I’ll take the data, keep only the records that had either an active or anticipated shortage and then keep only drug dins which had reports more than 12 months apart.
<- drug_shortages_data_within_range %>%
more_than_12_months filter(din %in% actual_shortage$din) %>%
filter(status %in% c("active_confirmed", "anticipated_shortage")) %>%
group_by(din) %>%
arrange(created_date) %>%
mutate(difference = created_date - lag(created_date)) %>%
filter(difference >= 365)
<- drug_shortages_data_within_range %>%
more_than_12_months_df filter(din %in% more_than_12_months$din)
Therefore, these are the drugs which had actual shortages within our time frame of interest but also had a 12 month or more gap between reports.
din | status | created_date |
---|---|---|
02220091 | active_confirmed | 2020-01-22 |
02220091 | active_confirmed | 2021-01-22 |
02220105 | active_confirmed | 2020-01-23 |
02220105 | active_confirmed | 2021-01-22 |
02221314 | active_confirmed | 2020-01-23 |
02221314 | active_confirmed | 2021-01-22 |
02418223 | active_confirmed | 2019-10-11 |
02418223 | active_confirmed | 2021-04-07 |
Age on Market
Now we want the average age on the market for the drugs found in the table above.
The first thing I’ll do is merge together the DPD datasets, since we have to work with two tables. DINs
are stored in dpd_drug
and status
is stored in dpd_status
. We need status
because it will tell us whether or not the drug has been marketed
. I’m also going to filter out any records that say CURRENT_STATUS_FLAG == N
because I’m only interested in the most up to date marketing date.
#' Mapping between DIN and DRUG_CODE:
<- dpd_drug %>% select(DRUG_CODE, DRUG_IDENTIFICATION_NUMBER)
din_drug_code_mapping
#' Add DIN to table of interest:
<- merge(dpd_status,
my_drug_product_database_all_files_status
din_drug_code_mapping,by.y = "DRUG_CODE")
#' Ensure types are the same:
$DRUG_IDENTIFICATION_NUMBER %<>%
my_drug_product_database_all_files_statusas.character(my_drug_product_database_all_files_status$DRUG_IDENTIFICATION_NUMBER)
#' Fix up formatting of din
$din %<>%
more_than_12_months_dfas.character(more_than_12_months_df$din) %>%
str_pad(8, pad = "0")
#' Find the drugs of interest in the status table
<- my_drug_product_database_all_files_status %>%
my_drug_product_database_status filter(DRUG_IDENTIFICATION_NUMBER %in% more_than_12_months_df$din)
#' Find relevant data and calculate time on market:
<- my_drug_product_database_status %>%
my_drug_product_database_status_marketed filter(STATUS == "MARKETED") %>%
filter(CURRENT_STATUS_FLAG == "Y") %>%
mutate(HISTORY_DATE = dmy(HISTORY_DATE),
AGE_ON_MARKET = lubridate::today() - HISTORY_DATE)
Conclusion
There were 5788 shortage reports between 2019-06-01 and 2021-06-01.
Of those reports, there were 593 actual shortages, for 528 unique drugs.
Of the actual shortages, there were 4 drugs with a 12 month or more gap between reports.
The proportion of drugs that have a 12 month or more gap between reports is 0.0075758.
The average age on the market for these drugs was 4.35 years.