This is a data report on the “Drug Prices” data received from Blue Shield. Original file (Drug Pricing Data Combined Final - V2.xlsx) rearranged and saved as (Updated BlueShield.xlsx) - by splitting into 3 tables and rearranging some columns.
drugs <- read_excel("/cloud/project/DATA/Updated BlueShieldMain.xlsx", sheet = "Table 1") %>% unique()
drug.prices.wide <- read_excel("/cloud/project/DATA/Updated BlueShieldMain.xlsx", sheet = "Table 2")
kable(head(drugs))
NDC | LABEL_NAME | MFR_NAME | DRUG_GENERIC_NAME | DOSAGE_FORM_DESC | ROUTE_OF_ADMIN_DESC | FORM_TYPE_CODE_DESC | PACKAGE_SIZE | PACKAGE_DESC | MAINTENANCE_CODE_DESC | DRUG_GROUPING | DRUG_BRAND_OR_GENERIC |
---|---|---|---|---|---|---|---|---|---|---|---|
00002323560 | CYMBALTA CAP 20MG | LILLY | DULOXETINE HCL | Capsule Delayed Release Particles | Oral | Solid | 00060000 | BOTTLE | Maintenance Drug | Antidepressant | Brand |
00002324030 | CYMBALTA CAP 30MG | LILLY | DULOXETINE HCL | Capsule Delayed Release Particles | Oral | Solid | 00030000 | BOTTLE | Maintenance Drug | Antidepressant | Brand |
00002324090 | CYMBALTA CAP 30MG | LILLY | DULOXETINE HCL | Capsule Delayed Release Particles | Oral | Solid | 00090000 | BOTTLE | Maintenance Drug | Antidepressant | Brand |
00002327004 | CYMBALTA CAP 60MG | LILLY | DULOXETINE HCL | Capsule Delayed Release Particles | Oral | Solid | 01000000 | BOTTLE | Maintenance Drug | Antidepressant | Brand |
00002327030 | CYMBALTA CAP 60MG | LILLY | DULOXETINE HCL | Capsule Delayed Release Particles | Oral | Solid | 00030000 | BOTTLE | Maintenance Drug | Antidepressant | Brand |
00008083321 | EFFEXOR XR CAP 75MG | PFIZER U.S. | VENLAFAXINE HCL | Capsule Extended Release 24 Hour | Oral | Solid | 00030000 | BOTTLE | Maintenance Drug | Antidepressant | Brand |
kable(head(drug.prices.wide))
NDC | LABEL_NAME | LOB | DOSAGE_FORM_DESC | PACKAGE_SIZE | AWP_UNIT_PRICE1 | AWP_EFFECTIVE_DATE1 | AWP_UNIT_PRICE2 | AWP_EFFECTIVE_DATE2 | AWP_UNIT_PRICE3 | AWP_EFFECTIVE_DATE3 | AWP_UNIT_PRICE4 | AWP_EFFECTIVE_DATE4 | AWP_UNIT_PRICE5 | AWP_EFFECTIVE_DATE5 | AWP_UNIT_PRICE6 | AWP_EFFECTIVE_DATE6 | awp_per_30ds | WAC_UNIT_PRICE1 | WAC_EFFECTIVE_DATE1 | WAC_UNIT_PRICE2 | WAC_EFFECTIVE_DATE2 | WAC_UNIT_PRICE3 | WAC_EFFECTIVE_DATE3 | WAC_UNIT_PRICE4 | WAC_EFFECTIVE_DATE4 | WAC_UNIT_PRICE5 | WAC_EFFECTIVE_DATE5 | WAC_UNIT_PRICE6 | WAC_EFFECTIVE_DATE6 | wac_per_30ds |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
00002323560 | CYMBALTA CAP 20MG | Commercial | Capsule Delayed Release Particles | 00060000 | 9.4 | 2021-01-01 | 9.1560000000000006 | 43861 | 8.8919999999999995 | 43515 | 8.5440000000000005 | 43084 | 7.7759999999999998 | 41453 | 7.0739999999999998 | 41255 | 428 | 7.9 | 2021-01-01 | 7.63 | 43861 | 7.41 | 43515 | 7.12 | 43084 | 6.48 | 41453 | 5.8949999999999996 | 41255 | 357 |
00002324030 | CYMBALTA CAP 30MG | Commercial | Capsule Delayed Release Particles | 00030000 | 10.6 | 2021-01-01 | 10.272 | 43861 | 9.9719999999999995 | 43515 | 9.5879999999999992 | 43084 | 8.7240000000000002 | 41453 | 7.9379999999999997 | 41255 | 437 | 8.8 | 2021-01-01 | 8.56 | 43861 | 8.31 | 43515 | 7.99 | 43084 | 7.27 | 41453 | 6.6150000000000002 | 41255 | 364 |
00002324090 | CYMBALTA CAP 30MG | Commercial | Capsule Delayed Release Particles | 00090000 | 10.6 | 2021-01-01 | 10.272 | 43861 | 9.9719999999999995 | 43515 | 9.5879999999999992 | 43084 | 8.7240000000000002 | 41453 | 7.9379999999999997 | 41255 | 565 | 8.8 | 2021-01-01 | 8.56 | 43861 | 8.31 | 43515 | 7.99 | 43084 | 7.27 | 41453 | 6.6150000000000002 | 41255 | 470 |
00002327004 | CYMBALTA CAP 60MG | Commercial | Capsule Delayed Release Particles | 01000000 | 10.6 | 2021-01-01 | 10.272 | 43861 | 9.9719999999999995 | 43515 | 9.5879999999999992 | 43084 | 8.7240000000000002 | 41453 | 7.9379999999999997 | 41255 | 358 | 8.8 | 2021-01-01 | 8.56 | 43861 | 8.31 | 43515 | 7.99 | 43084 | 7.27 | 41453 | 6.6150000000000002 | 41255 | 298 |
00002327030 | CYMBALTA CAP 60MG | Commercial | Capsule Delayed Release Particles | 00030000 | 10.6 | 2021-01-01 | 10.272 | 43861 | 9.9719999999999995 | 43515 | 9.5879999999999992 | 43084 | 8.7240000000000002 | 41453 | 7.9379999999999997 | 41255 | 401 | 8.8 | 2021-01-01 | 8.56 | 43861 | 8.31 | 43515 | 7.99 | 43084 | 7.27 | 41453 | 6.6150000000000002 | 41255 | 334 |
00008083321 | EFFEXOR XR CAP 75MG | Commercial | Capsule Extended Release 24 Hour | 00030000 | 19.5 | 2021-01-01 | 18.601330000000001 | 43831 | 17.732330000000001 | 43480 | 16.888000000000002 | 43297 | 18.482330000000001 | 43282 | 16.888000000000002 | 43101 | 764 | 16.3 | 2021-01-01 | 15.500999999999999 | 43831 | 14.776999999999999 | 43480 | 14.07333 | 43297 | 15.401999999999999 | 43282 | 14.07333 | 43101 | 637 |
The drug prices are stated as up to 6 (Date,Price) pairs. That is there are 6 dates for which prices are given for each item. For each date, there are two types of prices: awp (average wholesale price) and wac (wholesale acqusition cost). In addition there are two additional price numbers: awp_per_30ds and wac_per_30ds which state the average of the 30-day-supply prices (using most recent price).
First rename some columns to make it more convenient for coding. In the drugs table, this is straightforward, we just want to make some names shorter and simpler.
names(drugs) <- c("NDC", "Label_Name", "Mfr_Name", "Generic_Name", "Form", "How_Taken", "Drug_State", "Qty", "Package_Type", "Maintenance", "Category", "Brand")
In the drug prices table, there’s a little more. The first few “drug” columns can be renamed similar to above.
The remaining - repeating - “price and date” column names (of the form AWP_UNIT_PRICE1) can be simplified (to Price1_AWP), and they can be combined for simplification.
# Names for initial columns of the drug.prices.wide table
drug.prices.wide <- drug.prices.wide %>% rename("NDC" = 1, "Label_Name" = 2, "LOB" =3, "Form"=4, "Qty"=5)
# Simplify the price and date column names
#eg convert AWP_UNIT_PRICE1 to AWPPrice_1
colnames(drug.prices.wide) <- gsub('_UNIT_PRICE','Price_',colnames(drug.prices.wide))
colnames(drug.prices.wide) <- gsub('_EFFECTIVE_DATE','Date_',colnames(drug.prices.wide))
Notice the use of “.” and “" in the names … this will be useful in the wide-to-long conversion below. The "." is just to make the name readable, and the "” will help separate out the repeated readings of Price and Date.
Excel data format does not always preserve data type (class), so before doing any operations it is useful to coerce columns to the right type. Strangely the first date columns (among AWP and WAC) are both transmitted correctly, but the rest are not.
Let’s first identify and coerce the price columns.
# class(drug.prices.wide$AWPPrice_2)
# class(drug.prices.wide$AWPDate_2)
price.cols <- names(drug.prices.wide %>% select(contains("PRICE")) )
drug.prices.wide <- drug.prices.wide %>% mutate(across(all_of(price.cols), as.numeric)) # (cols = awp.cols, numeric)
And now identify and coerce the date columns (except the 1st and 7th), converting from character to numeric, then to Date.
date.cols <- names(drug.prices.wide %>% select(contains("DATE")) )
drug.prices.wide <- drug.prices.wide %>%
mutate(across(all_of(date.cols[c(-1,-7)]), as.numeric)) %>%
mutate(across(all_of(date.cols[c(-1,-7)]), as.Date, origin="1900-01-01")) %>%
mutate(across(all_of(date.cols), as.Date))
# as_datetime(x, origin = lubridate::origin, tz = "UTC")
Convert Qty column, which seems to be displayed with a multiple of 1000, into numeric. Finally convert character columns to factor.
drugs2 <- drugs %>% mutate(Qty = as.numeric(Qty)/1000)
drug.prices.wide2 <- drug.prices.wide %>% mutate(Qty = as.numeric(Qty)/1000)
drugs <- drugs %>% mutate_if(is.character,as.factor)
drug.prices.wide <- drug.prices.wide %>% mutate_if(is.character,as.factor)
(** This is not necessary because the BSC database has more detailed columns identifying Form **)
The “Form” values in the data set are sometimes compound. For instance, “Capsule Extended Release 12 Hour”, mixes the Form (“Capsule”) and release mechanism (“Extended Release 12 Hour”). We will split these, so that then one can compare prices for “Capsule Extended Release 12 Hour” and “Tablet Extended Release 12 Hour”, but not compare “Capsule Extended Release 12 Hour” against “Tablet Extended Release 24 Hour”.
drug.prices.wide$Mechanism <- str_extract(drug.prices.wide$Form, "\\s\\w*") #eg Tab Cap
The Label_Name column has compound values. For instance “CYMBALTA CAP 30M” is the medicine CYMBALTA in CAP type and 30MG Dosage. We’ll split it into 3 new columns: DrugName, FormType, and Dosage
# remove extra whitespaces
drug.prices.wide <- drug.prices.wide %>% mutate(Label_Name = gsub("\\s+"," ",drug.prices.wide$Label_Name) )
# Drug Name: first remove FormType ("\\s\\w+") and Dosage ("\\d+[^\ ]*")
# drug.prices.wide <- drug.prices.wide %>% mutate(DrugName = sub("\\s\\w+", " ", drug.prices.wide$Label_Name))
# drug.prices.wide <- drug.prices.wide %>% mutate(DrugName = sub("\\d+[^\ ]*","", drug.prices.wide$DrugName))
# get the "main form" by pulling the first word in Label_Name before a whitespace and followed by 0 or more "word" characters
drug.prices.wide$FormType <- str_extract(drug.prices.wide$Label_Name, "\\s\\w+") #eg Tab Cap
# For Dosage: match digit+nonspacealpha
# https://stackoverflow.com/questions/2192316/extract-a-regular-expression-match
drug.prices.wide$Dosage <- str_extract(drug.prices.wide$Label_Name, "\\d+[^\ ]*")
The goal is to take the 6 pairs of columns of AWP Prices and Dates and reduce them to 2 columns (AWP Price and AWP Date) and up to 6 rows; do the same for WAC. To apply the pivot_longer command efficiently these columns should be named like AWP1_AWP.Price and AWP1_AWP.Date etc.).
drugs.def <- drug.prices.wide %>% select(c(NDC:Qty), awp_per_30ds, wac_per_30ds, Dosage) #, DrugName, FormType, Dosage)
drugs.prices.only <- drug.prices.wide %>% select(c(NDC, contains("Price")))
drugs.dates.only <- drug.prices.wide %>% select(c(NDC, contains("Date")))
drug.prices.long <- pivot_longer(
drugs.prices.only,
cols = AWPPrice_1:WACPrice_6,
names_to = c(".value", "Type"), # notice the swap in order
names_pattern = "(.+)_(.+)",
# names_sep = "_",
values_drop_na = TRUE)
drug.dates.long <- pivot_longer(
drugs.dates.only,
cols = AWPDate_1:WACDate_6,
names_to = c(".value", "Type"),
names_sep = "_",
values_drop_na = TRUE)
Now re-join the 3 split tables.
drugs.long <- merge(drug.dates.long, drug.prices.long, by=c("NDC", "Type")) %>% unique()
drugs.long <- merge(drugs.def, drugs.long, by="NDC")
# drugs.long.1 <- drugs.long %>% filter(Type == "1")
Finally, add the Generic_Name and Mfr_Name columns from the drugs table.
drugs.final <- merge(drugs.long, drugs[, c("NDC", "Category", "Generic_Name", "Mfr_Name", "Brand")], by = "NDC", all.x=TRUE)
nameorder <- c("NDC", "Category", "Generic_Name", "Qty", "Dosage", "Mfr_Name", "Form", "LOB", "AWPPrice", "WACPrice", "AWPDate", "WACDate", "awp_per_30ds", "wac_per_30ds", "Label_Name", "Brand") #, "DrugName", "FormType", "FormType")
drugs.final <- drugs.final %>% mutate_if(is.character,as.factor) %>% select(nameorder) %>% ungroup()
drugs.final <- drugs.final %>%
mutate(NDC = as.character(NDC), Generic_Name = as.character(Generic_Name),
Mfr_Name = as.character(Mfr_Name), Dosage = as.character(Dosage),
Label_Name = as.character(Label_Name)) %>% arrange(Generic_Name, Qty, Dosage, Mfr_Name, Form)
Finally, save the data frames needed for analysis into an R data object.
kable(head(drugs.final))
NDC | Category | Generic_Name | Qty | Dosage | Mfr_Name | Form | LOB | AWPPrice | WACPrice | AWPDate | WACDate | awp_per_30ds | wac_per_30ds | Label_Name | Brand |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
47781034201 | Antidiabetics | ACARBOSE | 00100000 | 100MG | ALVOGEN | Tablet | Commercial | 1.2 | 0.62 | 2013-06-07 | 2013-06-07 | 104 | 56 | ACARBOSE TAB 100MG | Generic |
00054014225 | Antidiabetics | ACARBOSE | 00100000 | 100MG | HIKMA | Tablet | Medicare | 1.2 | 0.60 | 2008-05-07 | 2013-12-20 | 97 | 50 | ACARBOSE TAB 100MG | Generic |
00054014225 | Antidiabetics | ACARBOSE | 00100000 | 100MG | HIKMA | Tablet | Medicare | NA | 0.94 | NA | 2012-01-12 | 97 | 50 | ACARBOSE TAB 100MG | Generic |
00054014225 | Antidiabetics | ACARBOSE | 00100000 | 100MG | HIKMA | Tablet | Commercial | 1.2 | 0.60 | 2008-05-07 | 2013-12-20 | 96 | 50 | ACARBOSE TAB 100MG | Generic |
00054014225 | Antidiabetics | ACARBOSE | 00100000 | 100MG | HIKMA | Tablet | Commercial | NA | 0.94 | NA | 2012-01-12 | 96 | 50 | ACARBOSE TAB 100MG | Generic |
69543012210 | Antidiabetics | ACARBOSE | 00100000 | 100MG | VIRTUS PHARMACEUTICALS | Tablet | Medicare | 1.2 | 0.60 | 2015-05-20 | 2015-05-20 | 101 | 53 | ACARBOSE TAB 100MG | Generic |
save(drugs.final, drug.prices.wide, file="/cloud/project/DATA/BSC-GH-2021.RData")