Introduction
I recently watched the Hulu series Dopesick, which depicted the shocking true story of America’s opioid epidemic. The show focuses on the large-scale over-prescribing of prescription opioids, Oxycontin in a fictional Appalachian town. The series speaks to me because I was once prescribed a giant bottle of hydrocodone after a jaw surgery in 2015. Aware of its mind-altering properties, I opted not to use it. One year later, at a professional conference in Pittsburgh, I learned about the FBI’s effort to tackle the epidemic in the region, including the crack-down of small pain clinics. It struck me how prevalent opioid addiction was and how easily an ordinary patient can become an opioid victim because of greedy pharmaceutical companies such as Purdue Pharma.
Because of the show and my personal experience, I am curious to investigate prescription opioid transactions in my community, Hamilton County of the State of Ohio. The questions I would like to answer are:
Who are the distributors of pain pills?
Who are the manufacturers of pain pills?
What is the scale of pain pills sales?
Which cities/areas are they sold to?
Which factors affect the number of pain pills shipped?
The original data set is from Washington Post. A team of reporters sifted through almost 380 million transactions from 2006 through 2014 in the Drug Enforcement Administration’s database and made the data available at state and county levels to help the public understand the national crisis. These transactions represent shipments from sales of oxycodone and hydrocodone pills to retail pharmacies, chain pharmacies and practitioners
The entire data set for Hamilton county includes 42 variables and 741,222 entries. Due to my computer’s memory capacity, I sampled 18 variables and 150,000 entries for my project. After parsing from the complete data set, I selected 18 variables and sampled 150,000 entries. The sample data included the following fields.
Data Dictionary
| Column | Variable | Definition |
|---|---|---|
| 1 | REPORTER_BUS_ACT | Type of business the reporter does, typically distributors or manufacturers. |
| 2 | REPORTER_NAME | Name of entity reporting shipments to the DEA. |
| 3 | REPORTER_STATE | State of entity reporting shipments to the DEA. |
| 4 | BUYER_BUS_ACT | Type of business the reporter does. |
| 5 | BUYER_CITY | City of entity receiving shipments from reporter. |
| 6 | BUYER_ZIP | Zip code of entity receiving shipments from reporter. |
| 7 | DRUG_NAME | Name of drug corresponding with DRUG_CODE. |
| 8 | QUANTITY | Number of packages, weight or volume of shipment. |
| 9 | UNIT | Unit of measurement for QUANTITY. Values include 1: Micrograms, 2: Milligrams, 3: Grams, 4: Kilograms, 5: Milliliters, 6: Liters, D: Dozens, K: Thousands. |
| 10 | STRENGTH | One of three values: “(1) the purity of a bulk rawmaterial (2) the fractional portion of a standard NDC package size or (3) the percentage by which a package exceeds a standard NDC package size.” |
| 11 | TRANSACTION_DATE | Date shipment occurred. |
| 12 | CALC_BASE_WT_IN_GM | DEA added field indicating the total active weight of the drug in the transaction, in grams. |
| 13 | DOSAGE_UNIT | DEA calculated field indicating number of pills, patches or lozenges, among others, shipped as part of the transaction. |
| 14 | TRANSACTION_ID | Unique record of transaction. |
| 15 | Product_Name | Trade name of NDC_NO. |
| 16 | Ingredient_Name | Name of the active ingredient in the drug shipped. |
| 17 | MME_Conversion_Factor | Morphine Milligram Equivalent, or how the specific drug compares to a morphine equivilent. |
| 18 | dos_str | Strength of dose in milligrams. |
Below is a data table for the first 20 entries of the sample data.
Within the sample data, there are 66 million pills shipped. The total active weight of the drug is approximately 490kg (1,080 pounds).
| Total number of pills shipped | Total active weight of the drug in grams |
|---|---|
| 66,341,335 | 489,975.80 |
Part 2: Descriptive Analysis
2.1 Top 5 Distributors and Top 5 Manufacturers for Pain Pills
With the sample data, we can see Walgreens ordered approximately 28% of total pills (18 million pills) in Hamilton county from 2006 to 2014, quadrupling its competitor CVS. Cardinal Health is a multinational health care services company. It is based in Dublin, Ohio, therefore it has a strong presence in Ohio. Cardinal Health ordered 17 million pills. The top 5 distributors ordered about 56 million pills, which represented 84% of the total pills shipped.
In 2013, Walgreens agreed to pay a record settlement of $80 million for committing an unprecedented number of record-keeping and dispensing violations.
## # A tibble: 5 x 2
## `hamilton$REPORTER_NAME` COUNTS
## <chr> <dbl>
## 1 WALGREEN CO 18498990
## 2 CARDINAL HEALTH 17011470
## 3 KROGER LIMITED PARTNERSHIP II 8903090
## 4 AMERISOURCEBERGEN DRUG CORP 7045990
## 5 CVS TN DISTRIBUTION, LLC 4460200
The scale of manufacturers’ orders is much smaller. Below is the top five manufacturers.
## # A tibble: 5 x 2
## `hamilton$REPORTER_NAME` COUNTS
## <chr> <dbl>
## 1 THE COGHLAN GROUP, INC. 2800
## 2 DRX PHARMACEUTICAL CONSULTANTS, INC. 2559
## 3 FISHER CLINICAL SERVICES 2476
## 4 A-S MEDICATION SOLUTIONS LLC 1695
## 5 DISPENSING SOLUTIONS 1224
2.2.1 The Number of Pain Pills Shipped by Year
From 2006 to 2014, the peak of pain pills transactions appeared in 2011. A total of 8 million pills were shipped. Based on the CDC’s data on overdose death rates involving opioids in the United States, the death rate also reached 4.9 deaths per 100,100 people in 2011. The national overdose death rates trend mimicked the pain pills sales trend during the same time frame.
2.2.2 Ingredients Distribution by Year
There are three active ingredients in the drug shipped: hydrocodone, oxycodone and polyethylene glycol (PEG). Hydrocodone was the most common active ingredient in 2009, while oxycodone becomes increasingly widespread in 2011 and 2012. Polyethylene glycol is a type of osmotic laxative. Opioids are notorious for causing constipation. We can find a few hundred pills in 2012 and 2013 contain this ingredient.
2.3 Top Buyer Cities within Hamilton County
It is not surprising that buyers in Cincinnati ordered the highest number of pain pills since it has a population of 301,394 compared to Harrison (12,006). The outlier in the list is the small village Cleves. The population was only 3,354 in 2014. But its total pain pills quantity exceeded Sharonville, which has a population of 14,133.
## # A tibble: 5 x 2
## `hamilton$BUYER_CITY` COUNTS
## <chr> <dbl>
## 1 CINCINNATI 51955453
## 2 HARRISON 2984220
## 3 NORWOOD 2951610
## 4 CLEVES 1583310
## 5 SHARONVILLE 1134510
2.4 On What Day of the Week Are Transactions Most Often Reported?
The bar chart below shows that transactions were mainly reported on weekdays, especially Tuesdays.
2.5.1 The Average Total Active Weight by Year in Milligrams
I was interested to learn the change of the total active weight in drugs over time. By calculating the average active weight by year, I noticed there was a 12% decrease in active weight in 2011, dropping from 10.77mg to 9.44mg. 2011 is the year CDC calls deaths from prescription opioids “an epidemic”.
## # A tibble: 9 x 2
## Year average
## <fct> <dbl>
## 1 2006 10.7
## 2 2007 10.6
## 3 2008 10.4
## 4 2009 10.6
## 5 2010 10.8
## 6 2011 9.44
## 7 2012 9.23
## 8 2013 8.86
## 9 2014 9.54
2.5.2 Box Plots of MME/unit by Dosage Levels
Besides the active ingredient weight in each pill, I was also curious about the number of different dosage levels. Dosage level is a new dummy variable I created by dividing dos_str by DOSAGE_UNIT.
CDC states doses beyond 50 MME/day increase the chances of addiction to opioids while adding little benefit, and clinicians should avoid increasing dosage to beyond 90 MME/day.
Although we do not know how many unit patients eventually take per day, it is still helpful to visualize a side-by-side comparison by dosage level.
The median for high dosage level (> 50 MME/unit) is around 80 MME/unit, and the max can go up to 160 MME/unit. There is also more variance in MME/unit for the high dosage level. The median for the low dosage level is approximately 10 MME/unit.
Part 3: Secondary Data Source
Although I examined the top buyer cities in part 2.3, I wanted to have a more intuitive visual demonstrating where pain pills were shipped to. I used Google Maps Geocoding API to obtain the latitude and longitude information for all unique zip codes in the data, then added points with latitude/longitude coordinates on the Hamilton County map. The size of the red dots represente the quantity of pain pills shipped.
I also used one of the unstructured vector, Product_Name, to examine the number of pills that contain acetaminophen (Tylenol, generic). A study in 2012 has shown a special form of liver injury linked to opioid use occurs with their fixed drug combinations with acetaminophen. It turned out 45 million out of 66 million pills contains acetaminophen.
## # A tibble: 2 x 2
## ACET sum
## <dbl> <dbl>
## 1 0 20386375
## 2 1 45954960
Part 4: Prescriptive Analysis
The question I want to solve is which factors may affect the number of pain pills shipped. I suspect the origirnal data set will have many cases of multicollinearity due to the nature of the variables. For example, QUANTITY (Number of packages, weight, or volume of shipment) will directly affect the DOSAGE_UNIT (total number of pills). Therefore, I decided to introduce an additional source, the monthly Cincinnati unemployment rate, and aggregate the potential variables by calculating the monthly total or average. The data set for part 4 consists Year-Month, Sum Dosage, Avg Dosage, Sum Quantity, Avg Active Weight, Avg MME, Avg Dose Strength and Unemployment Rate.
By plotting the basic correlation matrix, besides the obvious cases of multicollinearity (e.g. Avg Dosage and Sum Quantity), it appears Sum Dosage is most highly correlated with Unemployment Rate and Avg Dose Strength.
A more in-depth look of these two independent variables.
##
## Call:
## lm(formula = hamilton_monthly_numeric$`Sum Dosage` ~ ., data = hamilton_monthly_reg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -178938 -39584 681 39191 158455
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 245895.92 70419.39 3.492 0.000703 ***
## `Avg Dose Strength` 53.59 16.92 3.167 0.002016 **
## `Unemployment Rate` 17978.79 3811.20 4.717 0.00000739 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 69410 on 105 degrees of freedom
## Multiple R-squared: 0.3247, Adjusted R-squared: 0.3118
## F-statistic: 25.24 on 2 and 105 DF, p-value: 0.000000001122
##
## Regression Results
## ===============================================
## Dependent variable:
## ---------------------------
## `Sum Dosage`
## -----------------------------------------------
## `Avg Dose Strength` 53.588*** (16.920)
## `Unemployment Rate` 17,978.790*** (3,811.196)
## Constant 245,895.900*** (70,419.390)
## -----------------------------------------------
## Observations 108
## R2 0.325
## Adjusted R2 0.312
## Residual Std. Error 69,408.340 (df = 105)
## F Statistic 25.238*** (df = 2; 105)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
By subsetting the selected two variables, the Adjusted \(R^2\) is .31. Although both variables are statistically significant, the adjusted R squared indicates the model only explains 31% of the variability of the response data around its mean.
The 2 variable model is as follows: \[Sum Dosage = \alpha_i + Avg Dose Strength_i + Unemployment Rate_i\]