Introduction

The Centers for Medicare and Medicaid services is the governmental agency that manages both medicare and medicaid. Medicare is traditionally used by those over 65 years of age or have qualifying disabilities. Medicare Part D is specifically a program reimbursing drug costs. Drug costs do not make up a majority of spending on healthcare in the US (17.9% of GDP), but it has been an area that has been outpacing the growth of other areas. Are we able to deciper any insights to help explain the rising drug costs?

Details of the Data

The Part D Prescriber Public Use File (PUF) provides information on prescription drugs prescribed by individual physicians and other health care providers and paid for under the Medicare Part D Prescription Drug Program. Medicare Part D is the prescription drug component of Medicare which is mostly utilized by those over 65 years of age along with those who have permanent disability and other specific conditions.

Summary of the dataset

Variables in Dataset Variable Type Explanation
Prscrbr_Geo_Lvl Character Identifies the level of geography that the data in the row has been aggregated. A value of ‘State’ indicates the data in the row is aggregated to a single state identified in the Referring Provider State column for a given HCPCS Code Level. A value of ‘National’ indicates the data in the row is aggregated across all states for a given HCPCS Code Level.
Prscrbr_Geo_Cd Character FIPS code of the referring provider state. This variable is blank when reported at the national level.
Prscrbr_Geo_Desc Character The state name where the provider is located, as reported in NPPES. The values include the 50 United States, District of Columbia, U.S. territories, Armed Forces areas, Unknown and Foreign Country. Data aggregated at the National level are identified by the word ‘National’.
Brnd_Name Character The trademarked name of the drug filled.
Gnrc_Name Character A term referring to the chemical ingredient of a drug rather than the trademarked brand name under which the drug is sold.
Tot_Prscrbrs Number Number of unique prescribers of Medicare Part D claims.
Tot_Clms Number The number of Medicare Part D claims. This includes original prescriptions and refills. Aggregated records based on Tot_Clms fewer than 11 are not included in the data file.
Tot_30_Day_Fills Number Number of Standardized 30-Day Fills, Including Refills
Tot_Drug_Cst Number The aggregate drug cost paid for all associated claims. This amount includes ingredient cost, dispensing fee, sales tax, and any applicable vaccine administration fees and is based on the amounts paid by the Part D plan, Medicare beneficiary, government subsidies, and any other third-party payers.
Tot_Benes Number The total number of unique Medicare Part D beneficiaries with at least one claim for the drug. Counts fewer than 11 are suppressed and are indicated by a blank.
GE65_Sprsn_Flag Number A flag that indicates the reason the GE65_Tot_Clms, GE65_Tot_30day_Fills, GE65_Tot_Drug_Cst and GE65_Tot_Day_Suply variables are suppressed.
GE65_Tot_Clms Number The number of Medicare Part D claims for beneficiaries age 65 and older. This includes original prescriptions and refills. A blank indicates the value is suppressed. See GE65_Bene_Sprsn_Flag regarding suppression of data.
GE65_Tot_30day_Fills Number The number of Medicare Part D standardized 30-day fills for beneficiaries age 65 and older. The standardized 30-day fill is derived from the number of days supplied on each Part D claim divided by 30. Standardized 30-day fill values less than 1.0 were bottom-coded with a value of 1.0 and standardized 30-day fill values greater than 12.0 were top-coded with a value of 12.0. If GE65_Tot_Clms is suppressed, this variable is suppressed. A blank indicates the value is suppressed. GE65_Bene_Sprsn_Flag regarding suppression of data.
GE65_Tot_Drug_Cst Number TThe aggregate total drug cost paid for all associated claims for beneficiaries age 65 and older. This amount includes ingredient cost, dispensing fee, sales tax, and any applicable vaccine administration fees and is based on the amounts paid by the Part D plan, Medicare beneficiary, government subsidies, and any other third-party payers. If GE65_Tot_Clms is suppressed, this variable is suppressed. A blank indicates the value is suppressed. See GE65_Bene_Sprsn_Flag regarding suppression of data.
GE65_Bene_Sprsn_Flag Character A flag indicating the reason the GE65_Tot_Benes variable is suppressed.
GE65_Tot_Benes Character The total number of unique Medicare Part D beneficiaries age 65 and older with at least one claim for the drug. A blank indicates the value is suppressed. See GE65_Bene_Sprsn_Flag regarding suppression of data.
LIS_Bene_Cst_Shr Number The aggregate total cost that beneficiaries using a drug, with a low-income subsidy, paid during the year.
NonLIS_Bene_Cst_Shr Number The aggregate total cost that beneficiaries using a drug, with no low-income subsidy, paid during the year.
Opioid_Drug_Flag Charcter A flag indicating whether drugs in this Drug Name/ Generic Name combination are identified as an opioid drug. The list for opioids are based upon drugs included in the Medicare Part D Overutilization Monitoring System (OMS). The list originates from the Centers for Disease Control and Prevention.
Opioid_LA_Drug_Flag Character A flag indicating whether drugs in this Drug Name/ Generic Name combination are identified as an long-acting opioid drug. The list for long-acting opioids are based upon drugs included in the Medicare Part D Overutilization Monitoring System (OMS). Those drugs were then identified by the National Center for Injury Prevention and Control. CDC compilation of benzodiazepines, muscle relaxants, stimulants, zolpidem, and opioid analgesics with oral morphine milligram equivalent conversion factors, 2018 version.
Antbtc_Drug_Flag Character A flag indicating whether drugs in this Drug Name/ Generic Name combination are identified as an antibiotic drug. The list for antibiotics was created by identifying antibiotic subcategories with the exclusion of the following types of products: tuberculosis agents, antimalarials, topical agents (topical ophthalmic, optic, vaginal, and dermatological agents, etc.).
Antpsyct_Drug_Flag Character A flag indicating whether drugs in this Drug Name/ Generic Name combination are identified as an antipsychotic drug. The list for antipsychotics was created by identifying antipsychotic subcategories, including first and second generation antipsychotics, as well as antipsychotics included in combination with other drugs, (e.g., OLANZAPINE/FLUOXETINE HCL).

Descriptive Analytics

Spending by State - Total Drug Costs

When we take a look at how spending occurs accross states it is apparent that the larger populous states have the higher rate of spending. However when we take a deeper look at a cost per beneficiary standpoint smaller states and specifically the Disctrict of Columbia (Washington, DC) and Connecticut rise to the top of the list.

One note, the total beneficiaries are excluded for drugs where the total number of beneficiaries is <10. However, this represents a very small portion of the total drug costs and as such does not have a significant impact on the analysis. It is important for us to leave the costs of these drugs in the data as it does contribute to the overall cost of Medicare Part D reimbursements in the US.

Spending by State - Cost/Beneficiary

If we take a deeper look at how spending plays out accross the country we can use cost per beneficiary to get a sense. The below map indicates the total Cost/Beneficiary accross the United States. The highest cost per beneficiary happen to occur in the North East.

Top Drugs

What are the top drugs adding to the cost of Medicare Part D?

Eliquis - Blood clot + Stroke
Revlimid - Cancer
Xarelto - Blood Clots
Januvia - Diabetes (Metformin)
Lantus Solostar - Diabetes (Insulin)
Imbruvica - Cancer
Trulicity - Diabetes
Lyrica - Nerve Pain (Fibromyalgia)
Sybicort - Steroid (COPD)
Novolog Flexpen - Diabetes (Insulin)

As you can see the top 10 drugs make up $65bn of the total $365bn in Medicare Part D Drug spending. The opportunity to reduce costs of drugs is the reduction in Diabetes, Cancer, and Heart related diagnoses. Clearly lowering the cost of drugs through negotiation is possible but seeing the repeating diseases causing the spend indicates a likely more impactful path is improving the health of Americans in these areas. Instead of investing in reducing drug costs the investment against health would likely lead to bigger gains.

Distribution of Claim Costs

One interesting view is when looking at the Total Drug Costs divided by the Total Claims submitted for that drug. We see a potentially expected distribution as the total drugs in each category declines as the overall cost increases. However, we get to $500+ and the count rises significantly. Why is this? In reviewing the drugs the traditional everyday drugs along with generics are driving the costs down. However, specialty drugs and those that are used for more rare disease come with a higher cost. One aspect of drug costs that is to be considered is that the number of people taking the drug potentially impacts costs. Research and Development along with Clinical Trial costs have to be absorbed byt he manufacturer and if the number of beneficiaries or claims is low the cost is likely to be higher to offset the investment in the drug.

Opoiod Driving Cost

Do opioids have an impact on the cost of drugs? Reviewing the median of drugs with or without an Opioid indicator shows that there is a potential higher median cost for drugs identified as Opioids. As manufacturers of Opioid drugs face continued government scrutiny is this having an impact on the associated costs? Could higher Opioid drug costs potentially lead those who develop addictions into going to the streets for cheaper drugs that have potentially devastating impacts due to the concern of drug lacing with dangerous compounds such as fentanyl.

Secondary Unstructured Data

Sentiment Analysis - Eliquis Tweets

As we noted in our previous analysis Eliquis is the highest dollar cost prescription drug reimbursed by medicare. This drug as an inhibitor to stroke and blood clots is critical for patients with the associated risk factors for those conditions. One interesting view is looking at the overall senitment of people tweeting about Eliquis. Overall the sentiment is positive with trust showing a high frequency. However, as a drug it is certainly potentially negative to have anticipation as it might indicate a person is concerned with taking it. You do not see a high level of anger and disgust as one might expect for a very commonly prescribed drug that results in a significant level of spending each year.

Predictive Analytics

Exploratory

Reviewing the correlations between variables shows nothing unexpected as the total drug costs are driven by the total number of claims, beneficiaries, and the prescribers. This is intuitive as drug costs are tied to a per claim, per beneficiary (multiple claims), or total presribers (Claims + Beneficiaries).

Regression

Reviewing the correlations between variables shows nothing unexpected as the total drug costs are driven by the total number of claims, beneficiaries, and the prescribers. This is intuitive as drug costs are tied to a per claim, per beneficiary (multiple claims), or total presribers (Claims + Beneficiaries).

In looking at a regression equation we can get to a R2 of 11% by using Beneficiaries, Total Claims, and Total Prescribers and their impact on Total Drug Costs. This is intuitive and not unnecessarily insightful. There is likely additional insights to be gained to further develop an additional regression analysis that is not contained within the medicare part D dataset. In reviewing the CMS site and their own analytics there are more components around demographics that are helpful for their own analysis on drug costs along with the conditions the drugs are prescribed for. Another consideraiton is that the total number of claims and beneficiaries only resulting in 11% explnation in Total Drug costs indicates that there is a lot of variability between the different drugs as you would expect the R2 to be much higher.

## 
## Call:
## lm(formula = partd2019national$Tot_Drug_Cst ~ partd2019national$Tot_Benes + 
##     partd2019national$Tot_Clms + partd2019national$Tot_Prscrbrs, 
##     data = partd2019national)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -555584558  -27641846  -26193405  -22893377 6605190082 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     2.618e+07  4.283e+06   6.111  1.1e-09 ***
## partd2019national$Tot_Benes    -2.933e+02  2.525e+01 -11.614  < 2e-16 ***
## partd2019national$Tot_Clms      6.190e+01  5.298e+00  11.682  < 2e-16 ***
## partd2019national$Tot_Prscrbrs  1.414e+03  1.013e+02  13.957  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 231800000 on 3377 degrees of freedom
##   (163 observations deleted due to missingness)
## Multiple R-squared:  0.1105, Adjusted R-squared:  0.1097 
## F-statistic: 139.8 on 3 and 3377 DF,  p-value: < 2.2e-16