AmeriSourceBergen

Peipei

2019/5/6

1 Background

Pharmacies typically engage with a single pharmaceutical distributor to purchase all the drugs they need. This type of arrangement is governed by a contract called a Prime Vendor Agreement (PVA), which the pharmacy enters into with the distributor for a specific number of years defined in the agreement itself. The PVA also stipulates what percentage of its drugs a pharmacy must buy from the distributor in order to be compliant with the agreement. A pharmacy may choose to purchase drugs from a distributor other than the one it has a PVA with; this is referred to as “leakage,” and it can sometimes be significant. Here we try to figure out which drugs are being bought outside the PVA from other distributors, hypothesize why this is happening, and recommend where AmerisourceBergen should concentrate efforts to reduce this leakage and regain lost revenue and profit.

2 Data

Extracted, anonymized and modified from the AmerisourceBergen Data Warehouse. The basic data set is Dispense Usage Transactions Top 50 GCN Seq Nbrs Week 201836. This is an Excel file that only contains 1 tab called “Trans”. Included in the data set are descriptions of key data elements or code fields

3 Data Process

3.1 Data Loading

library(readxl)
Dispense_Usage_Transactions_Top_50_GCN_Seq_Nbrs_Week_201836 <- read_excel("~/Desktop/Dispense Usage Transactions Top 50 GCN Seq Nbrs Week 201836.xlsx")
library(tidyverse) 
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0       ✔ purrr   0.3.1  
## ✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
data <- Dispense_Usage_Transactions_Top_50_GCN_Seq_Nbrs_Week_201836
print (Dispense_Usage_Transactions_Top_50_GCN_Seq_Nbrs_Week_201836)
## # A tibble: 89,516 x 43
##    WK_NBR WK_BEGIN_DTE WK_END_DTE NCPDP_TOKEN PHRMCY_NAME BUY_GRP_NAM NDC  
##    <chr>  <chr>        <chr>      <chr>       <chr>       <chr>       <chr>
##  1 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0007…
##  2 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0017…
##  3 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       1249…
##  4 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0000…
##  5 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0040…
##  6 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0059…
##  7 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0059…
##  8 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       5045…
##  9 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0007…
## 10 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0000…
## # … with 89,506 more rows, and 36 more variables: NDC_DESC <chr>,
## #   BRX_GRX_CD <chr>, NDC_QTY_MTRC <dbl>, NDC_PKG_SZ <dbl>,
## #   NDC_CASE_PK_QTY <dbl>, MFG_NAM <chr>, GCNSEQ_NBR <chr>,
## #   GCNSEQ_DESC <chr>, GCNSEQ_RTE_OF_ADMIN_DESC <chr>,
## #   GCNSEQ_DSG_FORM_CD <chr>, GCNSEQ_STR_DESC <chr>,
## #   DSPN_QTY_NDC_1WK <dbl>, DSPN_QTY_NDC_4WKS <dbl>,
## #   DSPN_QTY_NDC_13WKS <dbl>, DSPN_QTY_NDC_26WKS <dbl>,
## #   DSPN_QTY_GCN_1WK <dbl>, DSPN_QTY_GCN_4WKS <dbl>,
## #   DSPN_QTY_GCN_13WKS <dbl>, DSPN_QTY_GCN_26WKS <dbl>,
## #   PURCH_QTY_NDC_1WK_MTRC <dbl>, PURCH_QTY_NDC_4WKS_MTRC <dbl>,
## #   PURCH_QTY_NDC_13WKS_MTRC <dbl>, PURCH_QTY_NDC_26WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_1WK_MTRC <dbl>, PURCH_QTY_GCN_4WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_13WKS_MTRC <dbl>, PURCH_QTY_GCN_26WKS_MTRC <dbl>,
## #   COGS_PRC_MTRC <dbl>, DSPN_COGS_NDC_1WK <dbl>,
## #   DSPN_COGS_NDC_4WKS <dbl>, DSPN_COGS_NDC_13WKS <dbl>,
## #   DSPN_COGS_NDC_26WKS <dbl>, DSPN_COGS_GCN_1WK <dbl>,
## #   DSPN_COGS_GCN_4WKS <dbl>, DSPN_COGS_GCN_13WKS <dbl>,
## #   DSPN_COGS_GCN_26WKS <dbl>

3.2 Modified and Extracting the data

The data set contains NA values was omit, then get the mean of COGS PRC MTC(Unit Price products purchased from ABD) based on NDC_DESC group.with that with na value. The Code for this Process was listed below:

data_1 <- na.omit(data)
mean <- data_1 %>% 
  group_by (NDC_DESC) %>% 
  summarise ( Mean_COGS = mean(COGS_PRC_MTRC))

join_table <- inner_join (data_1, mean ,
   by = c ("NDC_DESC" = "NDC_DESC"))

print(join_table)
## # A tibble: 85,770 x 44
##    WK_NBR WK_BEGIN_DTE WK_END_DTE NCPDP_TOKEN PHRMCY_NAME BUY_GRP_NAM NDC  
##    <chr>  <chr>        <chr>      <chr>       <chr>       <chr>       <chr>
##  1 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0007…
##  2 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0017…
##  3 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       1249…
##  4 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0000…
##  5 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0040…
##  6 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0059…
##  7 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0059…
##  8 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       5045…
##  9 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0007…
## 10 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0000…
## # … with 85,760 more rows, and 37 more variables: NDC_DESC <chr>,
## #   BRX_GRX_CD <chr>, NDC_QTY_MTRC <dbl>, NDC_PKG_SZ <dbl>,
## #   NDC_CASE_PK_QTY <dbl>, MFG_NAM <chr>, GCNSEQ_NBR <chr>,
## #   GCNSEQ_DESC <chr>, GCNSEQ_RTE_OF_ADMIN_DESC <chr>,
## #   GCNSEQ_DSG_FORM_CD <chr>, GCNSEQ_STR_DESC <chr>,
## #   DSPN_QTY_NDC_1WK <dbl>, DSPN_QTY_NDC_4WKS <dbl>,
## #   DSPN_QTY_NDC_13WKS <dbl>, DSPN_QTY_NDC_26WKS <dbl>,
## #   DSPN_QTY_GCN_1WK <dbl>, DSPN_QTY_GCN_4WKS <dbl>,
## #   DSPN_QTY_GCN_13WKS <dbl>, DSPN_QTY_GCN_26WKS <dbl>,
## #   PURCH_QTY_NDC_1WK_MTRC <dbl>, PURCH_QTY_NDC_4WKS_MTRC <dbl>,
## #   PURCH_QTY_NDC_13WKS_MTRC <dbl>, PURCH_QTY_NDC_26WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_1WK_MTRC <dbl>, PURCH_QTY_GCN_4WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_13WKS_MTRC <dbl>, PURCH_QTY_GCN_26WKS_MTRC <dbl>,
## #   COGS_PRC_MTRC <dbl>, DSPN_COGS_NDC_1WK <dbl>,
## #   DSPN_COGS_NDC_4WKS <dbl>, DSPN_COGS_NDC_13WKS <dbl>,
## #   DSPN_COGS_NDC_26WKS <dbl>, DSPN_COGS_GCN_1WK <dbl>,
## #   DSPN_COGS_GCN_4WKS <dbl>, DSPN_COGS_GCN_13WKS <dbl>,
## #   DSPN_COGS_GCN_26WKS <dbl>, Mean_COGS <dbl>

3.3 Combine data

Combining group_by data set with original data, replace na value with the mean of COGS_PRC_MTRC.

join_table <- inner_join (data_1, mean ,
   by = c ("NDC_DESC" = "NDC_DESC"))

Data_2 <- join_table %>% 
  mutate(COGS_PRC_MTRC =
           ifelse( is.na(COGS_PRC_MTRC) == T,
                        Mean_COGS , COGS_PRC_MTRC))
print(Data_2)
## # A tibble: 85,770 x 44
##    WK_NBR WK_BEGIN_DTE WK_END_DTE NCPDP_TOKEN PHRMCY_NAME BUY_GRP_NAM NDC  
##    <chr>  <chr>        <chr>      <chr>       <chr>       <chr>       <chr>
##  1 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0007…
##  2 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0017…
##  3 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       1249…
##  4 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0000…
##  5 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0040…
##  6 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0059…
##  7 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0059…
##  8 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       5045…
##  9 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0007…
## 10 201836 2018-09-02   2018-09-08 4410677998… PHARMACY 1… SPURS       0000…
## # … with 85,760 more rows, and 37 more variables: NDC_DESC <chr>,
## #   BRX_GRX_CD <chr>, NDC_QTY_MTRC <dbl>, NDC_PKG_SZ <dbl>,
## #   NDC_CASE_PK_QTY <dbl>, MFG_NAM <chr>, GCNSEQ_NBR <chr>,
## #   GCNSEQ_DESC <chr>, GCNSEQ_RTE_OF_ADMIN_DESC <chr>,
## #   GCNSEQ_DSG_FORM_CD <chr>, GCNSEQ_STR_DESC <chr>,
## #   DSPN_QTY_NDC_1WK <dbl>, DSPN_QTY_NDC_4WKS <dbl>,
## #   DSPN_QTY_NDC_13WKS <dbl>, DSPN_QTY_NDC_26WKS <dbl>,
## #   DSPN_QTY_GCN_1WK <dbl>, DSPN_QTY_GCN_4WKS <dbl>,
## #   DSPN_QTY_GCN_13WKS <dbl>, DSPN_QTY_GCN_26WKS <dbl>,
## #   PURCH_QTY_NDC_1WK_MTRC <dbl>, PURCH_QTY_NDC_4WKS_MTRC <dbl>,
## #   PURCH_QTY_NDC_13WKS_MTRC <dbl>, PURCH_QTY_NDC_26WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_1WK_MTRC <dbl>, PURCH_QTY_GCN_4WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_13WKS_MTRC <dbl>, PURCH_QTY_GCN_26WKS_MTRC <dbl>,
## #   COGS_PRC_MTRC <dbl>, DSPN_COGS_NDC_1WK <dbl>,
## #   DSPN_COGS_NDC_4WKS <dbl>, DSPN_COGS_NDC_13WKS <dbl>,
## #   DSPN_COGS_NDC_26WKS <dbl>, DSPN_COGS_GCN_1WK <dbl>,
## #   DSPN_COGS_GCN_4WKS <dbl>, DSPN_COGS_GCN_13WKS <dbl>,
## #   DSPN_COGS_GCN_26WKS <dbl>, Mean_COGS <dbl>

3.4 Modified the data

Purchase quantity should not negative,remove all records where there are negative values in any of the following fields: PURCH_QTY_NDC_1WK_MTRC,PURCH_QTY_NDC_4WK_MTRC, PURCH_QTY_NDC_13WK_MTRC , and PURCH_QTY_NDC_26WK_MTRC)

data_3 <- Data_2 %>% 
  filter( PURCH_QTY_NDC_1WK_MTRC > 0, PURCH_QTY_NDC_4WKS_MTRC> 0, PURCH_QTY_NDC_13WKS_MTRC>0,PURCH_QTY_NDC_26WKS_MTRC>0) 
print(data_3)
## # A tibble: 32,609 x 44
##    WK_NBR WK_BEGIN_DTE WK_END_DTE NCPDP_TOKEN PHRMCY_NAME BUY_GRP_NAM NDC  
##    <chr>  <chr>        <chr>      <chr>       <chr>       <chr>       <chr>
##  1 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0017…
##  2 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    1249…
##  3 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0059…
##  4 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0007…
##  5 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0016…
##  6 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0018…
##  7 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    5374…
##  8 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    5931…
##  9 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0000…
## 10 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0016…
## # … with 32,599 more rows, and 37 more variables: NDC_DESC <chr>,
## #   BRX_GRX_CD <chr>, NDC_QTY_MTRC <dbl>, NDC_PKG_SZ <dbl>,
## #   NDC_CASE_PK_QTY <dbl>, MFG_NAM <chr>, GCNSEQ_NBR <chr>,
## #   GCNSEQ_DESC <chr>, GCNSEQ_RTE_OF_ADMIN_DESC <chr>,
## #   GCNSEQ_DSG_FORM_CD <chr>, GCNSEQ_STR_DESC <chr>,
## #   DSPN_QTY_NDC_1WK <dbl>, DSPN_QTY_NDC_4WKS <dbl>,
## #   DSPN_QTY_NDC_13WKS <dbl>, DSPN_QTY_NDC_26WKS <dbl>,
## #   DSPN_QTY_GCN_1WK <dbl>, DSPN_QTY_GCN_4WKS <dbl>,
## #   DSPN_QTY_GCN_13WKS <dbl>, DSPN_QTY_GCN_26WKS <dbl>,
## #   PURCH_QTY_NDC_1WK_MTRC <dbl>, PURCH_QTY_NDC_4WKS_MTRC <dbl>,
## #   PURCH_QTY_NDC_13WKS_MTRC <dbl>, PURCH_QTY_NDC_26WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_1WK_MTRC <dbl>, PURCH_QTY_GCN_4WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_13WKS_MTRC <dbl>, PURCH_QTY_GCN_26WKS_MTRC <dbl>,
## #   COGS_PRC_MTRC <dbl>, DSPN_COGS_NDC_1WK <dbl>,
## #   DSPN_COGS_NDC_4WKS <dbl>, DSPN_COGS_NDC_13WKS <dbl>,
## #   DSPN_COGS_NDC_26WKS <dbl>, DSPN_COGS_GCN_1WK <dbl>,
## #   DSPN_COGS_GCN_4WKS <dbl>, DSPN_COGS_GCN_13WKS <dbl>,
## #   DSPN_COGS_GCN_26WKS <dbl>, Mean_COGS <dbl>

3.5 Creating the Leakage

Each time interval Leakage are different, AmeriSourceBergen need the data to see if the pharmacy has a tendency to leakage, then standardize to a per week metric.

 data_4 <- data_3 %>% 
    mutate(Leakage_1WK = DSPN_COGS_NDC_1WK - PURCH_QTY_GCN_1WK_MTRC, 
           Leakage_4WK = DSPN_QTY_NDC_4WKS - PURCH_QTY_NDC_4WKS_MTRC,
           Leakage_13WK = DSPN_QTY_NDC_13WKS - PURCH_QTY_GCN_13WKS_MTRC,
           Leakage_26KW = DSPN_COGS_GCN_26WKS - PURCH_QTY_GCN_26WKS_MTRC)


data_5 <- data_4 %>% 
  mutate(Leakage_per_week_4WK = Leakage_4WK/4,
 Leakage_per_week_13WK = Leakage_13WK / 13,
 Leakage_per_week_26KW = Leakage_26KW/26)

print(data_5)
## # A tibble: 32,609 x 51
##    WK_NBR WK_BEGIN_DTE WK_END_DTE NCPDP_TOKEN PHRMCY_NAME BUY_GRP_NAM NDC  
##    <chr>  <chr>        <chr>      <chr>       <chr>       <chr>       <chr>
##  1 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0017…
##  2 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    1249…
##  3 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0059…
##  4 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0007…
##  5 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0016…
##  6 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0018…
##  7 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    5374…
##  8 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    5931…
##  9 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0000…
## 10 201836 2018-09-02   2018-09-08 5471446827… PHARMACY 1… CLIPPERS    0016…
## # … with 32,599 more rows, and 44 more variables: NDC_DESC <chr>,
## #   BRX_GRX_CD <chr>, NDC_QTY_MTRC <dbl>, NDC_PKG_SZ <dbl>,
## #   NDC_CASE_PK_QTY <dbl>, MFG_NAM <chr>, GCNSEQ_NBR <chr>,
## #   GCNSEQ_DESC <chr>, GCNSEQ_RTE_OF_ADMIN_DESC <chr>,
## #   GCNSEQ_DSG_FORM_CD <chr>, GCNSEQ_STR_DESC <chr>,
## #   DSPN_QTY_NDC_1WK <dbl>, DSPN_QTY_NDC_4WKS <dbl>,
## #   DSPN_QTY_NDC_13WKS <dbl>, DSPN_QTY_NDC_26WKS <dbl>,
## #   DSPN_QTY_GCN_1WK <dbl>, DSPN_QTY_GCN_4WKS <dbl>,
## #   DSPN_QTY_GCN_13WKS <dbl>, DSPN_QTY_GCN_26WKS <dbl>,
## #   PURCH_QTY_NDC_1WK_MTRC <dbl>, PURCH_QTY_NDC_4WKS_MTRC <dbl>,
## #   PURCH_QTY_NDC_13WKS_MTRC <dbl>, PURCH_QTY_NDC_26WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_1WK_MTRC <dbl>, PURCH_QTY_GCN_4WKS_MTRC <dbl>,
## #   PURCH_QTY_GCN_13WKS_MTRC <dbl>, PURCH_QTY_GCN_26WKS_MTRC <dbl>,
## #   COGS_PRC_MTRC <dbl>, DSPN_COGS_NDC_1WK <dbl>,
## #   DSPN_COGS_NDC_4WKS <dbl>, DSPN_COGS_NDC_13WKS <dbl>,
## #   DSPN_COGS_NDC_26WKS <dbl>, DSPN_COGS_GCN_1WK <dbl>,
## #   DSPN_COGS_GCN_4WKS <dbl>, DSPN_COGS_GCN_13WKS <dbl>,
## #   DSPN_COGS_GCN_26WKS <dbl>, Mean_COGS <dbl>, Leakage_1WK <dbl>,
## #   Leakage_4WK <dbl>, Leakage_13WK <dbl>, Leakage_26KW <dbl>,
## #   Leakage_per_week_4WK <dbl>, Leakage_per_week_13WK <dbl>,
## #   Leakage_per_week_26KW <dbl>

4 Summarise

the mean per week leakage for each time interval

data_6 <- data_5 %>% 
          summarise (Mean_Leakage_1WK = mean(Leakage_1WK), 
               MeanLeakage_4WK = mean(Leakage_per_week_4WK),
               MeanLeakage_13WK = mean(Leakage_per_week_13WK),
               MeanLeakage_26KW = mean(Leakage_per_week_26KW))
               
print(data_6)
## # A tibble: 1 x 4
##   Mean_Leakage_1WK MeanLeakage_4WK MeanLeakage_13WK MeanLeakage_26KW
##              <dbl>           <dbl>            <dbl>            <dbl>
## 1            2071.           -5.95            -50.6            1209.

5 Prediction and Analyzing

Here is the prediction of the leakage changes with different interval

As we can see from the table 1wk,4wks,13wks,26wks, Leakage number are more less with the incrasing of the time interval. 1.first week can be leakage more, because they can have inventory from last month. 2.4 weeks leakage has least leakage, due to sold out the inventory, 3 13 weeks as it best weeks to make a decision as it is quarter and no inventory, no return orders.

6 Forcasting and Conslusion

Conslusion: We find the Pharmcy 1596 has 35% leackage based on 13WKS Leakage, It’s just need a a phone call will solve the problems. the rest are Pharmcy957, Pharmcy928,Pharmcy2439,Pharmcy385, those have the leakage percentage 2-3%. it is acceptable within 5%.

data_7 <- data_5 %>% 
   select( PHRMCY_NAME, DSPN_COGS_NDC_1WK,DSPN_COGS_NDC_4WKS, DSPN_COGS_NDC_13WKS, DSPN_COGS_NDC_26WKS,Leakage_13WK) 
           
data_8 <- data_7 %>% 
    group_by (PHRMCY_NAME) %>% 
        mutate(sum_DSPN_COGS_NDC_1WK = sum(DSPN_COGS_NDC_1WK),
                   sum_DSPN_COGS_NDC_4WKS = sum(DSPN_COGS_NDC_4WKS),
                   sum_DSPN_COGS_NDC_13WKS = sum(DSPN_COGS_NDC_13WKS),
                   sum_DSPN_COGS_NDC_26WKS = sum(DSPN_COGS_NDC_26WKS),
                   sum_Leakage_13WK = sum (Leakage_13WK))
data_9 <- data_8 %>% 
        distinct(data_9.PHRMCY_NAME, .keep_all = TRUE)
## Warning: Trying to compute distinct() for variables not found in the data:
## - `data_9.PHRMCY_NAME`
## This is an error, but only a warning is raised for compatibility reasons.
## The operation will return the input unchanged.
data_10 <-data_9 %>% 
    mutate(Total_DSPN_COGS_NDC = sum_DSPN_COGS_NDC_1WK + sum_DSPN_COGS_NDC_4WKS + sum_DSPN_COGS_NDC_13WKS + sum_DSPN_COGS_NDC_26WKS) %>% 
    mutate(percent_Leakage = sum_Leakage_13WK / Total_DSPN_COGS_NDC ) 
     
sort_by_percent_leakage <- arrange(data_10, desc(percent_Leakage))

print(sort_by_percent_leakage)
## # A tibble: 2,590 x 13
## # Groups:   PHRMCY_NAME [2,590]
##    PHRMCY_NAME DSPN_COGS_NDC_1… DSPN_COGS_NDC_4… DSPN_COGS_NDC_1…
##    <chr>                  <dbl>            <dbl>            <dbl>
##  1 PHARMACY 1…              0               130.             434.
##  2 PHARMACY 9…           2043.            10623.           29070.
##  3 PHARMACY 9…            809.             3235.            7885.
##  4 PHARMACY 2…           1610.            10062            24900.
##  5 PHARMACY 3…            153.              535.            1832.
##  6 PHARMACY 1…           1524.             8636.           25909.
##  7 PHARMACY 2…            699.              699.            2098.
##  8 PHARMACY 1…            764.             4239.           18137.
##  9 PHARMACY 5…             52.1             469.             886.
## 10 PHARMACY 1…              0                 0              569.
## # … with 2,580 more rows, and 9 more variables: DSPN_COGS_NDC_26WKS <dbl>,
## #   Leakage_13WK <dbl>, sum_DSPN_COGS_NDC_1WK <dbl>,
## #   sum_DSPN_COGS_NDC_4WKS <dbl>, sum_DSPN_COGS_NDC_13WKS <dbl>,
## #   sum_DSPN_COGS_NDC_26WKS <dbl>, sum_Leakage_13WK <dbl>,
## #   Total_DSPN_COGS_NDC <dbl>, percent_Leakage <dbl>