Aktana Case Study

Business Question:

Let’s pretend we are helping our customer, a pharmaceutical company, optimize their sales and marketing activity.

Our customer sells Product A, and it competes against Product B and Product C

Our customer has a team of sales representatives that go to visit doctors on a regular basis and promote Product A.

In 2017 our customer developed a new sales message, but only gave it to half of their sales representatives (those who cover doctors in group #2). They told those sales representatives to use the new message all year, while the sales representatives who cover doctors in group #1 continued to use the same message as in the previous year"

Hypothesis:

1 - Explain any trends you find in the data and think are interesting
2 - Try to answer the question "“Was the new message (that was used on group #2) a successful message?”

Now that we have a working hypothesis, let’s take a look at the data.

Data Processing

Loading the required packages

library(readr)
library(reshape)
library(dplyr)
library(ggplot2)
library(ggpubr)
library(plotly)
library(tidyverse)

List_of_doctors <- read_csv("case study - List of doctors.csv")

Data Preparation

After loading into R, List of doctors dataframe has 9 columns with last 5 columns being NULL. Hence deleting last 5 columns

List_of_doctors <-  List_of_doctors[,1:4]

The data look something like this

head(List_of_doctors,5)

## # A tibble: 5 x 4
##   `Doctor ID` Role         Group   `Place of work`    
##         <dbl> <chr>        <chr>   <chr>              
## 1         101 Specialist   Group 1 Group practice     
## 2         102 Specialist   Group 1 Individual practice
## 3         103 Primary Care Group 2 Hospital           
## 4         104 Specialist   Group 2 Group practice     
## 5         105 Primary Care Group 1 Individual practice

Before doing data munging, let’s take a look at the structure

str(List_of_doctors)

## Classes 'tbl_df', 'tbl' and 'data.frame':    48 obs. of  4 variables:
##  $ Doctor ID    : num  101 102 103 104 105 106 107 108 109 110 ...
##  $ Role         : chr  "Specialist" "Specialist" "Primary Care" "Specialist" ...
##  $ Group        : chr  "Group 1" "Group 1" "Group 2" "Group 2" ...
##  $ Place of work: chr  "Group practice" "Individual practice" "Hospital" "Group practice" ...

summary(List_of_doctors)

##    Doctor ID         Role              Group           Place of work     
##  Min.   :101.0   Length:48          Length:48          Length:48         
##  1st Qu.:112.8   Class :character   Class :character   Class :character  
##  Median :124.5   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :124.5                                                           
##  3rd Qu.:136.2                                                           
##  Max.   :148.0

Looking at the structure of the data, most of the variables are considered as charecters . These should be categorial variables. Let’s use R factor function to convert these to categorial variables.

List_of_doctors$Role <- as.factor(List_of_doctors$Role)
List_of_doctors$Group <- as.factor(List_of_doctors$Group)
List_of_doctors$`Place of work` <- as.factor(List_of_doctors$`Place of work`)
List_of_doctors$`Doctor ID` <- as.factor(List_of_doctors$`Doctor ID`)

Checking the summary again after converting to factors

summary(List_of_doctors)

##    Doctor ID            Role        Group                Place of work
##  101    : 1   Primary Care:29   Group 1:22   Group practice     :16   
##  102    : 1   Specialist  :19   Group 2:26   Hospital           :16   
##  103    : 1                                  Individual practice:16   
##  104    : 1                                                           
##  105    : 1                                                           
##  106    : 1                                                           
##  (Other):42

Doctors by role
Primary Care 29 60%
Specialist 19 40%
It is usual that PCP’s are more than specialist doctors.

How many doctors were tried with the new message?

List_of_doctors %>% group_by(Group) %>% summarise(Doctors=n()) %>% mutate('% per'=Doctors/sum(Doctors) *100)

## # A tibble: 2 x 3
##   Group   Doctors `% per`
##   <fct>     <int>   <dbl>
## 1 Group 1      22    45.8
## 2 Group 2      26    54.2

new message is tried on more than half of the doctors.

Is there any preference for Role in trying the new message?

List_of_doctors %>% group_by(Role,Group) %>% summarise(Doctors=n())  %>% mutate('% per'=Doctors/sum(Doctors) *100)

## # A tibble: 4 x 4
## # Groups:   Role [2]
##   Role         Group   Doctors `% per`
##   <fct>        <fct>     <int>   <dbl>
## 1 Primary Care Group 1      10    34.5
## 2 Primary Care Group 2      19    65.5
## 3 Specialist   Group 1      12    63.2
## 4 Specialist   Group 2       7    36.8

PCP’s are preferred over specialist with more than 2/3rd of them have been tried with new message and only 1/3rd of specialist’s.

Loading the Prescriptions Data

Prescription_volumes <- read_csv("case study - Prescription volumes.csv")

Looking at the summary of data

summary(Prescription_volumes)

##    Doctor ID       Product             Month                nRx        
##  Min.   :101.0   Length:3459        Length:3459        Min.   :-20.00  
##  1st Qu.:113.0   Class :character   Class :character   1st Qu.: 13.00  
##  Median :125.0   Mode  :character   Mode  :character   Median : 20.00  
##  Mean   :124.5                                         Mean   : 21.27  
##  3rd Qu.:137.0                                         3rd Qu.: 29.00  
##  Max.   :148.0                                         Max.   :141.00  
##                                                        NA's   :6

Converting the charecter variables to factors and checking the summary of dataset

Prescription_volumes$Product <- as.factor(Prescription_volumes$Product)
Prescription_volumes$Month <- as.factor(Prescription_volumes$Month)
Prescription_volumes$`Doctor ID` <- as.factor(Prescription_volumes$`Doctor ID`)

summary(Prescription_volumes)

##    Doctor ID       Product         Month           nRx        
##  147    :  74   Brand A:1151   2016-09: 145   Min.   :-20.00  
##  115    :  73   Brand B:1154   2017-07: 145   1st Qu.: 13.00  
##  140    :  73   Brand C:1154   2017-12: 145   Median : 20.00  
##  101    :  72                  2016-01: 144   Mean   : 21.27  
##  102    :  72                  2016-02: 144   3rd Qu.: 29.00  
##  103    :  72                  2016-03: 144   Max.   :141.00  
##  (Other):3023                  (Other):2592   NA's   :6

Few Observations

Assuming that all 48 doctors having one record per month per product there should be 48*24 = 1,152 records per product. But, we are seeing extra records for Brand B and Brand C.
NRx having negative values, lets look at the distribution of NRx
NA’s in NRx column

Are the records repeated in Doctor, product and month combinations?

Prescription_volumes <- group_by(Prescription_volumes,Product,`Doctor ID`, Month) %>% mutate(s=1:n())  
Prescription_volumes %>% filter(s >1)

## # A tibble: 4 x 5
## # Groups:   Product, Doctor ID, Month [3]
##   `Doctor ID` Product Month     nRx     s
##   <fct>       <fct>   <fct>   <dbl> <int>
## 1 115         Brand C 2017-12    16     2
## 2 140         Brand C 2016-09     2     2
## 3 147         Brand B 2017-07     7     2
## 4 147         Brand B 2017-07     7     3

Deleting duplicate records

Prescription_volumes <- Prescription_volumes[! (Prescription_volumes$s > 1) , ]
Prescription_volumes$s <- NULL

Looking at the Distribution of NRx variable

qplot(Prescription_volumes$nRx,
      geom="histogram",
      binwidth = 5,  
      main = "Distribution of NRx", 
      xlab = "NRx", fill=I("blue"), col=I("blue"), 
      alpha=I(.2))

## Warning: Removed 6 rows containing non-finite values (stat_bin).

One reason for negative values in NRx could be restatements i.e scripts got rejected after the entire claim life cycle and was adjusted in database as negative values.

For this demonstration i am removing these records

Deleting negative NRx records

Prescription_volumes <-  subset(Prescription_volumes, nRx >= 0 | is.na(nRx))
summary(Prescription_volumes)

##    Doctor ID       Product         Month           nRx        
##  101    :  72   Brand A:1149   2016-02: 144   Min.   :  0.00  
##  102    :  72   Brand B:1150   2016-03: 144   1st Qu.: 13.00  
##  103    :  72   Brand C:1151   2016-04: 144   Median : 20.00  
##  104    :  72                  2016-05: 144   Mean   : 21.33  
##  105    :  72                  2016-06: 144   3rd Qu.: 29.00  
##  106    :  72                  2016-07: 144   Max.   :141.00  
##  (Other):3018                  (Other):2586   NA's   :6

Dealing with NA’s (missing values) in NRx.
For every doctor, we will calculate the median of NRx by product
And fill the missing values with median NRx for that product by same doctor.

NRx_by_Doc_and_Prod <- Prescription_volumes %>%  group_by(`Doctor ID`, Product) %>%
  summarise(median = median(nRx, na.rm = TRUE))
head(NRx_by_Doc_and_Prod,6)

## # A tibble: 6 x 3
## # Groups:   Doctor ID [2]
##   `Doctor ID` Product median
##   <fct>       <fct>    <dbl>
## 1 101         Brand A   20  
## 2 101         Brand B   19  
## 3 101         Brand C   21  
## 4 102         Brand A   18  
## 5 102         Brand B   17  
## 6 102         Brand C   20.5

NRx_NA <- which(is.na(Prescription_volumes$nRx))
head(NRx_NA,5)

## [1] 1139 2817 3039 3206 3279

For loop to replace each missing value in NRx with Median NRx for that product by same doctor

for(i in NRx_NA){
  Doc_ID_NA <- Prescription_volumes$`Doctor ID`[i]
  Product_NA <- Prescription_volumes$Product[i]
  NRx_median <- NRx_by_Doc_and_Prod[(NRx_by_Doc_and_Prod$`Doctor ID`==Doc_ID_NA[1]) & (NRx_by_Doc_and_Prod$Product == Product_NA[1]),'median']
  Prescription_volumes[i,'nRx'] = NRx_median[[1]]
}

There are two NRx values which are more than 100, which seems to be odd,we could replace them with median as the same procedure above

## [1] 3119 3404

Now that the dataset is clean. Checking the summary of dataset

summary(Prescription_volumes)

##    Doctor ID       Product         Month           nRx       
##  101    :  72   Brand A:1149   2016-02: 144   Min.   : 0.00  
##  102    :  72   Brand B:1150   2016-03: 144   1st Qu.:13.00  
##  103    :  72   Brand C:1151   2016-04: 144   Median :20.00  
##  104    :  72                  2016-05: 144   Mean   :21.26  
##  105    :  72                  2016-06: 144   3rd Qu.:29.00  
##  106    :  72                  2016-07: 144   Max.   :46.00  
##  (Other):3018                  (Other):2586

Exploratory Data Analysis

Let us join the datasets to get Master Dataset

PV_Master<-merge(x=Prescription_volumes,y=List_of_doctors,by="Doctor ID",all.x=TRUE)
summary(PV_Master)

##    Doctor ID       Product         Month           nRx       
##  101    :  72   Brand A:1149   2016-02: 144   Min.   : 0.00  
##  102    :  72   Brand B:1150   2016-03: 144   1st Qu.:13.00  
##  103    :  72   Brand C:1151   2016-04: 144   Median :20.00  
##  104    :  72                  2016-05: 144   Mean   :21.26  
##  105    :  72                  2016-06: 144   3rd Qu.:29.00  
##  106    :  72                  2016-07: 144   Max.   :46.00  
##  (Other):3018                  (Other):2586                  
##            Role          Group                  Place of work 
##  Primary Care:2084   Group 1:1582   Group practice     :1152  
##  Specialist  :1366   Group 2:1868   Hospital           :1146  
##                                     Individual practice:1152  
##                                                               
##                                                               
##                                                               
##

Looking at NRx Sales Overtime across products

NRx_by_Product_Mon <- PV_Master %>% group_by(Product,Month) %>% summarise(NRx_Total =sum(nRx),Docs =n_distinct(`Doctor ID`), NRx_Depth = sum(nRx)/n_distinct(`Doctor ID`))

P_NRx_by_Prod_Mon <- ggplot(data=NRx_by_Product_Mon, aes(x=Month, y=NRx_Total, group=Product)) +
  geom_line(aes(color=Product))+  geom_point(aes(color=Product)) + ylim(0, 1500)

P_NRx_by_Prod_Mon

We already saw this in workbook, and can see here that Brand A and and Brand B are performing well overtime where as Brand C isn’t(did it went LOE)? or the increase in sales among Brand A and B have heavily affected the sales of Brand C in the year 2017

As we saw change overtime, calculating the Year Over Year NRx Volume change for 3 Brands

adding year column to find year over year growth in NRx sales

PV_Master <- PV_Master %>% mutate(year = substr(PV_Master$Month,1,4))
PV_Master$year <- as.factor(PV_Master$year)

YOY_NRx_Vol_by_Prodcuct <- PV_Master %>% group_by(Product,year) %>% summarise(NRx_Total =sum(nRx))
YOY_NRx_Vol_by_Prodcuct <- YOY_NRx_Vol_by_Prodcuct %>% spread(year,NRx_Total, convert = FALSE) %>%   mutate(vol_change = `2017`-`2016`, per_change=((`2017`-`2016`)*100)/`2016`)
YOY_NRx_Vol_by_Prodcuct

## # A tibble: 3 x 5
## # Groups:   Product [3]
##   Product `2016` `2017` vol_change per_change
##   <fct>    <dbl>  <dbl>      <dbl>      <dbl>
## 1 Brand A   8647  14043       5396       62.4
## 2 Brand B  10387  16249       5862       56.4
## 3 Brand C  12928  11103      -1825      -14.1

Bullet chart to visualise change in volume

P_YOY_NRx_by_Prod

Green Bar is 2017 NRx Sales, Black Line is 2016 NRx Sales

Looking at the Depth of Writing(Productivity) = Scripts per Doctor , Has the productivity improved overtime across the Product by Role?

YOY_NRx_Depth_by_Product_Role <- PV_Master %>% group_by(Product,year, Role) %>% summarise(NRx_Total =sum(nRx),Docs =n_distinct(`Doctor ID`), NRx_Depth = sum(nRx)/n_distinct(`Doctor ID`))

YOY_NRx_Depth_by_Product_Role

## # A tibble: 12 x 6
## # Groups:   Product, year [6]
##    Product year  Role         NRx_Total  Docs NRx_Depth
##    <fct>   <fct> <fct>            <dbl> <int>     <dbl>
##  1 Brand A 2016  Primary Care      5015    29      173.
##  2 Brand A 2016  Specialist        3632    19      191.
##  3 Brand A 2017  Primary Care      8924    29      308.
##  4 Brand A 2017  Specialist        5119    19      269.
##  5 Brand B 2016  Primary Care      6339    29      219.
##  6 Brand B 2016  Specialist        4048    19      213.
##  7 Brand B 2017  Primary Care     10539    29      363.
##  8 Brand B 2017  Specialist        5710    19      301.
##  9 Brand C 2016  Primary Care      8268    29      285.
## 10 Brand C 2016  Specialist        4660    19      245.
## 11 Brand C 2017  Primary Care      6984    29      241.
## 12 Brand C 2017  Specialist        4119    19      217.

P_YOY_NRx_Depth_by_Product_Role <- ggplot(data=YOY_NRx_Depth_by_Product_Role, 
                                          aes(x=factor(year), y=NRx_Depth, 
                                              group=Role, shape=Role, color=Role)) + 
  geom_line() + 
  geom_point() +
  scale_x_discrete("Year") +
  scale_y_continuous("NRx Depth (Productivity)") + 
  facet_grid(.~Product )

P_YOY_NRx_Depth_by_Product_Role

Earlier we saw that new message was deployed more among PCP’s which could be the reason for phenominal increase in Depth(Productivity) of writing.

Now, let us try to answer the question Was the new message (that was used on group #2) a successful message?"

Insights

1 Filtering Brand A NRx sales

Prod_A_PV <- PV_Master %>% filter(Product=="Brand A")
Prod_A_PV_by_Grp_Year <- Prod_A_PV%>% group_by(Group,year) %>% summarise(nRx=sum(nRx))
Prod_A_PV_by_Grp_Year %>% spread(year, nRx, convert = FALSE) %>% mutate(vol_change =`2017`-`2016` ,Increase_percentage=((`2017`-`2016`)*100)/`2016`)

## # A tibble: 2 x 5
## # Groups:   Group [2]
##   Group   `2016` `2017` vol_change Increase_percentage
##   <fct>    <dbl>  <dbl>      <dbl>               <dbl>
## 1 Group 1   4198   6066       1868                44.5
## 2 Group 2   4449   7977       3528                79.3

Group 1 sales increased by 44% from previous year
Group 2 sales increased by ~80% from previous year
45% of the doctors belongs to Group 1
55% of the doctors belongs to Group 2
So ultimately message desgined to test the Group 2 doctors to increase the sales has worked well

2

Prod_A_PV_by_Grp_Year_Role <- Prod_A_PV %>% group_by(Group,year,Role) %>% summarise(nRx=sum(nRx))
Prod_A_PV_by_Grp_Year_Role  <- Prod_A_PV_by_Grp_Year_Role %>% spread(year, nRx, convert = FALSE) %>% mutate(vol_change =`2017`-`2016`, increase_percentage=((`2017`-`2016`)*100)/`2016`)
Prod_A_PV_by_Grp_Year_Role

## # A tibble: 4 x 6
## # Groups:   Group [2]
##   Group   Role         `2016` `2017` vol_change increase_percentage
##   <fct>   <fct>         <dbl>  <dbl>      <dbl>               <dbl>
## 1 Group 1 Primary Care   1929   3079       1150                59.6
## 2 Group 1 Specialist     2269   2987        718                31.6
## 3 Group 2 Primary Care   3086   5845       2759                89.4
## 4 Group 2 Specialist     1363   2132        769                56.4

P_Prod_A_PV_by_Grp_Year_Role <- ggplot(Prod_A_PV_by_Grp_Year_Role, aes(x=Group, y=increase_percentage,fill = Role)) + geom_bar(stat="identity", position = "dodge") + ggtitle( 'YoY % change in NRx sales by Group')

P_Prod_A_PV_by_Grp_Year_Role

The result shows that with new message for Group 2 the percentage of increase in sales from previous year is higher than Group 1 sales
Especially PCP’s have contributed more than specialist with higher number of sales due to higher number in count as 60% of the doctors are from primary care

3

Prod_A_PV_by_Grp_Year_Role_POW

## # A tibble: 12 x 7
## # Groups:   Group, Role [4]
##    Group  Role      `Place of work`   `2016` `2017` vol_change increase_percent…
##    <fct>  <fct>     <fct>              <dbl>  <dbl>      <dbl>             <dbl>
##  1 Group… Primary … Group practice       764   1253        489              64.0
##  2 Group… Primary … Hospital             233    332         99              42.5
##  3 Group… Primary … Individual pract…    932   1494        562              60.3
##  4 Group… Speciali… Group practice       704    783         79              11.2
##  5 Group… Speciali… Hospital             862   1238        376              43.6
##  6 Group… Speciali… Individual pract…    703    966        263              37.4
##  7 Group… Primary … Group practice       792   1561        769              97.1
##  8 Group… Primary … Hospital            1350   2409       1059              78.4
##  9 Group… Primary … Individual pract…    944   1875        931              98.6
## 10 Group… Speciali… Group practice       783   1112        329              42.0
## 11 Group… Speciali… Hospital             354    694        340              96.0
## 12 Group… Speciali… Individual pract…    226    326        100              44.2

We can clearly see all the doctors from group 2 have made higher number of sales compared to the doctors in Group 1.
The sales of product A by the PCP doctors in Individual practice workplace reached the maximum of 98% with the new message strategy.

4 Filtering Brand B NRx sales

Prod_B_PV <- PV_Master %>% filter(Product=="Brand B")
Prod_B_PV_by_Grp_Year_Role_POW  <- Prod_B_PV%>% group_by(Group,year,Role,`Place of work`) %>% summarise(nRx=sum(nRx))
Prod_B_PV_by_Grp_Year_Role_POW %>% spread(year, nRx, convert = FALSE) %>% mutate(vol_change =`2017`-`2016` ,Increase_percentage=((`2017`-`2016`)*100)/`2016`)

## # A tibble: 12 x 7
## # Groups:   Group, Role [4]
##    Group  Role      `Place of work`   `2016` `2017` vol_change Increase_percent…
##    <fct>  <fct>     <fct>              <dbl>  <dbl>      <dbl>             <dbl>
##  1 Group… Primary … Group practice       943   1423        480              50.9
##  2 Group… Primary … Hospital             269    353         84              31.2
##  3 Group… Primary … Individual pract…   1051   1741        690              65.7
##  4 Group… Speciali… Group practice       601    843        242              40.3
##  5 Group… Speciali… Hospital            1003   1300        297              29.6
##  6 Group… Speciali… Individual pract…    781   1018        237              30.3
##  7 Group… Primary … Group practice      1089   1896        807              74.1
##  8 Group… Primary … Hospital            1645   2938       1293              78.6
##  9 Group… Primary … Individual pract…   1342   2188        846              63.0
## 10 Group… Speciali… Group practice      1010   1352        342              33.9
## 11 Group… Speciali… Hospital             455    850        395              86.8
## 12 Group… Speciali… Individual pract…    198    347        149              75.3

From the results we can see that Brand B is bigger brand than Brand A in terms of total sales and can see the increase in sale percentage in group 2 doctors is higher compared to group 1.
But the increase percentage is not higher as much as group 2 for Brand A sales, which informs that new message is working for Brand A.

Descriptive Statistics

ANOVA : Analysis of Variance: To compare if there is a (significant) difference between means of sample groups (Variability of means of samples groups).

As we have 2 categorical variables and with more than 2 categories , hence we will use ANOVA here

Prod_A_PV <- PV_Master %>% filter(Product=="Brand A")
Prod_A_PV <- Prod_A_PV %>% group_by(Month,Group) %>% summarise(nRx=sum(nRx,na.rm = T))
Prod_A_PV$year <- substr(Prod_A_PV$Month, 1, 4) 

Prod_A_PV$level <- ifelse((Prod_A_PV$year == "2016") & (Prod_A_PV$Group == "Group 1"), "Group1_2016", 
                      ifelse((Prod_A_PV$year == "2016") & (Prod_A_PV$Group == "Group 2"), "Group2_2016",
                             ifelse((Prod_A_PV$year == "2017") & (Prod_A_PV$Group == "Group 1"), "Group1_2017", "Group2_2017")))
Prod_A_PV$level <- ordered(Prod_A_PV$level,
                       levels = c("Group1_2016", "Group2_2016", "Group1_2017","Group2_2017"))
group_by(Prod_A_PV, level) %>%
  summarise(
    count = n(),
    mean = mean(nRx, na.rm = TRUE),
    sd = sd(nRx, na.rm = TRUE)
  )

## # A tibble: 4 x 4
##   level       count  mean    sd
##   <ord>       <int> <dbl> <dbl>
## 1 Group1_2016    12  350.  78.1
## 2 Group2_2016    12  371. 116. 
## 3 Group1_2017    12  506.  78.8
## 4 Group2_2017    12  665.  74.1

ggboxplot(Prod_A_PV, x = "level", y = "nRx", 
          color = "level", palette = c("#00AFBB", "#E7B800", "#00AFBB","#E7B800"),
          order = c("Group1_2016", "Group2_2016", "Group1_2017","Group2_2017"),
          ylab = "nRx(Mean)", xlab = "Group")+ggtitle('Distribution of Mean across groups')

Compute the Analysis of variance

res.aov <- aov(nRx ~ level, data = Prod_A_PV)

Summary of the analysis

summary(res.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## level        3 761389  253796   32.57 3.01e-11 ***
## Residuals   44 342817    7791                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups.

Comparing among Group 2 doctors

Prod_A_PV <- Prod_A_PV %>% filter(level==c("Group2_2016")|level==c ("Group2_2017"))
res.aov_same_groups <- aov(nRx ~ level, data = Prod_A_PV)

Summary of the analysis

summary(res.aov_same_groups)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## level        1 518616  518616   55.03 2.02e-07 ***
## Residuals   22 207326    9424                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the p-value is less than the significance level 0.05,we can conclude that there are significant differences between the groups

Conclusions

New message is tried on more than half of the doctors.
PCP’s are preferred over specialist with more than 2/3rd of them have been tried with new message and only 1/3rd of specialist’s.
Brand A and and Brand B are performing well overtime where as Brand C isn’t(did it went LOE)? or the increase in sales among Brand A and B have heavily affected the sales of Brand C in the year 2017.
There has been phenominal increase in Depth(Productivity) of writing among PCP’s.
Group 1 sales increased by 44% from previous year where as Group 2 sales increased by ~80% from previous year. So ultimately message desgined to test the Group 2 doctors to increase the sales has worked well
Especially PCP’s have contributed more than specialist with higher number of sales
The sales of product A by the PCP doctors in Individual practice workplace reached the maximum of 98% with the new message strategy.
As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups.

Next steps..

Gather more data and do predictive modelling at Doctor level to find the important factors among the doctors whose NRx sales has decreased in 2017 from 2016.
Create a binary Target column for each doctor based on 2017 - 2016 Nrx sales

Aktana Case Study

Neeraj Kumar G

1/23/2020

Business Question:

Hypothesis:

Now that we have a working hypothesis, let’s take a look at the data.

Data Processing

Data Preparation

Before doing data munging, let’s take a look at the structure

Looking at the structure of the data, most of the variables are considered as charecters . These should be categorial variables. Let’s use R factor function to convert these to categorial variables.

How many doctors were tried with the new message?

Is there any preference for Role in trying the new message?

PCP’s are preferred over specialist with more than 2/3rd of them have been tried with new message and only 1/3rd of specialist’s.

Loading the Prescriptions Data

Few Observations

Looking at the Distribution of NRx variable

Exploratory Data Analysis

As we saw change overtime, calculating the Year Over Year NRx Volume change for 3 Brands

Looking at the Depth of Writing(Productivity) = Scripts per Doctor , Has the productivity improved overtime across the Product by Role?

Earlier we saw that new message was deployed more among PCP’s which could be the reason for phenominal increase in Depth(Productivity) of writing.

Now, let us try to answer the question Was the new message (that was used on group #2) a successful message?"

Insights

1

Filtering Brand A NRx sales

2

3

4

Filtering Brand B NRx sales

Descriptive Statistics

ANOVA : Analysis of Variance: To compare if there is a (significant) difference between means of sample groups (Variability of means of samples groups).

Compute the Analysis of variance

Summary of the analysis

As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups.

Comparing among Group 2 doctors

Summary of the analysis

As the p-value is less than the significance level 0.05,we can conclude that there are significant differences between the groups

Conclusions

Next steps..

Neeraj Kumar : neerajkumar990@gmail.com