Stephen senn has written very eloquently about consequences of dichotomizing a continuous measure and often illustrates it with terrible use of dichotomy in a cochrane review where it was “claimed” that only one in 10 patients respond to paracetamol.
He considers a counterfactual thought experiment where he simulates 6000 headaches which follow an exponential distrbtuion with mean of 3 hours and tries to treat every headache with paracetamol and placebo.
Suppose Paracetamol reduced every headache by three-fourth. But if we use improper international society of headache criteria as response being “pain-free at two hours”, we will reach this erroneous conclusion that only one patient out of 10 tension-headaches responds to paracetamol.
I found it very instructive and tried to redo the graph in R
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
set.seed(7)
placebo=rexp(n = 6000, rate = 1/2.97) # sampling an exponential distribution of headache with mean 2.97
paracetamol = placebo*0.755 # suppose paracetamol reduced duration of each headache almose 3/4th () , Thus paracetamol has benefit of 45 minutes for each headache
duration = c(placebo,paracetamol)
probability = c(ecdf(placebo)(placebo),ecdf(paracetamol)(paracetamol)) # calculating cumulative probabilities
group= c(rep("Placebo",6000),rep("Paracetamol",6000))
headache= data.frame(duration,probability,group)
a=ecdf(placebo)(2)
b=ecdf(paracetamol)(2)
headache %>% ggplot(aes(x=duration,y=probability,colour=group))+
geom_line()+xlim(0,7) + geom_vline(xintercept = 2, linetype="dashed")+
geom_hline(yintercept = a,linetype="dashed")+geom_hline(yintercept = b,linetype="dashed")+
annotate("text",x=6,y=a-0.05,label=paste(round(a,2),""))+
annotate("text",x=6,y=a+0.05,label=paste(round(b,2),""))+
labs(title= "Probability Of Response to Headache treatment",
y= "Probability of Response(Cumulative)",
x= " Time(hours)",
caption = "Data from Painful Dichotomies-Stephen Senn")
“The curve given for placebo is what we would expect to find for the simple exponential model if it were the case that mean time to response were 2.97 hours when a patient was given placebo. The curve for paracetamol has a mean of 2.24 hours. It is important to understand that this is perfectly compatible with this being the long term average response time (that is to say averaged over many many headaches) for every patient and this means that any patient at any time feeling the symptoms of headache could expect to shorten that headache by 2.97-2.24=0.73 hrs or just under 45 minutes.” - Senn
Lets now try to understand what happens when we dichotomize oucome to pain free at two hours and classifying those who became pain free at 2 hours as responders and others as non responders.
headache1 = headache %>% mutate(responder=ifelse(duration<=2,"responder","non-responder"))
First, we sample 20 headaches of our counterfactual experiment which were treated with both paracetamol and placebo. we see each headache duration decreased by three-fourth.
data_frame(placebo= headache1$duration[1:20], paracetamol= headache1$duration[6001:6020]) %>%
mutate(id=row_number()) %>% # add row_number as id
gather(group,duration,-id) %>% # note - sign to keep out id, key value neednt be explicit
mutate(responder= if_else(duration<2,"responder","non-responder")) %>%
ggplot( aes(x=as.factor(id), y=duration)) + geom_point(aes(color=group)) + geom_line()+ labs( title= "Counter Factual trial",
x = "Patient",
y = "Duration of Headache")+ coord_flip()
Here we can determine individual response . As each headache has been treated twice.
But since we will not be able to see such a trial where each headache is twice treated ( can be approximated with repeated cross over trial) , here is what we see in real life
data_frame(placebo= headache1$duration[1:20], paracetamol= headache1$duration[6001:6020]) %>%
mutate(id=row_number()) %>% # add row_number as id
gather(group,duration,-id) %>% # note - sign to keep out id, key value neednt be explicit
mutate(responder= if_else(duration<2,"responder","non-responder")) %>%
#filter((group=="paracetamol"&id%%2==0)|(group=="placebo"&id%%2!=0)) %>%
slice(sample(n(),20)) %>%
ggplot( aes(x=as.factor(id), y=duration)) + geom_point(aes(color=group)) + labs( title= "Parallel trial",
x = "Patient",
y = "Duration of Headache")+ coord_flip()
Here we randomly erased some variables, so now individual response is difficult to determine, all we can comment on is on average.
Now if we engage in responder fallacy, we will be calculating
difference = ecdf(paracetamol)(2)-ecdf(placebo)(2)
NNT = 1/difference
difference
## [1] 0.1023333
NNT
## [1] 9.771987
Thus difference in proportions is around 11% and NNT to be pain free at two hours is 9 , we can see this fallacy even as we know headache has decreased by three-fourth the amount.
Same headache cant be treated twice hence lets randomly sample 3000 patients like in parallel trial
placebo = sample(placebo,size=3000) #
paracetamol = sample(paracetamol,size=3000)
duration = c(placebo,paracetamol)
probability = c(ecdf(placebo)(placebo),ecdf(paracetamol)(paracetamol))
group= c(rep("Placebo",3000),rep("Paracetamol",3000))
headache= data.frame(duration,probability,group)
a=ecdf(placebo)(2)
b=ecdf(paracetamol)(2)
headache %>% ggplot(aes(x=duration,y=probability,colour=group))+
geom_line()+xlim(0,7) + geom_vline(xintercept = 2, linetype="dashed")+
geom_hline(yintercept = a,linetype="dashed")+geom_hline(yintercept = b,linetype="dashed")+
annotate("text",x=6,y=a-0.05,label=paste(round(a,2),""))+
annotate("text",x=6,y=a+0.05,label=paste(round(b,2),""))+
labs(title= "Probability Of Response to Headache treatment",
y= "Probability of Response(Cumulative)",
x= " Time(hours)",
caption = "Data from Painful Dichotomies-Stephen Senn")