Stephen senn has written an excellent article about hype of precision medicine and how the responder-fallacy has led to false dawn of precisionomics. In a highly instructive article - he shows how use of term “response” is often abused and can mislead us. We will try to simulate this example in R
Let us consider a counter-factual experiment in which 1000 patients with mean FEV1 ( a measure of pulmonary function ) of 2.3 L and standard deviation of 0.26 L are given an inhaler which increases each value on average by 0.2 L with sd of 0.03L
pre = rnorm(1000,2.3,0.26)
post= pre+ rnorm(1000,0.5,0.03)
Let’s look at histogram of FEV1 at baseline
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
inhale = data.frame(pre,post)
inhale %>% ggplot(aes(pre))+geom_histogram(fill="blue")+ geom_vline(xintercept = mean(pre))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Let’s look at histogram of FEV1 in post(2nd reading)
inhale %>% ggplot(aes(post))+geom_histogram(fill="blue")+ geom_vline(xintercept = mean(pre))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Now let’s sample 24 patients from this population
data_frame(placebo= pre[1:24], active= post[1:24]) %>%
mutate(id=row_number()) %>% # add row_number as id
gather(group,FEV1,-id) %>% # note - sign to keep out id, key value neednt be explicit
ggplot( aes(x=as.factor(id), y=FEV1)) + geom_point(aes(color=group)) + geom_line()+ labs( title= "Counter Factual trial",
x = "Patient",
y = "FEV1")+ coord_flip()
Here we can see individual response. Now of course such a counterfactual trial is not possible , we have parallel group trial where each member belongs to either placebo or active treatment group
data_frame(placebo= pre[1:24], active= post[1:24]) %>%
mutate(id=row_number()) %>% # add row_number as id
gather(group,FEV1,-id) %>% # note - sign to keep out id, key value neednt be explicit
filter((group=="placebo"&id%%2==0)|(group=="active"&id%%2!=0)) %>%
ggplot( aes(x=as.factor(id), y=FEV1)) + geom_point(aes(color=group)) + labs( title= "parallel trial",
x = "Patient",
y = "FEV1")+ coord_flip()
Thus in this real world parallel trial it is very difficult to determine response as was seen in counterfactual trial
So How do we determine response by crossover trials ..
Let’s consider a crossover trial with first crossover with active-placebo difference of 0.5 with sd of 0.2 L .
Now suppose we arbitrarily classify responder as those having more than 0.3 L increase in FEV1
set.seed(7)
crossover1 = rnorm(1000,0.5,0.2)
kil = ecdf(crossover1)(0.3)*100
data_frame(crossover1 = rnorm(1000,0.5,0.2)) %>% mutate(x=if_else(crossover1<0.3,"non-responder","responder")) %>%
ggplot(aes(crossover1,fill=factor(x)))+geom_histogram()+geom_vline(xintercept = 0.3)+ # change color of histogram
annotate("text",x=1,y=90,label=paste(kil," % non-responders"))+
theme(legend.title=element_blank()) # remove legend
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Thus we see that while every patient has some increase in FEV1, almost 14.8% will be labelled non-responder due to arbitrary dichotomization , one of the reasons to avoid dichomotization which can lead to erroneous conclusions.
So how do we judge response then let’s see the second crossover trial in which there is an element of response i.e. the placebo-active difference is highly correlated(r=0.9) with first crossover trial
simcor = function(x, r,m,s){ r2 = r**2
ve = 1-r2
SD = sqrt(ve)
e = rnorm(length(x), mean=0, sd=SD)
y = r*x + e
y1 = m+y*s
return(y1) }
x=scale(crossover1)
crossover2 = simcor(x=x,r=0.9,m=0.5,s=0.2)
kil1 = ecdf(crossover2)(0.3)*100
data_frame(crossover2 = as.vector(crossover2)) %>% mutate(x=if_else(crossover2<0.3,"non-responder","responder")) %>% # as.vector for crossover2
ggplot(aes(crossover2,fill=factor(x)))+geom_histogram()+geom_vline(xintercept = 0.3)+ # change color of histogram
annotate("text",x=1,y=90,label=paste(kil1," % non-responders"))+
theme(legend.title=element_blank()) # remove legend
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
So now we see 16.2 % in second crossover trial( which are highly correlated with first cross-over trial) are non-responders( response being defined as greater than 0.3 L increase)
Let us see a table of responder vs non responder in cross over 1 vs crossover 2
data_frame(crossover1 = as.vector(crossover1),crossover2 = as.vector(crossover2)) %>%
mutate( response1 = case_when(crossover1<0.3 ~ "non-responder",
TRUE~"responder"),response2 = case_when(crossover2<0.3 ~ "non-responder",
TRUE~"responder")) %>% select(response1,response2) %>% group_by(response1,response2) %>% summarise(n=n(),percent = 100*( n() / nrow(.)) )
## # A tibble: 4 x 4
## # Groups: response1 [?]
## response1 response2 n percent
## <chr> <chr> <int> <dbl>
## 1 non-responder non-responder 112 11.2
## 2 non-responder responder 36 3.6
## 3 responder non-responder 50 5.0
## 4 responder responder 802 80.2
Note : absolute numbers might vary due to simulation Thus we see that out of ~850 who responded in first 800 responded in second(~95%), while out of ~150 who didnt respond in first around ~32 did respond it in second trial (~20%) . ~8.5 % have inconsistent response in this crossover trial with an element of personal response . Thus marginal probability is around ~85%
Let’s visualise it
p = data_frame(crossover1 = as.vector(crossover1),crossover2 = as.vector(crossover2)) %>%
mutate(group = case_when(
crossover1 >0.3 & crossover2 >0.3 ~ "responders",
crossover1 >0.3 & crossover2 <0.3 ~ "inconsistent",
crossover1 <0.3 & crossover2 >0.3 ~"inconsistent",
crossover1 <0.3 & crossover2 <0.3 ~ "non-responders"
)) %>% ggplot(aes(crossover1,crossover2,color=group))+geom_point()+geom_hline(yintercept=0.3,linetype="dotted")+geom_vline(xintercept=0.3,linetype="dotted")+geom_hline(yintercept=0.5)+geom_vline(xintercept=0.5)
p
ggExtra::ggMarginal(p,type = "histogram")
Now consider a situation with no element of personal response
crossover2 = simcor(x=x,r=0.02,m=0.5,s=0.2)
data_frame(crossover1 = as.vector(crossover1),crossover2 = as.vector(crossover2)) %>% mutate( response1 = case_when(crossover1<0.3 ~ "non-responder",
TRUE~"responder"),response2 = case_when(crossover2<0.3 ~ "non-responder",
TRUE~"responder")) %>% select(response1,response2) %>% group_by(response1,response2) %>% summarise(n=n(),percent = 100*( n() / nrow(.)) )
## # A tibble: 4 x 4
## # Groups: response1 [?]
## response1 response2 n percent
## <chr> <chr> <int> <dbl>
## 1 non-responder non-responder 23 2.3
## 2 non-responder responder 125 12.5
## 3 responder non-responder 138 13.8
## 4 responder responder 714 71.4
Note : absolute numbers might vary due to simulation Compare this situation to original simulation with no personal response.
Here also marginal probability of response is ~85% Thus we see that out of ~850 who responded in first 714 responded in second(~85%), while out of ~150 who didnt respond in first around ~23 did respond it in second trial (~20%)
Thus we see almost there is almost ~25% inconsistent response in these crossover trial as opposed to the case in which there was element of personal response (~8% inconsistency )
“The important point to note is that had we only run one cross-over trial, that is to say only using one period we could not have drawn a scatter plot of response (if this is defined as the difference between treatment with an active treatment and treatment with placebo). All we could have drawn is a marginal distribution. However, the marginal distributions on the X axis in scatter diagrams are indistinguishable or at least, given either you could not tell which of the two cases it repre- sented” ~ Senn
Thus key to identification is adequate replication.
q = data_frame(crossover1 = as.vector(crossover1),crossover2 = as.vector(crossover2)) %>% mutate(group = case_when(
crossover1 >0.3 & crossover2 >0.3 ~ "responders",
crossover1 >0.3 & crossover2 <0.3 ~ "inconsistent",
crossover1 <0.3 & crossover2 >0.3 ~"inconsistent",
crossover1 <0.3 & crossover2 <0.3 ~ "non-responders"
)) %>% ggplot(aes(crossover1,crossover2,color=group))+geom_point()+geom_hline(yintercept=0.3,linetype="dotted")+geom_vline(xintercept=0.3,linetype="dotted")+geom_hline(yintercept=0.5)+geom_vline(xintercept=0.5)
q
ggExtra::ggMarginal(q, type = "histogram")