2024-02-06

Exercise 5.6

Download the clinical_trials.csv data set. In this synthetic data set, a company Badatta claims that they have developed a new drug Kilvir that can kill the common cold virus and have conducted clinical trials to prove this claim.

Here, each row in the data set represents data for one patient, including their

  • age group,

  • whether they have been treated with the Kilvir drug or a placebo (which has no therapeutic effect), and

  • whether the patient feels the drug has effectively improved their conditions.

Exercise 5.6

  1. Extract a subset of the data containing only patients in the 31-35 age group.

  2. Construct a contingency table of treatment drug versus treatment effect for this age group.

  3. Is there evidence that Kilvir is effective in this age group?

Now have a look at the other age groups.

  1. Will you be concerned if Badatta only publishes their clinical trial results in the 31-35 age group?

Download the dataset:

ct <- read.csv ("clinical_trials.csv")

Take a look at it:

head (ct)
##   PatientID AgeGroup    Drug      Result
## 1        P1    56-60  Kilvir   Effective
## 2        P2    26-30 Placebo   Effective
## 3        P3    61-65 Placebo Ineffective
## 4        P4    61-65  Kilvir   Effective
## 5        P5    16-20 Placebo   Effective
## 6        P6    81-85 Placebo   Effective

What should be my null hypothesis?

  • \(H_0\): the new drug Kilvir is not effective.
  • \(H_1\): the new drug Kilvir is effective.
  • Mostly, we put what we want to prove in the alternative hypothesis.

Extract age group: 31-35:

Here’s the first five age groups shown:

table (ct$AgeGroup) [1:5]
## 
## 16-20 21-25 26-30 31-35 36-40 
##    68    46    84    34    52
  • Age group 31-35 is in the fourth position, so name it D.
ctrowD <- which(ct$AgeGroup == "31-35")

Construct a contingenccy table of treatment drug versus treatment effect for this age group.

  • I make an individual table for those aged between 31-35 first.
dfD <- ct [ ctrowD, ]

contD <- table (dfD$Drug, dfD$Result)

contD
##          
##           Effective Ineffective
##   Kilvir          9           8
##   Placebo         2          15

Is there evidence that Kilvir is effective in this age group?

Let’s use different tests!

  • \(\chi^2\) test

  • Fisher’s exact test

\(\chi^2\) test

chisq.test(contD)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  contD
## X-squared = 4.8379, df = 1, p-value = 0.02784

Fisher’s exact test

fisher.test(contD)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  contD
## p-value = 0.02551
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   1.217552 92.482179
## sample estimates:
## odds ratio 
##   7.882191

Effective in this age group?

chisq.test(contD)$p.value
## [1] 0.02784006
fisher.test(contD)$p.value
## [1] 0.02550983

If we have 5% of significance level, then there is sufficient evidence to reject \(H_0\) that there the new drug Kilvin is effective

But if we only have significance level of 1% (in the case of testing drugs), then there is insufficient evidence to reject \(H_0\), it is not effective.

Odds ratio?

## odds ratio 
##   7.882191
  • An odds ratio greater than 1 indicates a positive association.

  • odds ratio of 7.88 indicates a strong positive relationship.

  • Given that I use the different drugs (drugs are explanatory variable) what is the probability that it is effective (effects as the response variable)

  • The results states that Kilvir is about 8 times more effective than Placebo for this certain age group.

Okay if only publish result from 31-35-year-olds?

  • No. 
  • Because publishing results for only one age group may not provide a representative picture of the overall population;
  • It might give a misleading impression of the overall findings if results are cherry-picked without considering the broader context.
  • Results from one age group may be confounded by other variables.
  • If it just publish the 31-35 age group data, we can see that p-value is not very low. When the population (people who take Kilvir) grows, there will be more issues.

Further looking: other age groups

  • what about their performances?

The youngest TWO age group:

##          
##           Effective Ineffective
##   Kilvir         18          39
##   Placebo        24          33

Perform two tests:

chisq.test(contAB)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  contAB
## X-squared = 0.94246, df = 1, p-value = 0.3316

Perform two tests:

fisher.test(contAB)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  contAB
## p-value = 0.3317
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.2737906 1.4623951
## sample estimates:
## odds ratio 
##  0.6371842

The eldest TWO age group:

##          
##           Effective Ineffective
##   Kilvir          4           9
##   Placebo         4           9

Perform two tests:

chisq.test(contNO)
## 
##  Pearson's Chi-squared test
## 
## data:  contNO
## X-squared = 0, df = 1, p-value = 1

Perform two tests:

fisher.test(contNO)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  contNO
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.1373006 7.2832873
## sample estimates:
## odds ratio 
##          1

Let’s see all age groups together

test <- function(age){
  
  ctrowi <- which(ct$AgeGroup == age)
  
  dfi <- ct [ ctrowi, ]
  conti <- table (dfi$Drug, dfi$Result)
  
  chi <- chisq.test(conti)
  fisher <- fisher.test(conti)
  
  return (list ("p-value (chi-squared)" = chi$p.value,
                "p-value (fisher)" = fisher$p.value)
  )
}

Let’s see all age groups together

age.index <- c ("16-20", "26-30", "31-35", "36-40",
                "41-45", "46-50", "51-55", "56-60",  
                "61-65", "66-70", "71-75", "76-80", 
                "81-85", "86-90")  # different ages

p.vec <- sapply(age.index, test)  
     # apply chi-squared test for each age group

Let’s see all age groups together

##       p-value (chi-squared) p-value (fisher)
## 16-20 0.1240666             0.1231997       
## 26-30 0.6403202             0.64078         
## 31-35 0.02784006            0.02550983      
## 36-40 0.3482841             0.3487349       
## 41-45 0.7125793             0.713982        
## 46-50 1                     1               
## 51-55 1                     1               
## 56-60 0.7300697             0.7310941       
## 61-65 0.3906743             0.3911247       
## 66-70 1                     1               
## 71-75 0.6932818             0.6945843       
## 76-80 0.3990752             0.4003231       
## 81-85 0.6170751             0.6199095       
## 86-90 0.4142162             0.4285714

Conclusion

  • Other groups (except 31 - 35) have much higher p-value.
  • In three age-groups, the Kilvin is statistically ineffective.
  • Other age groups have p-values much greater than 10%, even some with 70%.
  • There is insufficient evidence to reject \(H_0\): the new drug Kilvir is not effective.
  • Well, it is ineffective…