Scenario 1: Medication A vs Medication B

A medical research team created a new medication to reduce headaches (Medication A). They want to determine if Medication A is more effective at reducing headaches than the current medication on the market (Medication B). A group of participants were randomly assigned to either take Medication A or Medication B. Data was collected for 30 days through an app and participants reported each day if they did or did not have a headache. Was there a difference in the number of headaches between the groups?

HYPOTHESIS

Null Hypothesis: There is no difference in the number of headaches between participants taking Medication A and those taking Medication B.

Alternative Hypothesis: There is a difference in the number of headaches between participants taking Medication A and those taking Medication B.

options(repos=c(CRAN="https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'readxl'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\sweth\AppData\Local\R\win-library\4.5\00LOCK\readxl\libs\x64\readxl.dll
## to C:\Users\sweth\AppData\Local\R\win-library\4.5\readxl\libs\x64\readxl.dll:
## Permission denied
## Warning: restored 'readxl'
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpKI6SMd\downloaded_packages
library(readxl)
A6R1 <- read_excel("C:\\Users\\sweth\\Downloads\\A6R1.xlsx")
install.packages("dplyr")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'dplyr' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'dplyr'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\sweth\AppData\Local\R\win-library\4.5\00LOCK\dplyr\libs\x64\dplyr.dll
## to C:\Users\sweth\AppData\Local\R\win-library\4.5\dplyr\libs\x64\dplyr.dll:
## Permission denied
## Warning: restored 'dplyr'
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpKI6SMd\downloaded_packages
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
A6R1 %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50
hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
main = "Histogram of A HeadacheDays",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
main = "Histogram of B HeadacheDays",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

By looking at the histogram for Medication A’s HeadacheDays, one can notice that the distribution is slightly positively skewed: most values fall within the middle range, but a few data points extend to the higher end of the distribution, thus making it have a right-sided tail. Besides, this general shape is also flatter and more spread out, which indicates platykurtosis - that is, it does not look like a strong, tall bell shape. On the other hand, Medication B’s HeadacheDays are slightly positively skewed too because there are more values at the high end that stretch this distribution to the right. At the same time, however, unlike for Medication A, the distribution of Medication B values is more peaked in the center-much clearer around the middle values. It reflects leptokurtosis, or a sharper peak with heavier tails. These together provide evidence that both variables are slightly positively skewed, while the distribution of Medication A is flatter and that of Medication B is more peaked.

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741

Results of the Shapiro-Wilk tests indicate that both variables meet the normality assumption. Medication A yields a test p-value of 0.4913, well above the common significance level of 0.05. We therefore do not reject the null hypothesis of the Shapiro-Wilk test and thus conclude that HeadacheDays data for Medication A are normally distributed. Similarly, the p-value for Medication B is even higher at 0.8741, also well above 0.05. This is considered very good evidence that the distribution of HeadacheDays for the Medication B group is very close to normality. Overall, both groups show normal distributions, which support the use of the independent samples t-test.

install.packages("ggplot2")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpKI6SMd\downloaded_packages
install.packages("ggpubr")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpKI6SMd\downloaded_packages
library(ggplot2)
library(ggpubr)

ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

In both the Medication A and Medication B boxplots, there are some dots appearing outside the box, indicating the presence of outliers. However, these dots appear fairly proximal to the whiskers, and none of them are very far away from the main cluster of values. Given that only a few of the data points are considered outliers and do not indicate large separations from the whiskers, they are not considered severe. Based on the boxplots, therefore, the Independent t-test could be appropriately continued to compare the two medication groups.

INDEPENDENT T-TEST

t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6
install.packages("effectsize")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'effectsize' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpKI6SMd\downloaded_packages
library(effectsize)

cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.

The estimated effect size for this test was Cohen’s d = -1.40, which represents a very large effect, meaning the difference between the two group averages was quite large. Because the effect size is negative, it indicates that Group B scored higher than Group A, and this comparison of their means supports that Group B had the higher average score.

INDEPENDENT T-TEST

An independent samples t-test compared headache-day frequency of individuals taking Medication A (n = 50) with those taking Medication B (n = 50). Participants in the group of Medication A reported fewer headache days (SD = 8.10) when compared to their counterparts with Medication B (SD = 12.60). The difference between groups was statistically significant, t(98) = −6.99, p < .001, indicating that the two medications produced different outcomes. The effect size was medium, d = 0.42, suggesting there was a moderate difference in headache days between the two medication groups.