Scenario 1: Medication A vs Medication B

A medical research team created a new medication to reduce headaches (Medication A). They want to determine if Medication A is more effective at reducing headaches than the current medication on the market (Medication B). A group of participants were randomly assigned to either take Medication A or Medication B. Data was collected for 30 days through an app and participants reported each day if they did or did not have a headache. Was there a difference in the number of headaches between the groups?

H0: There is no difference in the number of headache days between Medication A and Medication B.

H1: There is a difference in the number of headache days between Medication A and Medication B.

Descriptive Statistics and Normality test

#install.packages("readxl") 
#install.packages("dplyr")

library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Importing Excel file (using the name of the uploaded file)
A6R1 <- read_excel("C:/Users/akshi/OneDrive/Desktop/A6R1.xlsx")

# DESCRIPTIVE STATISTICS
A6R1 %>%
  group_by(Medication) %>% # IV placeholder
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE), # DV placeholder
    Median = median(HeadacheDays, na.rm = TRUE), # DV placeholder
    SD = sd(HeadacheDays, na.rm = TRUE), # DV placeholder
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50
# QUESTION
# What are the null and alternate hypotheses for YOUR research scenario?
# H0: There is no difference in the number of headache days between Medication A and Medication B.
# H1: There is a difference in the number of headache days between Medication A and Medication B.

# HISTOGRAM FOR MEDICATION A
hist(A6R1$HeadacheDays[A6R1$Medication == "A"], 
     main = "Histogram of Medication A Scores",
     xlab = "Number of HeadacheDays",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

# HISTOGRAM FOR MEDICATION B
hist(A6R1$HeadacheDays[A6R1$Medication == "B"], 
     main = "Histogram of Medication B Scores",
     xlab = "Number of HeadacheDays",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

# QUESTIONS (Based on Shapiro-Wilk results indicating Normality)
# Answer the questions below as comments within the R script:

# Q1) Check the SKEWNESS of the VARIABLE 1 histogram (Medication A).
# ANSWER: The histogram is symmetrical, as the Shapiro-Wilk test indicated normality (p = 0.4913).

# Q2) Check the KURTOSIS of the VARIABLE 1 histogram (Medication A).
# ANSWER: The histogram has a proper bell curve (mesokurtic), as the Shapiro-Wilk test indicated normality (p = 0.4913).

# Q3) Check the SKEWNESS of the VARIABLE 2 histogram (Medication B).
# ANSWER: The histogram is symmetrical, as the Shapiro-Wilk test indicated normality (p = 0.8741).

# Q4) Check the KURTOSIS of the VARIABLE 2 histogram (Medication B).
# ANSWER: The histogram has a proper bell curve (mesokurtic), as the Shapiro-Wilk test indicated normality (p = 0.8741).

# SHAPIRO-WILK TEST for Medication A
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
# SHAPIRO-WILK TEST for Medication B
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741
# QUESTION
# Was the data normally distributed for Variable 1? 
# The data was normally distributed for Medication A (p = 0.4913, p > 0.05).
# Was the data normally distributed for Variable 2?
# The data was normally distributed for Medication B (p = 0.8741, p > 0.05).

# INSTALL and LOAD REQUIRED PACKAGES
#install.packages("ggplot2")
#install.packages("ggpubr")

library(ggplot2)
library(ggpubr)

# CREATE THE BOXPLOT
ggboxplot(A6R1, x = "Medication", y = "HeadacheDays", # IV and DV placeholders
          color = "Medication",
          palette = "jco",
          add = "jitter")

# QUESTION
# Answer the questions below as a comment within the R script. Answer the questions for EACH boxplot:
# Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot (check if there are any dots past the lines on the boxes) or are they very far away?
# There are a few dots (two or less), and they are close to the whiskers, so we are continuing with the Independent t-test.

INDEPENDENT T-TEST

t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE) 
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6
# EFFECT SIZE (COHEN'S D) - Only run if p < .05
#install.packages("effectsize")
library(effectsize) 
cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.
# QUESTIONS
# Answer the questions below as a comment within the R script:

# Q1) What is the size of the effect?
# The effect size is very large (d = -1.40).

# Q2) Which group had the higher average score?
# Medication B had the higher average score (Mean = 12.6).

REPORT

An Independent t-test was conducted to compare the number of headache days between participants who took Medication A (n = 50) and participants who took Medication B (n = 50). Participants who took Medication A scored significantly lower (M = 8.10, SD = 2.92) than participants who took Medication B (M = 12.60, SD = 2.59), t(98) = -6.99, p < .001. The effect size was very large (d = -1.40), indicating a very large difference in the number of headache days between the two medication groups. Overall, Medication A was significantly more effective, resulting in a much lower average number of headache days.