Scenario 1: Medication A vs Medication B
A medical research team created a new medication to reduce headaches (Medication A). They want to determine if Medication A is more effective at reducing headaches than the current medication on the market (Medication B). A group of participants were randomly assigned to either take Medication A or Medication B. Data was collected for 30 days through an app and participants reported each day if they did or did not have a headache. Was there a difference in the number of headaches between the groups?
H0: There is no difference in the number of headache days between Medication A and Medication B.
H1: There is a difference in the number of headache days between Medication A and Medication B.
#install.packages("readxl")
#install.packages("dplyr")
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Importing Excel file (using the name of the uploaded file)
A6R1 <- read_excel("C:/Users/akshi/OneDrive/Desktop/A6R1.xlsx")
# DESCRIPTIVE STATISTICS
A6R1 %>%
group_by(Medication) %>% # IV placeholder
summarise(
Mean = mean(HeadacheDays, na.rm = TRUE), # DV placeholder
Median = median(HeadacheDays, na.rm = TRUE), # DV placeholder
SD = sd(HeadacheDays, na.rm = TRUE), # DV placeholder
N = n()
)
## # A tibble: 2 × 5
## Medication Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 A 8.1 8 2.81 50
## 2 B 12.6 12.5 3.59 50
# QUESTION
# What are the null and alternate hypotheses for YOUR research scenario?
# H0: There is no difference in the number of headache days between Medication A and Medication B.
# H1: There is a difference in the number of headache days between Medication A and Medication B.
# HISTOGRAM FOR MEDICATION A
hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
main = "Histogram of Medication A Scores",
xlab = "Number of HeadacheDays",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
# HISTOGRAM FOR MEDICATION B
hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
main = "Histogram of Medication B Scores",
xlab = "Number of HeadacheDays",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
# QUESTIONS (Based on Shapiro-Wilk results indicating Normality)
# Answer the questions below as comments within the R script:
# Q1) Check the SKEWNESS of the VARIABLE 1 histogram (Medication A).
# ANSWER: The histogram is symmetrical, as the Shapiro-Wilk test indicated normality (p = 0.4913).
# Q2) Check the KURTOSIS of the VARIABLE 1 histogram (Medication A).
# ANSWER: The histogram has a proper bell curve (mesokurtic), as the Shapiro-Wilk test indicated normality (p = 0.4913).
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram (Medication B).
# ANSWER: The histogram is symmetrical, as the Shapiro-Wilk test indicated normality (p = 0.8741).
# Q4) Check the KURTOSIS of the VARIABLE 2 histogram (Medication B).
# ANSWER: The histogram has a proper bell curve (mesokurtic), as the Shapiro-Wilk test indicated normality (p = 0.8741).
# SHAPIRO-WILK TEST for Medication A
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
##
## Shapiro-Wilk normality test
##
## data: A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
# SHAPIRO-WILK TEST for Medication B
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
##
## Shapiro-Wilk normality test
##
## data: A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741
# QUESTION
# Was the data normally distributed for Variable 1?
# The data was normally distributed for Medication A (p = 0.4913, p > 0.05).
# Was the data normally distributed for Variable 2?
# The data was normally distributed for Medication B (p = 0.8741, p > 0.05).
# INSTALL and LOAD REQUIRED PACKAGES
#install.packages("ggplot2")
#install.packages("ggpubr")
library(ggplot2)
library(ggpubr)
# CREATE THE BOXPLOT
ggboxplot(A6R1, x = "Medication", y = "HeadacheDays", # IV and DV placeholders
color = "Medication",
palette = "jco",
add = "jitter")
# QUESTION
# Answer the questions below as a comment within the R script. Answer the questions for EACH boxplot:
# Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot (check if there are any dots past the lines on the boxes) or are they very far away?
# There are a few dots (two or less), and they are close to the whiskers, so we are continuing with the Independent t-test.
t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)
##
## Two Sample t-test
##
## data: HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B
## 8.1 12.6
# EFFECT SIZE (COHEN'S D) - Only run if p < .05
#install.packages("effectsize")
library(effectsize)
cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d | 95% CI
## --------------------------
## -1.40 | [-1.83, -0.96]
##
## - Estimated using pooled SD.
# QUESTIONS
# Answer the questions below as a comment within the R script:
# Q1) What is the size of the effect?
# The effect size is very large (d = -1.40).
# Q2) Which group had the higher average score?
# Medication B had the higher average score (Mean = 12.6).
An Independent t-test was conducted to compare the number of headache days between participants who took Medication A (n = 50) and participants who took Medication B (n = 50). Participants who took Medication A scored significantly lower (M = 8.10, SD = 2.92) than participants who took Medication B (M = 12.60, SD = 2.59), t(98) = -6.99, p < .001. The effect size was very large (d = -1.40), indicating a very large difference in the number of headache days between the two medication groups. Overall, Medication A was significantly more effective, resulting in a much lower average number of headache days.