A medical research team created a new medication to reduce headaches (Medication A). They want to determine if Medication A is more effective at reducing headaches than the current medication on the market (Medication B). A group of participants were randomly assigned to either take Medication A or Medication B. Data was collected for 30 days through an app and participants reported each day if they did or did not have a headache. Was there a difference in the number of headaches between the groups?
Used to test if there is a difference between the means of two groups.
There is no difference between the scores of Group A and Group B.
There is a difference between the scores of Group A and Group B.
Purpose: Import your Excel dataset into R to conduct analyses.
# install.packages("readxl")
library(readxl)
## Warning: package 'readxl' was built under R version 4.5.2
dataset <- read_excel("C:/Users/Murari_Lakshman/Downloads/A6R1.xlsx")
PURPOSE: Calculate the mean, median, SD, and sample size for each group.
# install.packages("dplyr")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dataset %>%
group_by(Medication) %>%
summarise(
Mean = mean(HeadacheDays, na.rm = TRUE),
Median = median(HeadacheDays, na.rm = TRUE),
SD = sd(HeadacheDays, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## Medication Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 A 8.1 8 2.81 50
## 2 B 12.6 12.5 3.59 50
Purpose: Visually check the normality of the scores for each group.
hist(dataset$HeadacheDays[dataset$Medication == "A"],
main = "Histogram of Group A Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(dataset$HeadacheDays[dataset$Medication == "B"],
main = "Histogram of Group B Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
Q1) Check the SKEWNESS of the Group A histogram. In your opinion,
does the histogram look symmetrical, positively skewed, or negatively
skewed?
A) The histogram for Group A looks symmetrical
Q2) Check the KURTOSIS of the Group A histogram. In your opinion, does
the histogram look too flat, too tall, or does it have a proper bell
curve? A) The histogram has a proper bell shaped
curve
Q3) Check the SKEWNESS of the Group B histogram. In your opinion, does
the histogram look symmetrical, positively skewed, or negatively
skewed?
A) The histogram for Group A looks symmetrical
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion,
does the histogram look too flat, too tall, or does it have a proper
bell curve?
A) The histogram has a proper bell shaped curve
Purpose: Check the normality for each group’s score statistically. The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.
shapiro.test(dataset$HeadacheDays[dataset$Medication == "A"])
##
## Shapiro-Wilk normality test
##
## data: dataset$HeadacheDays[dataset$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(dataset$HeadacheDays[dataset$Medication == "B"])
##
## Shapiro-Wilk normality test
##
## data: dataset$HeadacheDays[dataset$Medication == "B"]
## W = 0.98758, p-value = 0.8741
Was the data normally distributed for Group A?
Yes, the data is Normally distributed for Group A
Was the data normally distributed for Group B?
Yes, the data is Normally distributed for Group B
If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below. If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).
Purpose: Check for any outliers impacting the mean for each group’s scores.
# install.packages("ggplot2")
# install.packages("ggpubr")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.5.2
ggboxplot(dataset, x = "Medication", y = "HeadacheDays",
color = "Medication",
palette = "jco",
add = "jitter")
Q1) Were there any dots outside of the boxplot? Are these dots close
to the whiskers of the boxplot or are they very far away?
[NOTE: If there are no dots, continue with Independent t-test. If there
are a few dots (two or less), and they are close to the whiskers,
continue with the Independent t-test. If there are a few dots (two or
less), and they are far away from the whiskers, consider switching to
Mann Whitney U test. If there are many dots (more than one or two) and
they are very far away from the whiskers, you should switch to the Mann
Whitney U test.]
A) For both the box-plots, there are a few dots and they are
close to the whiskers. Hence we go with Independent T-test.
PURPOSE: Test if there was a difference between the means of the two groups.
t.test(HeadacheDays ~ Medication, data = dataset, var.equal = TRUE)
##
## Two Sample t-test
##
## data: HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B
## 8.1 12.6
If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below.
NOTE: Getting results that are not statistically significant does NOT mean you switch to Mann-Whitney U. The Mann-Whitney U test is only for abnormally distributed data — not based on outcome significance.
PURPOSE: Determine how big of a difference there was between the group means.
# install.packages("effectsize")
library(effectsize)
## Warning: package 'effectsize' was built under R version 4.5.2
cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = dataset, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d | 95% CI
## --------------------------
## -1.40 | [-1.83, -0.96]
##
## - Estimated using pooled SD.
Q1) What is the size of the effect?
A) A Cohen’s D of -1.40 indicates the difference between the
group averages was very large.
Q2) Which group had the higher average score?
A) Here Group B has the higher average score.
An Independent t-test was conducted to compare the differences in the number of headaches between the Medication A (n = 50) and Medication B (n = 50). People who used medication B have higher average headache days (M = 12.6, SD = 3.59) than that of medication B (M = 8.1, SD = 2.81), t(100) = -6.9862, p < .001. The effect size was very large (d = -1.40), indicating a very large difference between headache days of medication A and medication B. Overall, medication B has significantly higher average days of headache among the participants.