A medical research team created a new medication to reduce headaches (Medication A). They want to determine if Medication A is more effective at reducing headaches than the current medication on the market (Medication B). A group of participants were randomly assigned to either take Medication A or Medication B. Data was collected for 30 days through an app and participants reported each day if they did or did not have a headache. Was there a difference in the number of headaches between the groups?
Used to test whether there is a statistically significant difference between the means of two independent groups
There is no difference in the number of headaches between participants taking Medication A and those taking Medication B.
There is a difference in the number of headaches between participants taking Medication A and those taking Medication B.
##install.packages(“readxl”)
chooseCRANmirror(graphics = FALSE, ind = 1)
install.packages("readxl")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\manit\AppData\Local\Temp\RtmpIFFKei\downloaded_packages
library(readxl)
A6R1 <- read_excel("C:\\Users\\manit\\OneDrive\\Desktop\\A6R1.xlsx")
head(A6R1)
## # A tibble: 6 × 3
## ParticipantID Medication HeadacheDays
## <dbl> <chr> <dbl>
## 1 1 A 6
## 2 2 A 7
## 3 3 A 13
## 4 4 A 8
## 5 5 A 8
## 6 6 A 13
Calculate the mean, median, SD, and sample size for each variable. ## INSTALL THE REQUIRED PACKAGE
#install.packages("dplyr")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
A6R1 %>%
group_by(Medication) %>%
summarise(
Mean = mean(HeadacheDays, na.rm = TRUE),
Median = median(HeadacheDays, na.rm = TRUE),
SD = sd(HeadacheDays, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## Medication Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 A 8.1 8 2.81 50
## 2 B 12.6 12.5 3.59 50
hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
main = "Histogram of Medication A Headache Days",
xlab = "Headache Days",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
main = "Histogram of Medication B HeadacheDays ",
xlab = "Headache Days",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
Answer the questions below as comments within the R script::
Q1)Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion,
does the histogram look symmetrical, positively skewed, or negatively
skewed?
Ans)The histogram is slightly positively skewed,
indicating a longer tail on the right.
Q2) Check the KURTOSIS of
the VARIABLE 1 histogram. In your opinion, does the histogram look too
flat, too tall, or does it have a proper bell curve?
Ans)The
histogram is moderately flat, showing no extreme peaks or outliers.
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion,
does the histogram look symmetrical, positively skewed, or negatively
skewed?
Ans)The histogram is slightly positively skewed, with a
tail extending to the right.
Q4) Check the KUROTSIS of the
VARIABLE 2 histogram. In your opinion, does the histogram look too flat,
too tall, or does it have a proper bell curve?
Ans)The histogram
has a moderately flat distribution, with no sharp peaks.
Check the normality for each group’s score statistically.The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?”For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.If p is LESS than .05 (p < .05), the data is NOT normal.
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
##
## Shapiro-Wilk normality test
##
## data: A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
##
## Shapiro-Wilk normality test
##
## data: A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741
Answer the questions below as a comment within the R script:
Was the data normally distributed for Variable 1?
Ans)Yes,
the data for Medication A is normally distributed because the p-value
(0.4913) is greater than 0.05.
Was the data normally distributed
for Variable 2?
Ans)Yes, the data for Medication B is normally
distributed because the p-value (0.8741) is greater than 0.05.
Check for any outliers impacting the mean for each group’s scores.
install.packages("ggplot2")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\manit\AppData\Local\Temp\RtmpIFFKei\downloaded_packages
install.packages("ggpubr")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\manit\AppData\Local\Temp\RtmpIFFKei\downloaded_packages
library(ggplot2)
library(ggpubr)
ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
color = "Medication",
palette = "jco",
add = "jitter")
Answer the questions below as a comment within the R script. Answer
the questions for EACH boxplot:
Q1) Were there any dots outside
of the boxplot? Are these dots close to the whiskers of the boxplot or
are they very far away?
Ans)Yes,there are a few dots outside the
boxplots for both Medication A and Medication B, but they are not very
far from the whiskers. Since the outliers are few and close to the
whiskers, it is reasonable to proceed with the independent
t-test.
Test if there was a difference between the means of the two groups.
t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)
##
## Two Sample t-test
##
## data: HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B
## 8.1 12.6
If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below.
NOTE: Getting results that are not statistically significant does NOT mean you switch to Mann-Whitney U.The Mann-Whitney U test is only for abnormally distributed data — not based on outcome significance.
Determine how big of a difference there was between the group means.
install.packages("effectsize")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'effectsize' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\manit\AppData\Local\Temp\RtmpIFFKei\downloaded_packages
library(effectsize)
cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d | 95% CI
## --------------------------
## -1.40 | [-1.83, -0.96]
##
## - Estimated using pooled SD.
Q1) What is the size of the effect?
A) A Cohen’s D of
-1.40 indicates the difference between the group averages was very
large.
Q2) Which group had the higher average score?
Ans)Here Group B has the higher average score.
An Independent t-test was conducted to compare the differences in the number of headaches between the Medication A (n = 50) and Medication B (n = 50). People who used medication B have higher average headache days (M = 12.6, SD = 3.59) than that of medication B (M = 8.1, SD = 2.81), t(100) = -6.9862, p < .001. The effect size was very large (d = -1.40), indicating a very large difference between headache days of medication A and medication B. Overall, medication B has significantly higher average days of headache among the participants.