Research Scenario 1

A medical research team created a new medication to reduce headaches (Medication A). They want to determine if Medication A is more effective at reducing headaches than the current medication on the market (Medication B). A group of participants were randomly assigned to either take Medication A or Medication B. Data was collected for 30 days through an app and participants reported each day if they did or did not have a headache. Was there a difference in the number of headaches between the groups?

PURPOSE

Used to test whether there is a statistically significant difference between the means of two independent groups

NULL HYPOTHESIS

There is no difference in the number of headaches between participants taking Medication A and those taking Medication B.

ALTERNATE HYPOTHESIS

There is a difference in the number of headaches between participants taking Medication A and those taking Medication B.

##install.packages(“readxl”)

chooseCRANmirror(graphics = FALSE, ind = 1) 
install.packages("readxl")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'readxl'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\manit\AppData\Local\R\win-library\4.5\00LOCK\readxl\libs\x64\readxl.dll
## to C:\Users\manit\AppData\Local\R\win-library\4.5\readxl\libs\x64\readxl.dll:
## Permission denied
## Warning: restored 'readxl'
## 
## The downloaded binary packages are in
##  C:\Users\manit\AppData\Local\Temp\RtmpkXoof9\downloaded_packages

LOAD THE PACKAGE

library(readxl)

IMPORT THE EXCEL FILE INTO R STUDIO

A6R1 <- read_excel("C:\\Users\\manit\\OneDrive\\Desktop\\A6R1.xlsx")
head(A6R1)
## # A tibble: 6 × 3
##   ParticipantID Medication HeadacheDays
##           <dbl> <chr>             <dbl>
## 1             1 A                     6
## 2             2 A                     7
## 3             3 A                    13
## 4             4 A                     8
## 5             5 A                     8
## 6             6 A                    13

DESCRIPTIVE STATISTICS

Calculate the mean, median, SD, and sample size for each variable. ## INSTALL THE REQUIRED PACKAGE

#install.packages("dplyr")

LOAD THE PACKAGE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

CALCULATE THE DESCRIPTIVE DATA`

A6R1 %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50

HISTOGRAMS

Purpose: Visually check the normality of the scores for each group.

hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
     main = "Histogram of Medication A Headache Days",
     xlab = "Headache Days",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
     main = "Histogram of Medication B HeadacheDays ",
     xlab = "Headache Days",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

QUESTIONS

Answer the questions below as comments within the R script::
Q1)Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
Ans)The histogram is slightly positively skewed, indicating a longer tail on the right.
Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
Ans)The histogram is moderately flat, showing no extreme peaks or outliers.
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
Ans)The histogram is slightly positively skewed, with a tail extending to the right.
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
Ans)The histogram has a moderately flat distribution, with no sharp peaks.

SHAPIRO-WILK TEST

Purpose:

Check the normality for each group’s score statistically.The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?”For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.If p is LESS than .05 (p < .05), the data is NOT normal.

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741

QUESTION

Answer the questions below as a comment within the R script:
Was the data normally distributed for Variable 1?
Ans)Yes, the data for Medication A is normally distributed because the p-value (0.4913) is greater than 0.05.
Was the data normally distributed for Variable 2?
Ans)Yes, the data for Medication B is normally distributed because the p-value (0.8741) is greater than 0.05.

BOXPLOT

Purpose:

Check for any outliers impacting the mean for each group’s scores.

INSTALL REQUIRED PACKAGE

install.packages("ggplot2")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\manit\AppData\Local\Temp\RtmpkXoof9\downloaded_packages
install.packages("ggpubr")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\manit\AppData\Local\Temp\RtmpkXoof9\downloaded_packages

LOAD THE PACKAGE

Always reload the package you want to use.

library(ggplot2)
library(ggpubr)

CREATE THE BOXPLOT

ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

QUESTION

Answer the questions below as a comment within the R script. Answer the questions for EACH boxplot:
Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
Ans)Yes, there are dots outside of the boxplot for both Medication A and Medication B.
The dots for Medication A are relatively close to the whiskers, while for Medication B, the dots are further away from the whiskers, indicating potential outliers.

MANN-WHITNEY U TEST

#PURPOSE: Test if there was a difference between the distributions of the two groups.

wilcox.test(HeadacheDays ~ Medication, data = A6R1, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  HeadacheDays by Medication
## W = 394.5, p-value = 3.358e-09
## alternative hypothesis: true location shift is not equal to 0

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below. # EFFECT-SIZE # PURPOSE: Determine how big of a difference there was between the group distributions. # INSTALL REQUIRED PACKAGE

install.packages("effectsize")
## Installing package into 'C:/Users/manit/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'effectsize' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\manit\AppData\Local\Temp\RtmpkXoof9\downloaded_packages

LOAD THE PACKAGE

library(effectsize)

CALCULATE EFFECT SIZE (R VALUE)

rank_biserial(HeadacheDays~ Medication, data = A6R1, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.68             | [-0.79, -0.54]

QUESTIONS

Answer the questions below as a comment within the R script:
Q1) What is the size of the effect?
Ans)The rank-biserial correlation is -0.68 with a 95% confidence interval of [-0.79, -0.54].This indicates a large effect because the value falls between ±0.50 and higher, meaning the difference between the two groups is significant and substantial.
Q2) Which group had the higher average rank?
Ans)Since the rank-biserial correlation is negative (-0.68), this suggests that Medication A had lower ranks on average, and Medication B had higher ranks.Therefore, Medication B had the higher average rank.
# WRITTEN REPORT FOR MANN-WHITNEY U TEST # Write a paragraph summarizing your findings.

1) REVIEW YOUR OUTPUT

Mann-Whitney U Test Report: An independent Mann-Whitney U test was conducted to compare HeadacheDays between participants who took Medication A (n = 100) and those who took Medication B (n = 100). The results indicated a statistically significant difference between the two groups (U = 394.5, p = 3.358e-09). Specifically, participants who took Medication B had significantly higher ranks for HeadacheDays compared to those who took Medication A. The median number of HeadacheDays for Medication A was 8.00, while the median for Medication B was 12.00. The rank-biserial correlation (effect size) was -0.68, indicating a large effect. This suggests that the difference in HeadacheDays between the two groups is substantial.

In summary, Medication B resulted in higher HeadacheDays compared to Medication A, with a large effect size, and the results were statistically significant.

REPORT YOUR DATA AS A PARAGRAPH

A Mann-Whitney U test was conducted to compare HeadacheDays between participants who took Medication A (n = 100) and those who took Medication B (n = 100). Participants who took Medication B had significantly higher median HeadacheDays (Mdn = 12.00) compared to those who took Medication A (Mdn = 8.00), U = 394.5, p = 3.358e-09. The effect size was large (r = -0.68), indicating a substantial difference between the groups. Overall, Medication B resulted in a significantly higher number of HeadacheDays than Medication A.