Research Scenario

A medical research team needs to determine which of the two medication they created was more effective at reducing headaches.

Hypothesis

Null Hpothesis (H0): There is no difference between the scores of Group A and Group B.
Alternate Hypothesis (H1): There is a difference between the scores of Group A and Group B.

Summary

An Independent t-test was conducted to compare headache days between medication A (n = 50) and B (n = 50). Participants who took Medication B scored significantly higher (M = 12.6, SD = 3.59) than Participants who took Medication A (M = 8.1, SD = 2.81), t = -6.9862, p = 3.431e-10. The effect size was very large (d = -1.40), indicating a very large difference between the two groups. Overall, participants who took Medication B had more headache days than the ones who took Medication A.

In-Between Groups

Code

# install.packages("readxl")
library(readxl)
A6R1 <- read_excel("C:/Users/armil/Downloads/A6R1.xlsx")

Descriptive Statistics

# install.packages("dplyr")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
A6R1 %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 Ă— 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50
# Histograms

hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
main = "Histogram of Medication A Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
main = "Histogram of Medication B Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

# QUESTIONS

# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Symmetric

# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Bell Curve

# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Symmetric

# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Bell Curve
# SHAPIRO-WILK TEST

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741

For both the groups, p-values > 0.05, so the data is normal.

# QUESTION
# Was the data normally distributed for Variable 1?
# Yes
# Was the data normally distributed for Variable 2?
# Yes
# BOXPLOT

# install.packages("ggplot2")
# install.packages("ggpubr")

library(ggplot2)
library(ggpubr)

ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

# QUESTION
# For Medication A:
# Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
# No, there are no dots outside of the boxplot. So we are going to continue with Independent t-test.

# For Medication B: 
# Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
# Yes, there are two dots outside of the boxplot. Considering the distance, we are going to continue with Independent t-test
# INDEPENDENT T-TEST 

t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6
# The results were statistically significant (p < .05), so we are going to continue with effect size. 

# EFFECT-SIZE

# install.packages("effectsize")

library(effectsize)

cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.
# QUESTIONS
# Q1) What is the size of the effect?
# Cohen's d is -1.40, so the difference is very large. 

# Q2) Which group had the higher average score?
# Group with Medication B had the higher average score. 

REVIEW OF YOUR OUTPUT

#    1. The name of the inferential test used
# Independent t-test
#    2. The names of the IV and DV. 
# IV - Medication and DV - HeadacheDays
#    3. The sample size for each group (labeled as "n").
# Medication A - 50
# Medication B - 50
#    4. Whether the inferential test results were statistically significant (p < .05) or not (p > .05)
# The results were statistically significant. 
#    5. The mean and SD for each group's score on the DV (rounded to two places after the decimal)
# Medication A: Mean - 8.1 and SD - 2.81
# Medication B: Mean - 12.6 and SD - 3.59
#    7. Degrees of freedom (labeled as "df")
# df = 98
#    8. t-value (labeled as "sample estimate: cor" in output)
# t = -6.9862
#    9. EXACT p-value to three decimals. NOTE: If p > .05, just report p > .05 If p < .001, just report p < .001
# p = 3.431e-10
#   10. Effect size (Cohen’s d) ** Only if the results were significant
# Cohen's d = -1.40