Hypothesis:

H0: There is no difference in the number of headaches between participants taking Medication A and participants taking Medication B.

H1: There is a difference in the number of headaches between participants taking Medication A and participants taking Medication B.

Execution:

library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(ggpubr)
library(effectsize)
dataset <- read_excel("/Users/patel777/Desktop/Week6/A6R1.xlsx")

score <- dataset$HeadacheDays
group <- dataset$Medication
dataset %>%
group_by(Medication) %>%
summarise(
Mean = mean(HeadacheDays),
Median = median(HeadacheDays),
SD = sd(HeadacheDays),
N = n()
)
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50
hist(dataset$HeadacheDays[dataset$Medication == "A"],
main = "Histogram: Medication A",
xlab = "Headache Days",
col = "lightblue", border = "black", breaks = 20)

hist(dataset$HeadacheDays[dataset$Medication == "B"],
main = "Histogram: Medication B",
xlab = "Headache Days",
col = "lightgreen", border = "black", breaks = 20)

QUESTIONS:

Q1) Check the SKEWNESS of the Group A histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

  1. The histogram for Group A looks symmetrical

Q2) Check the KURTOSIS of the Group A histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

  1. The histogram has a proper bell shaped curve

Q3) Check the SKEWNESS of the Group B histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

  1. The histogram for Group A looks symmetrical

Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

  1. The histogram has a proper bell shaped curve

SHAPIRO-WILK TEST:

Purpose: Check the normality for each group’s score statistically. The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.

shapiro.test(dataset$HeadacheDays[dataset$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(dataset$HeadacheDays[dataset$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "B"]
## W = 0.98758, p-value = 0.8741

QUESTION:

1.Was the data normally distributed for Group A?

Yes, the data is Normally distributed for Group A

2.Was the data normally distributed for Group B?

Yes, the data is Normally distributed for Group B

NOTE:

If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below. If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).

BOXPLOT:

Purpose: Check for any outliers impacting the mean for each group’s scores.

ggboxplot(dataset, x = "Medication", y = "HeadacheDays",
color = "Medication", palette = "jco", add = "jitter")

DETERMINE STATISTICAL SIGNIFICANCE:

If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below.

NOTE: Getting results that are not statistically significant does NOT mean you switch to Mann-Whitney U. The Mann-Whitney U test is only for abnormally distributed data — not based on outcome significance.

t.test(HeadacheDays ~ Medication, data = dataset, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6
cohens_d_result <- cohens_d(HeadacheDays ~ Medication,
data = dataset, pooled_sd = TRUE)
cohens_d_result
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.

QUESTIONS:

Q1) What is the size of the effect?

  1. A Cohen’s D of -1.40 indicates the difference between the group averages was very large.

Q2) Which group had the higher average score?

  1. Here Group B has the higher average score.

WRITTEN REPORT FOR INDEPENDENT T-TEST:

An Independent t-test was performed to compare the difference between the number of headaches among Medication A (n = 50) and Medication B (n = 50). People using medication B have higher average headache days (M = 12.6, SD = 3.59) than that of medication B (M = 8.1, SD = 2.81), t(100) = -6.9862, p < .001. The effect size was very large (d = -1.40), showing very large difference between headache days of medication A and medication B. In general, medication B has significantly higher average days of headache among the participants.