INDEPENDENT T-TEST & MANN-WHITNEY U TEST

QUESTION

What are the null and alternate hypotheses for YOUR research scenario? H0:There is no difference in the mean number of headache days between the two medication groups H1:There is a difference in the mean number of headache days between the two medication groups

IMPORT EXCEL FILE

#install.packages("readxl")

LOAD THE PACKAGE

library(readxl)
dataset <- read_excel("C:\\Users\\navya\\Downloads\\A6R1.xlsx")

DESCRIPTIVE STATISTICS

PURPOSE: Calculate the mean, median, SD, and sample size for each group.

#install.packages("dplyr")
#install.packages("tidyverse")

LOAD THE PACKAGE

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

CALCULATE THE DESCRIPTIVE STATISTICS

dataset %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50

HISTOGRAMS

hist(dataset$HeadacheDays[dataset$Medication == "A"],
    main = "Histogram of A",
    xlab = "Value",
    ylab = "Frequency",
    col = "lightblue",
    border = "black",
    breaks = 20)

hist(dataset$HeadacheDays[dataset$Medication == "B"],
    main = "Histogram of B",
    xlab = "Value",
    ylab = "Frequency",
    col = "lightgreen",
    border = "black",
    breaks = 20)

# QUESTIONS
#Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
#Answer: The histogram of A appears to be slightly positively skewed.The tail of the distribution extends longer toward the right.

#Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
#Answer: The histogram of A looks too tall, which means the values are concentrated around the mean with thick tails. It doesn't have properly curved bell shape. 

#Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
#Answer:The histogram of B appears to be roughly symmetrical which is positively skewed.

#Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
#Answer:The histogram of B has slightly peaked shape close to bell curve.

SHAPIRO-WILK TEST

shapiro.test(dataset$HeadacheDays[dataset$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(dataset$HeadacheDays[dataset$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "B"]
## W = 0.98758, p-value = 0.8741
# QUESTION
#Was the data normally distributed for Variable 1?
#Yes, as the p=0.49>0.05 it is normally distributed.

#Was the data normally distributed for Variable 2?
#Yes, as the p=0.87>0.05 it is normally distributed.

BOXPLOT

#install.packages("ggplot2")
#install.packages("ggpubr")
library(ggplot2)
library(ggpubr)

#CREATE THE BOXPLOT

ggboxplot(dataset, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

# QUESTION
#Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot (check if there are any dots past the lines on the boxes) or are they very far away?
#Answer: 
#BOXPLOT A
#Yes, there is one dot outside the boxplot and it is located just above the upper whisker, so we should continue with Independent t-test
#BOXPLOT B
#Yes there are two dots outside the box so we can continue with Independent t-test.

#Conclusion Since both groups have passed normality check and close to whiskers we should continue with Independent t-test.

INDEPENDENT T-TEST

t.test(HeadacheDays ~ Medication, data = dataset, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6

DETERMINE STATISTICAL SIGNIFICANCE

EFFECT-SIZE

#install.packages("effectsize")

LOAD THE PACKAGE

library(effectsize)

CALCULATE COHEN’S D

cohens_d_result <- cohens_d(HeadacheDays ~ Medication, data = dataset, pooled_sd = TRUE)
print(cohens_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.
# QUESTIONS
#Q1) What is the size of the effect?
#The Cohen's d value is -1.40 which is +/- 1.30 to + size of effect is very large

# Q2) Which group had the higher average score?
#Group B had higher average score which is M=12.6 compared to Group A which is M=8.18.

WRITTEN REPORT FOR INDEPENDENT T-TEST

  1. FINAL REPORT An Independent t-test was conducted to compare the mean of headache days (Dependent Variable) by participants taking Group A VS Group B (Independent Variable). The descriptive statistics shows that the Group A (N = 50)which is an average of M = 8.18 headache days (SD = 2.81), which was lower than Group B(N = 50), which is an average of M = 12.60 headache days (SD = 3.59). This comparison gives a conclusion that Group A was more effective than Group B.