# INDEPENDENT T-TEST & MANN-WHITNEY U TEST

# HYPOTHESIS TESTED:
# Used to test if there is a difference between the means of two groups.

# QUESTION

# What are the null and alternate hypotheses for YOUR research scenario?
# H0:There is no difference in the number of headacheDays between participants who take Medication A and those who take Medication B.

# H1: There is a difference in the number of headacheDays between participants who take Medication A and those who take Medication B.


# IMPORT EXCEL FILE
# Purpose: Import your Excel dataset into R to conduct analyses.

# INSTALL REQUIRED PACKAGE

# install.packages("readxl")

# LOAD THE PACKAGE

library(readxl)

A6R1 <- read_excel("C:\\Users\\deept\\Documents\\Zoom\\A6R1.xlsx")

# DESCRIPTIVE STATISTICS
# PURPOSE: Calculate the mean, median, SD, and sample size for each group.

# INSTALL REQUIRED PACKAGE
# install.packages("dplyr")

# LOAD THE PACKAGE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# CALCULATE THE DESCRIPTIVE STATISTICS

A6R1 %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(Medication, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## Warning: There were 2 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `Mean = mean(Medication, na.rm = TRUE)`.
## ℹ In group 1: `Medication = "A"`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## # A tibble: 2 Ă— 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A             NA    8    2.81    50
## 2 B             NA   12.5  3.59    50
# HISTOGRAMS
# Purpose: Visually check the normality of the scores for each group.

hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
     main = "Histogram of A Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
     main = "Histogram of B Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

# QUESTIONS
# Answer the questions below as comments within the R script:

# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# The Histogram is Symmetrical

# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram is bell curve

# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# The histogram is negatively skewed

# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram is too tall


# SHAPIRO-WILK TEST
# Purpose: Check the normality for each group's score statistically.

# CONDUCT THE SHAPIRO-WILK TEST

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741
# QUESTION
# Answer the questions below as a comment within the R script:
# Was the data normally distributed for Variable 1?
# The data is normally distributed
# Was the data normally distributed for Variable 2?
# The data is normally distributed

# If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below.
# If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).


# BOXPLOT
# Purpose: Check for any outliers impacting the mean for each group's scores.

# INSTALL REQUIRED PACKAGE
# If previously installed, put a hashtag in front of the code.

# install.packages("ggplot2")
# install.packages("ggpubr")

# LOAD THE PACKAGE
# Always reload the package you want to use. 

library(ggplot2)
library(ggpubr)

# CREATE THE BOXPLOT
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)


ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

# QUESTION
# Answer the questions below as a comment within the R script:
# Q1) Were there any dots outside of the boxplots? These dots represent participants with extreme scores.
# There are dots outside the boxplots 
# Q2) If there are outliers, in your opinion are the scores of those dots changing the mean so much that the mean no longer accurately represents the average score?
# No


# INDEPENDENT T-TEST 
# PURPOSE: Test if there was a difference between the means of the two groups.

t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6
# DETERMINE STATISTICAL SIGNIFICANCE

# The results were statistically significant (p < .05), continue to effect size section below.


# EFFECT-SIZE
# PURPOSE: Determine how big of a difference there was between the group means.

# INSTALL REQUIRED PACKAGE

# install.packages("effectsize")

# LOAD THE PACKAGE
# Always load the package you want to use.

library(effectsize)

# CALCULATE COHEN’S D

cohen_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohen_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.
# QUESTIONS
# Answer the questions below as a comment within the R script:

# Q1) What is the size of the effect?
# A Cohen's D of -1.40 indicates the difference between the group averages was very large.

# Q2) Which group had the higher average score?
# Group B has higher average score.

# WRITTEN REPORT FOR INDEPENDENT T-TEST
# Write a paragraph summarizing your findings.

# 2) REPORT YOUR DATA AS A PARAGRAPH

#    An Independent t-test was conducted to compare 
#    number of headacheDays between participants who take Medication A (n = 50) and without Medication  (n = 50). 
#    number of headacheDays between participants who take Medication A are significantly higher (M = 12.5, SD = 3.59) than 
#    participnats without medication (M =8 SD =2.81 ), t(50) = -6.9862, p = 0.00.
#    The effect size was very large (d =98)
#    Overall, number of headacheDays between participants without Medication have higher scores.