# INDEPENDENT T-TEST & MANN-WHITNEY U TEST

# HYPOTHESIS TESTED:
# Used to test if there is a difference between the means of two groups.

# NULL HYPOTHESIS (H0)
# The null hypothesis below is ALWAYS used.
# There is no difference between the scores of Group A and Group B.

# ALTERNATE HYPOTHESIS (H1)
# Choose ONE of the three options below (based on your research scenario):
# 1) NON-DIRECTIONAL ALTERNATE HYPOTHESIS: There is a difference between the scores of Group A and Group B.
# 2) DIRECTIONAL ALTERNATE HYPOTHESES ONE: Group A has higher scores than Group B.
# 3) DIRECTIONAL ALTERNATE HYPOTHESIS TWO: Group B has higher scores than Group A.

# QUESTION
# What are the null and alternate hypotheses for YOUR research scenario?
# H0:
# H1: 


# IMPORT EXCEL FILE
# Purpose: Import your Excel dataset into R to conduct analyses.

# INSTALL REQUIRED PACKAGE
# If never installed, remove the hashtag before the install code.
# If previously installed, leave the hashtag in front of the code.

# install.packages("readxl")

# LOAD THE PACKAGE
# Always reload the package you want to use. 

library(readxl)

# IMPORT EXCEL FILE INTO R STUDIO
# Download the Excel file from One Drive and save it to your desktop.
# Right-click the Excel file and click “Copy as path” from the menu.
# In RStudio, replace the example path below with your actual path.
# Replace backslashes \ with forward slashes / or double them //:
# ✘ WRONG   "C:\Users\Joseph\Desktop\mydata.xlsx"
# ✔ CORRECT "C:/Users/Joseph/Desktop/mydata.xlsx"
# ✔ CORRECT "C:\\Users\\Joseph\\Desktop\\mydata.xlsx"
# Replace "dataset" with the name of your excel data (without the .xlsx)

Datasets <- read_excel("C:/Users/Luqman ullah/Downloads/A6R1.xlsx")
A6R1 <- read_excel("C:/Users/Luqman ullah/Downloads/A6R1.xlsx")


# DESCRIPTIVE STATISTICS
# PURPOSE: Calculate the mean, median, SD, and sample size for each group.

# INSTALL REQUIRED PACKAGE
# If never installed, remove the hashtag before the install code.
# If previously installed, leave the hashtag in front of the code.

# install.packages("dplyr")

# LOAD THE PACKAGE
# Always reload the package you want to use. 

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# CALCULATE THE DESCRIPTIVE STATISTICS
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)
# NOTE: Do NOT edit "group_by"

A6R1 %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50
# HISTOGRAMS
# Purpose: Visually check the normality of the scores for each group.


# CREATE THE HISTOGRAMS 
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)
# Replace "Group1" with the R code name for your first group (example: USA)
# Replace "Group2" with the R code name for your second group (example: India)

 hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
main = "Histogram of Group 1 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

 hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
main = "Histogram of Group 2 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

# QUESTIONS
# Answer the questions below as comments within the R script:

# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?


# SHAPIRO-WILK TEST
# Purpose: Check the normality for each group's score statistically.
# The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.
# The test is checking "Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?"
# For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.
# If p is LESS than .05 (p < .05), the data is NOT normal.

# CONDUCT THE SHAPIRO-WILK TEST
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)
# Replace "Group1" with the R code name for your first group (example: USA)
# Replace "Group2" with the R code name for your second group (example: India)

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741
# QUESTION
# Answer the questions below as a comment within the R script:
# Was the data normally distributed for Variable 1?
# Was the data normally distributed for Variable 2?

# If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below.
# If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).


# BOXPLOT
# Purpose: Check for any outliers impacting the mean for each group's scores.

# INSTALL REQUIRED PACKAGE
# If previously installed, put a hashtag in front of the code.

# install.packages("ggplot2")
# install.packages("ggpubr")

# LOAD THE PACKAGE
# Always reload the package you want to use. 

library(ggplot2)
library(ggpubr)

# CREATE THE BOXPLOT
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)


ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

# QUESTION
# Answer the questions below as a comment within the R script:
# Q1) Were there any dots outside of the boxplots? These dots represent participants with extreme scores.
# Q2) If there are outliers, in your opinion are the scores of those dots changing the mean so much that the mean no longer accurately represents the average score?

# If there were no extreme outliers, this means the data is NORMAL. Continue to the Independent t-test.
# If there WERE any extreme outliers, this means the data is NOT abnormal. Switch to the Mann-Whitney U test.

                                        # INDEPENDENT T-TEST 

# PURPOSE: Test if there was a difference between the means of the two groups.

# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)

t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6
# DETERMINE STATISTICAL SIGNIFICANCE

# If results were statistically significant (p < .05), continue to effect size section below.
# If results were NOT statistically significant (p > .05), skip to reporting section below.

# NOTE: Getting results that are not statistically significant does NOT mean you switch to Mann-Whitney U.
# The Mann-Whitney U test is only for abnormally distributed data — not based on outcome significance.


# EFFECT-SIZE
# PURPOSE: Determine how big of a difference there was between the group means.

# INSTALL REQUIRED PACKAGE
# If never installed, remove the hashtag before the install code.
# If previously installed, leave the hashtag in front of the code.

# install.packages("effectsize")

# LOAD THE PACKAGE
# Always load the package you want to use.

library(effectsize)

# CALCULATE COHEN’S D
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)

cohen_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohen_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.
# QUESTIONS
# Answer the questions below as a comment within the R script:

# Q1) What is the size of the effect?
# The effect means how big or small was the difference between the group averages.
# ± 0.00 to 0.19 = ignore
# ± 0.20 to 0.49 = small
# ± 0.50 to 0.79 = moderate
# ± 0.80 to 1.29 = large
# ± 1.30 to +   = very large
# Example 1) A Cohen's D of 0.10 indicates the difference between the group averages was not truly meaningful. There was no effect.
# Example 2) A Cohen's D of 0.22 indicates the difference between the group averages was small.

# Q2) Which group had the higher average score?
# You will notice that this effect size is either positive or negative. This tells us whether Group A or Group B had a higher score.
# The group you entered first into your code is Group A, and the group you entered second is B.
# However, it can be confusing to remember which is group is A and which group is B.
# To make things easy, just look at the means of each group to see which group had the higher score. 


# WRITTEN REPORT FOR INDEPENDENT T-TEST
# Write a paragraph summarizing your findings.

# 1) REVIEW YOUR OUTPUT
#    Collect the information below from your output:
#    1. The name of the inferential test used (Independent t-test)
#    2. The names of the IV and DV (their proper names, not their R code names).
#    3. The sample size for each group (labeled as "n").
#    4. Whether the inferential test results were statistically significant (p < .05) or not (p > .05)
#    5. The mean and SD for each group's score on the DV (rounded to two places after the decimal)
#    7. Degrees of freedom (labeled as "df")
#    8. t-value (labeled as "sample estimate: cor" in output)
#    9. EXACT p-value to three decimals. NOTE: If p > .05, just report p > .05 If p < .001, just report p < .001
#   10. Effect size (Cohen’s d) ** Only if the results were significant


# 2) REPORT YOUR DATA AS A PARAGRAPH
#    An example report is provided below. You should copy the paragraph and just edit/ replace words with your information.
#    This is not considered plagiarizing because science has a specific format for reporting information.
#    
#    EXAMPLE
#    An Independent t-test was conducted to compare 
#    exam scores between students who attended a review session (n = 60) and students who did not (n = 60). 
#    Students who attended the review session scored significantly higher (M = 85.31, SD = 6.12) than 
#    students who did not attend a review session (M = 78.21, SD = 7.42), t(118) = 4.25, p = 0.12.
#    The effect size was large (d = 0.78), indicating a very large difference between student exam scores.
#    Overall, attending the review session resulted in much exam higher scores.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.