# DEPENDENT T-TEST & WILCOXON SIGN RANK
# Used to test if there is a difference between Before Training and After Training (comparing the means).

# NULL HYPOTHESIS (H0)
# The null hypothesis is ALWAYS used.
# There is no difference between the Before Training and After Training

# ALTERNATE HYPOTHESIS (H1)
# Choose ONE of the three options below (based on your research scenario):

# 1) NON-DIRECTIONAL ALTERNATE: There is a difference between the Before Training and After Training

# 2) DIRECTIONAL ALTERNATE HYPOTHESES ONE: Before Training is higher than After Training
# 3) DIRECTIONAL ALTERNATE HYPOTHESIS TWO: After Training are higher than Before Training.

# ========================================================
#                >> IMPORT EXCEL FILE <<
# ========================================================

# Import your Excel dataset into R to conduct analyses.


# 1) INSTALL REQUIRED PACKAGE
#    • If never installed, remove the hashtag before the install code.
#    • If previously installed, leave the hashtag in front of the code.

#install.packages("readxl")

# ........................................................

# 2) LOAD THE PACKAGE
#    • Always reload the package you want to use. 


library(readxl)

# ........................................................

# 3) IMPORT EXCEL FILE INTO R STUDIO

A6R3 <- read_excel("C:\\Users\\leena\\Desktop\\SLU\\Sem 3 Fall 1\\Week 6\\A6R3.xlsx")


# ============================================
#   >> CALCULATE THE DIFFERENCE SCORES <<
# ============================================

# Calculate the difference between the Before scores versus the after scores.

# ............................................

# 1) RENAME THE VARIABLES
#    • Replace "dataset" with your dataset name (without .xlsx)
#    • Replace "pre" with name of your variable for before scores.
#    • Replace "post" with name of your variable for after scores.

Before <- A6R3$PreTraining
After <- A6R3$PostTraining

Differences <- After - Before


# ========================================================
#                    >> HISTOGRAM <<
# ========================================================

# Create a histogram for difference scores to visually check skewness and kurtosis.

# .........................................................

# 1) CREATE THE HISTOGRAMS
#    • You do not need to edit this code.

hist(Differences,
     main = "Histogram of Difference Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "blue",
     border = "black",
     breaks = 20)

# ........................................................

# 2) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#    Q1) Is the histograms symmetrical, positively skewed, or negatively skewed?
#Ans: The histograms are symmetrical
#    Q2) Did the histogram look too flat, too tall, or did it have a proper bell curve?
#ans: It is a proper bell curve


# ========================================================
#                >> SHAPIRO-WILK TEST <<
# ========================================================

# Check the normality for the difference between the groups.

# ........................................................

# 1) CONDUCT SHAPIRO-WILK TEST
#    • You do not need to edit the code.

shapiro.test(Differences)
## 
##  Shapiro-Wilk normality test
## 
## data:  Differences
## W = 0.98773, p-value = 0.21
# ........................................................

# 2) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#    Q1) Was the data normally distributed or abnormally distributed?
#        If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL (continue with Dependent t-test).
#        If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Wilcoxon Sign Rank).
# Ans: Data is normally distributed. (W= 0.988, p=0.21 > 0.05)


# ========================================================
#                     >> BOXPLOT <<
# ========================================================

# Check for any outliers impacting the mean. 

# ........................................................

# 1) CREATE THE BOXPLOT
#    • You do not need to edit this code

boxplot(Before, After,
        names = c("Before", "After"),
        main = "Boxplot of Before and After Scores",
        col = c("lightblue", "lightgreen"))

# ........................................................

# 2) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#    Q1) Were there any dots outside of the boxplots? These dots represent participants with extreme scores.
#Ans: No
#    Q2) If there are outliers, are they are changing the mean so much that the mean no longer accurately represents the average score?
#Ans: No, the mean and median are nearly identical, so the averages represent the data well.
#    Q3) Make a decision. If the outliers are extreme, you will need to switch to a Wilcoxon Sign Rank. 
#        If there are not outliers, or the outliers are not extreme, continue with Dependent t-test.
#Ans: Continuing with the Dependent t-test



# ========================================================
#               >> DESCRIPTIVE STATISTICS <<
# ========================================================

# Calculate the mean, median, SD, and sample size for each group.

# ........................................................

# 1) DESCRIPTIVES FOR BEFORE SCORES
#    • You do not need to edit this code

mean(Before, na.rm = TRUE)
## [1] 59.73333
median(Before, na.rm = TRUE)
## [1] 60
sd(Before, na.rm = TRUE)
## [1] 7.966091
length(Before)
## [1] 150
# ........................................................

# 2) DESCRIPTIVES FOR AFTER SCORES
#    • You do not need to edit this code

mean(After, na.rm = TRUE)
## [1] 69.24
median(After, na.rm = TRUE)
## [1] 69.5
sd(After, na.rm = TRUE)
## [1] 9.481653
length(After)
## [1] 150
# ========================================================
#     >> DEPENDENT T-TEST & WILCOXON SIGN RANK TEST <<
# ========================================================

# Check if the means from Before and After are different.

# ........................................................

# 1) CHOOSE THE TEST
#    • If difference scores were normally distributed, use Dependent t-test.
#    • If difference scores were NOT normally distributed, use Wilcoxon Sign Rank test.
#Ans: Dependent t-test
# ........................................................

# 2) CONDUCT THE PROPER TEST
#    • Replace "dataset" with your dataset name (without .xlsx)
#    • Replace "score" with your dependent variable R code name (example: USD)
#    • Replace "group" with your independent variable R code name (example: Country)


# OPTION 1: DEPENDENT T-TEST
#           • Note: The Dependent t-test is also called the Paired Samples t-test.
#           • Remove the hashtag to use the code
#           • There are no other edits you need to make to the code.

t.test(Before, After, paired = TRUE)
## 
##  Paired t-test
## 
## data:  Before and After
## t = -23.285, df = 149, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -10.313424  -8.699909
## sample estimates:
## mean difference 
##       -9.506667
# OPTION 2: WILCOXON SIGN RANK TEST
#           • Remove the hashtag to use the code
#           • There are no other edits you need to make to the code.
#           • You do not need to edit the code.

#wilcox.test(Before, After, paired = TRUE)


# .......................................................

# 3) DETERMINE STATISTICAL SIGNIFICANCE
#    • If results were statistically significant (p < .05), continue to effect size section below.
#    • If results were NOT statistically significant (p > .05), skip to reporting section below.
#    • NOTE: Getting results that are not statistically significant does NOT mean you switch to Wilcoxon Sign Rank.
#      The Wilcoxon Sign Rank test is only for abnormally distributed data — not based on outcome significance.
#Ans: Results were statistically significant, p < 0.001


# ========================================================
#        >> EFFECT SIZE FOR DEPENDENT T-TEST <<
# ========================================================

# Determine how big of a difference there was between the group means.
# • Remove the hashtags to use the code below.

# ........................................................

# 1) INSTALL REQUIRED PACKAGE
#    • If never installed, remove the hashtag before the install code.
#    • If previously installed, leave the hashtag in front of the code.

#install.packages("effectsize")

# ........................................................

# 2) LOAD THE PACKAGE
#    Always reload the package you want to use. 

library(effectsize)

# ........................................................

# 3) CALCULATE COHEN’S D
#    • You do not need to edit the code

cohens_d(Before, After, paired = TRUE)
## For paired samples, 'repeated_measures_d()' provides more options.
## Cohen's d |         95% CI
## --------------------------
## -1.90     | [-2.17, -1.63]
# ........................................................

# 4) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#
#    Q1) What is the size of the effect?
#        The effect means how big or small was the difference between the group averages.
#         ± 0.00 to 0.19 = ignore
#         ± 0.20 to 0.49 = small
#         ± 0.50 to 0.79 = moderate
#         ± 0.80 to 1.29 = large
#         ± 1.30 to +   = very large
#        Examples:
#            A Cohen's D of 0.10 indicates the difference between the group averages was not truly meaningful. There was no effect.
#        A Cohen's D of 0.22 indicates the difference between the group averages was small.
#Ans:  t= -23.285, n=150
#     Cohen's d  = 1.90 (very large effect)
#     d is verylarge.
#
#     Q2) Which group had the higher average score?
#         - With the way we calculated differences (After minus Before), if it is positive, it means the After scores were higher.
#     - If it is negative, it means the Before scores were higher.
#         - You can also easily look at the means and tell which scores were higher.
#Ans: Before Training: M = 59.73, SD = 7.97
#     After Training: M = 69.24, SD = 9.48
# The after training scores were higher.

# ================================================================================================
# Research Report on Results: Dependent t-test (Paired Samples t-test)
# ================================================================================================

# Goal: Write a paragraph summarizing your findings

# Directions:

# For your results summary, you should report the following information:
# 1. The name of the inferential test used (Dependent t-test or Paired Samples t-test) -> Dependent t-test
# 2. The names of the two related conditions or time points you analyzed (use proper labels) -> Before training versus After training
# 3. The sample size (n) -> n=150
# 4. Whether the test was statistically significant (p < .05) or not (p > .05) -> p< 0.001
# 5. The mean (M) and standard deviation (SD) for each condition -> Before Traning M = 59.73, SD = 7.96
# After Training M = 69.24, SD = 9.48
# 6. Whether scores significantly increased, decreased, or stayed the same across time/conditions
#Ans: Scores increased Significantly
# 7. Degrees of freedom (df)
#Ans: df = 149
# 8. t-value
#Ans: t = -23.29
# 9. p-value (exact value if > .001, or p < .001)
#Ans: p< 0.001
# 10. If there was a significant difference, report the effect size (Cohen’s d) and interpretation (small, medium, large)
#Ans: Effect size = d = 1.90(very large)

# ================================================================================================
# Paragraph
# A dependent t-test was conducted to compare training scores before and after training among 150 participants. Results showed that post-training scores (M = 69.24, SD = 9.48) were significantly higher than pre-training scores (M = 59.73, SD = 7.97), t(149) = -23.29, p < .001. The mean difference was -9.51 points, with a 95% confidence interval from -10.31 to -8.70. The effect size was Cohen’s d = 1.90, indicating a very large effect. These results suggest that the training program produced a substantial improvement in participant scores.
#===================================================================================================
#===================================================================================================
  # DEPENDENT T-TEST
# Note: The Dependent t-test is also called the Paired Samples t-test.
# Remove the hashtag to use the code
# There are no other edits you need to make to the code.

t.test(Before, After, paired = TRUE)
## 
##  Paired t-test
## 
## data:  Before and After
## t = -23.285, df = 149, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -10.313424  -8.699909
## sample estimates:
## mean difference 
##       -9.506667
# DETERMINE STATISTICAL SIGNIFICANCE
# If results were statistically significant (p < .05), continue to effect size section below.
# If results were NOT statistically significant (p > .05), skip to reporting section below.
# NOTE: Getting results that are not statistically significant does NOT mean you switch to Wilcoxon Sign Rank.
# The Wilcoxon Sign Rank test is only for abnormally distributed data — not based on outcome significance.


# EFFECT SIZE FOR DEPENDENT T-TEST

# Purpose: Determine how big of a difference there was between the group means.

# INSTALL REQUIRED PACKAGE
# If never installed, remove the hashtag before the install code.
# If previously installed, leave the hashtag in front of the code.

#install.packages("effectsize")

# LOAD THE PACKAGE
# Always reload the package you want to use. 

library(effectsize)

# CALCULATE COHEN’S D
# You do not need to edit the code.
# Just remove the hashtag.

cohens_d(Before, After, paired = TRUE)
## For paired samples, 'repeated_measures_d()' provides more options.
## Cohen's d |         95% CI
## --------------------------
## -1.90     | [-2.17, -1.63]
# QUESTIONS
# Answer the questions below as a comment within the R script:
#
# Q1) What is the size of the effect?
# The effect means how big or small was the difference between the group averages.
# ± 0.00 to 0.19 = ignore
# ± 0.20 to 0.49 = small
# ± 0.50 to 0.79 = moderate
# ± 0.80 to 1.29 = large
# ± 1.30 to +   = very large
# Ans: Cohen’s d = -1.90. The absolute value of 1.90 indicates a VERY LARGE effect size.
#
# Q2) Which group had the higher average score?
# Ans: he After Training group had the higher average (M = 69.24, SD = 9.48) compared to Before Training (M = 59.73, SD = 7.97).

# Research Report on Results: Dependent t-test
# Goal: Write a paragraph summarizing your findings

# Directions:

# For your results summary, you should report the following information:
# 1. The name of the inferential test used (Dependent t-test or Paired Samples t-test)
# Ans: Dependent t-test (Paired Samples t-test)
# 2. The names of the two related conditions or time points you analyzed (use proper labels)
# Ans: Before Training and After Training
# 3. The sample size (n)
# Ans: n = 150 participants
# 4. Whether the test was statistically significant (p < .05) or not (p > .05)
# Ans: Yes, statistically significant (p < .001)
# 5. The mean (M) and standard deviation (SD) for each condition
# Ans: Before Training: M = 59.73, SD = 7.97
# After Training: M = 69.24, SD = 9.48
# 6. Whether scores significantly increased, decreased, or stayed the same across time/conditions
# Ans: Scores significantly increased after training
# 7. Degrees of freedom (df)
# Ans: df = 149
# 8. t-value
# Ans: t = -23.29
# 9. p-value (exact value if > .001, or p < .001)
# Ans: p < .001
# 10. If there was a significant difference, report the effect size (Cohen’s d) and interpretation (small, medium, large)
# Ans: Cohen’s d = 1.90, very large effect

# Answer:
# A dependent t-test was conducted to compare training scores before and after training among 150 participants. 
# Results showed that post-training scores (M = 69.24, SD = 9.48) were significantly higher than pre-training scores 
# (M = 59.73, SD = 7.97), t(149) = -23.29, p < .001. 
# The mean difference was -9.51 points. 
# The effect size was Cohen’s d = 1.90, indicating a very large effect. 
# These findings suggest that the training program produced a substantial improvement in participant scores.