# DEPENDENT T-TEST & WILCOXON SIGN RANK
# Used to test if there is a difference between Before scores and After scores (comparing the means).

# NULL HYPOTHESIS (H0)
# The null hypothesis is ALWAYS used.
# There is no difference between the Before scores and After scores.

# ALTERNATE HYPOTHESIS (H1)
# Choose ONE of the three options below (based on your research scenario):

# 1) NON-DIRECTIONAL ALTERNATE: There is a difference between the Before scores and After scores.

# 2) DIRECTIONAL ALTERNATE HYPOTHESES ONE: Before scores are higher than After scores.
# 3) DIRECTIONAL ALTERNATE HYPOTHESIS TWO: After scores are higher than Before scores.

# ========================================================
#                >> IMPORT EXCEL FILE <<
# ========================================================

# Import your Excel dataset into R to conduct analyses.


# 1) INSTALL REQUIRED PACKAGE
#    • If never installed, remove the hashtag before the install code.
#    • If previously installed, leave the hashtag in front of the code.

#install.packages("readxl")

# ........................................................

# 2) LOAD THE PACKAGE
#    • Always reload the package you want to use. 


library(readxl)

# ........................................................

# 3) IMPORT EXCEL FILE INTO R STUDIO
#    • Download the Excel file from One Drive and save it to your desktop.
#    • Right-click the Excel file and click “Copy as path” from the menu.
#    • In RStudio, replace the example path below with your actual path.
#    • Replace backslashes \ with forward slashes / or double them //:
#         ✘ WRONG   "C:\Users\Joseph\Desktop\mydata.xlsx"
#         ✔ CORRECT "C:/Users/Joseph/Desktop/mydata.xlsx"
#         ✔ CORRECT "C:\\Users\\Joseph\\Desktop\\mydata.xlsx"
#    • Replace "dataset" with the name of your excel data (without the .xlsx)


A6R4 <- read_excel("C:\\Users\\leena\\Desktop\\SLU\\Sem 3 Fall 1\\Week 6\\A6R4.xlsx")


# ============================================
#   >> CALCULATE THE DIFFERENCE SCORES <<
# ============================================

# Calculate the difference between the Before scores versus the after scores.

# ............................................

# 1) RENAME THE VARIABLES
#    • Replace "dataset" with your dataset name (without .xlsx)
#    • Replace "pre" with name of your variable for before scores.
#    • Replace "post" with name of your variable for after scores.

Before <- A6R4$PreCampaignSales
After <- A6R4$PostCampaignSales

Differences <- After - Before


# ========================================================
#                    >> HISTOGRAM <<
# ========================================================

# Create a histogram for difference scores to visually check skewness and kurtosis.

# .........................................................

# 1) CREATE THE HISTOGRAMS
#    • You do not need to edit this code.

hist(Differences,
     main = "Histogram of Difference Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "blue",
     border = "black",
     breaks = 20)

# ........................................................

# 2) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#    Q1) Is the histograms symmetrical, positively skewed, or negatively skewed?
#    Ans: The histogram is positively skewed.
#    Q2) Did the histogram look too flat, too tall, or did it have a proper bell curve?
#    Ans: The histogram is too tall


# ========================================================
#                >> SHAPIRO-WILK TEST <<
# ========================================================

# Check the normality for the difference between the groups.

# ........................................................

# 1) CONDUCT SHAPIRO-WILK TEST
#    • You do not need to edit the code.

shapiro.test(Differences)
## 
##  Shapiro-Wilk normality test
## 
## data:  Differences
## W = 0.94747, p-value = 0.01186
# ........................................................

# 2) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#    Q1) Was the data normally distributed or abnormally distributed?
#        If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL (continue with Dependent t-test).
#        If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Wilcoxon Sign Rank).
# Ans: p = 0.01186, p value is less than 0.05, p < 0.05 -> data is not normal, so we will use Wilcoxon sign rank.


# ========================================================
#                     >> BOXPLOT <<
# ========================================================

# Check for any outliers impacting the mean. 

# ........................................................

# 1) CREATE THE BOXPLOT
#    • You do not need to edit this code

boxplot(Before, After,
        names = c("Before", "After"),
        main = "Boxplot of Before and After Scores",
        col = c("lightblue", "lightgreen"))

# ........................................................

# 2) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#    Q1) Were there any dots outside of the boxplots? These dots represent participants with extreme scores.
# Ans: There were some dots outisde the boxplots.
#    Q2) If there are outliers, are they are changing the mean so much that the mean no longer accurately represents the average score?
# Ans: The outliers are not changing the mean so much. 
#    Q3) Make a decision. If the outliers are extreme, you will need to switch to a Wilcoxon Sign Rank. 
# Ans: The distribution is not normal, p<0.05, So we eed to use Wilcoxon Sign Rank.
#        If there are not outliers, or the outliers are not extreme, continue with Dependent t-test.



# ========================================================
#               >> DESCRIPTIVE STATISTICS <<
# ========================================================

# Calculate the mean, median, SD, and sample size for each group.

# ........................................................

# 1) DESCRIPTIVES FOR BEFORE SCORES
#    • You do not need to edit this code

mean(Before, na.rm = TRUE)
## [1] 25154.53
median(Before, na.rm = TRUE)
## [1] 24624
sd(Before, na.rm = TRUE)
## [1] 12184.4
length(Before)
## [1] 60
# ........................................................

# 2) DESCRIPTIVES FOR AFTER SCORES
#    • You do not need to edit this code

mean(After, na.rm = TRUE)
## [1] 26873.45
median(After, na.rm = TRUE)
## [1] 25086
sd(After, na.rm = TRUE)
## [1] 14434.37
length(After)
## [1] 60
# ========================================================
#     >> DEPENDENT T-TEST & WILCOXON SIGN RANK TEST <<
# ========================================================

# Check if the means from Before and After are different.

# ........................................................

# 1) CHOOSE THE TEST
#    • If difference scores were normally distributed, use Dependent t-test.
#    • If difference scores were NOT normally distributed, use Wilcoxon Sign Rank test.
# Ans: Wilcoxon Sign Rank Test

# ........................................................

# 2) CONDUCT THE PROPER TEST
#    • Replace "dataset" with your dataset name (without .xlsx)
#    • Replace "score" with your dependent variable R code name (example: USD)
#    • Replace "group" with your independent variable R code name (example: Country)


# OPTION 1: DEPENDENT T-TEST
#           • Note: The Dependent t-test is also called the Paired Samples t-test.
#           • Remove the hashtag to use the code
#           • There are no other edits you need to make to the code.

# t.test(Before, After, paired = TRUE)


# OPTION 2: WILCOXON SIGN RANK TEST
#           • Remove the hashtag to use the code
#           • There are no other edits you need to make to the code.
#           • You do not need to edit the code.

wilcox.test(Before, After, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  Before and After
## V = 640, p-value = 0.0433
## alternative hypothesis: true location shift is not equal to 0
# .......................................................

# 3) DETERMINE STATISTICAL SIGNIFICANCE
#    • If results were statistically significant (p < .05), continue to effect size section below.
#    • If results were NOT statistically significant (p > .05), skip to reporting section below.
#    • NOTE: Getting results that are not statistically significant does NOT mean you switch to Wilcoxon Sign Rank.
#      The Wilcoxon Sign Rank test is only for abnormally distributed data — not based on outcome significance.


# ========================================================
#     >> EFFECT SIZE FOR WILCOXON SIGN RANK TEST <<
# ========================================================

# Determine how big of a difference there was between the group means.

# ........................................................

# 1) INSTALL REQUIRED PACKAGE
#    - If never installed, remove the hashtag before the install code.
#    - If previously installed, leave the hashtag in front of the code.

#install.packages("rstatix")
#install.packages("coin")

# ........................................................

# 2) LOAD THE PACKAGE
#    Always reload the package you want to use. 

library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
# ........................................................

# 3) CALCULATE RANK BISERIAL CORRELATION (EFFECT SIZE)
#    - You do not need to edit this code, just remove the hashtags

# Commented block (safe, won’t run)
df_long <- data.frame(
id = rep(1:length(Before), 2),
time = rep(c("Before", "After"), each = length(Before)),
score = c(Before, After)
)


wilcox_effsize(df_long, score ~ time, paired = TRUE)
## # A tibble: 1 × 7
##   .y.   group1 group2 effsize    n1    n2 magnitude
## * <chr> <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 score After  Before   0.261    60    60 small
# ........................................................

# 4) WRITE THE REPORT
#    Answer the questions below as a comment within the R script:
#
#    Q1) What is the size of the effect?
#        ± 0.00 to 0.09  = small
#        ± 0.10 to 0.29  = moderate
#        ± 0.30 to 0.49  = large
#        ± 0.50 to 1.00  = very large
#        Examples: 0.261 - Small
#            A Rank Biserial Correlation of 0.10 indicates the difference between the group averages was not truly meaningful. There was no effect.
#            A Rank Biserial Correlation of 0.22 indicates the difference between the group averages was small.
#  
#     Q2) Which group had the higher average score?
#         - With the way we calculated differences (After minus Before), if it is positive, it means the After scores were higher.
#     - If it is negative, it means the Before scores were higher.
#         - You can also easily look at the means and tell which scores were higher.
# Ans: After> before, the After scores are higher.

# ================================================================================================
# Research Report on Results: Wilcoxon Signed-Rank Test
# ================================================================================================

# Goal: Write a paragraph summarizing your findings

# Directions:

# For your results summary, you should report the following information:
# 1. The name of the inferential test used
# Ans: Wilcoxon Signed-Rank Test

# 2. The names of the two related conditions or time points you analyzed
# Ans: Pre-Campaign Sales and Post-Campaign Sales

# 3. The sample size (n)
# Ans: n = 60

# 4. Whether the test was statistically significant (p < .05) or not (p > .05)
# Ans: Yes, the test was statistically significant (p = 0.0433 < 0.05)

# 5. The median for each condition
# Ans: Pre-Campaign Sales Median = 24,624; Post-Campaign Sales Median = 25,086

# 6. Whether scores significantly increased, decreased, or stayed the same
# Ans: Scores significantly increased after the campaign

# 7. The test statistic (W or V depending on your R output)
# Ans: V = 640

# 8. The p-value (exact if > .001, or report p < .001)
# Ans: p = 0.0433

# 9. If significant, the direction of the difference
# Ans: Post-Campaign Sales were significantly higher than Pre-Campaign Sales

# 10. The effect size (Rank Biserial Correlation) and its interpretation
# Ans: Rank Biserial Correlation = 0.261 → Small effect

# ================================================================================================
# Ans:
#A Wilcoxon Signed-Rank Test was conducted to compare sales before and after the marketing campaign among 60 participants. Median post-campaign sales (Md = 25,086) were significantly higher than pre-campaign sales (Md = 24,624), V = 640, p = .043. The effect size was r = 0.26, indicating a small effect. These results suggest that the campaign led to a statistically significant but modest improvement in sales.