# INDEPENDENT T-TEST & MANN-WHITNEY U TEST

# H0:There is no difference in the average satisfaction scores between customers served by human agents and those served by the AI chatbot.
# H1: The average satisfaction scores are higher for customers served by human agents than those served by the AI chatbot. 

# ========================================================
#                >> IMPORT EXCEL FILE <<
# ========================================================

#install.packages("readxl")

# LOAD THE PACKAGE

library(readxl)

A6R2 <- read_excel("C:/Users/lesle/Downloads/A6R2.xlsx")

# DESCRIPTIVE STATISTICS
# PURPOSE: Calculate the mean, median, SD, and sample size for each group.

#install.packages("dplyr")

# LOAD THE PACKAGE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# CALCULATE THE DESCRIPTIVE STATISTICS

A6R2 %>%
  group_by(ServiceType) %>%
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE),
    Median = median(SatisfactionScore, na.rm = TRUE),
    SD = sd(SatisfactionScore, na.rm = TRUE),
    N = n()
)
## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100
# HISTOGRAMS
# Purpose: Visually check the normality of the scores for each group.
# CREATE THE HISTOGRAMS 
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)
# Replace "Group1" with the R code name for your first group (example: USA)
# Replace "Group2" with the R code name for your second group (example: India)

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
     main = "Histogram of Human Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "green",
     border = "black",
     breaks = 20)

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
     main = "Histogram of AI Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "yellow",
     border = "black",
     breaks = 20)

# QUESTIONS
# Answer the questions below as comments within the R script:

# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the      histogram look symmetrical, positively skewed, or negatively skewed?
# Ans.Negatively skewed.
# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the       histogram look too flat, too tall, or does it have a proper bell curve?
# Ans. Too tall
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the       histogram look symmetrical, positively skewed, or negatively skewed?
#Answ. Positively skewed
# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the       histogram look too flat, too tall, or does it have a proper bell curve?
#Ans. Too flat.

# SHAPIRO-WILK TEST
# Purpose: Check the normality for each group's score statistically.
# The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.
# The test is checking "Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?"
# For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.
# If p is LESS than .05 (p < .05), the data is NOT normal.

# CONDUCT THE SHAPIRO-WILK TEST
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)
# Replace "Group1" with the R code name for your first group (example: USA)
# Replace "Group2" with the R code name for your second group (example: India)

shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
# QUESTION
# Answer the questions below as a comment within the R script:
# Was the data normally distributed for Variable 1?
#Ans: Not normal
# Was the data normally distributed for Variable 2?
#Ans: Not normal

# If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below.

# BOXPLOT
# Purpose: Check for any outliers impacting the mean for each group's scores.

# INSTALL REQUIRED PACKAGE
# If previously installed, put a hashtag in front of the code.

#install.packages("ggplot2")
#install.packages("ggpubr")

# LOAD THE PACKAGE
# Always reload the package you want to use. 

library(ggplot2)
library(ggpubr)

# CREATE THE BOXPLOT
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)


ggboxplot(A6R2, x = "ServiceType", y = "SatisfactionScore",
          color = "lightblue",
          palette = "jco",
          add = "jitter")

# QUESTION
# Answer the questions below as a comment within the R script. Answer the questions for EACH boxplot:
# Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
# Ans: there are many dots and they are very far away from the whiskers, so we should switch to the Mann Whitney U test.
#Q2) If there are outliers, in your opinion are the scores of those dots changing the mean so much that the mean no longer accurately represents the average score?
# Ans: Yes there are many dots outside the boxplot and according to my opinion, they are changing the mean.


# If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).

# MANN-WHITNEY U TEST
# PURPOSE: Test if there was a difference between the distributions of the two groups.

# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)

wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
# DETERMINE STATISTICAL SIGNIFICANCE
# P < .05, statistically significant

# If results were statistically significant (p < .05), continue to effect size section below.
# If results were NOT statistically significant (p > .05), skip to reporting section below.

# NOTE: The Mann-Whitney U test is used when your data is abnormally distributed 
# or when assumptions of the t-test are not met. 
# It is not chosen based on whether the t-test was significant.

# EFFECT-SIZE
# PURPOSE: Determine how big of a difference there was between the group distributions.

# INSTALL REQUIRED PACKAGE
# If never installed, remove the hashtag before the install code.
# If previously installed, leave the hashtag in front of the code.

#install.packages("effectsize")

# LOAD THE PACKAGE
# Always load the package you want to use.

library(effectsize)

# CALCULATE EFFECT SIZE (R VALUE)
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)

mw_results <- wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)

rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.90             | [-0.93, -0.87]
# QUESTIONS
# Answer the questions below as a comment within the R script:

# Q1) What is the size of the effect?
# The effect means how big or small was the difference between the two groups.
# ± 0.00 to 0.10 = ignore
# ± 0.10 to 0.30 = small
# ± 0.30 to 0.50 = moderate
# ± 0.50 to +   = large
# Example 1) A rank-biserial correlation of 0.05 indicates the difference between the groups was not meaningful. There was no #effect.
# Example 2) A rank-biserial correlation of 0.32 indicates the difference between the groups was moderate.

# Q2) Which group had the higher average rank?
# The Mann-Whitney U test does not compare means directly. Instead, it looks at whether one group tends to have higher scores #than the other.
# To determine which group ranked higher, look at the group means or medians in your dataset. 


# WRITTEN REPORT FOR MANN-WHITNEY U TEST
# Write a paragraph summarizing your findings.

# 1) REVIEW YOUR OUTPUT
#    Collect the information below from your output:
#    1. The name of the inferential test used 
# Ans:(Mann-Whitney U test)
#    2. The names of the IV and DV (their proper names, not their R code names).
# Ans: IV is Service Type, DV is Satisfaction Score.
#    3. The sample size for each group (labeled as "n").
# Ans: Sample size is 100
#    4. Whether the inferential test results were statistically significant (p < .05) or not (p > .05).
# Ans: p< .05 result are statistically significant
#    5. The median for each group's score on the DV (rounded to two places after the decimal).
# Ans: Median for Human is 8 and for AI Charbot is 3.
#    6. U statistic (from output).
# Ans: U Statistic is 497
#    7. EXACT p-value to three decimals. NOTE: If p > .05, just report p > .05 If p < .001, just report p < .001
# Ans: p< 0.001
#    8. Effect size (rank-biserial correlation) ** Only if the results were significant.
# Ans: Rank-biserial correlation = -0.90

# 2) REPORT YOUR DATA AS A PARAGRAPH
#A Mann–Whitney U test was conducted to compare customer satisfaction scores between those served by human agents (n = 100) and those served by the AI chatbot (n = 100). Customers served by human agents had significantly higher median satisfaction scores (Mdn = 8.00) than those served by the AI chatbot (Mdn = 3.00), W = 497, p < .001. The effect size was large (r = –0.90, 95% CI [–0.93, –0.87]), indicating a substantial and meaningful difference in satisfaction, with customers reporting much greater satisfaction when served by human agents compared to the AI chatbot.