Research Scenario

A customer service firm wants to test whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. After interactions, customers rate their satisfaction (single-item rating scale). Is there a difference in the average satisfaction scores of the two groups?

Hypothesis

NULL HYPOTHESIS (H0): There is no difference between the satisfaction scores of customers served by Human agents and those served by an AI.

ALTERNATE HYPOTHESIS (H1): There is a difference between the satisfaction scores of customers served by Human agents and those served by an AI.

FINAL REPORT

A Mann-Whitney U test was conducted to compare satisfaction scores between users who received Human service (n = 100) and those who received AI service (n = 100).
Users who received Human service had significantly higher median satisfaction scores (Mdn = 8.00) than users who received AI service (Mdn = 3.00), U = 497, p < .001.
The effect size was large (r = -0.90), indicating a substantial difference between the two service types.
Overall, Human service resulted in significantly higher satisfaction compared to AI service.

Load the package

library(readxl)

Import

dataset <- read_excel("C:\\Users\\rohit\\Downloads\\A6R2.xlsx")

Descriptive Statistics

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Caluculate the Descriptive Statistics

dataset %>%
  group_by(ServiceType) %>%
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE),
    Median = median(SatisfactionScore, na.rm = TRUE),
    SD = sd(SatisfactionScore, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100

Histograms

hist(dataset$SatisfactionScore[dataset$ServiceType == "Human"],
main = "Histogram of Group 1 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(dataset$SatisfactionScore[dataset$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# ANSWER: The histogram appears slightly negatively skewed.

# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# ANSWER: The histogram is too tall.

# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# ANSWER: The histogram is positively skewed.

# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# ANSWER: The histogram is too tall.

SHAPIRO-WILK TEST

shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "Human"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$SatisfactionScore[dataset$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "AI"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$SatisfactionScore[dataset$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
# Was the data normally distributed for Variable 1?
# ANSWER: No, the data for Variable 1 (Human Satisfaction Scores) is not normally distributed.
# The Shapiro-Wilk test returned a p-value of 0.0001344, which is less than 0.05, indicating a significant deviation from normality.


# Was the data normally distributed for Variable 2?
# ANSWER: No, the data for Variable 2 (AI Satisfaction Scores) is also not normally distributed.
# The Shapiro-Wilk test returned a p-value of 5.083e-06, which is well below 0.05, confirming non-normality.

Boxplot

 #install.packages("ggplot2")
 #install.packages("ggpubr")


library(ggplot2)
library(ggpubr)

Create the Boxplot

ggboxplot(dataset, x = "ServiceType", y = "SatisfactionScore",
          color = "ServiceType",
          palette = "jco",
          add = "jitter")

# Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?

# ANSWER: There are many dots (more than one or two) and they are very far away from the whiskers, you should switch to the Mann Whitney U test.

MANN-WHITNEY U TEST

wilcox.test(SatisfactionScore ~ ServiceType , data = dataset, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
# EFFECT-SIZE

library(effectsize)

# CALCULATE EFFECT SIZE (R VALUE)
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)

rank_biserial(SatisfactionScore ~ ServiceType , data = dataset, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.90             | [-0.93, -0.87]
# Q1) What is the size of the effect?
# ANSWER: A rank-biserial correlation of -0.90 indicates the difference between the groups was LARGE.

# Q2) Which group had the higher average rank?
# ANSWER: The Human group had the higher average rank.