Research Scenario 2: Human vs. AI Service
A customer service firm wants to test whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. After interactions, customers rate their satisfaction (single-item rating scale). Is there a difference in the average satisfaction scores of the two groups?
Hypotheses
H₀: There is no difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.
H₁: There is a difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.
Load Required Library
library(readxl)
Read dataset
A6R2 <- read_excel("C:/Users/saisa/Downloads/A6R2.xlsx")
Descriptive Statistics
# Install and load the package
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Calculate the Descriptive Statistics
A6R2 %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
Histograms
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
Shapiro-wilk Test
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
Normality Test Results: - Human: W = 0.93741, p-value = 0.0001344 → NOT normally distributed
Decision: Since both variables are not normally distributed, we will use Mann-Whitney U Test instead of Pearson Correlation.
Determine Statistical Significance
wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Effect size
# Install and load the package
library(effectsize)
Calculate the effect size
rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
FINAL REPORT
A Mann-Whitney U test was conducted to compare whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. Human had significantly higher median scores (Mdn = 8.00) than Ai chatbots (Mdn = 3.00), U = 497, p < 0.01.The effect size was large (r = -0.90), indicating a meaningful difference between satisfaction scores. Overall, customers are more satisfied by those who served by human agents.