Research Scenario 2: Human vs. AI Service
A customer service firm wants to test whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. After interactions, customers rate their satisfaction (single-item rating scale). Is there a difference in the average satisfaction scores of the two groups?
Hypotheses
H₀: There is no difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.
H₁: There is a difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.
Load Required Library
library(readxl)
Read dataset
A6R2 <- read_excel("C:/Users/saisa/Downloads/A6R2.xlsx")
Descriptive Statistics
# Install and load the package
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Calculate the Descriptive Statistics
A6R2 %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
Histograms
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? Negatively skewed
Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? Bell curve
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? Positively skewed
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? Flat
Shapiro-wilk Test
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
Normality Test Results: - Human: W = 0.93741, p-value = 0.0001344 → NOT normally distributed
Decision: Since both variables are not normally distributed, we will use Mann-Whitney U Test instead of Pearson Correlation.
Determine Statistical Significance
wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Effect size
# Install and load the package
library(effectsize)
Calculate the effect size
rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
What is the size of the effect? A rank-biserial correlation of -0.90 indicates the difference between the groups was large
Which group had the higher average rank? Human agents group
FINAL REPORT
A Mann-Whitney U test was conducted to compare whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. Human had significantly higher median scores (Mdn = 8.00) than Ai chatbots (Mdn = 3.00), U = 497, p < 0.01.The effect size was large (r = -0.90), indicating a meaningful difference between satisfaction scores. Overall, customers are more satisfied by those who served by human agents.