Research Scenario 2: Human vs. AI Service

A customer service firm wants to test whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. After interactions, customers rate their satisfaction (single-item rating scale). Is there a difference in the average satisfaction scores of the two groups?

Hypotheses

H₀: There is no difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.

H₁: There is a difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.

Load Required Library

library(readxl)

Read dataset

A6R2 <- read_excel("C:/Users/saisa/Downloads/A6R2.xlsx")

Descriptive Statistics

# Install and load the package

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Calculate the Descriptive Statistics

A6R2 %>%
  group_by(ServiceType) %>%
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE),
    Median = median(SatisfactionScore, na.rm = TRUE),
    SD = sd(SatisfactionScore, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100

Histograms

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

Shapiro-wilk Test

shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06

Normality Test Results: - Human: W = 0.93741, p-value = 0.0001344 → NOT normally distributed

Decision: Since both variables are not normally distributed, we will use Mann-Whitney U Test instead of Pearson Correlation.

Determine Statistical Significance

wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Effect size

# Install and load the package

library(effectsize)

Calculate the effect size

rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.90             | [-0.93, -0.87]

FINAL REPORT

A Mann-Whitney U test was conducted to compare whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. Human had significantly higher median scores (Mdn = 8.00) than Ai chatbots (Mdn = 3.00), U = 497, p < 0.01.The effect size was large (r = -0.90), indicating a meaningful difference between satisfaction scores. Overall, customers are more satisfied by those who served by human agents.