Research Scenario 2: Human vs. AI Service

A customer service firm wants to test whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. After interactions, customers rate their satisfaction (single-item rating scale). Is there a difference in the average satisfaction scores of the two groups?

Hypotheses

H₀: There is no difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.

H₁: There is a difference in the average customer satisfaction scores between those served by human agents and those served by the AI chatbot.

Load Required Library

library(readxl)

Read dataset

A6R2 <- read_excel("C:/Users/saisa/Downloads/A6R2.xlsx")

Descriptive Statistics

# Install and load the package

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Calculate the Descriptive Statistics

A6R2 %>%
  group_by(ServiceType) %>%
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE),
    Median = median(SatisfactionScore, na.rm = TRUE),
    SD = sd(SatisfactionScore, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100

Histograms

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? Negatively skewed

Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? Bell curve

Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? Positively skewed

Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? Flat

Shapiro-wilk Test

shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06

Normality Test Results: - Human: W = 0.93741, p-value = 0.0001344 → NOT normally distributed

Decision: Since both variables are not normally distributed, we will use Mann-Whitney U Test instead of Pearson Correlation.

Determine Statistical Significance

wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Effect size

# Install and load the package

library(effectsize)

Calculate the effect size

rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.90             | [-0.93, -0.87]

What is the size of the effect? A rank-biserial correlation of -0.90 indicates the difference between the groups was large

Which group had the higher average rank? Human agents group

FINAL REPORT

A Mann-Whitney U test was conducted to compare whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. Human had significantly higher median scores (Mdn = 8.00) than Ai chatbots (Mdn = 3.00), U = 497, p < 0.01.The effect size was large (r = -0.90), indicating a meaningful difference between satisfaction scores. Overall, customers are more satisfied by those who served by human agents.