A customer service firm wants to test whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. After interactions, customers rate their satisfaction (single-item rating scale). Is there a difference in the average satisfaction scores of the two groups?
Null Hypothesis(H0) : There is no difference in average customer satisfaction scores between customers served by Human service and those served by AI.
Alternate Hypothesis(H1) : There is a difference in average customer satisfaction scores between customers served by Human service and those served by AI.
# Install .packages("readxl")
# Load required packages
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Import the Excel file
A6R2 <- read_excel("C:/Users/sravz/Downloads/A6R2.xlsx")
# calculate descriptive statistics
A6R2 %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
HISTOGRAMS
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightpink",
border = "black",
breaks = 20)
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "orange",
border = "black",
breaks = 20)
QUESTIONS
Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
It looks like it is negatively skewed.
Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
Histogram appears to be too tall.
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
The histogram appears to be positively skewed.
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
Histogram appears to be too tall.
SHAPIRO-WILK TEST
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
QUESTIONs
Was the data normally distributed for Variable 1?
Data is NOT normally distributed.
Was the data normally distributed for Variable 2?
Data is NOT normally distributed.
library(ggplot2)
library(ggpubr)
BOXPLOT
ggboxplot(A6R2, x = "ServiceType", y = "SatisfactionScore",
color = "ServiceType",
palette = "jco",
add = "jitter")
QUESTIONs
Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
There are many dots and they are very far away from the whiskers, so switched to Mann Whitney U test.
MANN-WHITNEY U TEST
wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
library(effectsize)
EFFECT SIZE (R VALUE)
rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
QUESTIONS
Q1) What is the size of the effect?
A rank-biserial correlation of - 0.90 indicates the difference between the groups was large
Q2) Which group had the higher average rank?
The human agents served group has higher average value.
A Mann-Whitney U test was conducted to test whether customer satisfaction scores differ for customers(n=126) between those served by human agents versus those served by an AI chatbot.Customers who were served by human agents had significantly higher median scores (Mdn = 8) than those served by AI agents (Mdn = 3).The effect size was large (r = - 0.90), indicating a meaningful difference between those served by Human Agents than those of AI agent.Overall, The customer satisfaction was higher for those customers who were served by human agents.