A customer service firm wants to test whether customer satisfaction scores differ between those served by human agents versus those served by an AI chatbot. After interactions, customers rate their satisfaction (single-item rating scale). Is there a difference in the average satisfaction scores of the two groups?
Used to test if there is a difference between the means of two groups.
There is no difference between the scores of Group A and Group B.
There is a difference between the scores of Group A and Group B.
Purpose: Import your Excel dataset into R to conduct analyses.
# install.packages("readxl")
library(readxl)
## Warning: package 'readxl' was built under R version 4.5.2
dataset <- read_excel("C:/Users/Murari_Lakshman/Downloads/A6R2.xlsx")
PURPOSE: Calculate the mean, median, SD, and sample size for each group.
# install.packages("dplyr")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dataset %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
Purpose: Visually check the normality of the scores for each group.
hist(dataset$SatisfactionScore[dataset$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(dataset$SatisfactionScore[dataset$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
Q1) Check the SKEWNESS of the Human Scores histogram. In your
opinion, does the histogram look symmetrical, positively skewed, or
negatively skewed?
A) The histogram for Human Scores looks negatively
skewed
Q2) Check the KURTOSIS of the Human Scores histogram. In your opinion,
does the histogram look too flat, too tall, or does it have a proper
bell curve? A) The histogram does not have a proper bell shaped
curve
Q3) Check the SKEWNESS of the AI Scores histogram. In your opinion, does
the histogram look symmetrical, positively skewed, or negatively
skewed?
A) The histogram for AI scores looks positively
skewed
Q4) Check the KUROTSIS of the AI Scores histogram. In your opinion, does
the histogram look too flat, too tall, or does it have a proper bell
curve?
A) The histogram does not have a proper bell shaped
curve
Purpose: Check the normality for each group’s score
statistically.
The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the
same time. The test is checking “Is this variable the SAME as normal
data (null hypothesis) or DIFFERENT from normal data (alternate
hypothesis)?”
For this test, if p is GREATER than .05 (p > .05), the data is
NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: dataset$SatisfactionScore[dataset$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: dataset$SatisfactionScore[dataset$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
Was the data normally distributed for Group A?
No, the data is not Normally distributed for Human
score
Was the data normally distributed for Group B?
No, the data is not Normally distributed for AI
score
If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below. If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).
Purpose: Check for any outliers impacting the mean for each group’s scores.
# install.packages("ggplot2")
# install.packages("ggpubr")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.5.2
ggboxplot(dataset, x = "ServiceType", y = "SatisfactionScore",
color = "ServiceType",
palette = "jco",
add = "jitter")
Q1) Were there any dots outside of the boxplot? Are these dots close
to the whiskers of the boxplot or are they very far away?
[NOTE: If there are no dots, continue with Independent t-test. If there
are a few dots (two or less), and they are close to the whiskers,
continue with the Independent t-test. If there are a few dots (two or
less), and they are far away from the whiskers, consider switching to
Mann Whitney U test. If there are many dots (more than one or two) and
they are very far away from the whiskers, you should switch to the Mann
Whitney U test.]
A) For Human scores the box plot has many dots far away from
whiskers while for the AI score there is a lesser proportion of dots
outside, hence switching to Mann Whitney U test.
PURPOSE: Test if there was a difference between the distributions of the two groups.
wilcox.test(SatisfactionScore ~ ServiceType, data = dataset, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below.
NOTE: The Mann-Whitney U test is used when your data is abnormally distributed or when assumptions of the t-test are not met. It is not chosen based on whether the t-test was significant.
PURPOSE: Determine how big of a difference there was between the group distributions.
# install.packages("effectsize")
library(effectsize)
## Warning: package 'effectsize' was built under R version 4.5.2
rank_biserial(SatisfactionScore ~ ServiceType, data = dataset, exact = FALSE)
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
Q1) What is the size of the effect?
A) A rank-biserial correlation of -0.90 indicates the difference
between the groups was large.
Q2) Which group had the higher average rank?
A) Satisfaction scores for Human services have higher average
rank
A Mann-Whitney U test was conducted to compare satisfaction scores between human (n = 100) and AI customer services (n = 100). Satisfaction score of the customers interacting with humans have higher median satisfaction score (Mdn = 8) than that of an AI chatbot (Mdn = 3), U = 497, p < .001. The effect size was large (r = -0.90), indicating a meaningful difference between the satisfaction score of human and AI customer services. Overall, satisfaction score for human interaction in customer services is higher than AI chat bot.