H0: There is no difference in customer satisfaction scores between customers served by human agents and those served by the AI chatbot.
H1: There is a difference in customer satisfaction scores between customers served by human agents and those served by the AI chatbot.
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(ggpubr)
library(effectsize)
dataset <- read_excel("/Users/patel777/Desktop/Week6/A6R2.xlsx")
score <- dataset$SatisfactionScore
group <- dataset$ServiceType
dataset %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
hist(dataset$SatisfactionScore[dataset$ServiceType == "Human"],
main = "Histogram of Satisfaction Scores (Human)",
xlab = "Satisfaction Score",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 10)
hist(dataset$SatisfactionScore[dataset$ServiceType == "AI"],
main = "Histogram of Satisfaction Scores (AI)",
xlab = "Satisfaction Score",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 10)
Q1)Check the SKEWNESS of the Human Scores histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
A)The histogram for Human Scores looks negatively skewed
Q2)Check the KURTOSIS of the Human Scores histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
A)The histogram does not have a proper bell shaped curve
Q3)Check the SKEWNESS of the AI Scores histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
A)The histogram for AI scores looks positively skewed
Q4)Check the KUROTSIS of the AI Scores histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
A)The histogram does not have a proper bell shaped curve
Purpose: Check the normality for each group’s score statistically. The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: dataset$SatisfactionScore[dataset$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: dataset$SatisfactionScore[dataset$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
Q1.Was the data normally distributed for Group A?
A)No, the data is not Normally distributed for Human score
Q2.Was the data normally distributed for Group B?
A)No, the data is not Normally distributed for AI score
If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below. If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).
ggboxplot(dataset, x = "ServiceType", y = "SatisfactionScore",
color = "ServiceType",
palette = "jco",
add = "jitter")
Q1)Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away? [NOTE: If there are no dots, continue with Independent t-test. If there are a few dots (two or less), and they are close to the whiskers, continue with the Independent t-test. If there are a few dots (two or less), and they are far away from the whiskers, consider switching to Mann Whitney U test. If there are many dots (more than one or two) and they are very far away from the whiskers, you should switch to the Mann Whitney U test.]
A)For Human scores the box plot has many dots far away from whiskers while for the AI score there is a lesser proportion of dots outside, hence switching to Mann Whitney U test.
t.test(SatisfactionScore ~ ServiceType, data = dataset, var.equal = TRUE)
##
## Two Sample t-test
##
## data: SatisfactionScore by ServiceType
## t = -17.792, df = 198, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group AI and group Human is not equal to 0
## 95 percent confidence interval:
## -4.243396 -3.396604
## sample estimates:
## mean in group AI mean in group Human
## 3.60 7.42
wilcox_result <- wilcox.test(SatisfactionScore ~ ServiceType,
data = dataset,
exact = FALSE)
wilcox_result
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below.
NOTE: The Mann-Whitney U test is used when your data is abnormally distributed or when assumptions of the t-test are not met. It is not chosen based on whether the t-test was significant.
rb_result <- rank_biserial(SatisfactionScore ~ ServiceType,
data = dataset)
rb_result
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
Q1)What is the size of the effect?
A)A rank-biserial correlation of -0.90 indicates the difference between the groups was large.
Q2)Which group had the higher average rank?
A)Satisfaction scores for Human services have higher average rank
To test if the satisfaction scores for human (n = 100) and AI customer services (n = 100) differ, a Mann-Whitney U test was performed. Satisfaction score for the customers who interacted with a human has higher median satisfaction score (Mdn = 8) than that of an AI chatbot (Mdn = 3), U = 497, p < .001. The effect size was large (r = -0.90), showing a substantial difference between the satisfaction score for human and AI customer services. In general, satisfaction score for human interaction in customer services is higher than AI chat bot.