QUESTION
What are the null and alternate hypotheses for YOUR research scenario?
H0:There is no difference in the average customer satisfaction scores between customers served byhuman agents and those served by an AI chatbot.
H1: There is a difference in the average customer satisfaction scores between customers served by human agents and those served by an AI chatbot.
library(readxl)
A6R2 <- read_excel("C:\\Users\\kuppi\\OneDrive\\Desktop\\A6R2.xlsx")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
A6R2 %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
QUESTIONS
Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
A)The histogram of the VARIABLE 2 looks like too flat.
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
QUESTION
Was the data normally distributed for Variable 1?
Was the data normally distributed for Variable 2?
If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL.
If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal.
library(ggplot2)
library(ggpubr)
ggboxplot(A6R2, x = "ServiceType", y = "SatisfactionScore",
color = "ServiceType",
palette = "jco",
add = "jitter")
QUESTION
Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot (check if there are any dots past the lines on the boxes) or are they very far away? If there are no dots, continue with Independent t-test. If there are a few dots (two or less), and they are close to the whiskers, continue with the Independent t-test. If there are a few dots (two or less), and they are far away from the whiskers, consider switching to Mann Whitney U test. If there are many dots (more than one or two) and they are very far away from the whiskers, you should switch to the Mann Whitney U test.
wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
library(effectsize)
rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
QUESTIONS
Q1) What is the size of the effect? The effect means how big or small was the difference between the two groups. ± 0.00 to 0.10 = ignore ± 0.10 to 0.30 = small ± 0.30 to 0.50 = moderate ± 0.50 to + = large Example 1) A rank-biserial correlation of 0.05 indicates the difference between the groups was not meaningful. There was no effect. Example 2) A rank-biserial correlation of 0.32 indicates the difference between the groups was moderate.
Answer: The size of the effect is very large
Q2) Which group had the higher average rank? The Mann-Whitney U test does not compare means directly. Instead, it looks at whether one group tends to have higher scores than the other. To determine which group ranked higher, look at the group means or medians in your dataset.
Answer: The human group has the higher average rank
A Mann-Whitney U test was conducted to compare customer satisfaction scores between individuals served by an AI chatbot (n = 100) and those served by human agents (n = 100). customers served by human agents had significantly higher median scores (Mdn = 7.42) than those served by the AI chatbot (Mdn = 3.6), U = 497, p < 0.01. The effect size was very large (r = -0.90), indicating a substantial difference between the two service types. customers interacting with human agents reported much higher satisfaction compared to those interacting with the AI chatbot.