MANN-WHITNEY U TEST

This analysis is for RESEARCH SCENARIO 2 from assignment 6. It tests to see if there was difference distribution of customer satisfaction scores between those served by human agents versus those served by an AI chatbot.

Hypotheses

  • H0 (Null Hypothesis): There is no difference in the distribution of Satisfaction Scores between Human and AI service types.
  • H1 (Alternate Hypothesis): There is a difference in the distribution of Satisfaction Scores between Human and AI service types.

Result paragraph

A Mann-Whitney U test was conducted to compare customer satisfaction scores between customers who served by human agents (n = 100) and customers who served by an AI chatbot (n = 100). Customers who served by human agents had significantly higher median satisfaction scores (Mdn = 8.00) than Customers who served by an AI chatbot (Mdn = 3.00), U = 497, p < 0.001. The effect size was large (r = -0.90), indicating a meaningful difference between customer satisfaction scores. Overall, serving by human agents resulted in higher satisfaction.

R code and Analysis

CHECK NORMAL DISTRIBUTION

IMPORT EXCEL FILE Purpose: Import your Excel dataset into R to conduct analyses.

# INSTALL REQUIRED PACKAGE

# install.packages("readxl")

# LOAD THE PACKAGE

library(readxl)

dataset <- read_excel("//apporto.com/dfs/SLU/Users/minhoku_slu/Downloads/A6R2.xlsx")

DESCRIPTIVE STATISTICS PURPOSE: Calculate the mean, median, SD, and sample size for each group.

# INSTALL REQUIRED PACKAGE

# install.packages("dplyr")

# LOAD THE PACKAGE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# CALCULATE THE DESCRIPTIVE STATISTICS

dataset %>%
  group_by(ServiceType) %>%
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE),
    Median = median(SatisfactionScore, na.rm = TRUE),
    SD = sd(SatisfactionScore, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100

HISTOGRAMS Purpose: Visually check the normality of the scores for each group.

# CREATE THE HISTOGRAMS 

hist(dataset$SatisfactionScore[dataset$ServiceType == "Human"],
main = "Histogram of Group 1 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(dataset$SatisfactionScore[dataset$ServiceType == "AI"],
main = "Histogram of Group 2 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

QUESTIONS

  • Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
  • The histogram negatively skewed.
  • Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
  • The histogram look too tall.
  • Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
  • The histogram positively skewed.
  • Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
  • The histogram look too flat.

SHAPIRO-WILK TEST Purpose: Check the normality for each group’s score statistically.

# CONDUCT THE SHAPIRO-WILK TEST

shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "Human"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$SatisfactionScore[dataset$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "AI"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$SatisfactionScore[dataset$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06

QUESTION

  • Was the data normally distributed for Variable 1?
  • The data was not normally distributed for Variable 1.(p<0.05)
  • Was the data normally distributed for Variable 2?
  • The data was not normally distributed for Variable 1.(p<0.05)

BOXPLOT Purpose: Check for any outliers impacting the mean for each group’s scores.

# INSTALL REQUIRED PACKAGE

# install.packages("ggplot2")
# install.packages("ggpubr")

# LOAD THE PACKAGE

library(ggplot2)
library(ggpubr)

# CREATE THE BOXPLOT

ggboxplot(dataset, x = "ServiceType", y = "SatisfactionScore",
          color = "ServiceType",
          palette = "jco",
          add = "jitter")

QUESTION

  • Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
  • There were a lot of dots outside of the boxplot and not close to the whiskers.

MANN-WHITNEY U TEST

PURPOSE: Test if there was a difference between the distributions of the two groups.

wilcox.test(SatisfactionScore ~ ServiceType, data = dataset, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
# DETERMINE STATISTICAL SIGNIFICANCE

EFFECT-SIZE PURPOSE: Determine how big of a difference there was between the group distributions.

# INSTALL REQUIRED PACKAGE

# install.packages("effectsize")

# LOAD THE PACKAGE

library(effectsize)

# CALCULATE EFFECT SIZE (R VALUE)

rank_biserial( SatisfactionScore ~ ServiceType, data = dataset, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.90             | [-0.93, -0.87]

QUESTIONS

  • Q1) What is the size of the effect?

  • A rank-biserial correlation of -0.90 indicates the difference between the groups was large.

  • Q2) Which group had the higher average rank?

  • Human group had higher average rank.