R2

SCENARIO 2

Human vs. AI Service Satisfaction: The CEO of a company conducted a study to compare the effectiveness of customer service agents (Human) versus an AI chatbot (AI) on customer satisfaction (multiple-item rating scale). The CEO wants to determine if there is a difference in the average satisfaction scores between the two groups. Is there a difference in customer satisfaction between human and AI service?

Null Hypothesis (H₀): There is no difference in the average customer satisfaction scores between customers served by human agents and those served by an AI chatbot.

Alternative Hypothesis (H₁): There is a difference in the average customer satisfaction scores between customers served by human agents and those served by an AI chatbot.

# INDEPENDENT T-TEST & MANN-WHITNEY U TEST

# QUESTION
# What are the null and alternate hypotheses for YOUR research scenario?
# H0: There is no difference in the average customer satisfaction scores between customers served by human agents and those served by an AI chatbot.
# H1: There is a difference in the average customer satisfaction scores between customers served by human agents and those served by an AI chatbot. 


# --- SETUP AND DATA IMPORT ---

# INSTALL REQUIRED PACKAGE (Uncomment and run only if needed)
# install.packages("readxl")

# LOAD THE PACKAGE
library(readxl)

# IMPORT EXCEL FILE INTO R STUDIO
# REPLACE THE FILE PATH BELOW with your actual path.
satisfaction_data <- read_excel("C:/Users/sahit/Downloads/A6R2.xlsx")


# --- DESCRIPTIVE STATISTICS ---

# INSTALL REQUIRED PACKAGE (Uncomment and run only if needed)
# install.packages("dplyr")

# LOAD THE PACKAGE
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# CALCULATE THE DESCRIPTIVE STATISTICS
satisfaction_data %>% 
  group_by(ServiceType) %>% 
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE), 
    Median = median(SatisfactionScore, na.rm = TRUE), 
    SD = sd(SatisfactionScore, na.rm = TRUE), 
    N = n()
  )

## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100

# --- ASSUMPTION CHECKS: NORMALITY AND OUTLIERS ---

# HISTOGRAMS
# CREATE THE HISTOGRAMS 
hist(satisfaction_data$SatisfactionScore[satisfaction_data$ServiceType == "Human"],
     main = "Histogram of Human Agent Scores",
     xlab = "SatisfactionScore",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(satisfaction_data$SatisfactionScore[satisfaction_data$ServiceType == "AI"],
     main = "Histogram of AI Chatbot Scores",
     xlab = "SatisfactionScore",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

# SHAPIRO-WILK TEST
# CONDUCT THE SHAPIRO-WILK TEST
# Result was: p < 0.05 for both groups (NOT normal)
shapiro.test(satisfaction_data$SatisfactionScore[satisfaction_data$ServiceType == "Human"])

## 
##  Shapiro-Wilk normality test
## 
## data:  satisfaction_data$SatisfactionScore[satisfaction_data$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344

shapiro.test(satisfaction_data$SatisfactionScore[satisfaction_data$ServiceType == "AI"])

## 
##  Shapiro-Wilk normality test
## 
## data:  satisfaction_data$SatisfactionScore[satisfaction_data$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06

# BOXPLOT
# INSTALL REQUIRED PACKAGE (Uncomment and run only if needed)
# install.packages("ggplot2")
# install.packages("ggpubr")

# LOAD THE PACKAGE
library(ggplot2)
library(ggpubr)

# CREATE THE BOXPLOT
ggboxplot(satisfaction_data, x = "ServiceType", y = "SatisfactionScore",
          color = "ServiceType",
          palette = "jco",
          add = "jitter")

# --- INFERENTIAL TEST: MANN-WHITNEY U TEST ---

# DECISION: The Shapiro-Wilk test for both groups resulted in p < 0.05. Proceed with Mann-Whitney U test.

# MANN-WHITNEY U TEST (Result: W = 497, p < 2.2e-16)
wilcox_result <- wilcox.test(SatisfactionScore ~ ServiceType, data = satisfaction_data, exact = FALSE)
print(wilcox_result)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

# EFFECT-SIZE
# INSTALL REQUIRED PACKAGE (Uncomment and run only if needed)
# install.packages("effectsize")

# LOAD THE PACKAGE
library(effectsize)

# CALCULATE EFFECT SIZE (RANK-BISERIAL r = 0.90)

# --- FINAL ANSWERS AND REPORTING ---

# QUESTIONS
# Answer the questions below as a comment within the R script:

# Q1) What is the size of the effect? 
# The effect size is r = 0.90. This is a **large effect**.

# Q2) Which group had the higher average rank? 
# The **Human Agent** group had the higher average rank (Median = 8.00).

Result Paragraph

A Mann-Whitney U test was conducted to compare customer satisfaction scores between those served by human agents (\(n = 100\)) and those served by an AI chatbot (\(n = 100\)). The human agents group had significantly higher median scores (\(\text{Mdn} = 8.00\)) than the AI chatbot group (\(\text{Mdn} = 3.00\)), \(U = 497\), \(p < 0.001\). The effect size was large (\(r = 0.90\)), indicating a very large difference between the two service types. Overall, customers reported much higher satisfaction when interacting with a human agent compared to an AI chatbot.

R2

Team 4

2025-11-19