INDEPENDENT T-TEST & MANN-WHITNEY U TEST

H0:There is no difference in the average satisfaction scores between customers served by human agents and those served by the AI chatbot.

H1:There is a difference in the average satisfaction scores between customers served by human agents and those served by the AI chatbot.

========================================================

>> IMPORT EXCEL FILE <<

========================================================

#install.packages(“readxl”)

LOAD THE PACKAGE

library(readxl)
A6R2 <- read_excel("C:/Users/lesle/Downloads/A6R2.xlsx")

DESCRIPTIVE STATISTICS

PURPOSE: Calculate the mean, median, SD, and sample size for each group.

#install.packages(“dplyr”)

LOAD THE PACKAGE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

CALCULATE THE DESCRIPTIVE STATISTICS

A6R2 %>%
  group_by(ServiceType) %>%
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE),
    Median = median(SatisfactionScore, na.rm = TRUE),
    SD = sd(SatisfactionScore, na.rm = TRUE),
    N = n()
)
## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100

HISTOGRAMS

Purpose: Visually check the normality of the scores for each group.

CREATE THE HISTOGRAMS

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

Replace “Group1” with the R code name for your first group (example: USA)

Replace “Group2” with the R code name for your second group (example: India)

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
     main = "Histogram of Human Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "green",
     border = "black",
     breaks = 20)

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
     main = "Histogram of AI Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "yellow",
     border = "black",
     breaks = 20)

QUESTIONS

Answer the questions below as comments within the R script:

Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

Ans.Negatively skewed.

Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

Ans. Too tall

Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

#Answ. Positively skewed # Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? #Ans. Too flat.

SHAPIRO-WILK TEST

Purpose: Check the normality for each group’s score statistically.

The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.

The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?”

For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.

If p is LESS than .05 (p < .05), the data is NOT normal.

CONDUCT THE SHAPIRO-WILK TEST

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

Replace “Group1” with the R code name for your first group (example: USA)

Replace “Group2” with the R code name for your second group (example: India)

shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06

QUESTION

Answer the questions below as a comment within the R script:

Was the data normally distributed for Variable 1?

#Ans: Not normal # Was the data normally distributed for Variable 2? #Ans: Not normal

If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below.

If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).

MANN-WHITNEY U TEST

PURPOSE: Test if there was a difference between the distributions of the two groups.

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

DETERMINE STATISTICAL SIGNIFICANCE

P < .05, statistically significant

If results were statistically significant (p < .05), continue to effect size section below.

If results were NOT statistically significant (p > .05), skip to reporting section below.

NOTE: The Mann-Whitney U test is used when your data is abnormally distributed

or when assumptions of the t-test are not met.

It is not chosen based on whether the t-test was significant.

EFFECT-SIZE

PURPOSE: Determine how big of a difference there was between the group distributions.

INSTALL REQUIRED PACKAGE

If never installed, remove the hashtag before the install code.

If previously installed, leave the hashtag in front of the code.

#install.packages(“effectsize”)

LOAD THE PACKAGE

Always load the package you want to use.

library(effectsize)

CALCULATE EFFECT SIZE (R VALUE)

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

mw_results <- wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.90             | [-0.93, -0.87]

QUESTIONS

Answer the questions below as a comment within the R script:

Q1) What is the size of the effect?

The effect means how big or small was the difference between the two groups.

± 0.00 to 0.10 = ignore

± 0.10 to 0.30 = small

± 0.30 to 0.50 = moderate

± 0.50 to + = large

Example 1) A rank-biserial correlation of 0.05 indicates the difference between the groups was not meaningful. There was no #effect.

Example 2) A rank-biserial correlation of 0.32 indicates the difference between the groups was moderate.

Q2) Which group had the higher average rank?

The Mann-Whitney U test does not compare means directly. Instead, it looks at whether one group tends to have higher scores #than the other.

To determine which group ranked higher, look at the group means or medians in your dataset.

WRITTEN REPORT FOR MANN-WHITNEY U TEST

Write a paragraph summarizing your findings.

1) REVIEW YOUR OUTPUT

Collect the information below from your output:

1. The name of the inferential test used

Ans:(Mann-Whitney U test)

2. The names of the IV and DV (their proper names, not their R code names).

Ans: IV is Satisfaction Score, DV is Service Type.

3. The sample size for each group (labeled as “n”).

Ans: Sample size is 100

4. Whether the inferential test results were statistically significant (p < .05) or not (p > .05).

Ans: p< .05 result are statistically significant

5. The median for each group’s score on the DV (rounded to two places after the decimal).

Ans: Median for Human is 8 and for AI Charbot is 3.

6. U statistic (from output).

Ans: U Statistic is 497

7. EXACT p-value to three decimals. NOTE: If p > .05, just report p > .05 If p < .001, just report p < .001

Ans: p< 0.001

8. Effect size (rank-biserial correlation) ** Only if the results were significant.

Ans: Rank-biserial correlation = -0.90

2) REPORT YOUR DATA AS A PARAGRAPH

#A Mann–Whitney U test was conducted to compare customer satisfaction scores between those served by human agents (n = 100) and those served by the AI chatbot (n = 100). Customers served by human agents had significantly higher median satisfaction scores (Mdn = 8.00) than those served by the AI chatbot (Mdn = 3.00), U = 497, p < .001. The effect size was large (r = –0.90, 95% CI [–0.93, –0.87]), indicating a substantial and meaningful difference in satisfaction, with customers reporting much greater satisfaction when served by human agents compared to the AI chatbot.