INDEPENDENT T-TEST & MANN-WHITNEY U TEST

QUESTIONS

What are the null and alternate hypotheses for YOUR research scenario?

H0: There is no difference in customer satisfaction scores between customers served by human agents and those served by AI chatbots.

H1: There is a difference in customer satisfaction scores between customers served by human agents and those served by AI chatbots.

IMPORT EXCEL FILE

Purpose

Import your Excel dataset into R to conduct analyses.

INSTALL REQUIRED PACKAGE

install.packages(“readxl”)

LOAD THE PACKAGE

Always reload the package you want to use.

library(readxl)

IMPORT EXCEL FILE INTO R STUDIO

A6R2 <- read_excel("/Users/alfred/Desktop/A6R2.xlsx")

DESCRIPTIVE STATISTICS

PURPOSE

Calculate the mean, median, SD, and sample size for each group.

INSTALL REQUIRED PACKAGE

install.packages(“dplyr”)

LOAD THE PACKAG

Always reload the package you want to use.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

CALCULATE THE DESCRIPTIVE STATISTICS

A6R2 %>%
  group_by(ServiceType) %>%
  summarise(
    Mean = mean(SatisfactionScore, na.rm = TRUE),
    Median = median(SatisfactionScore, na.rm = TRUE),
    SD = sd(SatisfactionScore, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   ServiceType  Mean Median    SD     N
##   <chr>       <dbl>  <dbl> <dbl> <int>
## 1 AI           3.6       3  1.60   100
## 2 Human        7.42      8  1.44   100

HISTOGRAMS

Purpose

Visually check the normality of the scores for each group.

CREATE THE HISTOGRAMS

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

QUESTIONS

Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

- The histogram looks positively skewed (longer tail to the right).

Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

- The histogram looks a bit too flat, not a perfect bell curve.

Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

- The histogram looks slightly negatively skewed (tail to the left).

Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

- The histogram looks slightly too tall, more peaked than a normal bell curve.

SHAPIRO-WILK TEST

Purpose

Check the normality for each group’s score statistically.

CONDUCT THE SHAPIRO-WILK TEST

shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344

QUESTIONS

Q1) Was the data normally distributed for Variable 1?

- No. The Shapiro-Wilk test gave p < 0.05, which means the AI scores are NOT normally distributed.

Q2) Was the data normally distributed for Variable 2?

- No. The Shapiro-Wilk test gave p < 0.05, which means the Human scores are NOT normally distributed.

BOXPLOT

Purpose

Check for any outliers impacting the mean for each group’s scores.

INSTALL REQUIRED PACKAGE

install.packages(“ggplot2”) install.packages(“ggpubr”)

LOAD THE PACKAGE

Always reload the package you want to use.

library(ggplot2)
library(ggpubr)

CREATE THE BOXPLOT

ggboxplot(A6R2, x = "ServiceType", y = "SatisfactionScore",
          color = "ServiceType",
          palette = "jco",
          add = "jitter")

QUESTIONS

Q1) Were there any dots outside of the boxplots? These dots represent participants with extreme scores.

- Yes, there were a few dots outside the boxplots, which represent potential outliers.

Q2) If there are outliers, in your opinion are the scores of those dots changing the mean so much that the mean no longer accurately represents the average score?

- Yes, the outliers appear to affect the distribution, and combined with the non-normal Shapiro-Wilk results, the mean is not the best representation of central tendency. The Mann-Whitney U test is more appropriate.

MANN-WHITNEY U TEST

PURPOSE

Test if there was a difference between the distributions of the two groups.

wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to effect size section below.

If results were NOT statistically significant (p > .05), skip to reporting section below.

NOTE:

The Mann-Whitney U test is used when your data is abnormally distributed or when assumptions of the t-test are not met.

It is not chosen based on whether the t-test was significant.

EFFECT-SIZE

PURPOSE

Determine how big of a difference there was between the group distributions.

INSTALL REQUIRED PACKAGE

install.packages(“effectsize”)

LOAD THE PACKAGE

Always load the package you want to use.

library(effectsize)

CALCULATE EFFECT SIZE (R VALUE)

rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) |         95% CI
## ----------------------------------
## -0.90             | [-0.93, -0.87]

QUESTIONS

Q1) What is the size of the effect?

The effect means how big or small was the difference between the two groups.

± 0.00 to 0.10 = ignore

± 0.10 to 0.30 = small

± 0.30 to 0.50 = moderate

± 0.50 to + = large

- The rank-biserial correlation was r = -0.90, which is a large effect size. This means there was a strong difference between the two groups’ satisfaction scores.

Q2) Which group had the higher average rank?

- Looking at the group medians and means, the Human group (Median = 8.00, Mean = 7.42) had higher satisfaction scores than the AI group (Median = 3.00, Mean = 3.60). Therefore, the Human group had the higher average rank.