INDEPENDENT T-TEST & MANN-WHITNEY U TEST
QUESTIONS
What are the null and alternate hypotheses for YOUR research scenario?
H0: There is no difference in customer satisfaction scores between customers served by human agents and those served by AI chatbots.
H1: There is a difference in customer satisfaction scores between customers served by human agents and those served by AI chatbots.
IMPORT EXCEL FILE
Purpose
Import your Excel dataset into R to conduct analyses.
INSTALL REQUIRED PACKAGE
install.packages(“readxl”)
LOAD THE PACKAGE
Always reload the package you want to use.
library(readxl)
IMPORT EXCEL FILE INTO R STUDIO
A6R2 <- read_excel("/Users/alfred/Desktop/A6R2.xlsx")
DESCRIPTIVE STATISTICS
PURPOSE
Calculate the mean, median, SD, and sample size for each group.
INSTALL REQUIRED PACKAGE
install.packages(“dplyr”)
LOAD THE PACKAG
Always reload the package you want to use.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
CALCULATE THE DESCRIPTIVE STATISTICS
A6R2 %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
HISTOGRAMS
Purpose
Visually check the normality of the scores for each group.
CREATE THE HISTOGRAMS
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"],
main = "Histogram of Human Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
QUESTIONS
Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
- The histogram looks positively skewed (longer tail to the right).
Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
- The histogram looks a bit too flat, not a perfect bell curve.
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
- The histogram looks slightly negatively skewed (tail to the left).
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
- The histogram looks slightly too tall, more peaked than a normal bell curve.
SHAPIRO-WILK TEST
Purpose
Check the normality for each group’s score statistically.
CONDUCT THE SHAPIRO-WILK TEST
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
shapiro.test(A6R2$SatisfactionScore[A6R2$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: A6R2$SatisfactionScore[A6R2$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
QUESTIONS
Q1) Was the data normally distributed for Variable 1?
- No. The Shapiro-Wilk test gave p < 0.05, which means the AI scores are NOT normally distributed.
Q2) Was the data normally distributed for Variable 2?
- No. The Shapiro-Wilk test gave p < 0.05, which means the Human scores are NOT normally distributed.
BOXPLOT
Purpose
Check for any outliers impacting the mean for each group’s scores.
INSTALL REQUIRED PACKAGE
install.packages(“ggplot2”) install.packages(“ggpubr”)
LOAD THE PACKAGE
Always reload the package you want to use.
library(ggplot2)
library(ggpubr)
CREATE THE BOXPLOT
ggboxplot(A6R2, x = "ServiceType", y = "SatisfactionScore",
color = "ServiceType",
palette = "jco",
add = "jitter")
QUESTIONS
Q1) Were there any dots outside of the boxplots? These dots represent participants with extreme scores.
- Yes, there were a few dots outside the boxplots, which represent potential outliers.
Q2) If there are outliers, in your opinion are the scores of those dots changing the mean so much that the mean no longer accurately represents the average score?
- Yes, the outliers appear to affect the distribution, and combined with the non-normal Shapiro-Wilk results, the mean is not the best representation of central tendency. The Mann-Whitney U test is more appropriate.
MANN-WHITNEY U TEST
PURPOSE
Test if there was a difference between the distributions of the two groups.
wilcox.test(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
DETERMINE STATISTICAL SIGNIFICANCE
If results were statistically significant (p < .05), continue to effect size section below.
If results were NOT statistically significant (p > .05), skip to reporting section below.
NOTE:
The Mann-Whitney U test is used when your data is abnormally distributed or when assumptions of the t-test are not met.
It is not chosen based on whether the t-test was significant.
EFFECT-SIZE
PURPOSE
Determine how big of a difference there was between the group distributions.
INSTALL REQUIRED PACKAGE
install.packages(“effectsize”)
LOAD THE PACKAGE
Always load the package you want to use.
library(effectsize)
CALCULATE EFFECT SIZE (R VALUE)
rank_biserial(SatisfactionScore ~ ServiceType, data = A6R2, exact = FALSE)
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
QUESTIONS
Q1) What is the size of the effect?
The effect means how big or small was the difference between the two groups.
± 0.00 to 0.10 = ignore
± 0.10 to 0.30 = small
± 0.30 to 0.50 = moderate
± 0.50 to + = large
- The rank-biserial correlation was r = -0.90, which is a large effect size. This means there was a strong difference between the two groups’ satisfaction scores.
Q2) Which group had the higher average rank?
- Looking at the group medians and means, the Human group (Median = 8.00, Mean = 7.42) had higher satisfaction scores than the AI group (Median = 3.00, Mean = 3.60). Therefore, the Human group had the higher average rank.