INDEPENDENT T-TEST & MANN-WHITNEY U TEST
NULL HYPOTHESIS (H0)
There is no difference between the scores of Group A and Group
B.
ALTERNATE HYPOTHESIS (H1)
1) NON-DIRECTIONAL ALTERNATE HYPOTHESIS: There is a difference
between the scores of Group A and Group B.
2) DIRECTIONAL ALTERNATE HYPOTHESES ONE: Group A has higher scores
than Group B.
3) DIRECTIONAL ALTERNATE HYPOTHESIS TWO: Group B has higher scores
than Group A.
QUESTION
What are the null and alternate hypotheses for YOUR research
scenario?
H0:
H1:
LOAD THE PACKAGE
# Always reload the package you want to use.
library(readxl)
IMPORT EXCEL FILE INTO R STUDIO
dataset <- read_excel("C:\\Users\\rohit\\Downloads\\A6R2.xlsx")
DESCRIPTIVE STATISTICS
PURPOSE: Calculate the mean, median, SD, and sample size for each
group.
LOAD THE PACKAGE
Always reload the package you want to use.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
CALCULATE THE DESCRIPTIVE STATISTICS
dataset %>%
group_by(ServiceType) %>%
summarise(
Mean = mean(SatisfactionScore, na.rm = TRUE),
Median = median(SatisfactionScore, na.rm = TRUE),
SD = sd(SatisfactionScore, na.rm = TRUE),
N = n()
)
## # A tibble: 2 × 5
## ServiceType Mean Median SD N
## <chr> <dbl> <dbl> <dbl> <int>
## 1 AI 3.6 3 1.60 100
## 2 Human 7.42 8 1.44 100
HISTOGRAMS
Purpose: Visually check the normality of the scores for each
group.
CREATE THE HISTOGRAMS
hist(dataset$SatisfactionScore[dataset$ServiceType == "Human"],
main = "Histogram of Group 1 Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(dataset$SatisfactionScore[dataset$ServiceType == "AI"],
main = "Histogram of AI Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

# QUESTIONS
# Answer the questions below as comments within the R script:
# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
SHAPIRO-WILK TEST
Purpose: Check the normality for each group’s score
statistically.
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "Human"])
##
## Shapiro-Wilk normality test
##
## data: dataset$SatisfactionScore[dataset$ServiceType == "Human"]
## W = 0.93741, p-value = 0.0001344
shapiro.test(dataset$SatisfactionScore[dataset$ServiceType == "AI"])
##
## Shapiro-Wilk normality test
##
## data: dataset$SatisfactionScore[dataset$ServiceType == "AI"]
## W = 0.91143, p-value = 5.083e-06
# QUESTION
# Answer the questions below as a comment within the R script:
# Was the data normally distributed for Variable 1?
# Was the data normally distributed for Variable 2?
# If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below.
# If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).
BOXPLOT
Purpose: Check for any outliers impacting the mean for each group’s
scores.
#install.packages("ggplot2")
#install.packages("ggpubr")
# LOAD THE PACKAGE
# Always reload the package you want to use.
library(ggplot2)
library(ggpubr)
CREATE THE BOXPLOT
ggboxplot(dataset, x = "ServiceType", y = "SatisfactionScore",
color = "ServiceType",
palette = "jco",
add = "jitter")

# QUESTION
# Answer the questions below as a comment within the R script. Answer the questions for EACH boxplot:
# Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?
# If there are no dots, continue with Independent t-test.
# If there are a few dots (two or less), and they are close to the whiskers, continue with the Independent t-test.
# If there are a few dots (two or less), and they are far away from the whiskers, consider switching to Mann Whitney U test.
# If there are many dots (more than one or two) and they are very far away from the whiskers, you should switch to the Mann Whitney U test.
MANN-WHITNEY U TEST
PURPOSE: Test if there was a difference between the distributions of
the two groups.
wilcox.test(SatisfactionScore ~ ServiceType , data = dataset, exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: SatisfactionScore by ServiceType
## W = 497, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
# DETERMINE STATISTICAL SIGNIFICANCE
# If results were statistically significant (p < .05), continue to effect size section below.
# If results were NOT statistically significant (p > .05), skip to reporting section below.
# NOTE: The Mann-Whitney U test is used when your data is abnormally distributed
# or when assumptions of the t-test are not met.
# It is not chosen based on whether the t-test was significant.
# EFFECT-SIZE
# PURPOSE: Determine how big of a difference there was between the group distributions.
# INSTALL REQUIRED PACKAGE
# If never installed, remove the hashtag before the install code.
# If previously installed, leave the hashtag in front of the code.
# LOAD THE PACKAGE
# Always load the package you want to use.
library(effectsize)
# CALCULATE EFFECT SIZE (R VALUE)
# Replace "dataset" with your dataset name (without .xlsx)
# Replace "score" with your dependent variable R code name (example: USD)
# Replace "group" with your independent variable R code name (example: Country)
rank_biserial(SatisfactionScore ~ ServiceType , data = dataset, exact = FALSE)
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.90 | [-0.93, -0.87]
# QUESTIONS
# Answer the questions below as a comment within the R script:
# Q1) What is the size of the effect?
# The effect means how big or small was the difference between the two groups.
# ± 0.00 to 0.10 = ignore
# ± 0.10 to 0.30 = small
# ± 0.30 to 0.50 = moderate
# ± 0.50 to + = large
# Example 1) A rank-biserial correlation of 0.05 indicates the difference between the groups was not meaningful. There was no effect.
# Example 2) A rank-biserial correlation of 0.32 indicates the difference between the groups was moderate.
# Q2) Which group had the higher average rank?
# The Mann-Whitney U test does not compare means directly. Instead, it looks at whether one group tends to have higher scores than the other.
# To determine which group ranked higher, look at the group means or medians in your dataset.
WRITTEN REPORT FOR MANN-WHITNEY U TEST