INDEPENDENT T-TEST & MANN-WHITNEY U TEST

QUESTION

What are the null and alternate hypotheses for YOUR research scenario?

H0:There is no difference in the number of headaches between participants taking Medication A and Medication B.

H1:There is a difference in the number of headaches between participants taking Medication A and Medication B.

IMPORT EXCEL FILE

Purpose

Import your Excel dataset into R to conduct analyses.

INSTALL REQUIRED PACKAGE

install.packages(“readxl”)

LOAD THE PACKAGE

library(readxl)
A6R1 <- read_excel("/Users/alfred/Desktop/A6R1.xlsx")

DESCRIPTIVE STATISTICS

PURPOSE

Calculate the mean, median, SD, and sample size for each group.

install.packages(“dplyr”)

LOAD THE PACKAGE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

CALCULATE THE DESCRIPTIVE STATISTICS

A6R1 %>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50

HISTOGRAMS

Purpose

Visually check the normality of the scores for each group.

hist(A6R1$HeadacheDays[A6R1$Medication == "A"],
main = "Histogram of Group A Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(A6R1$HeadacheDays[A6R1$Medication == "B"],
main = "Histogram of Group B Scores",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

QUESTIONS

Q1)Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

-The histogram for Variable 1 (Medication A) looks fairly symmetrical. The values are centered around the mean with no strong skew.

Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

-The histogram for Variable 1 (Medication A) shows a proper bell curve. It is neither too flat nor too tall.

Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

-The histogram for Variable 2 (Medication B) looks fairly symmetrical. The distribution is balanced with no extreme skew.

Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

-The histogram for Variable 2 (Medication B) shows a proper bell curve. It resembles a normal distribution with moderate kurtosis.

SHAPIRO-WILK TEST

Purpose

Check the normality for each group’s score statistically.

CONDUCT THE SHAPIRO-WILK TEST

shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(A6R1$HeadacheDays[A6R1$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  A6R1$HeadacheDays[A6R1$Medication == "B"]
## W = 0.98758, p-value = 0.8741

QUESTION

Q1)Was the data normally distributed for Variable 1?

-Yes, the data for Variable 1 (Medication A) was normally distributed (Shapiro-Wilk p = 0.4913 > .05).

Q2)Was the data normally distributed for Variable 2?

-Yes, the data for Variable 2 (Medication B) was normally distributed (Shapiro-Wilk p = 0.8741 > .05).

NOTE

If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below.

If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).

BOXPLOT

Purpose

Check for any outliers impacting the mean for each group’s scores.

INSTALL REQUIRED PACKAGE

install.packages(“ggplot2”) install.packages(“ggpubr”)

LOAD THE PACKAGE

library(ggplot2)
library(ggpubr)

CREATE THE BOXPLOT

ggboxplot(A6R1, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

QUESTION

Q1) Were there any dots outside of the boxplots? These dots represent participants with extreme scores.

- Yes, there were a few dots outside the boxplots, which represent participants with extreme scores.

Q2) If there are outliers, in your opinion are the scores of those dots changing the mean so much that the mean no longer accurately represents the average score?

- No, the outliers do not appear extreme enough to distort the group means. The means of both groups still represent the central tendency well.

NOTE

If there were no extreme outliers, this means the data is NORMAL. Continue to the Independent t-test.

If there WERE any extreme outliers, this means the data is NOT abnormal. Switch to the Mann-Whitney U test.

INDEPENDENT T-TEST

PURPOSE

Test if there was a difference between the means of the two groups.

t.test(HeadacheDays ~ Medication, data = A6R1, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  HeadacheDays by Medication
## t = -6.9862, df = 98, p-value = 3.431e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -5.778247 -3.221753
## sample estimates:
## mean in group A mean in group B 
##             8.1            12.6

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to effect size section below.

If results were NOT statistically significant (p > .05), skip to reporting section below.

NOTE

Getting results that are not statistically significant does NOT mean you switch to Mann-Whitney U.

The Mann-Whitney U test is only for abnormally distributed data — not based on outcome significance.

EFFECT-SIZE

PURPOSE

Determine how big of a difference there was between the group means.

install.packages(“effectsize”)

LOAD THE PACKAGE

library(effectsize)

CALCULATE COHEN’S D

cohen_d_result <- cohens_d(HeadacheDays ~ Medication, data = A6R1, pooled_sd = TRUE)
print(cohen_d_result)
## Cohen's d |         95% CI
## --------------------------
## -1.40     | [-1.83, -0.96]
## 
## - Estimated using pooled SD.

QUESTIONS

Q1) What is the size of the effect?

The effect means how big or small was the difference between the group averages.

± 0.00 to 0.19 = ignore

± 0.20 to 0.49 = small

± 0.50 to 0.79 = moderate

± 0.80 to 1.29 = large

± 1.30 to + = very large

- The effect size is very large because Cohen’s d = -1.40, which falls into the ±1.30 and above range.This indicates a very large difference between the two groups’ average headache days.

Q2) Which group had the higher average score?

- Group B had the higher average score (M = 12.6) compared to Group A (M = 8.1).The negative sign of Cohen’s d (-1.40) indicates that Group B’s scores were higher than Group A’s.