INDEPENDENT T-TEST & MANN-WHITNEY U TEST

HYPOTHESIS TESTED:

Used to test if there is a difference between the means of two groups.

NULL HYPOTHESIS (H0)

The null hypothesis below is ALWAYS used.

There is no difference between the scores of Group A and Group B.

ALTERNATE HYPOTHESIS (H1)

Choose ONE of the three options below (based on your research scenario):

1) NON-DIRECTIONAL ALTERNATE HYPOTHESIS: There is a difference between the scores of Group A and Group B.

2) DIRECTIONAL ALTERNATE HYPOTHESES ONE: Group A has higher scores than Group B.

3) DIRECTIONAL ALTERNATE HYPOTHESIS TWO: Group B has higher scores than Group A.

QUESTION

What are the null and alternate hypotheses for YOUR research scenario?

H0:

H1:

IMPORT EXCEL FILE

Purpose: Import your Excel dataset into R to conduct analyses.

INSTALL REQUIRED PACKAGE

If never installed, remove the hashtag before the install code.

If previously installed, leave the hashtag in front of the code.

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/tsury/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'readxl'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\tsury\AppData\Local\R\win-library\4.5\00LOCK\readxl\libs\x64\readxl.dll
## to C:\Users\tsury\AppData\Local\R\win-library\4.5\readxl\libs\x64\readxl.dll:
## Permission denied
## Warning: restored 'readxl'
## 
## The downloaded binary packages are in
##  C:\Users\tsury\AppData\Local\Temp\RtmpMHdTEU\downloaded_packages

LOAD THE PACKAGE

Always reload the package you want to use.

library(readxl)

IMPORT EXCEL FILE INTO R STUDIO

Download the Excel file from One Drive and save it to your desktop.

Right-click the Excel file and click “Copy as path” from the menu.

In RStudio, replace the example path below with your actual path.

Replace backslashes  with forward slashes / or double them //:

✘ WRONG “C:.xlsx”

✔ CORRECT “C:/Users/Joseph/Desktop/mydata.xlsx”

✔ CORRECT “C:\Users\Joseph\Desktop\mydata.xlsx”

Replace “dataset” with the name of your excel data (without the .xlsx)

dataset <- read_excel("C:\\Users\\tsury\\Downloads\\A6R1.xlsx")

DESCRIPTIVE STATISTICS

PURPOSE: Calculate the mean, median, SD, and sample size for each group.

INSTALL REQUIRED PACKAGE

If never installed, remove the hashtag before the install code.

If previously installed, leave the hashtag in front of the code.

install.packages("dplyr")
## Installing package into 'C:/Users/tsury/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'dplyr' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'dplyr'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\tsury\AppData\Local\R\win-library\4.5\00LOCK\dplyr\libs\x64\dplyr.dll
## to C:\Users\tsury\AppData\Local\R\win-library\4.5\dplyr\libs\x64\dplyr.dll:
## Permission denied
## Warning: restored 'dplyr'
## 
## The downloaded binary packages are in
##  C:\Users\tsury\AppData\Local\Temp\RtmpMHdTEU\downloaded_packages

LOAD THE PACKAGE

Always reload the package you want to use.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

CALCULATE THE DESCRIPTIVE STATISTICS

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

NOTE: Do NOT edit “group_by”

 dataset%>%
  group_by(Medication) %>%
  summarise(
    Mean = mean(HeadacheDays, na.rm = TRUE),
    Median = median(HeadacheDays, na.rm = TRUE),
    SD = sd(HeadacheDays, na.rm = TRUE),
    N = n()
  )
## # A tibble: 2 × 5
##   Medication  Mean Median    SD     N
##   <chr>      <dbl>  <dbl> <dbl> <int>
## 1 A            8.1    8    2.81    50
## 2 B           12.6   12.5  3.59    50

HISTOGRAMS

Purpose: Visually check the normality of the scores for each group.

CREATE THE HISTOGRAMS

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

Replace “Group1” with the R code name for your first group (example: USA)

Replace “Group2” with the R code name for your second group (example: India)

hist(dataset$HeadacheDays[dataset$Medication == "A"],
     main = "Histogram of Medication Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(dataset$HeadacheDays[dataset$Medication == "B"],
     main = "Histogram of Group 2 Scores",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

QUESTIONS

Answer the questions below as comments within the R script:

Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?

Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?

SHAPIRO-WILK TEST

Purpose: Check the normality for each group’s score statistically.

The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.

The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?”

For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.

If p is LESS than .05 (p < .05), the data is NOT normal.

CONDUCT THE SHAPIRO-WILK TEST

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

Replace “Group1” with the R code name for your first group (example: USA)

Replace “Group2” with the R code name for your second group (example: India)

shapiro.test(dataset$HeadacheDays[dataset$Medication == "A"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "A"]
## W = 0.97852, p-value = 0.4913
shapiro.test(dataset$HeadacheDays[dataset$Medication == "B"])
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$HeadacheDays[dataset$Medication == "B"]
## W = 0.98758, p-value = 0.8741

QUESTION

Answer the questions below as a comment within the R script:

Was the data normally distributed for Variable 1?

Was the data normally distributed for Variable 2?

If p > 0.05 (P-value is GREATER than .05) this means the data is NORMAL. Continue to the box-plot test below.

If p < 0.05 (P-value is LESS than .05) this means the data is NOT normal (switch to Mann-Whitney U).

BOXPLOT

Purpose: Check for any outliers impacting the mean for each group’s scores.

INSTALL REQUIRED PACKAGE

If previously installed, put a hashtag in front of the code.

install.packages("ggplot2")
## Installing package into 'C:/Users/tsury/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\tsury\AppData\Local\Temp\RtmpMHdTEU\downloaded_packages
install.packages("ggpubr")
## Installing package into 'C:/Users/tsury/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\tsury\AppData\Local\Temp\RtmpMHdTEU\downloaded_packages

LOAD THE PACKAGE

Always reload the package you want to use.

library(ggplot2)
library(ggpubr)

CREATE THE BOXPLOT

Replace “dataset” with your dataset name (without .xlsx)

Replace “score” with your dependent variable R code name (example: USD)

Replace “group” with your independent variable R code name (example: Country)

ggboxplot(dataset, x = "Medication", y = "HeadacheDays",
          color = "Medication",
          palette = "jco",
          add = "jitter")

QUESTION

Answer the questions below as a comment within the R script. Answer the questions for EACH boxplot:

Q1) Were there any dots outside of the boxplot? Are these dots close to the whiskers of the boxplot or are they very far away?

If there are no dots, continue with Independent t-test.

If there are a few dots (two or less), and they are close to the whiskers, continue with the Independent t-test.

If there are a few dots (two or less), and they are far away from the whiskers, consider switching to Mann Whitney U test.

If there are many dots (more than one or two) and they are very far away from the whiskers, you should switch to the Mann Whitney U test.