Amoud University

Abstract

This primer provides an overview of 25 different hypothesis testing methods, including parametric and non-parametric tests, using reproducible R software. The tests are categorized based on research questions, including analysis of effects, analysis of association, analysis of difference, and analysis of dependency.

For each test, we provide a definition, hypothesis, use, and real-life applications in research. We also present R code examples to illustrate the hypothesis testing process, including data preparation, test selection, test execution, and result interpretation.

In the analysis of effects category, we cover tests such as the t-test, ANOVA, and MANOVA, which are used to determine the significance of a treatment or intervention. In the analysis of association category, we cover tests such as the Pearson correlation and chi-squared tests, which are used to determine the relationship between two variables.

In the analysis of difference category, we cover tests such as the Wilcoxon signed-rank, Kruskal-Wallis, and Friedman tests, which are used to determine the difference between two or more groups. In the analysis of dependency category, we cover tests such as the McNemar and Cochran’s Q tests, which are used to determine the dependency between two categorical variables.

The primer emphasizes the importance of reproducibility in hypothesis testing and demonstrates how to achieve this using R. We also discuss the assumptions and limitations of each test and provide guidance on how to choose the appropriate test based on the research question and data type.

Overall, this primer provides a practical guide to hypothesis testing using R, suitable for researchers and data analysts at all levels. The primer covers a wide range of tests and provides R code examples that can be easily adapted to suit individual research needs.

Module 7: Statistical Hypothesis Testing

Statistical tools and softwareâs for hhypothesis testing:

PARAMETRIC TESTS

Dependent t-test

Definition: a test that compares the means of two related groups (e.g., pre-treatment and post-treatment) to determine whether there is a significant difference.
Assumptions: normality of the differences, homogeneity of variance.
Application: comparing the effectiveness of a new drug treatment by measuring the pre-treatment and post-treatment blood pressure of patients.
Real-life example: comparing the average commute times of employees before and after a change in the company’s transportation policy.

Independent t-test

Definition: a test that compares the means of two independent groups to determine whether there is a significant difference.
Assumptions: normality of the data, homogeneity of variance.
Application: comparing the effectiveness of two different brands of a pain reliever by measuring the pain levels of patients who receive each brand.
Real-life example: comparing the average sales figures of two different stores that sell the same product.

Paired z-test

Definition: a test that compares the means of two related groups (e.g., pre-treatment and post-treatment) to determine whether there is a significant difference, using the normal distribution.
Assumptions: normality of the differences, known population standard deviation.
Application: comparing the effectiveness of a new diet plan by measuring the pre-diet and post-diet weights of participants.
Real-life example: comparing the average scores of students before and after a tutoring program.

Unpaired z-test

Definition: a test that compares the means of two independent groups to determine whether there is a significant difference, using the normal distribution.
Assumptions: normality of the data, known population standard deviation.
Application: comparing the effectiveness of two different teaching methods by measuring the test scores of students who receive each method.
Real-life example: comparing the average salaries of male and female employees in a company.

One-way ANOVA

Definition: a test that compares the means of three or more independent groups to determine whether there is a significant difference.
Assumptions: normality of the data, homogeneity of variance.
Application: comparing the effectiveness of three different types of fertilizers by measuring the yield of crops grown with each fertilizer.
Real-life example: comparing the average customer satisfaction scores for three different airlines.

Two-way ANOVA

Definition: a test that compares the means of two or more independent groups, considering the effects of two or more categorical variables.
Assumptions: normality of the data, homogeneity of variance.
Application: comparing the effectiveness of two different advertising campaigns for three different products.
Real-life example: comparing the average salaries of employees in different departments of a company, considering the effects of both job title and years of experience.

MANOVA

Definition: a test that compares the means of two or more groups on two or more dependent variables.
Assumptions: normality of the data, homogeneity of variance-covariance matrices.
Application: comparing the effectiveness of three different treatments for a particular medical condition, measuring both pain levels and quality of life.
Real-life example: comparing the average scores of students on multiple tests across different subjects.

ANCOVA

Definition: a test that compares the means of two or more groups on a dependent variable, while controlling for the effects of one or more continuous variables. Assumptions: normality of the data, homogeneity of regression slopes, homogeneity of variance. * Application: comparing the effectiveness of two different teaching methods, while controlling for the effect of student age. * Real-life example: comparing the average salaries of employees in different departments of a company, while controlling for the effect of years of experience.

MANCOVA

Definition: a test that compares the means of two or more groups on two or more dependent variables, while controlling for the effects of one or more continuous variables.
Assumptions: normality of the data, homogeneity of regression slopes, homogeneity of variance-covariance matrices.
Application: comparing the effectiveness of two different training programs on multiple measures of job performance, while controlling for the effect of years of experience.
Real-life example: comparing the average test scores of students across multiple subjects, while controlling for the effect of socioeconomic status. Overall, these parametric tests provide useful tools for data analysis in a wide range of applications. By understanding their definitions, assumptions, and real-life examples, researchers can choose the appropriate test for their data and draw valid conclusions from their analyses.

Real-life Applications using R software

Examples of how to perform each of the tests listed above using R software:

# 1.    Dependent t-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
t.test(group1, group2, paired=TRUE)

## 
##  Paired t-test
## 
## data:  group1 and group2
## t = 6.3246, df = 4, p-value = 0.003198
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.122011 2.877989
## sample estimates:
## mean of the differences 
##                       2

# 2.    Independent t-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
t.test(group1, group2)

## 
##  Welch Two Sample t-test
## 
## data:  group1 and group2
## t = 0.62684, df = 7.938, p-value = 0.5484
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.367585  9.367585
## sample estimates:
## mean of x mean of y 
##      18.8      16.8

# 3.    Paired z-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
library(BSDA)

## Warning: package 'BSDA' was built under R version 4.1.3

## Loading required package: lattice

## 
## Attaching package: 'BSDA'

## The following object is masked from 'package:datasets':
## 
##     Orange

z.test(group1, group2, alternative="two.sided", sigma.x=sd(group1), sigma.y=sd(group2))

## 
##  Two-sample z-Test
## 
## data:  group1 and group2
## z = 0.62684, p-value = 0.5308
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.253483  8.253483
## sample estimates:
## mean of x mean of y 
##      18.8      16.8

# 4.    Unpaired z-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
z.test(group1, group2, alternative="two.sided", mu=0, sigma.x=sd(group1), sigma.y=sd(group2))

## 
##  Two-sample z-Test
## 
## data:  group1 and group2
## z = 0.62684, p-value = 0.5308
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.253483  8.253483
## sample estimates:
## mean of x mean of y 
##      18.8      16.8

# 5.    One-way ANOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
anova <- aov(c(group1, group2, group3) ~ rep(c("Group 1", "Group 2", "Group 3"), each=5))
summary(anova)

##                                                   Df Sum Sq Mean Sq F value
## rep(c("Group 1", "Group 2", "Group 3"), each = 5)  2  32.53   16.27   0.621
## Residuals                                         12 314.40   26.20        
##                                                   Pr(>F)
## rep(c("Group 1", "Group 2", "Group 3"), each = 5)  0.554
## Residuals

# 6.    Two-way ANOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
factor1 <- rep(c("A", "B", "C"), each=5)
factor2 <- rep(c("X", "Y","Z"), each=5)
data <- data.frame(group=c(group1, group2, group3), factor1=factor1, factor2=factor2)
anova <- aov(group ~ factor1 * factor2, data=data)
summary(anova)

##             Df Sum Sq Mean Sq F value Pr(>F)
## factor1      2  32.53   16.27   0.621  0.554
## Residuals   12 314.40   26.20

# 7.    MANOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
x <- c(4, 5, 6, 4, 5)
y <- c(3, 4, 5, 4, 5)
data <- data.frame(group=c(group1, group2, group3), x=x, y=y)
manova <- manova(cbind(x, y) ~ group, data=data)
summary(manova)

##           Df  Pillai approx F num Df den Df    Pr(>F)    
## group      1 0.89175   49.427      2     12 1.609e-06 ***
## Residuals 13                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# 8.    ANCOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
age <- c(25, 30, 35, 40, 45, 20, 25, 30, 35, 40)
data <- data.frame(group=c(group1, group2), age=age)
model <- lm(group ~ age, data=data)
summary(model)

## 
## Call:
## lm(formula = group ~ age, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2889 -0.3500 -0.2889  0.6889  1.7111 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.8444     1.4141  -1.304    0.228    
## age           0.6044     0.0424  14.257 5.71e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.006 on 8 degrees of freedom
## Multiple R-squared:  0.9621, Adjusted R-squared:  0.9574 
## F-statistic: 203.3 on 1 and 8 DF,  p-value: 5.711e-07

# 9.    MANCOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
x <- c(4, 5, 6, 4, 5)
y <- c(3, 4, 5, 4, 5)
age <- c(25, 30, 35, 40, 45, 20, 25, 30, 35, 40)
data <- data.frame(group=c(group1, group2), x=x, y=y, age=age)
manova <- manova(cbind(x, y) ~ group + age, data=data)
summary(manova)

##           Df  Pillai approx F num Df den Df    Pr(>F)    
## group      1 0.94336   49.970      2      6 0.0001817 ***
## age        1 0.29999    1.286      2      6 0.3430082    
## Residuals  7                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

NON-PARAMETRIC TESTS:

List of some common non-parametric tests, along with their alternative, application, and real-life example:

Wilcoxon rank-sum test (Mann-Whitney U test)

Alternative: t-test
Application: comparing the medians of two independent groups when the assumptions of normality or equal variances are not met.
Real-life example: comparing the prices of two different brands of the same product when the prices are not normally distributed.

Wilcoxon signed-rank test

Alternative: paired t-test
Application: comparing the medians of two dependent groups when the assumptions of normality or equal variances are not met.
Real-life example: comparing the pre- and post-treatment scores of patients in a clinical trial when the scores are not normally distributed.

Kruskal-Wallis test

Alternative: one-way ANOVA
Application: comparing the medians of three or more independent groups when the assumptions of normality or equal variances are not met.
Real-life example: comparing the salaries of employees across three different departments of a company when the salaries are not normally distributed.

Friedman test

Alternative: repeated measures ANOVA
Application: comparing the medians of three or more dependent groups when the assumptions of normality or equal variances are not met.
Real-life example: comparing the satisfaction scores of customers for three different products over time when the scores are not normally distributed.

Spearman rank correlation

Alternative: Pearson correlation
Application: measuring the strength of association between two variables when one or both variables are ordinal.
Real-life example: measuring the correlation between the rankings of different restaurants based on their ratings and the number of customers they have.

Chi-squared test

Alternative: z-test or t-test for proportions
Application: testing the independence of two categorical variables.
Real-life example: determining whether there is a significant association between smoking status and lung cancer diagnosis.

Wilcoxon-Mann-Whitney test

Alternative: t-test or ANOVA
Application: comparing the distribution of two independent samples when the assumptions of normality or equal variances are not met.
Real-life example: comparing the distributions of the size of fish caught in two different fishing spots.

Kolmogorov-Smirnov test

Alternative: t-test or ANOVA
Application: testing whether two or more samples are drawn from the same distribution.
Real-life example: determining whether the heights of male and female students in a school are drawn from the same distribution.

Permutation test

Alternative: t-test or ANOVA
Application: testing the significance of a difference between two or more groups by randomly permuting the labels of the observations.
Real-life example: determining whether there is a significant difference in the mean scores of students in two different schools on a standardized test.

Sign test

Alternative: t-test or ANOVA
Application: testing whether the median of a sample is equal to a specified value.
Real-life example: testing whether the median age of patients in a hospital is equal to 50 years.

Kendall rank correlation

Alternative: Pearson correlation
Application: measuring the strength of association between two variables when one or both variables are ordinal, and assessing the degree of similarity of rankings between two or more judges or raters.
Real-life example: measuring the correlation between the rankings of different cities based on their quality of life scores.

Runs test

Alternative: t-test or ANOVA
Application: testing for randomness or independence in a sequence of observations, by counting the number of runs (i.e., consecutive increasing or decreasing values) in the sequence.
Real-life example: testing whether the sequence of daily stock prices for a particular stock follows a random pattern.

Siegel-Tukey test

Alternative: t-test or ANOVA
Application: testing for differences in dispersion or variance between two or more groups, by comparing the range of the data within each group.
Real-life example: comparing the variability of the salaries of employees in different departments of a company.

Mood’s median test

Alternative: t-test or ANOVA Application: testing for differences in medians between two or more groups, by comparing the medians of the data within each group. * Real-life example: comparing the median ages of participants in different treatment groups in a clinical trial.

Cramer-von Mises test

Alternative: t-test or ANOVA
Application: testing whether a sample comes from a specified distribution, by comparing the empirical cumulative distribution function (CDF) to the theoretical CDF.
Real-life example: testing whether the distribution of heights of students in a school follows a normal distribution.

Overall, non-parametric tests provide useful alternatives to parametric tests and can be used in a wide range of applications. By understanding the alternatives, applications, and real-life examples of non-parametric tests, researchers can choose the appropriate test for their data and draw valid conclusions from their analyses.

Real-life Examples using R software

Examples of how to perform each of the 15 non-parametric tests using R software:

#  1.   Wilcoxon rank-sum test (Mann-Whitney U test)
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
wilcox.test(group1, group2)

## Warning in wilcox.test.default(group1, group2): cannot compute exact p-value
## with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  group1 and group2
## W = 16, p-value = 0.5284
## alternative hypothesis: true location shift is not equal to 0

# 2.    Wilcoxon signed-rank test
before <- c(8, 9, 10, 12, 14)
after <- c(10, 11, 12, 13, 15)
wilcox.test(before, after, paired=TRUE)

## Warning in wilcox.test.default(before, after, paired = TRUE): cannot compute
## exact p-value with ties

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  before and after
## V = 0, p-value = 0.05334
## alternative hypothesis: true location shift is not equal to 0

# 3.    Kruskal-Wallis test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
kruskal.test(list(group1, group2, group3))

## 
##  Kruskal-Wallis rank sum test
## 
## data:  list(group1, group2, group3)
## Kruskal-Wallis chi-squared = 1.302, df = 2, p-value = 0.5215

# 4.    Friedman test
before <- c(8, 9, 10, 12, 14)
after1 <- c(10, 11, 12, 13, 15)
after2 <- c(11, 12, 13, 14, 16)
friedman.test(cbind(before, after1, after2))

## 
##  Friedman rank sum test
## 
## data:  cbind(before, after1, after2)
## Friedman chi-squared = 10, df = 2, p-value = 0.006738

# 5.    Spearman rank correlation
x <- c(10, 20, 30, 40, 50)
y <- c(5, 15, 25, 35, 45)
cor.test(x, y, method="spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  x and y
## S = 4.4409e-15, p-value = 0.01667
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho 
##   1

# 6.    Chi-squared test
table <- matrix(c(10, 20, 30, 15, 25, 35), nrow=2)
chisq.test(table)

## 
##  Pearson's Chi-squared test
## 
## data:  table
## X-squared = 9.8283, df = 2, p-value = 0.007342

# 7.    Wilcoxon-Mann-Whitney test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
wilcox.test(group1, group2)

## Warning in wilcox.test.default(group1, group2): cannot compute exact p-value
## with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  group1 and group2
## W = 16, p-value = 0.5284
## alternative hypothesis: true location shift is not equal to 0

# 8.    Kolmogorov-Smirnov test
x <- rnorm(100, mean=0, sd=1)
y <- rnorm(100, mean=1, sd=2)
ks.test(x, y)

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  x and y
## D = 0.43, p-value = 1.866e-08
## alternative hypothesis: two-sided

# 9.    Permutation test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
obs.diff <- median(group1) - median(group2)
perm.samples <- replicate(10000, {
  permuted <- sample(c(group1, group2), replace=FALSE)
  perm.diff <- median(permuted[1:5]) - median(permuted[6:10])
  return(perm.diff)
})
p.value <- mean(abs(perm.samples) >= abs(obs.diff))
p.value

## [1] 1

# 10.   Sign test
before <- c(8, 9, 10, 12, 14)
after <- c(10, 11, 12, 13, 15)
library(BSDA)
SIGN.test(before, after, mu=0)

## 
##  Dependent-samples Sign-Test
## 
## data:  before and after
## S = 0, p-value = 0.0625
## alternative hypothesis: true median difference is not equal to 0
## 93.75 percent confidence interval:
##  -2 -1
## sample estimates:
## median of x-y 
##            -2

# 11.   Kendall rank correlation
x <- c(10, 20, 30, 40, 50)
y <- c(5, 15, 25, 35, 45)
cor.test(x, y, method="kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  x and y
## T = 10, p-value = 0.01667
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau 
##   1

# 12.   Runs test
x <- c(1, 2, 3, 2, 1, 3, 2, 1, 3, 2)
library(DescTools)

## Warning: package 'DescTools' was built under R version 4.1.3

RunsTest(x)

## 
##  Runs Test for Randomness
## 
## data:  x
## runs = 7, m = 7, n = 3, p-value = 0.25
## alternative hypothesis: true number of runs is not equal the expected number
## sample estimates:
## median(x) 
##         2

# 13.   Siegel-Tukey test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
var.test(group1, group2)

## 
##  F test to compare two variances
## 
## data:  group1 and group2
## F = 1.194, num df = 4, denom df = 4, p-value = 0.8677
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##   0.1243127 11.4674775
## sample estimates:
## ratio of variances 
##           1.193966

# 14.   Mood's median test
set.seed(123)
response <- c(rnorm(10,3,1.5),rnorm(10,5.5,2))
fact <- gl(2,10,labels=LETTERS[1:2])
library(RVAideMemoire)

## *** Package RVAideMemoire v 0.9-83 ***

mood.medtest(response~fact)

## 
##  Mood's median test
## 
## data:  response by fact
## p-value = 0.02301

# 15.   Cramer-von Mises test
x <- rnorm(100, mean=0, sd=1)
y <- rnorm(100, mean=1, sd=2)
library(nortest)
cvm.test(x)

## 
##  Cramer-von Mises normality test
## 
## data:  x
## W = 0.030388, p-value = 0.8401

Thanks for your attention

A Primer on Hypothesis Testing Using Reproducible R Software

Abdisalam Hassan Muse (PhD)

2023-07-06