This primer provides an overview of 25 different hypothesis testing methods, including parametric and non-parametric tests, using reproducible R software. The tests are categorized based on research questions, including analysis of effects, analysis of association, analysis of difference, and analysis of dependency.
For each test, we provide a definition, hypothesis, use, and real-life applications in research. We also present R code examples to illustrate the hypothesis testing process, including data preparation, test selection, test execution, and result interpretation.
In the analysis of effects category, we cover tests such as the t-test, ANOVA, and MANOVA, which are used to determine the significance of a treatment or intervention. In the analysis of association category, we cover tests such as the Pearson correlation and chi-squared tests, which are used to determine the relationship between two variables.
In the analysis of difference category, we cover tests such as the Wilcoxon signed-rank, Kruskal-Wallis, and Friedman tests, which are used to determine the difference between two or more groups. In the analysis of dependency category, we cover tests such as the McNemar and Cochran’s Q tests, which are used to determine the dependency between two categorical variables.
The primer emphasizes the importance of reproducibility in hypothesis testing and demonstrates how to achieve this using R. We also discuss the assumptions and limitations of each test and provide guidance on how to choose the appropriate test based on the research question and data type.
Overall, this primer provides a practical guide to hypothesis testing using R, suitable for researchers and data analysts at all levels. The primer covers a wide range of tests and provides R code examples that can be easily adapted to suit individual research needs.
Statistical tools and software’s for hhypothesis testing:
Definition: a test that compares the means of two or more groups on a dependent variable, while controlling for the effects of one or more continuous variables. Assumptions: normality of the data, homogeneity of regression slopes, homogeneity of variance. * Application: comparing the effectiveness of two different teaching methods, while controlling for the effect of student age. * Real-life example: comparing the average salaries of employees in different departments of a company, while controlling for the effect of years of experience.
Examples of how to perform each of the tests listed above using R software:
# 1. Dependent t-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
t.test(group1, group2, paired=TRUE)
##
## Paired t-test
##
## data: group1 and group2
## t = 6.3246, df = 4, p-value = 0.003198
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.122011 2.877989
## sample estimates:
## mean of the differences
## 2
# 2. Independent t-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
t.test(group1, group2)
##
## Welch Two Sample t-test
##
## data: group1 and group2
## t = 0.62684, df = 7.938, p-value = 0.5484
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.367585 9.367585
## sample estimates:
## mean of x mean of y
## 18.8 16.8
# 3. Paired z-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
library(BSDA)
## Warning: package 'BSDA' was built under R version 4.1.3
## Loading required package: lattice
##
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
##
## Orange
z.test(group1, group2, alternative="two.sided", sigma.x=sd(group1), sigma.y=sd(group2))
##
## Two-sample z-Test
##
## data: group1 and group2
## z = 0.62684, p-value = 0.5308
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.253483 8.253483
## sample estimates:
## mean of x mean of y
## 18.8 16.8
# 4. Unpaired z-test:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
z.test(group1, group2, alternative="two.sided", mu=0, sigma.x=sd(group1), sigma.y=sd(group2))
##
## Two-sample z-Test
##
## data: group1 and group2
## z = 0.62684, p-value = 0.5308
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.253483 8.253483
## sample estimates:
## mean of x mean of y
## 18.8 16.8
# 5. One-way ANOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
anova <- aov(c(group1, group2, group3) ~ rep(c("Group 1", "Group 2", "Group 3"), each=5))
summary(anova)
## Df Sum Sq Mean Sq F value
## rep(c("Group 1", "Group 2", "Group 3"), each = 5) 2 32.53 16.27 0.621
## Residuals 12 314.40 26.20
## Pr(>F)
## rep(c("Group 1", "Group 2", "Group 3"), each = 5) 0.554
## Residuals
# 6. Two-way ANOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
factor1 <- rep(c("A", "B", "C"), each=5)
factor2 <- rep(c("X", "Y","Z"), each=5)
data <- data.frame(group=c(group1, group2, group3), factor1=factor1, factor2=factor2)
anova <- aov(group ~ factor1 * factor2, data=data)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor1 2 32.53 16.27 0.621 0.554
## Residuals 12 314.40 26.20
# 7. MANOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
x <- c(4, 5, 6, 4, 5)
y <- c(3, 4, 5, 4, 5)
data <- data.frame(group=c(group1, group2, group3), x=x, y=y)
manova <- manova(cbind(x, y) ~ group, data=data)
summary(manova)
## Df Pillai approx F num Df den Df Pr(>F)
## group 1 0.89175 49.427 2 12 1.609e-06 ***
## Residuals 13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 8. ANCOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
age <- c(25, 30, 35, 40, 45, 20, 25, 30, 35, 40)
data <- data.frame(group=c(group1, group2), age=age)
model <- lm(group ~ age, data=data)
summary(model)
##
## Call:
## lm(formula = group ~ age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2889 -0.3500 -0.2889 0.6889 1.7111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.8444 1.4141 -1.304 0.228
## age 0.6044 0.0424 14.257 5.71e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.006 on 8 degrees of freedom
## Multiple R-squared: 0.9621, Adjusted R-squared: 0.9574
## F-statistic: 203.3 on 1 and 8 DF, p-value: 5.711e-07
# 9. MANCOVA:
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
x <- c(4, 5, 6, 4, 5)
y <- c(3, 4, 5, 4, 5)
age <- c(25, 30, 35, 40, 45, 20, 25, 30, 35, 40)
data <- data.frame(group=c(group1, group2), x=x, y=y, age=age)
manova <- manova(cbind(x, y) ~ group + age, data=data)
summary(manova)
## Df Pillai approx F num Df den Df Pr(>F)
## group 1 0.94336 49.970 2 6 0.0001817 ***
## age 1 0.29999 1.286 2 6 0.3430082
## Residuals 7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
List of some common non-parametric tests, along with their alternative, application, and real-life example:
Alternative: t-test or ANOVA Application: testing for differences in medians between two or more groups, by comparing the medians of the data within each group. * Real-life example: comparing the median ages of participants in different treatment groups in a clinical trial.
Alternative: t-test or ANOVA
Application: testing whether a sample comes from a specified distribution, by comparing the empirical cumulative distribution function (CDF) to the theoretical CDF.
Real-life example: testing whether the distribution of heights of students in a school follows a normal distribution.
Overall, non-parametric tests provide useful alternatives to parametric tests and can be used in a wide range of applications. By understanding the alternatives, applications, and real-life examples of non-parametric tests, researchers can choose the appropriate test for their data and draw valid conclusions from their analyses.
Examples of how to perform each of the 15 non-parametric tests using R software:
# 1. Wilcoxon rank-sum test (Mann-Whitney U test)
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
wilcox.test(group1, group2)
## Warning in wilcox.test.default(group1, group2): cannot compute exact p-value
## with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: group1 and group2
## W = 16, p-value = 0.5284
## alternative hypothesis: true location shift is not equal to 0
# 2. Wilcoxon signed-rank test
before <- c(8, 9, 10, 12, 14)
after <- c(10, 11, 12, 13, 15)
wilcox.test(before, after, paired=TRUE)
## Warning in wilcox.test.default(before, after, paired = TRUE): cannot compute
## exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: before and after
## V = 0, p-value = 0.05334
## alternative hypothesis: true location shift is not equal to 0
# 3. Kruskal-Wallis test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
group3 <- c(8, 12, 16, 19, 21)
kruskal.test(list(group1, group2, group3))
##
## Kruskal-Wallis rank sum test
##
## data: list(group1, group2, group3)
## Kruskal-Wallis chi-squared = 1.302, df = 2, p-value = 0.5215
# 4. Friedman test
before <- c(8, 9, 10, 12, 14)
after1 <- c(10, 11, 12, 13, 15)
after2 <- c(11, 12, 13, 14, 16)
friedman.test(cbind(before, after1, after2))
##
## Friedman rank sum test
##
## data: cbind(before, after1, after2)
## Friedman chi-squared = 10, df = 2, p-value = 0.006738
# 5. Spearman rank correlation
x <- c(10, 20, 30, 40, 50)
y <- c(5, 15, 25, 35, 45)
cor.test(x, y, method="spearman")
##
## Spearman's rank correlation rho
##
## data: x and y
## S = 4.4409e-15, p-value = 0.01667
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 1
# 6. Chi-squared test
table <- matrix(c(10, 20, 30, 15, 25, 35), nrow=2)
chisq.test(table)
##
## Pearson's Chi-squared test
##
## data: table
## X-squared = 9.8283, df = 2, p-value = 0.007342
# 7. Wilcoxon-Mann-Whitney test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
wilcox.test(group1, group2)
## Warning in wilcox.test.default(group1, group2): cannot compute exact p-value
## with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: group1 and group2
## W = 16, p-value = 0.5284
## alternative hypothesis: true location shift is not equal to 0
# 8. Kolmogorov-Smirnov test
x <- rnorm(100, mean=0, sd=1)
y <- rnorm(100, mean=1, sd=2)
ks.test(x, y)
##
## Two-sample Kolmogorov-Smirnov test
##
## data: x and y
## D = 0.43, p-value = 1.866e-08
## alternative hypothesis: two-sided
# 9. Permutation test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
obs.diff <- median(group1) - median(group2)
perm.samples <- replicate(10000, {
permuted <- sample(c(group1, group2), replace=FALSE)
perm.diff <- median(permuted[1:5]) - median(permuted[6:10])
return(perm.diff)
})
p.value <- mean(abs(perm.samples) >= abs(obs.diff))
p.value
## [1] 1
# 10. Sign test
before <- c(8, 9, 10, 12, 14)
after <- c(10, 11, 12, 13, 15)
library(BSDA)
SIGN.test(before, after, mu=0)
##
## Dependent-samples Sign-Test
##
## data: before and after
## S = 0, p-value = 0.0625
## alternative hypothesis: true median difference is not equal to 0
## 93.75 percent confidence interval:
## -2 -1
## sample estimates:
## median of x-y
## -2
# 11. Kendall rank correlation
x <- c(10, 20, 30, 40, 50)
y <- c(5, 15, 25, 35, 45)
cor.test(x, y, method="kendall")
##
## Kendall's rank correlation tau
##
## data: x and y
## T = 10, p-value = 0.01667
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 1
# 12. Runs test
x <- c(1, 2, 3, 2, 1, 3, 2, 1, 3, 2)
library(DescTools)
## Warning: package 'DescTools' was built under R version 4.1.3
RunsTest(x)
##
## Runs Test for Randomness
##
## data: x
## runs = 7, m = 7, n = 3, p-value = 0.25
## alternative hypothesis: true number of runs is not equal the expected number
## sample estimates:
## median(x)
## 2
# 13. Siegel-Tukey test
group1 <- c(12, 15, 20, 22, 25)
group2 <- c(10, 14, 18, 20, 22)
var.test(group1, group2)
##
## F test to compare two variances
##
## data: group1 and group2
## F = 1.194, num df = 4, denom df = 4, p-value = 0.8677
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1243127 11.4674775
## sample estimates:
## ratio of variances
## 1.193966
# 14. Mood's median test
set.seed(123)
response <- c(rnorm(10,3,1.5),rnorm(10,5.5,2))
fact <- gl(2,10,labels=LETTERS[1:2])
library(RVAideMemoire)
## *** Package RVAideMemoire v 0.9-83 ***
mood.medtest(response~fact)
##
## Mood's median test
##
## data: response by fact
## p-value = 0.02301
# 15. Cramer-von Mises test
x <- rnorm(100, mean=0, sd=1)
y <- rnorm(100, mean=1, sd=2)
library(nortest)
cvm.test(x)
##
## Cramer-von Mises normality test
##
## data: x
## W = 0.030388, p-value = 0.8401
Thanks for your attention