Parametric tests can be more powerful than non-parametric tests because they inherently rely on assumptions about the parameters of the underlying population distribution that a particular sample comes from. In situations where those assumptions are valid or at least reasonably satisfied, parametric tests should be more powerful than non-parametric tests, however assessing the validity of those assumptions can be difficult; conducting parametric tests in situations where those assumptions are not reasonably satisfied may result in irrational conclusions because analysis have been conducted on false premises.
Non-parametric testing (sometimes referred to as distribution-free testing) relaxes assumptions about the population distribution from which a sample comes. Fewer, less-restricting assumptions make these methods more widely applicable. Non-parametric test results are typically valid in the presence of outliers, influential points, inadequate sample size, nonnormality, etc. In settings with any of those data maladies, non-parametric alternatives typically outperform those situationally shoddy parametric tests. Parametric tests cannot always be used because the underlying assumptions are not always known to be true.
data("anscombe")
kable(anscombe)
| x1 | x2 | x3 | x4 | y1 | y2 | y3 | y4 |
|---|---|---|---|---|---|---|---|
| 10 | 10 | 10 | 8 | 8.04 | 9.14 | 7.46 | 6.58 |
| 8 | 8 | 8 | 8 | 6.95 | 8.14 | 6.77 | 5.76 |
| 13 | 13 | 13 | 8 | 7.58 | 8.74 | 12.74 | 7.71 |
| 9 | 9 | 9 | 8 | 8.81 | 8.77 | 7.11 | 8.84 |
| 11 | 11 | 11 | 8 | 8.33 | 9.26 | 7.81 | 8.47 |
| 14 | 14 | 14 | 8 | 9.96 | 8.10 | 8.84 | 7.04 |
| 6 | 6 | 6 | 8 | 7.24 | 6.13 | 6.08 | 5.25 |
| 4 | 4 | 4 | 19 | 4.26 | 3.10 | 5.39 | 12.50 |
| 12 | 12 | 12 | 8 | 10.84 | 9.13 | 8.15 | 5.56 |
| 7 | 7 | 7 | 8 | 4.82 | 7.26 | 6.42 | 7.91 |
| 5 | 5 | 5 | 8 | 5.68 | 4.74 | 5.73 | 6.89 |
rr <- cor(anscombe); r <- rr[1:4,5:8]
corrplot.mixed(rr, lower = "number",
upper = "square",
lower.col = "black",
title = "Pearson Correlation of Anscombe's Quartet")
ggpairs(anscombe,
columns = c(1,5,2,6,3,7,4,8),
title = "Pearson Correlation of Anscombe's Quartet",
axisLabels = "show")
Assuming continuous variables, independence between observations, normality, homoskedasticity, and absence of outliers, Pearson correlation is a measure of the linear association between two variables (X & Y) ranging from [-1, 1]. It is calculated by dividing the covariance of X, Y by the product of the respective standard deviations. If the relationship between X & Y is something other than linear, pearson correlation is a vague metric and can be misleading. Anscombe’s quartet is comprised of four different artificially constructed datasets, each having nearly identical descriptive statistics, including a pearson correlation of 0.816; this indicates a strong positive, linear relationship between X & Y, but when the data is visualized it is clear that there is a unique relationship between each X/Y pair that is nonlinear despite the Pearson correlation coefficients being identical (to 3 decimal places!) for each pair.
When nonlinear relationships are present in the data, the Pearson product-moment correlation is difficult to interpret and calls for different descriptive measures. Rather than using the true quantitaties from the sample, an alternative measure of correlation ranks & orders each variable separately before calculating the covariance/standard deviations of X/Y. This measure, known as Spearman correlation, quantifies the strength & direction of the monotonic relationship between pairs of X/Y; that is, spearman correlation measures the tendency of X to increase as Y increases or decrease as Y decreases rather than purely measuring the linear relationship between X & Y.
# Approximate y values (eye-balled)
# Accuracy of these quantities is arbitrary for Spearman/Kendall correlation
# Assumes ranks are correct despite uncertainty in true y values
df <- data.frame(x = c(1,2,3,4,5),
y = c(0,25,-20,50,200))
df$rx <- rank(df$x); df$ry <- rank(df$y)
kable(df)
| x | y | rx | ry |
|---|---|---|---|
| 1 | 0 | 1 | 2 |
| 2 | 25 | 2 | 3 |
| 3 | -20 | 3 | 1 |
| 4 | 50 | 4 | 4 |
| 5 | 200 | 5 | 5 |
# Spearman Correlation Coefficient
cor(df$x, df$y, method = "spearman")
## [1] 0.7
# Kendall Correlation Coefficient
cor(df$x, df$y, method = "kendall")
## [1] 0.6
# Spearman Correlation Coefficient
cor.test(df$x, df$y, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: df$x and df$y
## S = 6, p-value = 0.2333
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.7
# Kendall Correlation Coefficient
cor.test(df$x, df$y, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: df$x and df$y
## T = 8, p-value = 0.2333
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.6
Spearman’s \(\rho\) & Kendall’s \(\tau\) would be more appropriate to use here since the plot seems to be showing a non-linear relationship with a small sample size; using Pearson product-moment correlation may be misleading when applying a descriptive statistic for linearity in a seemingly non-linear situation.
df <- data.frame(before = c(102 , 86, 127, 140, 80),
after = c(124 ,110 ,123 ,134, 154))
df$diff <- df$before - df$after
kable(df)
| before | after | diff |
|---|---|---|
| 102 | 124 | -22 |
| 86 | 110 | -24 |
| 127 | 123 | 4 |
| 140 | 134 | 6 |
| 80 | 154 | -74 |
\(H_{0}: \mu_{b} = \mu_{a} ; \mu_{d} = 0\)
\(H_{A}: \mu_{b} \neq \mu_{a}; \mu_{d} \neq 0\)
Assumptions: The only assumption we have is the distribution of “before” under the null hypothesis is the same as the distribution of “after”
n <- 5; xbar <- mean(df$diff)
signs <- expand.grid(rep(list(c(-1,1)), 5))
perms <- as.matrix(signs)%*%as.matrix(df$diff)/n
perms <- sort(perms)
# p-value
1 - sum(perms > xbar)/length(perms)
## [1] 0.125
# p-value, different but the same!
2*(2/(2^5))
## [1] 0.125
P-value: 0.125
Conclusion: With a relatively high p-value being greater than the generally accepted but arbitrary cutoff of \(\alpha = 0.05\), we fail to reject the null hypothesis lacking statistically significant evidence that the means before and after some event are different.
The p-value found in the paired comparison permutation test (0.125) was calculated exactly using all conceivable permutations; this cannot always be done because finding every permutation becomes too computationally intense as the sample size increases.
x <- c(124,-1,-114,142,-77,124,-28,101,-99,18,26,212)
Binomial Exact Test
\(H_{0}: \theta = 5; p = 0.50\)
\(H_{A}: \theta > 50; p > 0.5\)
Assumptions: Under the null we assume observations are mutually independent & the probability of an observation being greater than 50 is the same for all observations.
xu <- length(x[x > 50])
xl <- length(x[x <= 50])
bin_test <- binom.test(x = xu,
n = length(x),
p = .5,
alternative = "greater");
Conclusion: With a p-value of 0.8061523, we fail to reject the null lacking statistically significant evidence that the median is greater than 50.
\(H_{0}:\) The population distribution functions for each of the three iris species are identical.
\(H_{A}:\) At least one of the populations tends to yield larger observations than at least one of the other populations.
t <- kruskal.test(Petal.Width ~ Species, data = iris); t
##
## Kruskal-Wallis rank sum test
##
## data: Petal.Width by Species
## Kruskal-Wallis chi-squared = 131.19, df = 2, p-value < 2.2e-16
P-Value: 3.261795610^{-29}
Conclusion: With a p-value of 3.261795610^{-29}, we reject the null with statistically significant evidence in the direction of the alternative. That is, the results of the Kruskal-Wallis test indicate at least one species of iris species tends to yield wider petals than at least one of the other species.
Having rejected the null hypothesis in part a, it seems necessary to investigate further into the pairwise differences between iris species.
pairwise.wilcox.test(x = iris$Petal.Width,
g = iris$Species,
p.adjust.method = "bonferroni",
paired = TRUE)
##
## Pairwise comparisons using Wilcoxon signed rank test
##
## data: iris$Petal.Width and iris$Species
##
## setosa versicolor
## versicolor 2.2e-09 -
## virginica 2.2e-09 3.3e-09
##
## P value adjustment method: bonferroni
fit <- lm(Petal.Width ~ Species, data = iris)
av <- aov(fit)
tukey <- TukeyHSD(x = av, conf.level = 0.95)
plot(tukey)
tukey
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = fit)
##
## $Species
## diff lwr upr p adj
## versicolor-setosa 1.08 0.9830903 1.1769097 0
## virginica-setosa 1.78 1.6830903 1.8769097 0
## virginica-versicolor 0.70 0.6030903 0.7969097 0
LSD.test(y = av,
trt = "Species",
console = TRUE,
group = TRUE,
main = "Fisher's LSD")
##
## Study: Fisher's LSD
##
## LSD t Test for Petal.Width
##
## Mean Square Error: 0.04188163
##
## Species, means and individual ( 95 %) CI
##
## Petal.Width std r LCL UCL Min Max
## setosa 0.246 0.1053856 50 0.1888041 0.3031959 0.1 0.6
## versicolor 1.326 0.1977527 50 1.2688041 1.3831959 1.0 1.8
## virginica 2.026 0.2746501 50 1.9688041 2.0831959 1.4 2.5
##
## Alpha: 0.05 ; DF Error: 147
## Critical Value of t: 1.976233
##
## least Significant Difference: 0.08088724
##
## Treatments with the same letter are not significantly different.
##
## Petal.Width groups
## virginica 2.026 a
## versicolor 1.326 b
## setosa 0.246 c
Analysis of the pairwise differences in petal width by iris species using bonferroni corrected p-values, Tukey’s HSD, & Fisher’s LSD yielded similar results in each instance. That is each species seems to be characterized by significantly different petal widths.
df <- data.frame(Y = c(21.308064, 6.800487, 2.742283, 34.083525, 29.941135, 11.941155, 51.397883, 36.453405, 20.557651),
trt = rep(c(1,2,3), times = 3),
blk = c(1,1,1,2,2,2,3,3,3))
kable(df)
| Y | trt | blk |
|---|---|---|
| 21.308064 | 1 | 1 |
| 6.800487 | 2 | 1 |
| 2.742283 | 3 | 1 |
| 34.083525 | 1 | 2 |
| 29.941135 | 2 | 2 |
| 11.941155 | 3 | 2 |
| 51.397883 | 1 | 3 |
| 36.453405 | 2 | 3 |
| 20.557651 | 3 | 3 |
\(H_{0}:\) The population distribution functions for each of the three groups are identical.
\(H_{A}:\) At least one of the populations tends to yield larger observations than at least one of the other populations.
boxplot(Y ~ trt, data = df, horizontal = TRUE, border = "red")
t <- kruskal.test(Y ~ trt, data = df); t
##
## Kruskal-Wallis rank sum test
##
## data: Y by trt
## Kruskal-Wallis chi-squared = 3.8222, df = 2, p-value = 0.1479
P-Value: 0.1479159
Conclusion: With a p-value of 0.1479159, we fail to reject the null lacking statistically significant evidence in the direction of the alternative. That is, the results of the Kruskal-Wallis hold that the population distributions for each group are identical; this test, however, is not appropriate for settings in which more than one factor is involved so the results may be invalid.
f <- friedman.test(y = df$Y, groups = df$trt, blocks = df$blk); f
##
## Friedman rank sum test
##
## data: df$Y, df$trt and df$blk
## Friedman chi-squared = 6, df = 2, p-value = 0.04979
Conclusion: With a p-value of 0.0497871 just slightly less than \(\alpha = 0.05\), we narrowly reject the null hypothesis with statistically significant evidence in the direction of the alternate hypothesis; that is, when accounting for treatments and blocking with the friedman test, we are moved to conclude that the population distributions are different for at least one of the treatment groups.
Theoretically, since blocking is done when treatment effects are thought to depend in some way on the experimental unit to which they are applied (aka when nuisance factors appear to be present), it would be acceptable to ignore the blocking effect in the model when you know nuisance factors are not at play or their effect is deemed negligible.