Content you should have understood before watching this video:
- Number 2, ‘Variables’
- Number 3, ‘Variation in data’
- Number 4, ‘Basic statistical metrics’
- Number 5, ‘Standard deviation and standard error’
- Number 6, ‘Populations, samples, hypotheses’
- Number 7, ‘Distributions’
- Number 8, ‘Quantiles and probabilities’
- Number 12, ‘Error types’
- Number 15, ‘The t-test’
What if the assumption of normally distributed data is violated in a t-test?
- The assumption of normality is often difficult to assess in small sample sizes. In a t-test with n = 3 for example, this is virtually impossible
Alternatives:
- One possibility is to transform the data
- A Non-parametric alternative (in the case of the t-test) is the Wilcoxon test
Non-parametric means you are not assuming anything about the parameters of population (‘distribution-free’ tests)
Data transformation
The log transformation
Consider the size of fish (x). A log transformation makes the distribution look much more normal:
hist(x) hist(log(x))
Data transformation
The square root transformation
Consider a variable x that shows a ratio (e.g. mark in %). A square root transformation makes the distribution look much more normal:
hist(x) hist(sqrt(x))
The Wilcoxon test
A two-sample test of non-normally distributed samples
- Robust against an increased probability for type I and type II errors due to the violation of the assumption of normally distributed samples
- Based on ranks, the absolute numbers are irrelevant
- Therefore we do not require the data to be normally distributed
- Testing the two samples
a = c(2, 3, 5)againstb = c(3, 6, 8)will yield the very same results than testingx = c(0, 3, 4)againsty = c(3, 23, 700). This is why:
a = c(2, 3, 5); b = c(3, 6, 8) x = c(0, 3, 4); y = c(3, 23, 700) rank(c(a, b)) [1] 1.0 2.5 4.0 2.5 5.0 6.0 rank(c(x, y)) [1] 1.0 2.5 4.0 2.5 5.0 6.0
If you understand the above code, you will understand how the Wilcoxon test works!
The Wilcoxon test (example)
Are x and y from the same population?
- Check for normality of the samples visually
x <- c(1.83, 0.50, 3.64, 2.48, 2.68, 3.88, 1.55, 3.06, 2.30) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) par(mfrow = c(1, 2)) qqnorm(x) qqline(x) qqnorm(y) qqline(y)
The Wilcoxon test (example)
- Check for normality of the samples using the Shapiro-Wilk test
shapiro.test(x)
Shapiro-Wilk normality test
data: x
W = 0.97555, p-value = 0.9376
shapiro.test(y)
Shapiro-Wilk normality test
data: y
W = 0.81992, p-value = 0.03439
What would you conclude?
If you are unsure, using both a parametric and a non-parametric test is a good idea:
The Wilcoxon test (example)
t.test(x, y)
Welch Two Sample t-test
data: x and y
t = 2.4915, df = 14.934, p-value = 0.02498
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1587134 2.0428422
sample estimates:
mean of x mean of y
2.435556 1.334778
wilcox.test(x, y)
Wilcoxon rank sum test with continuity correction
data: x and y
W = 64, p-value = 0.04205
alternative hypothesis: true location shift is not equal to 0
The Wilcoxon test (example)
- What do we conclude from the tests?
- Both tests are significant, so we can be reasonably confident that there is a difference between the two groups
- If the Wilcoxon test is more flexible, why not always use that test? Because it sacrifices some information in the data (that’s the price we have to pay) but it is more robust
- Note that you can also specify one- or two sided testing, as well as independent/paired tests (see the video on the t-test)
Statistical power in a two-sample test
Statistical power is the probability to detect a significant difference between two samples, given there is a true difference between them
- This probability should normally be >80%
- If it is less, we have little chance (power) to detect a difference, should there be one
- By conducting a pilot study, we can estimate the variation and the expected difference between samples (in the case of a t-test)
- This allows us to conduct a power test and plan the required sample size
Remember the example with the two boxes?
What was your power in this example?
Statistical power in a t-test
There is a trade-off between sample size (or standard deviations), expected difference, type-I error probability (\(\alpha\)) and type-II error probability (\(\beta\)) and the power (\(1-\beta\))!
Power t-test: example
You are trying to find out whether it is possible to detect a 20% change in transpiration if you apply a certain treatment to forest trees, and your funding restricts you to a sample size of 8 trees:
In a pilot study you find a mean of 4 \(Ld^{-1}\) (liters per day) with a standard deviation of 2 \(Ld^{-1}\). The expected difference is 1 \(Ld^{-1}\)
We simulate a t-test based on these assumptions, what do we conclude?
set.seed(0)
t.test(rnorm(8, mean = 4, sd = 2), rnorm(8, mean = 5, sd = 2))
Welch Two Sample t-test
data: rnorm(8, mean = 4, sd = 2) and rnorm(8, mean = 5, sd = 2)
t = -0.6851, df = 13.997, p-value = 0.5045
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.124121 1.611485
sample estimates:
mean of x mean of y
4.297588 5.053906
Power t-test: example continuted
Given the above, a t-test is unlikely to detect a potential difference, so how big would our sample size need to be?
power.t.test(delta = 1, sd = 2, power = 0.8)
Two-sample t test power calculation
n = 63.76576
delta = 1
sd = 2
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
This tells you that you need a sample size of c. 65 to achieve a power of 80% (i.e. 80% change to detect the difference if there is one)
Conclusion: Don’t even start the experiment if your sample size is restricted to 8, you only have about a 15% chance to detect a potential difference (see next slide)!
Power t-test: example continuted
power.t.test(n = 8, delta = 1, sd = 2, sig.level = 0.05)
Two-sample t test power calculation
n = 8
delta = 1
sd = 2
sig.level = 0.05
power = 0.1521558
alternative = two.sided
NOTE: n is number in *each* group
Again, sample size, expected difference, standard deviations, \(\alpha\) (type I error probability), and \(\beta\) are traded off, you can’t have the cake and eat it!
In a nutshell
Consider the Wilcoxon rank sum test for non-normal data or small sample sizes
The Wilcoxon is not sensitive to the distribution of the data, only their ranking
You can also specify one-sided and paired tests, just like for the standard t-test
Power tests are important to identify the necessary sample size, or the usefulness of an experiment in the first place (if sample size is restricted)