15 - Alternatives to the t-test, the power t-test

Department of Environmental Science, AUT

The t-test: Prerequisites

The t-test II

Content you should have understood before watching this video:

Number 2, ‘Variables’
Number 3, ‘Variation in data’
Number 4, ‘Basic statistical metrics’
Number 5, ‘Standard deviation and standard error’
Number 6, ‘Populations, samples, hypotheses’
Number 7, ‘Distributions’
Number 8, ‘Quantiles and probabilities’
Number 12, ‘Error types’
Number 15, ‘The t-test’

What if the assumption of normally distributed data is violated in a t-test?

The t-test II

The assumption of normality is often difficult to assess in small sample sizes. In a t-test with n = 3 for example, this is virtually impossible

Alternatives:

One possibility is to transform the data
A Non-parametric alternative (in the case of the t-test) is the Wilcoxon test

Non-parametric means you are not assuming anything about the parameters of population (‘distribution-free’ tests)

Data transformation

The log transformation

The t-test II

Consider the size of fish (x). A log transformation makes the distribution look much more normal:

hist(x)
hist(log(x))

Data transformation

The square root transformation

The t-test II

Consider a variable x that shows a ratio (e.g. mark in %). A square root transformation makes the distribution look much more normal:

hist(x)
hist(sqrt(x))

The Wilcoxon test

A two-sample test of non-normally distributed samples

The t-test II

Robust against an increased probability for type I and type II errors due to the violation of the assumption of normally distributed samples
Based on ranks, the absolute numbers are irrelevant
Therefore we do not require the data to be normally distributed
Testing the two samples a = c(2, 3, 5) against b = c(3, 6, 8) will yield the very same results than testing x = c(0, 3, 4) against y = c(3, 23, 700). This is why:

a = c(2, 3, 5); b = c(3, 6, 8)
x = c(0, 3, 4); y = c(3, 23, 700)
rank(c(a, b))
[1] 1.0 2.5 4.0 2.5 5.0 6.0
rank(c(x, y))
[1] 1.0 2.5 4.0 2.5 5.0 6.0

If you understand the above code, you will understand how the Wilcoxon test works!

The Wilcoxon test (example)

The t-test II

Are x and y from the same population?

Check for normality of the samples visually

x <- c(1.83,  0.50,  3.64,  2.48, 2.68, 3.88, 1.55, 3.06, 2.30)
y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
par(mfrow = c(1, 2))
qqnorm(x)
qqline(x)
qqnorm(y)
qqline(y)

The Wilcoxon test (example)

The t-test II

Check for normality of the samples using the Shapiro-Wilk test

shapiro.test(x)

    Shapiro-Wilk normality test

data:  x
W = 0.97555, p-value = 0.9376
shapiro.test(y)

    Shapiro-Wilk normality test

data:  y
W = 0.81992, p-value = 0.03439

What would you conclude?

If you are unsure, using both a parametric and a non-parametric test is a good idea:

The Wilcoxon test (example)

The t-test II

t.test(x, y)

    Welch Two Sample t-test

data:  x and y
t = 2.4915, df = 14.934, p-value = 0.02498
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.1587134 2.0428422
sample estimates:
mean of x mean of y 
 2.435556  1.334778

wilcox.test(x, y)

    Wilcoxon rank sum test with continuity correction

data:  x and y
W = 64, p-value = 0.04205
alternative hypothesis: true location shift is not equal to 0

The Wilcoxon test (example)

The t-test II

What do we conclude from the tests?
Both tests are significant, so we can be reasonably confident that there is a difference between the two groups
If the Wilcoxon test is more flexible, why not always use that test? Because it sacrifices some information in the data (that’s the price we have to pay) but it is more robust
Note that you can also specify one- or two sided testing, as well as independent/paired tests (see the video on the t-test)

Statistical power in a two-sample test

The t-test II

Statistical power is the probability to detect a significant difference between two samples, given there is a true difference between them

This probability should normally be >80%
If it is less, we have little chance (power) to detect a difference, should there be one
By conducting a pilot study, we can estimate the variation and the expected difference between samples (in the case of a t-test)
This allows us to conduct a power test and plan the required sample size

Remember the example with the two boxes?

What was your power in this example?

The t-test II

Statistical power in a t-test

The t-test II

There is a trade-off between sample size (or standard deviations), expected difference, type-I error probability (\(\alpha\)) and type-II error probability (\(\beta\)) and the power (\(1-\beta\))!

Power t-test: example

The t-test II

You are trying to find out whether it is possible to detect a 20% change in transpiration if you apply a certain treatment to forest trees, and your funding restricts you to a sample size of 8 trees:

In a pilot study you find a mean of 4 \(Ld^{-1}\) (liters per day) with a standard deviation of 2 \(Ld^{-1}\). The expected difference is 1 \(Ld^{-1}\)

We simulate a t-test based on these assumptions, what do we conclude?

set.seed(0)
t.test(rnorm(8, mean = 4, sd = 2), rnorm(8, mean = 5, sd = 2))

    Welch Two Sample t-test

data:  rnorm(8, mean = 4, sd = 2) and rnorm(8, mean = 5, sd = 2)
t = -0.6851, df = 13.997, p-value = 0.5045
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.124121  1.611485
sample estimates:
mean of x mean of y 
 4.297588  5.053906

Power t-test: example continuted

The t-test II

Given the above, a t-test is unlikely to detect a potential difference, so how big would our sample size need to be?

power.t.test(delta = 1, sd = 2, power = 0.8)

     Two-sample t test power calculation 

              n = 63.76576
          delta = 1
             sd = 2
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

This tells you that you need a sample size of c. 65 to achieve a power of 80% (i.e. 80% change to detect the difference if there is one)

Conclusion: Don’t even start the experiment if your sample size is restricted to 8, you only have about a 15% chance to detect a potential difference (see next slide)!

Power t-test: example continuted

The t-test II

power.t.test(n = 8, delta = 1, sd = 2, sig.level = 0.05)

     Two-sample t test power calculation 

              n = 8
          delta = 1
             sd = 2
      sig.level = 0.05
          power = 0.1521558
    alternative = two.sided

NOTE: n is number in *each* group

Again, sample size, expected difference, standard deviations, \(\alpha\) (type I error probability), and \(\beta\) are traded off, you can’t have the cake and eat it!

In a nutshell

The t-test II

Consider the Wilcoxon rank sum test for non-normal data or small sample sizes
The Wilcoxon is not sensitive to the distribution of the data, only their ranking
You can also specify one-sided and paired tests, just like for the standard t-test
Power tests are important to identify the necessary sample size, or the usefulness of an experiment in the first place (if sample size is restricted)