September 19, 2018

Timetable second half (tentative)

Week 8
Date Week Lab Report Topic
12.9. 7 5 - Ploting techniques
19.9. 8 6 5 Wilcoxon test, statistical power, power test
26.9. 9 7 6 Chi-squared test, plotting/tabulating data correctly
3.10. 10 8 7 Correlation analysis, scientific reporting
10.10. 11 9 8 Linear Regression
17.10. 12 10 9 Revision
24.10. 13 - 10 NA, lab 10 due on Thursday that week

What will we do today

Week 8
  • Midsemester test
  • Quick recap on t-test
  • Data transformations
  • t-test extensions (Wilcoxon test)
  • What is a power test?
  • The power t-test
  • Plotting data in R (bar plots)

Histogram of midsemester test results, quick look at the questions

Week 8

Recap on t-test

Week 8
  • What is the t-statistic (basics of the formula)?
  • What is a t-distribution?
  • Independent t-test vs. paired t-test
  • one vs. two-tailed t-test
  • How does the t-value link to the p-value?

Rationale for the t-test

Week 8

We need a metric (a test statistic!) that puts the difference between the samples into perspective with the standard deviations of the two samples !

This is called the t-statistic in a two sample test:

 

\[t = \frac{\text{Mean of sample 1 - mean of sample 2}}{\text{Pooled estimate of the standard deviations}}\]

 

So the larger the numerator or the smaller the denominator, the larger t will turn out. Think what this means in terms of the p-value!

Example question

Week 8

The formula for calculating a t-value makes sure a t-value becomes larger (and is thus more likely to produce a significant p-value) as…

  1. …the observed difference between samples becomes larger
  2. …the estimated standard deviations of the samples become larger
  3. …the expected difference between samples becomes smaller
  4. all of the above

The assumptions of a t-test

A t-test will give trustworthy results if the underlying assumptions are met:

Week 8
  • The two populations that are compared should follow a normal distribution
  • The two samples should be taken independently (e.g. not clustered)
  • The variances of the two samples can be unequal (set 'var.equal' argument to 'FALSE', which is the default)

But how can we test for normality of the data?

  • Using a test (e.g. the Shapiro-Wilk test: shapiro.test())
  • Looking at the so-called 'qqplot' using the function qqnorm(): do the points sit on a straight line?
  • Looking at the histogram of the variable: does it look bell-shaped?

Example: Are the marks of the mid-semester test normally distributed?

Using the Shapiro-Wilk test

Week 8

The null hypothesis of the Shapiro-Wilk test is: 'The data are not significantly different from a normal distribution'

shapiro.test(ms$marks) # your midsemester test marks
    Shapiro-Wilk normality test

data:  ms$marks
W = 0.93872, p-value = 2.396e-07

What do we conclude?

Looking at the qq-plot

Week 8

The quantile-quantile plot (qq-plot) plots the theoretical quantiles (of a perfect normal distribution) against the quantiles of the distribution in question. Here is a simple example:

x <- c(1, 7, 8, 9, 9, 9, 10, 10, 10)
theoretical_quantiles <- qnorm(p = c(.06, .17, .28, .39, .5, .61, .72, .83, .94))
plot(theoretical_quantiles, x)

In other words: the qq-plot is nothing but a calibration curve! If the distribution of \(x\) is approximately normal, the dots sit on a straight line! Use qqnorm() and qqline() to create this plot.

Looking at the qq-plot

Week 8

qq-plot for the midsemester test results using the built-in qqnorm function:

qqnorm(ms$marks, main = NA)
qqline(ms$marks) # adds line where quantiles should match

- What do we conclude? The interpretation of a qq-plot needs experience, dots systematically deviating from the line indicate that your sample is not normally distributed. Question: can percentages be normally distributed?

What if the assumption of normally distributed data is violated?

Week 8
  • The assumption of normality is often difficult to assess in small sample sizes. In a t-test with n = 3 for example, this is virtually impossible
  • Once you are sure that you cannot/should not assume normality in your data, consider alternatives
  • One possibility is to transform the data
  • Another alternative (in the case of the t-test) is the Wilcoxon test
  • For more complicated models (linear regression, ANOVA) there are a multitude of ways to deal with non-normality (not dicussed here)

Data transformation

The log transformation

Week 8

Consider the size of fish (x). A log transformation makes the distribution look much more normal:

hist(x)
hist(log(x))

Data transformation

The square root transformation

Week 8

Consider a variable x that shows a ratio (e.g. mark in %). A square root transformation makes the distribution look much more normal:

hist(x)
hist(sqrt(x))

The Wilcoxon test

A two-sample test of non-normally distributed samples

Week 8
  • Robust against an increased probability for type I and type II errors due to the violation of the assumption of normally distributed samples
  • Based on ranks, the absolute numbers are irrelevant
  • Therefore we do not require the data to be normally distributed
  • E.g., testing the two samples a = c(2, 3, 5) against b = c(3, 6, 8) will yield the very same results than testing x = c(0, 3, 4) against y = c(3, 23, 700). This is why:
a <- c(2, 3, 5); b <- c(3, 6, 8)
x <- c(0, 3, 4); y <- c(3, 23, 700)
rank(c(a, b))
[1] 1.0 2.5 4.0 2.5 5.0 6.0
rank(c(x, y))
[1] 1.0 2.5 4.0 2.5 5.0 6.0

You should be able to understand and interpret the above code!

The Wilcoxon test (example)

Week 8

Did the first Wednesday stream do better than the Friday stream?

  • Check for normality of the samples visually
qqnorm(ms$marks[ms$stream == "wed1.3"], main = "Wednesday 1 - 3", font = 1)
qqline(ms$marks[ms$stream == "wed1.3"])

qqnorm(ms$marks[ms$stream == "fri"], main = "Friday")
qqline(ms$marks[ms$stream == "fri"])

The Wilcoxon test (example)

Week 8
  • Check for normality of the samples using the Shapiro-Wilk test
## Test for normality of the Wednesday marks
shapiro.test(ms$marks[ms$stream == "wed1.3"])

    Shapiro-Wilk normality test

data:  ms$marks[ms$stream == "wed1.3"]
W = 0.87303, p-value = 0.0001104

## Test for normality of the Friday marks
shapiro.test(ms$marks[ms$stream == "fri"])

    Shapiro-Wilk normality test

data:  ms$marks[ms$stream == "fri"]
W = 0.89852, p-value = 0.0007417

What would you conclude?

If you are unsure, using both a parametric and a non-parametric test is a good idea:

The Wilcoxon test (example)

Week 8
t.test(x = ms$marks[ms$stream == "wed1.3"], 
       y = ms$marks[ms$stream == "fri"])

    Welch Two Sample t-test

data:  ms$marks[ms$stream == "wed1.3"] and ms$marks[ms$stream == "fri"]
t = 1.2364, df = 90.695, p-value = 0.2195
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.99162 12.85452
sample estimates:
mean of x mean of y 
 78.35319  73.42174 
wilcox.test(x = ms$marks[ms$stream == "wed1.3"], 
            y = ms$marks[ms$stream == "fri"])

    Wilcoxon rank sum test with continuity correction

data:  ms$marks[ms$stream == "wed1.3"] and ms$marks[ms$stream == "fri"]
W = 1265.5, p-value = 0.1567
alternative hypothesis: true location shift is not equal to 0

The Wilcoxon test (example)

Week 8
  • Note the use of [] and ==. What does it do?
ms$marks[ms$stream == "wed1.3"] # vector subsetting same as...
ms[ms$stream == 'wed1.3', 'marks'] # data frame subsetting
  • What do we conclude from the test?
  • Both tests are non-significant, so we can be reasonably confident that there is no difference between the two groups

Statistical power in a two-sample test

Week 8

Statistical power is the probability to detect a significant difference between two samples, given there is a true difference between them

  • This probability should normally be >80%
  • If it is less, we have little power to detect a difference, should there be one
  • By conducting a pilot study, we can estimate the variation and the expected difference between samples (in the case of a t-test)
  • This allows us to conduct a power test and plan the required sample size

Remember the example with the two boxes?

What was your power in this example?

Week 8

Statistical power in a t-test

Week 8

There is a trade-off between sample size (or standard deviations), expected difference, type I error probability (\(\alpha\)) and type II error probability (\(\beta\)) and the power (\(1-\beta\))!

Power t-test: example

Week 8

You are trying to find out whether it is possible to detect a 20 % change in transpiration if you apply a certain treatment to forest trees, and your funding restricts you to a sample size of 8 trees:

In a pilot study you find a mean of 4 \(L~d^{-1}\) (liters per day) with a standard deviation of 2 \(L~d^{-1}\). The expected difference is 1 \(L~d^{-1}\)

We simulate a t-test based on these assumptions, what do we conclude?

set.seed(0)
t.test(rnorm(n = 8, mean = 4, sd = 2), rnorm(n = 8, mean = 5, sd = 2))

    Welch Two Sample t-test

data:  rnorm(n = 8, mean = 4, sd = 2) and rnorm(n = 8, mean = 5, sd = 2)
t = -0.6851, df = 13.997, p-value = 0.5045
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.124121  1.611485
sample estimates:
mean of x mean of y 
 4.297588  5.053906 

Power t-test: example continuted

Week 8

Given the above, a t-test is unlikely to detect a potential difference, so how big would our sample size need to be?

power.t.test(delta = 1, sd = 2, power = 0.8)

     Two-sample t test power calculation 

              n = 63.76576
          delta = 1
             sd = 2
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

This tells you that you need a sample size of ca. 65 to achieve a power of 80 % (i.e. 80 % change to detect the difference if there is one).

Conclusion: Don't even start the experiment if your sample size is restricted to 8, you only have about a 15% chance to detect a potential difference (see next slide)!

Power t-test: example continuted

Week 8
power.t.test(n = 8, delta = 1, sd = 2, sig.level = 0.05)

     Two-sample t test power calculation 

              n = 8
          delta = 1
             sd = 2
      sig.level = 0.05
          power = 0.1521558
    alternative = two.sided

NOTE: n is number in *each* group

Again, sample size, expected difference, standard deviations, \(\alpha\) (type I error probability), and \(\beta\) are traded off, you can't have the cake and eat it!

Example question

Week 8

What can you do to improve your power in a t-test?

  1. You can try to reduce unsystematic variation in order to obtain lower standard deviations
  2. You can decrease your \(\alpha\)-level
  3. You can reduce the sample size
  4. All of the above is correct

Bar plots

Week 7

Plotting the results of a t-test

Week 8

Create a simple data frame, one categorical, one continuous variable:

set.seed(132)
dat <- data.frame(gender = rep(c('M', 'F'), each = 10), 
                  height = c(rnorm(n = 10, mean = 177, sd = 9), 
                             rnorm(n = 10, mean = 160, sd = 9)))
head(dat, n = 12)
   gender   height
1       M 181.2670
2       M 172.0055
3       M 176.9104
4       M 186.6029
5       M 170.5899
6       M 174.1140
7       M 165.8587
8       M 167.2362
9       M 184.8210
10      M 197.8447
11      F 162.8337
12      F 148.5576

Plotting the results of a t-test

Week 8

Conduct the t-test (as seen earlier):

t.test(height ~ gender, data = dat)

For a quick look at the data, use the exploratory plotting function boxplot:

boxplot(height ~ gender, data = dat, ylab = "Height (cm)", boxwex = 0.5)

Plotting the results of a t-test (customised plot)

Week 8

First calculate the mean and standard errors per group:

means <- tapply(dat$height, dat$gender, mean) # calculating the mean per group
ses <- tapply(dat$height, dat$gender, se) # calculating the sd per group

and plot it:

bp <- barplot(means, ylim = c(0, 200), ylab = "Height (cm)")
arrows(x0 = bp, y0 = means - ses, x1 = bp, y1 = means + ses, angle = 90, code = 3,
       length = 0.04)

…and plot it more nicely!

Week 8
par(mar = c(4, 4, 1.5, 2.5), bg = NA)
bp <- barplot(means, ylim = c(0, 200), col = c("white", "black"), mgp = c(2, 0.3, 0.8), 
              tcl = 0.25, las = 1, axisnames = F)
mtext("Height (cm)", side = 2, line = 2.8)
mtext(c("F", "M"), side = 1, line = 0.3, at = bp)
arrows(x0 = bp, y0 = means - ses, x1 = bp, y1 = means + ses, angle = 90, code = 3, 
       length = 0.04)
arrows(x0 = bp[2, ], y0 = means[2], x1 = bp[2, ], y1 = means[2] - ses[2], 
       angle = 90, code = 2, length = 0.04, col = "white")
text(x = bp, y = means + ses + 15, labels = c("a", "b"), xpd = NA)

What will we have learnt by the end of this week?

Week 8
  • How to do a simple data transformation
  • How to perform a Wilcoxon test (and when it makes sense to do so)
  • How to test for normality visually or with a test
  • What statistical power is
  • How to perform a power-t-test to determine the required sample size
  • How to plot the results from a t-test quickly
  • How to customise a bar plot using some of the many arguments to the function plot()

Glossary

Week 8
  • log transformation
  • square root transformation
  • Wilcoxon rank-sum test
  • qq-plot (qqnorm(), qqline())
  • power, sample size, significance level (\(\alpha\)), type II error probability (1-\(\beta\))
  • boxplot()
  • barplot()
  • arrows()