September 18, 2014

Timetable

Week Lecture/Lab
1 introduction, introduction to R, handling R
2 the research process, kinds of data, variables, hypotheses, statistical models, variability, the mean
3 importing/organising data, distributions, histograms, population vs. sample, measuring variability, the standard deviation, the standard error, degrees of freedom, confidence intervals
4 NO LECTURE
5 statistical hypothesis testing, type I and type II error, contingency tables, Chi-square test
6 comparing two means (t-test), midsemester test preparation
7 revision, tips on handling R, practice questions MIDSEMESTER TEST
- Midsemester Break

Timetable

Tentative schedule for second half of semester

Week Lecture/Lab
8 t-test and extensions, power test, scientific plotting
9 correlation, regression, scientific plotting
10 comparing several means (ANOVA)
11 ANOVA extensions, ANOVA power test
12 Revision
13-15 Exam weeks

What will we do today

  • Recap on t-test
  • t-test extensions (Wilcoxon test)
  • What is a power test?
  • The power t-test
  • Plotting data in R (bar plots)
  • R demonstration continued

Recap on t-test

  • What is the t-statistic (basics of the formula)?
  • What is a t-distribution?
  • Independent t-test vs. paired t-test
  • one vs. two-tailed t-test
  • How does the t-value link to the p-value?

Rationale for the t-test

We need a metric (a test statistic!) that puts the difference between the samples into perspective with the standard deviations of the two samples !

This is called the t-statistic in a two sample test:

\[t = \frac{\text{observed difference - expected difference}}{\text{estimate of the standard error}}\]

So the larger the numerator or the smaller the denominator, the larger t will turn out. Think what this means in terms of the p-value!

Example question

The formula for calculating a t-value makes sure a t-value becomes larger (and is thus more likely to produce a significant p-value) as…

  1. the observed difference between samples becomes larger
  2. the estimated standard deviations of the samples become smaller
  3. the expected difference between samples becomes smaller
  4. all of the above

The assumptions of a t-test

A t-test will give trustworthy results, if the underlying assumptions are met:

  • The two populations that are compared should follow a normal distribution
  • The two samples should be taken independently (e.g. not clustered)
  • The variances of the two samples can be unequal (set 'var.equal' argument to 'FALSE', which is the default)

But how can we test for normality of the data?

  • Using a test (e.g. the Shapiro-Wilk test: shapiro.test())
  • Looking at the so-called 'qqplot' using the function qqnorm()

Example: Are the marks of the mid-semester test normally distributed?

Using the Shapiro-Wilk test

The null hypothesis of the Shapiro-Wilk test is: 'The data follow a normal distribution'

shapiro.test(test$Mark)
    Shapiro-Wilk normality test

data:  test$Mark
W = 0.9624, p-value = 0.02131

What do we conclude?

Looking at the qq-plot

The qq-plot plots the theoretical quantiles (of a perfect normal distribution) against the quantiles of the distribution in question.

qqnorm(test$Mark, main = NA)
qqline(test$Mark) #adds line where quantiles should match

plot of chunk unnamed-chunk-2 - What do we conclude? The interpretation of a qq-plot needs experience, dots systematically deviating from the line indicate that your sample is not normally distributed. Question: can percentages be normally distributed?

The Wilcoxon test

A two-sample test of non-normally distributed samples

  • Robust against an increased probability for type I and type II errors due to the violation of the assumption of normally distributed samples
  • Based on ranks, the absolute numbers are not important
  • Therefore we do not required the data to be normally distributed
  • E.g., testing the two samples c(2, 3, 5) against c(3, 6, 8) will yield the very same results than testing c(0, 3, 4) against c(3, 23, 700). This is why:
rank(c(c(2, 3, 5),c(3, 6, 8)))
[1] 1.0 2.5 4.0 2.5 5.0 6.0
rank(c(c(0, 3, 4),c(3, 23, 700)))
[1] 1.0 2.5 4.0 2.5 5.0 6.0

You should be able to understand and interpret the above code!

The Wilcoxon test

Example

Did the food science students do better than the environmental sciences students? (We assume non-normality and therefore use the Wilcoxon test)

wilcox.test(test$Mark[test$Pathway == 'Food Science'],
            test$Mark[test$Pathway == 'Env Sci'])
    Wilcoxon rank sum test with continuity correction

data:  test$Mark[test$Pathway == "Food Science"] and test$Mark[test$Pathway == "Env Sci"]
W = 136.5, p-value = 0.4437
alternative hypothesis: true location shift is not equal to 0

Note the use of [] and ==. What does it do?

What do we conclude from the test?

Statistical power

Statistical power is the probability to detect a significant difference

  • This probability should usually not drop below 80%
  • If it does, we have little power to detect a difference, should there be one
  • By conducting a pilot study, we can estimate the variation and the expected difference between samples (in the case of a t-test)
  • This allows us to conduct a power test and plan the required sample size

Statistical power in a t-test

There is a trade-off between sample size (or standard deviations), expected difference, type-I error probability (\(\alpha\)) and type-II error probability (\(\beta\)) and the power (\(1-\beta\))!

Power t-test: example

You are trying to find out whether it is possible to detect a 10% change in transpiration if you apply a certain treatment to forest trees, and your funding restricts you to a sample size of 8 trees:

In a pilot study you find a mean of 4 \(Ld^{-1}\) (liters per day) with a standard deviation of 2 \(Ld^{-1}\). The expected difference is 1 \(Ld^{-1}\)

We simulate a t-test based on these assumptions, what do we conclude?

t.test(rnorm(8, mean = 4, sd = 2), rnorm(8, mean = 5, sd = 2))

    Welch Two Sample t-test

data:  rnorm(8, mean = 4, sd = 2) and rnorm(8, mean = 5, sd = 2)
t = 0.4066, df = 13.86, p-value = 0.6905
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.943  2.851
sample estimates:
mean of x mean of y 
    4.884     4.430 

Power t-test: example continuted

Given the above, a t-test is unlikely to detect a potential difference, so how big would our sample size need to be?

power.t.test(delta = 1, sd = 2, power = 0.8)

     Two-sample t test power calculation 

              n = 63.77
          delta = 1
             sd = 2
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

This tells you that you need a sample size of c. 65 to achieve a power of 80% (i.e. 80% change to detect the difference if there is one)

Conclusion: Don't even start the experiment if your sample size is restricted to 8, you only have about a 15% chance to detect a potential difference (see next slide)!

Power t-test: example continuted

power.t.test(n = 8, delta = 1, sd = 2, sig.level = 0.05)

     Two-sample t test power calculation 

              n = 8
          delta = 1
             sd = 2
      sig.level = 0.05
          power = 0.1522
    alternative = two.sided

NOTE: n is number in *each* group

Again, sample size, expected difference, standard deviations, \(\alpha\) (type I error probability), and \(\beta\) are traded off, you can't have it all!

Example question

What can you do to improve your power in a t-test?

  1. You can try to reduce unsystematic variation in order to obtain lower standard deviations
  2. You can decrease your \(\alpha\)-level
  3. You can reduce the sample size
  4. All of the above is correct

Plotting the results of a t-test

Create a simple data frame, one categorical, one continuous variable:

d1 <- data.frame(gender = rep(c('M', 'F'), each = 10),
                      height = c(rnorm(n = 10, mean = 177, sd = 9),
                                 rnorm(10, 160, 9)))
head(d1, 11)
   gender height
1       M  171.3
2       M  180.6
3       M  160.3
4       M  173.5
5       M  165.2
6       M  168.2
7       M  179.3
8       M  192.0
9       M  178.9
10      M  191.5
11      F  168.6

Plotting the results of a t-test

Conduct the t-test (as seen earlier):

t.test(d1$height ~ d1$gender)

For a quick look at the data, use the exploratory plotting function boxplot:

boxplot(d1$height ~ d1$gender)

plot of chunk unnamed-chunk-10

Plotting the results of a t-test (customised plot)

First calculate the mean and standard errors per group:

m <- tapply(d1$height, d1$gender, mean) #calculating the mean per group
s <- tapply(d1$height, d1$gender, sd) #calculating the sd per group

and plot it:

hh <- barplot(m, ylim = c(0, 200))
segments(hh,m,hh,m+s)

plot of chunk unnamed-chunk-12

What will we have learnt by the end of this week?

  • How to perform a Wilcoxon test (and when it makes sense to do so)
  • How to test for normality visually or with a test
  • What statistical power is
  • How to perform a power-t-test to determine the required sample size
  • How to plot the results from a t-test quickly
  • How to customise a bar plot using some of the many arguments to the function plot()

Glossary

  • Wilcoxon rank-sum test
  • qq-plot (qqnorm(), qqline())
  • power, sample size, significance level (\(\alpha\)), type-II error probability (1-\(\beta\))
  • boxplot()
  • barplot()
  • segments()

Midsemester test results

  • Passrate: 87%
  • Highest mark: 100%
  • Lowest mark: 30%
  • Median: 70%
  • Mean: 70%
  • ANOVA on majors: ns, p=0.5

Midsemester test results

plot of chunk unnamed-chunk-13

Midsemester test results

plot of chunk unnamed-chunk-14