| Date | Week | Lab | Assignment | Topic |
|---|---|---|---|---|
| 29.8. | 7 | - | - | Week of midsemester test |
| 19.9. | 8 | 6 | - | Statistical power, power test, plotting/tabulating data correctly |
| 26.9. | 9 | 7 | - | The Chi-squared test |
| 3.10. | 10 | 8 | - | Correlation analysis, reporting statistical results |
| 10.10. | 11 | 9 | - | Linear Regression |
| 17.10. | 12 | 10 | 2 | Exam week, Assignment 2 due on Friday |
What will we do today
- Midsem test
- Quick recap on t-test
- Data transformations
- non-parametric t-test alternatives (Wilcoxon test)
- What is a power test?
- The power t-test
- Plotting data in R (bar plots)
Histogram of midsemester test results
Recap on t-test
- What is the t-statistic (basics of the formula)?
- What is a t-distribution?
- Independent t-test vs. paired t-test
- one vs. two-tailed t-test
- How does the t-value link to the p-value?
Rationale for the t-test
We need a metric (a test statistic!) that puts the difference between the samples into perspective with the standard deviations of the two samples !
This is called the t-statistic in a two sample test:
\[t = \frac{\text{observed difference - expected difference}}{\text{estimate of the standard deviations}}\]
So the larger the numerator or the smaller the denominator, the larger t will turn out. Think what this means in terms of the p-value!
Example question
The formula for calculating a t-value makes sure a t-value becomes larger (and is thus more likely to produce a significant p-value) as…
- …the observed difference between samples becomes larger
- …the estimated standard deviations of the samples become larger
- …the observed difference between samples becomes smaller
- all of the above
The assumptions of a t-test
A t-test will give trustworthy results if the underlying assumptions are met:
- The two samples that are compared should follow a normal distribution
- The two samples should be taken independently (e.g. not clustered)
- The variances of the two samples can be unequal (set ‘var.equal’ argument to ‘FALSE’, which is the default)
But how can we test for normality of the data?
- Using a test (e.g. the Shapiro-Wilk test:
shapiro.test()) - Looking at the so-called ‘qqplot’ using the function
qqnorm(): do the points sit on a straight line? - Looking at the histogram of the variable: does it look bell-shaped?
Example: Are the marks of the mid-semester test normally distributed?
Using the Shapiro-Wilk test
The null hypothesis of the Shapiro-Wilk test is: ‘The data are not significantly different from a normal distribution’
shapiro.test(test$mark) #your midsemester test marks
Shapiro-Wilk normality test
data: test$mark
W = 0.96243, p-value = 0.02131
What do we conclude?
Looking at the qq-plot
The qq-plot plots the theoretical quantiles (of a perfect normal distribution) against the quantiles of the distribution in question, simple example:
x = c(1, 7, 8, 9, 9, 9, 10, 10, 10) theoretical_quantiles = qnorm(p = c(.06, .17, .28, .39, .5, .61, .72, .83, .94)) plot(theoretical_quantiles, x)
If the distribution of \(x\) is approximately normal, the dots sit on a straight line! Use qqnorm() and qqline() to create this plot.
Looking at the qq-plot
qq-plot for the midsemester test results:
qqnorm(test$mark, main = NA) qqline(test$mark) #adds line where quantiles should match
- What do we conclude? The interpretation of a qq-plot needs experience, dots systematically deviating from the line indicate that your sample is not normally distributed. Question: can percentages be normally distributed?
What if the assumption of normally distributed data is violated in a t-test?
- The assumption of normality is often difficult to assess in small sample sizes. In a t-test with n = 3 for example, this is virtually impossible
- Once you are sure that you cannot/should not assume normality in your data, consider alternatives
- One possibility is to transform the data
- A Non-parametric alternative (in the case of the t-test) is the Wilcoxon test
- For more complicated models (linear regression, ANOVA) there are a multitude of ways to deal with non-normality (not discussed here)
Non-parametric means you are not assuming anything about the parameters of population (‘distribution-free’ tests)
Data transformation
The log transformation
Consider the size of fish (x). A log transformation makes the distribution look much more normal:
hist(x) hist(log(x))
Data transformation
The square root transformation
Consider a variable x that shows a ratio (e.g. mark in %). A square root transformation makes the distribution look much more normal:
hist(x) hist(sqrt(x))
The Wilcoxon test
A two-sample test of non-normally distributed samples
- Robust against an increased probability for type I and type II errors due to the violation of the assumption of normally distributed samples
- Based on ranks, the absolute numbers are irrelevant
- Therefore we do not required the data to be normally distributed
- E.g., testing the two samples
a = c(2, 3, 5)againstb = c(3, 6, 8)will yield the very same results than testingx = c(0, 3, 4)againsty = c(3, 23, 700). This is why:
a = c(2, 3, 5); b = c(3, 6, 8) x = c(0, 3, 4); y = c(3, 23, 700) rank(c(a, b)) [1] 1.0 2.5 4.0 2.5 5.0 6.0 rank(c(x, y)) [1] 1.0 2.5 4.0 2.5 5.0 6.0
You should be able to understand and interpret the above code!
The Wilcoxon test (example)
Did the food science students do better than the environmental sciences students?
- Check for normality of the samples visually
qqnorm(test$mark[test$Pathway == 'Food Science']) qqline(test$mark[test$Pathway == 'Food Science']) qqnorm(test$mark[test$Pathway == 'Env Sci']) qqline(test$mark[test$Pathway == 'Env Sci'])
The Wilcoxon test (example)
- Check for normality of the samples using the Shapiro-Wilk test
shapiro.test(test$mark[test$Pathway == 'Food Science'])
Shapiro-Wilk normality test
data: test$mark[test$Pathway == "Food Science"]
W = 0.94241, p-value = 0.4889
shapiro.test(test$mark[test$Pathway == 'Env Sci'])
Shapiro-Wilk normality test
data: test$mark[test$Pathway == "Env Sci"]
W = 0.91797, p-value = 0.119
What would you conclude?
If you are unsure, using both a parametric and a non-parametric test is a good idea:
The Wilcoxon test (example)
t.test(test$mark[test$Pathway == 'Food Science'],
test$mark[test$Pathway == 'Env Sci'])
Welch Two Sample t-test
data: test$mark[test$Pathway == "Food Science"] and test$mark[test$Pathway == "Env Sci"]
t = 1.3396, df = 28.199, p-value = 0.1911
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.919604 18.748664
sample estimates:
mean of x mean of y
77.69231 70.27778
wilcox.test(test$mark[test$Pathway == 'Food Science'],
test$mark[test$Pathway == 'Env Sci'])
Wilcoxon rank sum test with continuity correction
data: test$mark[test$Pathway == "Food Science"] and test$mark[test$Pathway == "Env Sci"]
W = 136.5, p-value = 0.4437
alternative hypothesis: true location shift is not equal to 0
The Wilcoxon test (example)
- Note the use of
[]and==. What does it do?
test$mark[test$Pathway == 'Food Science'] #same as test[test$Pathway == 'Env Sci', 'Mark']
- What do we conclude from the test?
- Both tests are non-significant, so we can be reasonably confident that there is no difference between the two groups
- The Wilcoxon test sacrifices some information in the data (that’s the price we have to pay) but it is more robust
Statistical power in a two-sample test
Statistical power is the probability to detect a significant difference between two samples, given there is a true difference between them
- This probability should normally be >80%
- If it is less, we have little chance (power) to detect a difference, should there be one
- By conducting a pilot study, we can estimate the variation and the expected difference between samples (in the case of a t-test)
- This allows us to conduct a power test and plan the required sample size
Remember the example with the two boxes?
What was your power in this example?
Statistical power in a t-test
There is a trade-off between sample size (or standard deviations), expected difference, type-I error probability (\(\alpha\)) and type-II error probability (\(\beta\)) and the power (\(1-\beta\))!
Power t-test: example
You are trying to find out whether it is possible to detect a 20% change in transpiration if you apply a certain treatment to forest trees, and your funding restricts you to a sample size of 8 trees:
In a pilot study you find a mean of 4 \(Ld^{-1}\) (liters per day) with a standard deviation of 2 \(Ld^{-1}\). The expected difference is 1 \(Ld^{-1}\)
We simulate a t-test based on these assumptions, what do we conclude?
set.seed(0)
t.test(rnorm(8, mean = 4, sd = 2), rnorm(8, mean = 5, sd = 2))
Welch Two Sample t-test
data: rnorm(8, mean = 4, sd = 2) and rnorm(8, mean = 5, sd = 2)
t = -0.6851, df = 13.997, p-value = 0.5045
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.124121 1.611485
sample estimates:
mean of x mean of y
4.297588 5.053906
Power t-test: example continuted
Given the above, a t-test is unlikely to detect a potential difference, so how big would our sample size need to be?
power.t.test(delta = 1, sd = 2, power = 0.8)
Two-sample t test power calculation
n = 63.76576
delta = 1
sd = 2
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
This tells you that you need a sample size of c. 65 to achieve a power of 80% (i.e. 80% change to detect the difference if there is one)
Conclusion: Don’t even start the experiment if your sample size is restricted to 8, you only have about a 15% chance to detect a potential difference (see next slide)!
Power t-test: example continuted
power.t.test(n = 8, delta = 1, sd = 2, sig.level = 0.05)
Two-sample t test power calculation
n = 8
delta = 1
sd = 2
sig.level = 0.05
power = 0.1521558
alternative = two.sided
NOTE: n is number in *each* group
Again, sample size, expected difference, standard deviations, \(\alpha\) (type I error probability), and \(\beta\) are traded off, you can’t have the cake and eat it!
Power t-test: visualisation
http://www.statstudio.net/free-tools/power-analysis/
Example question
What can you do to improve your power in a t-test?
- You can try to reduce unsystematic variation in order to obtain lower standard deviations
- You can decrease your \(\alpha\)-level
- You can reduce the sample size
- All of the above is correct
Bar plots
Plotting the results of a t-test
Create a simple data frame, one categorical, one continuous variable:
d1 = data.frame(sex = rep(c('M', 'F'), each = 10),
height = c(rnorm(n = 10, mean = 177, sd = 9),
rnorm(10, 160, 9)))
head(d1, 14)
sex height
1 M 179.2700
2 M 168.9727
3 M 180.9211
4 M 165.8622
5 M 174.9816
6 M 180.3966
7 M 178.2000
8 M 184.2377
9 M 176.4860
10 M 181.5325
11 F 169.7719
12 F 153.7814
13 F 148.4386
14 F 160.4205
Plotting the results of a t-test
Conduct the t-test (as seen earlier):
t.test(d1$height ~ d1$sex)
For a quick look at the data, use the exploratory plotting function boxplot:
boxplot(d1$height ~ d1$sex)
Plotting the results of a t-test (customised plot)
First calculate the mean and standard errors per group:
m = tapply(X = d1$height, INDEX = d1$sex, FUN = mean) #mean per group s = tapply(X = d1$height, INDEX = d1$sex, FUN = sd) #calculating the sd per group
and plot it:
hh = barplot(m, ylim = c(0, 200)) segments(hh, m, hh, m+s)
What will we have learnt by the end of this week?
- How to do a simple data transformation
- How to perform a Wilcoxon test (and when it makes sense to do so)
- How to test for normality visually or with a test
- What statistical power is
- How to perform a power-t-test to determine the required sample size
- How to plot the results from a t-test quickly
- How to plot a customised a bar plot
Glossary
- log transformation
- square root transformation
- Wilcoxon rank-sum test
- non-parametric tests
- qq-plot (
qqnorm(),qqline()) - power, sample size, significance level (\(\alpha\)), type-II error probability (1-\(\beta\))
boxplot()barplot()segments()