- Recap week 8: Example questions Wilcoxon rank sum test and power test, interactive R app
- Chi-squared test
- Playing with other distributions: Chi-squared, Poisson, Uniform, Binomial
- Distribution R app
Recap week 8
- The Wilcoxon test (when to use it)
- Testing for normality visually and with the Shapiro Wilk test
- Statistical power: what is it? (Definition?)
- Perform a power-t-test to determine the required sample size
- Means to improve statistical power
- How to aggregate data (e.g. calculate the mean or sd per group)
- How to plot the results from a t-test quickly using
barplot() - How to customise a bar plot adding error bars etc.
Some advice on studying for BIOL501
- Listen to the lecture . There are no shortcuts in this business.
- Read the lab in full, attempt it independently first
- Go back to older material if you identify knowledge gaps
- Ask questions in tutorials
- Look out for announcements and follow instructions closely (e.g. don’t miss deadlines)
If you follow these steps and put in the timetabled 8 hours per week to study, failing this paper is virtually impossible.
Attendance has been low lately. Don’t be fooled into thinking you can watch all the lectures quickly and then just turn up to the exam. Practice is the keyword.
Example Question (1)
A p-value of 0.1 in a Wilcoxon test with \(n\) = 3 should not be overinterpreted because…
- this is the smallest p-value you can expect
- there are only very few possible p-values at such a low \(n\)
- the better test to use at such a small sample size is the t-test
- Both (1) and (2) are correct
Example Question (2)
What will be the power of a t-test if your expected difference between samples is 7, your standard devation 5, and your sample size 8? Assume an alpha threshold of 5%.
- It cannot be computed based on this information
- Too high for the experiment to make sense
- Far too low for the experiment to make sense
- Around 75%
Example Question (3, non-MC)
You are in a situation where you have to test whether in Auckland primary schools, children with siblings are showing higher developed social skills than only-children. Your sample size is 10 (i.e. you have 10 in each group)
How would you select your 20 subjects?
What could be your metric to measure ‘social skills’?
What test(s) could you use to analyse your data?
Would you use a one- or two-tailed test? Justify your answer.
Would you use a paired or an independent test?
If the p-values of Shapiro-Wilk tests that you conduct on your two samples are both < 0.01, what test would be best suited?
The power t-test app
https://rpsychologist.com/d3/nhst/
Play with this! It will help you understand what statistical power is!
Reminder: test statistics
- In the box example your test statistic was simply the mean
- Test statistics can be more complicated, e.g. for the t-test we needed a metric that reflects
- the difference between the samples
- the standard deviations of the samples
- So for a t-test the test statistic is a t-value that we compare to a random t-distribution
- In ‘Analysis of Variance’ (not treated in this paper), we are using an F-value that we compare to a random F-distribution
The fundamental question always is: ‘How does the test statistic of our sample compare to a random distribution of that test statistic ?’
The Chi-squared test: dancing cats
- We know how to analyse a data set with one continuous and one binomial variable
- Now we are analysing a data set with two categorical variables
- The mean of a categorical variable is often meaningless, because the numeric values you attach to different categories may be arbitrary
Therefore, we analyse frequencies
An example:
- Can animals be trained to line-dance with different rewards?
- Participants: 200 cats
- Training: the animal was trained using either food or affection, not both
- The animal then either learnt to line-dance or it did not
- Outcome: the number of animals (frequency) that could dance or not in each reward category
- We can tabulate these frequencies in a contingency table
The contingency table
head(d1) #long format
reward dance 1 F TRUE 2 A FALSE 3 F FALSE 4 A TRUE 5 F TRUE 6 A TRUE
| Learn to dance? | Food as reward | Affection as reward | Total |
|---|---|---|---|
| YES | 28 | 48 | 76 |
| NO | 10 | 114 | 124 |
| TOTAL | 38 | 162 | 200 |
Pearson’s Chi-squared test
- Used to test whether there’s a relationship between two categorical variables
- Compares the frequencies you observe in certain categories to the frequencies you might expect to get in those categories by chance.
\[\chi^2 = \sum \frac{(\text{observed}_{ij} - \text{model}_{ij})^2}{\text{model}_{ij}} \]
- \(i\) represents the rows in the contingency table
- \(j\) represents the columns
The observed data are the frequencies in the contingency table
Pearson’s Chi-squared test
- The ‘model’ is based on ‘expected frequencies’ calculated for each of the cells in the contingency table.
- n is the total number of observations (in this case 200).
\[\text{model}_{ij} = E_{ij} = \frac{\text{row total}_i \times \text{column total}_j}{n}\]
- The sum of all \((observed_{ij} - E_{ij})^2/E_{ij}\) (the test statistic for the Chi-squared test) is then checked against a random Chi-squared distribution with \((r-1)(c-1)\) degrees of freedom, \(r\), and \(c\) stands for rows/columns
- If the p-value is significant then there is a significant association between the categorical variables.
Pearson’s Chi-squared test
Why one degree of freedom in a 2x2 contingency table? What are degrees of freedom again?
- Because, given the row and column totals, you can choose exactly one number freely, the remaining numbers are then locked in! Try it out.
What is the null hypothesis?
‘The two variables are not significantly related’
or
‘Variable 1 is independent of variable 2’
or (if you only have one variable, see second example in the lab):
The counts per level are not significantly disproportionate
Pearson’s Chi-squared test
Pearson’s Chi-squared test in R (manually)
- Calculate the modelled frequencies
- Sum up the modelled minus the observed frequencies squared divided by the modelled frequencies!
- Compare your computed Chi-squared value (your test statistic) to the distribution of random numbers that follow a Chi-squared distribution
Use the function pchisq() for this purpose (analogous to pnorm(), pt(), etc.:
model_fy = 76*38/200; model_fn = 124*38/200 # etc... chisq = sum( (28 - model_fy)^2/model_fy + ... pchisq(q = chisq, df = 1)
Pearson’s Chi-squared test in R
1 - pchisq(q = 25.35, df = 1) [1] 4.781524e-07
Pearson’s Chi-squared test in R
- Create a data frame in R that contains the contingency table you would like to test
- Use the
chisq.test()function on it
cats = data.frame(food = c(28, 10), affection = c(48, 114))
cats
food affection
1 28 48
2 10 114
chisq.test(cats)
Pearson's Chi-squared test with Yates' continuity correction
data: cats
X-squared = 23.52, df = 1, p-value = 1.236e-06
Note that the computed chi-squared value is slightly different from the one we calculated. This is due to a correction factor, which we need not worry about now.
What does the p-value tell you?
Other distributions: the uniform distribution
hist(runif(n = 1000, min = 0, max = 15), xlim = c(-5, 22))
The uniform distribution, example
What is the chance of waiting 10 minutes or less if a bus comes every 15 minutes?
punif(q = 10, min = 0, max = 15) [1] 0.6666667
Other distributions: the poisson distribution
hist(rpois(n = 1000, lambda = 3))
The poisson distribution, example
What is the probability of finding more than 6 rats in traps where on average we find 3 rats?
1 - ppois(q = 6, lambda = 3) [1] 0.03350854
The binomial distribution
What is the probability of a coin landing on one particular side (head)?
rbinom(n = 1, size = 1, prob = .5) [1] 0
Do this experiment 40 times:
rbinom(n = 40, size = 1, prob = .5) [1] 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 0 0 1 [39] 0 1
size = number of trials prob = probability for ‘success’
The argument ‘q’ in pbinom means ‘as many or less times ’success’!
The distribution R app
https://gallery.shinyapps.io/dist_calc/
Example data analysis
You are working on a pest control project with the goal to treat rats so their offspring becomes infertile. You spread a certain substance on one rat-infested island in the Hauraki Gulf, and a control substance on another. Then you collect 20 mice each and test whether they are still fertile.
Now you check whether after one year, there is a difference between the islands in terms of rat size.
Make up the appropriate data frames and analyse the data sets!
What will we have learnt by the end of this week?
- How to test a data set with two categorical variables
- What a contingency table is
- How to conduct and interpret a Chi-squared test in R
- How to manually conduct a Chi-squared test
- A little bit about other distributions and how to calculate probabilities and draw histograms for those
Glossary
- Contingency table
- Chi-squared test
- Chi-squared distribution
- Uniform distribution
- Poisson distribution
- Binomial distribution