The following example covers Example 8.6 (Mass extinctions) in Whitlock & Schluter’s “Analysis of Biological Data” using matrices only. To see how to work through this example using more sophisticated techniques, click here.
extinctData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter08/chap08e6MassExtinctions.csv")
extinctData
(total <- length(extinctData$numberOfExtinctions))
[1] 76
(mu <- mean(extinctData$numberOfExtinctions))
[1] 4.210526
The following command creates the table.
(extinctTable <- table(extinctData))
extinctData
1 2 3 4 5 6 7 8 9 10 11 14 16 20
13 15 16 7 10 4 2 1 2 1 1 1 2 1
Looking at the table, the problem we have is the non-existence of levels that have no counts, such as 0, 12, and 13. Let’s fix that.
extinctData$numberOfExtinctions <- factor(extinctData$numberOfExtinctions, levels = c(0:20))
(extinctTable <- table(extinctData))
extinctData
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 13 15 16 7 10 4 2 1 2 1 1 0 0 1 0 2 0 0 0 1
Okay, now let’s create the expected frequencies using the dpois
function. Note that the Poisson distribution is parameterized by \(\lambda\), which is the average number of extinctions we calculated previously. Also, we need the table to be absolute frequencies, not relative frequencies, so we multiply the probabilities by the total we calculated previously.
extinctTable <- t(rbind(extinctTable, total*dpois(0:20, lambda = mu)))
colnames(extinctTable) <- c("obs", "exp")
print(extinctTable)
obs exp
0 0 1.127730e+00
1 13 4.748338e+00
2 15 9.996501e+00
3 16 1.403018e+01
4 7 1.476861e+01
5 10 1.243672e+01
6 4 8.727524e+00
7 2 5.249639e+00
8 1 2.762968e+00
9 2 1.292616e+00
10 1 5.442596e-01
11 1 2.083290e-01
12 0 7.309790e-02
13 0 2.367543e-02
14 1 7.120431e-03
15 0 1.998718e-03
16 2 5.259783e-04
17 0 1.302733e-04
18 0 3.047328e-05
19 0 6.753081e-06
20 1 1.421701e-06
Looking at the expected frequencies, we notice that many of the frequencies are less than one, and some of them are less than five. Following the example given here, we will group 0 and 1 extinctions together, and 8 or more extinctions together.
extinctTable[9,] <- colSums(extinctTable[9:21,])
extinctTable[2,] <- colSums(extinctTable[1:2,])
extinctTable <- extinctTable[2:9,]
rownames(extinctTable) <- c("0 or 1", "2", "3", "4", "5", "6", "7", "8 or more")
extinctTable
obs exp
0 or 1 13 5.876068
2 15 9.996501
3 16 14.030177
4 7 14.768608
5 10 12.436722
6 4 8.727524
7 2 5.249639
8 or more 9 4.914760
The Poisson distribution goes to infinity (and beyond!), which we’ve effectively ignored as our data only went to 20. So, we need to add this probability back in. We do that now by adding the missing frequencies to the last group.
extinctTable["8 or more", "exp"] <- extinctTable["8 or more", "exp"] + (total - sum(extinctTable[,"exp"]))
extinctTable
obs exp
0 or 1 13 5.876068
2 15 9.996501
3 16 14.030177
4 7 14.768608
5 10 12.436722
6 4 8.727524
7 2 5.249639
8 or more 9 4.914761
Finally, we perform the \(\chi^2\) goodness-of-fit test.
saveChiTest <- chisq.test(extinctTable[,"obs"],
p = extinctTable[,"exp"]/total)
Chi-squared approximation may be incorrect
Note that the chisq.test
function assumes the degrees of freedom are number of categories minus one, but we lost one more degree of freedom since we had to estimate the mean of the Poisson distribution from the data. Since our number of categories is 8, we have a degree of freedom given by \(df = 8 - 1 - 1 = 6\). Thus, we will calculate the \(P\)-value manually using pchisq
, but use the already calculated test statistic from the chisq.test
function (which is why we saved the result in saveChiTest
).
pValue <- pchisq(q = saveChiTest$statistic,
df = 6,
lower.tail = FALSE)
unname(pValue)
[1] 0.0005334919
So what’s our conclusion? We reject the null hypothesis that extinctions in the fossil record follow a Poisson distribution.