The following example covers Example 8.6 (Mass extinctions) in Whitlock & Schluter’s “Analysis of Biological Data” using matrices only. To see how to work through this example using more sophisticated techniques, click here.

Load the data

extinctData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter08/chap08e6MassExtinctions.csv")
extinctData

Calculate summary statistics

(total <- length(extinctData$numberOfExtinctions))
[1] 76
(mu <- mean(extinctData$numberOfExtinctions))
[1] 4.210526

Create table

The following command creates the table.

(extinctTable <- table(extinctData))
extinctData
 1  2  3  4  5  6  7  8  9 10 11 14 16 20 
13 15 16  7 10  4  2  1  2  1  1  1  2  1 

Fill in gaps

Looking at the table, the problem we have is the non-existence of levels that have no counts, such as 0, 12, and 13. Let’s fix that.

extinctData$numberOfExtinctions <- factor(extinctData$numberOfExtinctions, levels = c(0:20))
(extinctTable <- table(extinctData))
extinctData
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
 0 13 15 16  7 10  4  2  1  2  1  1  0  0  1  0  2  0  0  0  1 

Compute expected frequencies

Okay, now let’s create the expected frequencies using the dpois function. Note that the Poisson distribution is parameterized by \(\lambda\), which is the average number of extinctions we calculated previously. Also, we need the table to be absolute frequencies, not relative frequencies, so we multiply the probabilities by the total we calculated previously.

extinctTable <- t(rbind(extinctTable, total*dpois(0:20, lambda = mu)))
colnames(extinctTable) <- c("obs", "exp")
print(extinctTable)
   obs          exp
0    0 1.127730e+00
1   13 4.748338e+00
2   15 9.996501e+00
3   16 1.403018e+01
4    7 1.476861e+01
5   10 1.243672e+01
6    4 8.727524e+00
7    2 5.249639e+00
8    1 2.762968e+00
9    2 1.292616e+00
10   1 5.442596e-01
11   1 2.083290e-01
12   0 7.309790e-02
13   0 2.367543e-02
14   1 7.120431e-03
15   0 1.998718e-03
16   2 5.259783e-04
17   0 1.302733e-04
18   0 3.047328e-05
19   0 6.753081e-06
20   1 1.421701e-06

Aggregate groups to satisfy assumptions

Looking at the expected frequencies, we notice that many of the frequencies are less than one, and some of them are less than five. Following the example given here, we will group 0 and 1 extinctions together, and 8 or more extinctions together.

extinctTable[9,] <- colSums(extinctTable[9:21,])
extinctTable[2,] <- colSums(extinctTable[1:2,])
extinctTable <- extinctTable[2:9,]
rownames(extinctTable) <- c("0 or 1", "2", "3", "4", "5", "6", "7", "8 or more")
extinctTable
          obs       exp
0 or 1     13  5.876068
2          15  9.996501
3          16 14.030177
4           7 14.768608
5          10 12.436722
6           4  8.727524
7           2  5.249639
8 or more   9  4.914760

Add in lost probability

The Poisson distribution goes to infinity (and beyond!), which we’ve effectively ignored as our data only went to 20. So, we need to add this probability back in. We do that now by adding the missing frequencies to the last group.

extinctTable["8 or more", "exp"] <- extinctTable["8 or more", "exp"] + (total - sum(extinctTable[,"exp"]))
extinctTable
          obs       exp
0 or 1     13  5.876068
2          15  9.996501
3          16 14.030177
4           7 14.768608
5          10 12.436722
6           4  8.727524
7           2  5.249639
8 or more   9  4.914761

Perform chi-squared goodness-of-fit test

Finally, we perform the \(\chi^2\) goodness-of-fit test.

saveChiTest <- chisq.test(extinctTable[,"obs"], 
                          p = extinctTable[,"exp"]/total)
Chi-squared approximation may be incorrect

Note that the chisq.test function assumes the degrees of freedom are number of categories minus one, but we lost one more degree of freedom since we had to estimate the mean of the Poisson distribution from the data. Since our number of categories is 8, we have a degree of freedom given by \(df = 8 - 1 - 1 = 6\). Thus, we will calculate the \(P\)-value manually using pchisq, but use the already calculated test statistic from the chisq.test function (which is why we saved the result in saveChiTest).

pValue <- pchisq(q = saveChiTest$statistic, 
                 df = 6,
                 lower.tail = FALSE)
unname(pValue)
[1] 0.0005334919

So what’s our conclusion? We reject the null hypothesis that extinctions in the fossil record follow a Poisson distribution.

LS0tCnRpdGxlOiAiTWFzcyBleHRpbmN0aW9ucyB3aXRoIG1hdHJpeCBjb21tYW5kcyBvbmx5IgphdXRob3I6ICJNLiBEcmV3IExhTWFyIgpkYXRlOiAiTWFyY2ggMSwgMjAxNyIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICBjb2RlX2ZvbGRpbmc6IHNob3cKICAgIHRvYzogeWVzCiAgICB0b2NfZGVwdGg6IDMKICAgIHRvY19mbG9hdDogeWVzCiAgICB0aGVtZTogcmVhZGFibGUKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCBpbmNsdWRlID0gVFJVRSkKYGBgCgpUaGUgZm9sbG93aW5nIGV4YW1wbGUgY292ZXJzIEV4YW1wbGUgOC42IChNYXNzIGV4dGluY3Rpb25zKSBpbiBXaGl0bG9jayAmIFNjaGx1dGVyJ3MgIkFuYWx5c2lzIG9mIEJpb2xvZ2ljYWwgRGF0YSIgdXNpbmcgbWF0cmljZXMgb25seS4gIFRvIHNlZSBob3cgdG8gd29yayB0aHJvdWdoIHRoaXMgZXhhbXBsZSB1c2luZyBtb3JlIHNvcGhpc3RpY2F0ZWQgdGVjaG5pcXVlcywgW2NsaWNrIGhlcmVdKGh0dHA6Ly9yc3R1ZGlvLXB1YnMtc3RhdGljLnMzLmFtYXpvbmF3cy5jb20vMTU3ODMxX2JlZWJkNTYyOGVhNjQ2MDE5MGI1YzM5ZTliNDEzZTI0Lmh0bWwjZXhhbXBsZS04LjYuLW1hc3MtZXh0aW5jdGlvbnMpLgoKIyBMb2FkIHRoZSBkYXRhCgpgYGB7cn0KZXh0aW5jdERhdGEgPC0gcmVhZC5jc3YoImh0dHA6Ly93aGl0bG9ja3NjaGx1dGVyLnpvb2xvZ3kudWJjLmNhL3dwLWNvbnRlbnQvZGF0YS9jaGFwdGVyMDgvY2hhcDA4ZTZNYXNzRXh0aW5jdGlvbnMuY3N2IikKZXh0aW5jdERhdGEKYGBgCgojIENhbGN1bGF0ZSBzdW1tYXJ5IHN0YXRpc3RpY3MKCmBgYHtyfQoodG90YWwgPC0gbGVuZ3RoKGV4dGluY3REYXRhJG51bWJlck9mRXh0aW5jdGlvbnMpKQoobXUgPC0gbWVhbihleHRpbmN0RGF0YSRudW1iZXJPZkV4dGluY3Rpb25zKSkKYGBgCgojIENyZWF0ZSB0YWJsZQoKVGhlIGZvbGxvd2luZyBjb21tYW5kIGNyZWF0ZXMgdGhlIHRhYmxlLiAgCgpgYGB7cn0KKGV4dGluY3RUYWJsZSA8LSB0YWJsZShleHRpbmN0RGF0YSkpCmBgYAoKIyBGaWxsIGluIGdhcHMKTG9va2luZyBhdCB0aGUgdGFibGUsIHRoZSBwcm9ibGVtIHdlIGhhdmUgaXMgdGhlIG5vbi1leGlzdGVuY2Ugb2YgbGV2ZWxzIHRoYXQgaGF2ZSBubyBjb3VudHMsIHN1Y2ggYXMgMCwgMTIsIGFuZCAxMy4gIExldCdzIGZpeCB0aGF0LgoKYGBge3J9CmV4dGluY3REYXRhJG51bWJlck9mRXh0aW5jdGlvbnMgPC0gZmFjdG9yKGV4dGluY3REYXRhJG51bWJlck9mRXh0aW5jdGlvbnMsIGxldmVscyA9IGMoMDoyMCkpCihleHRpbmN0VGFibGUgPC0gdGFibGUoZXh0aW5jdERhdGEpKQpgYGAKCiMgQ29tcHV0ZSBleHBlY3RlZCBmcmVxdWVuY2llcwpPa2F5LCBub3cgbGV0J3MgY3JlYXRlIHRoZSBleHBlY3RlZCBmcmVxdWVuY2llcyB1c2luZyB0aGUgYGRwb2lzYCBmdW5jdGlvbi4gIE5vdGUgdGhhdCB0aGUgUG9pc3NvbiBkaXN0cmlidXRpb24gaXMgcGFyYW1ldGVyaXplZCBieSAkXGxhbWJkYSQsIHdoaWNoIGlzIHRoZSBhdmVyYWdlIG51bWJlciBvZiBleHRpbmN0aW9ucyB3ZSBjYWxjdWxhdGVkIHByZXZpb3VzbHkuICBBbHNvLCB3ZSBuZWVkIHRoZSB0YWJsZSB0byBiZSBhYnNvbHV0ZSBmcmVxdWVuY2llcywgbm90IHJlbGF0aXZlIGZyZXF1ZW5jaWVzLCBzbyB3ZSBtdWx0aXBseSB0aGUgcHJvYmFiaWxpdGllcyBieSB0aGUgdG90YWwgd2UgY2FsY3VsYXRlZCBwcmV2aW91c2x5LgoKYGBge3J9CmV4dGluY3RUYWJsZSA8LSB0KHJiaW5kKGV4dGluY3RUYWJsZSwgdG90YWwqZHBvaXMoMDoyMCwgbGFtYmRhID0gbXUpKSkKY29sbmFtZXMoZXh0aW5jdFRhYmxlKSA8LSBjKCJvYnMiLCAiZXhwIikKcHJpbnQoZXh0aW5jdFRhYmxlKQpgYGAKCiMgQWdncmVnYXRlIGdyb3VwcyB0byBzYXRpc2Z5IGFzc3VtcHRpb25zCgpMb29raW5nIGF0IHRoZSBleHBlY3RlZCBmcmVxdWVuY2llcywgd2Ugbm90aWNlIHRoYXQgbWFueSBvZiB0aGUgZnJlcXVlbmNpZXMgYXJlIGxlc3MgdGhhbiBvbmUsIGFuZCBzb21lIG9mIHRoZW0gYXJlIGxlc3MgdGhhbiBmaXZlLiAgRm9sbG93aW5nIHRoZSBleGFtcGxlIGdpdmVuIFtoZXJlXShodHRwOi8vcnN0dWRpby1wdWJzLXN0YXRpYy5zMy5hbWF6b25hd3MuY29tLzE1NzgzMV9iZWViZDU2MjhlYTY0NjAxOTBiNWMzOWU5YjQxM2UyNC5odG1sI2V4YW1wbGUtOC42Li1tYXNzLWV4dGluY3Rpb25zKSwgd2Ugd2lsbCBncm91cCAwIGFuZCAxIGV4dGluY3Rpb25zIHRvZ2V0aGVyLCBhbmQgOCBvciBtb3JlIGV4dGluY3Rpb25zIHRvZ2V0aGVyLgpgYGB7cn0KZXh0aW5jdFRhYmxlWzksXSA8LSBjb2xTdW1zKGV4dGluY3RUYWJsZVs5OjIxLF0pCmV4dGluY3RUYWJsZVsyLF0gPC0gY29sU3VtcyhleHRpbmN0VGFibGVbMToyLF0pCmV4dGluY3RUYWJsZSA8LSBleHRpbmN0VGFibGVbMjo5LF0Kcm93bmFtZXMoZXh0aW5jdFRhYmxlKSA8LSBjKCIwIG9yIDEiLCAiMiIsICIzIiwgIjQiLCAiNSIsICI2IiwgIjciLCAiOCBvciBtb3JlIikKZXh0aW5jdFRhYmxlCmBgYAoKIyBBZGQgaW4gbG9zdCBwcm9iYWJpbGl0eQoKVGhlIFBvaXNzb24gZGlzdHJpYnV0aW9uIGdvZXMgdG8gaW5maW5pdHkgKGFuZCBiZXlvbmQhKSwgd2hpY2ggd2UndmUgZWZmZWN0aXZlbHkgaWdub3JlZCBhcyBvdXIgZGF0YSBvbmx5IHdlbnQgdG8gMjAuICBTbywgd2UgbmVlZCB0byBhZGQgdGhpcyBwcm9iYWJpbGl0eSBiYWNrIGluLiAgV2UgZG8gdGhhdCBub3cgYnkgYWRkaW5nIHRoZSBtaXNzaW5nIGZyZXF1ZW5jaWVzIHRvIHRoZSBsYXN0IGdyb3VwLgpgYGB7cn0KZXh0aW5jdFRhYmxlWyI4IG9yIG1vcmUiLCAiZXhwIl0gPC0gZXh0aW5jdFRhYmxlWyI4IG9yIG1vcmUiLCAiZXhwIl0gKyAodG90YWwgLSBzdW0oZXh0aW5jdFRhYmxlWywiZXhwIl0pKQpleHRpbmN0VGFibGUKYGBgCgojIFBlcmZvcm0gY2hpLXNxdWFyZWQgZ29vZG5lc3Mtb2YtZml0IHRlc3QKCkZpbmFsbHksIHdlIHBlcmZvcm0gdGhlICRcY2hpXjIkIGdvb2RuZXNzLW9mLWZpdCB0ZXN0LgpgYGB7cn0Kc2F2ZUNoaVRlc3QgPC0gY2hpc3EudGVzdChleHRpbmN0VGFibGVbLCJvYnMiXSwgCiAgICAgICAgICAgICAgICAgICAgICAgICAgcCA9IGV4dGluY3RUYWJsZVssImV4cCJdL3RvdGFsKQpgYGAKCk5vdGUgdGhhdCB0aGUgYGNoaXNxLnRlc3RgIGZ1bmN0aW9uIGFzc3VtZXMgdGhlIGRlZ3JlZXMgb2YgZnJlZWRvbSBhcmUgbnVtYmVyIG9mIGNhdGVnb3JpZXMgbWludXMgb25lLCBidXQgd2UgbG9zdCBvbmUgbW9yZSBkZWdyZWUgb2YgZnJlZWRvbSBzaW5jZSB3ZSBoYWQgdG8gZXN0aW1hdGUgdGhlIG1lYW4gb2YgdGhlIFBvaXNzb24gZGlzdHJpYnV0aW9uIGZyb20gdGhlIGRhdGEuICBTaW5jZSBvdXIgbnVtYmVyIG9mIGNhdGVnb3JpZXMgaXMgOCwgd2UgaGF2ZSBhIGRlZ3JlZSBvZiBmcmVlZG9tIGdpdmVuIGJ5ICRkZiA9IDggLSAxIC0gMSA9IDYkLiAgVGh1cywgd2Ugd2lsbCBjYWxjdWxhdGUgdGhlICRQJC12YWx1ZSBtYW51YWxseSB1c2luZyBgcGNoaXNxYCwgYnV0IHVzZSB0aGUgYWxyZWFkeSBjYWxjdWxhdGVkIHRlc3Qgc3RhdGlzdGljIGZyb20gdGhlIGBjaGlzcS50ZXN0YCBmdW5jdGlvbiAod2hpY2ggaXMgd2h5IHdlIHNhdmVkIHRoZSByZXN1bHQgaW4gYHNhdmVDaGlUZXN0YCkuCgpgYGB7cn0KcFZhbHVlIDwtIHBjaGlzcShxID0gc2F2ZUNoaVRlc3Qkc3RhdGlzdGljLCAKICAgICAgICAgICAgICAgICBkZiA9IDYsCiAgICAgICAgICAgICAgICAgbG93ZXIudGFpbCA9IEZBTFNFKQp1bm5hbWUocFZhbHVlKQpgYGAKClNvIHdoYXQncyBvdXIgY29uY2x1c2lvbj8gIFdlIHJlamVjdCB0aGUgbnVsbCBoeXBvdGhlc2lzIHRoYXQgZXh0aW5jdGlvbnMgaW4gdGhlIGZvc3NpbCByZWNvcmQgZm9sbG93IGEgUG9pc3NvbiBkaXN0cmlidXRpb24u