library(vcd)Loading required package: grid
data("VonBort")
xtabs(~ deaths, data = VonBort)deaths
0 1 2 3 4
144 91 32 11 2
Any process where there is a small and constant probability of a single event happening but where there are a large number of possible events is described by the exponential and Poisson distributions.
The exponential describes the decreasing number of events that occur within the entire population as the population gets smaller. The Poisson distribution describes the number of events that occur in a specified time-frame.
An example of this is the horse kick data from von Bortkiewicz which shows the number of Prussian soldiers kicked to death across 14 army corps over a 20 year period. This is summarised using the xtabs function
library(vcd)Loading required package: grid
data("VonBort")
xtabs(~ deaths, data = VonBort)deaths
0 1 2 3 4
144 91 32 11 2
When Fisher analysed the data in 1925 he excluded some of the Corps because of their different organisation.
xtabs(~ deaths, data = VonBort, subset = fisher == "yes")deaths
0 1 2 3 4
109 65 22 3 1
The formula for the expected number from the Poisson distribution is:
\[ n\frac{e^{-m}m^{x}}{x!} \]
For the Fisher subset the mean number of deaths is 0.61.
filtered <- subset(VonBort, fisher=="yes")
deaths <- filtered$deaths
mean(deaths)[1] 0.61
From this you can tabulate the data with the expected number of counts.
Deaths <- c(0:4)
Count <- c(109,65,22,3,1)
Expected <- 200*(exp(-0.61)*0.61^Deaths)/factorial(Deaths)
horsekicks <- data.frame(Deaths,Count,Expected)
horsekicks Deaths Count Expected
1 0 109 108.6701738
2 1 65 66.2888060
3 2 22 20.2180858
4 3 3 4.1110108
5 4 1 0.6269291
Compare this to the original unfiltered data with a mean of 0.7
Deaths1 <- c(0:4)
Count1 <- c(144,91,32,11,2)
Expected1 <- 200*(exp(-0.7)*0.7^Deaths)/factorial(Deaths)
horsekicks1 <- data.frame(Deaths1,Count1,Expected1)
horsekicks1 Deaths1 Count1 Expected1
1 0 144 99.3170608
2 1 91 69.5219425
3 2 32 24.3326799
4 3 11 5.6776253
5 4 2 0.9935844
This is a much worse fit and it seems that Fisher’s choice to remove some of the Corps with different organisation was a valid one.
Another set of data that follow the Poisson distribution is “Student’s” distribution of yeast cells in a haemocytometer. You are dividing a culture growth plate into multiple squares and counting the number of yeast cells on each. As there are a large number of squares (400) the probability of any yeast cell being in a particular cell is quite small. But there are very many yeast cells. In this case the mean expected number of yeast cells in a square is 4.68.
Number <- c(0:12)
Observed <- c(0,20,43,53,86,70,54,37,18,10,5,2,2)
Expected <- 400*(exp(-4.68)*4.68^Number)/factorial(Number)
haemocytometer <- data.frame(Number,Observed,Expected)
haemocytometer Number Observed Expected
1 0 0 3.7116056
2 1 20 17.3703140
3 2 43 40.6465348
4 3 53 63.4085942
5 4 86 74.1880552
6 5 70 69.4400197
7 6 54 54.1632154
8 7 37 36.2119783
9 8 18 21.1840073
10 9 10 11.0156838
11 10 5 5.1553400
12 11 2 2.1933628
13 12 2 0.8554115