The Poisson Distribution

Author

Andrew Dalby

Introduction

Any process where there is a small and constant probability of a single event happening but where there are a large number of possible events is described by the exponential and Poisson distributions.

The exponential describes the decreasing number of events that occur within the entire population as the population gets smaller. The Poisson distribution describes the number of events that occur in a specified time-frame.

An example of this is the horse kick data from von Bortkiewicz which shows the number of Prussian soldiers kicked to death across 14 army corps over a 20 year period. This is summarised using the xtabs function

library(vcd)
Loading required package: grid
data("VonBort")
xtabs(~ deaths, data = VonBort)
deaths
  0   1   2   3   4 
144  91  32  11   2 

When Fisher analysed the data in 1925 he excluded some of the Corps because of their different organisation.

xtabs(~ deaths, data = VonBort, subset = fisher == "yes")
deaths
  0   1   2   3   4 
109  65  22   3   1 

The formula for the expected number from the Poisson distribution is:

\[ n\frac{e^{-m}m^{x}}{x!} \]

For the Fisher subset the mean number of deaths is 0.61.

filtered <- subset(VonBort, fisher=="yes")
deaths <- filtered$deaths
mean(deaths)
[1] 0.61

From this you can tabulate the data with the expected number of counts.

Deaths <- c(0:4)
Count <- c(109,65,22,3,1)
Expected <- 200*(exp(-0.61)*0.61^Deaths)/factorial(Deaths)
horsekicks <- data.frame(Deaths,Count,Expected)
horsekicks
  Deaths Count    Expected
1      0   109 108.6701738
2      1    65  66.2888060
3      2    22  20.2180858
4      3     3   4.1110108
5      4     1   0.6269291

Compare this to the original unfiltered data with a mean of 0.7

Deaths1 <- c(0:4)
Count1 <- c(144,91,32,11,2)
Expected1 <- 200*(exp(-0.7)*0.7^Deaths)/factorial(Deaths)
horsekicks1 <- data.frame(Deaths1,Count1,Expected1)
horsekicks1
  Deaths1 Count1  Expected1
1       0    144 99.3170608
2       1     91 69.5219425
3       2     32 24.3326799
4       3     11  5.6776253
5       4      2  0.9935844

This is a much worse fit and it seems that Fisher’s choice to remove some of the Corps with different organisation was a valid one.

Another set of data that follow the Poisson distribution is “Student’s” distribution of yeast cells in a haemocytometer. You are dividing a culture growth plate into multiple squares and counting the number of yeast cells on each. As there are a large number of squares (400) the probability of any yeast cell being in a particular cell is quite small. But there are very many yeast cells. In this case the mean expected number of yeast cells in a square is 4.68.

Number <- c(0:12)
Observed <- c(0,20,43,53,86,70,54,37,18,10,5,2,2)
Expected <- 400*(exp(-4.68)*4.68^Number)/factorial(Number)
haemocytometer <- data.frame(Number,Observed,Expected)
haemocytometer
   Number Observed   Expected
1       0        0  3.7116056
2       1       20 17.3703140
3       2       43 40.6465348
4       3       53 63.4085942
5       4       86 74.1880552
6       5       70 69.4400197
7       6       54 54.1632154
8       7       37 36.2119783
9       8       18 21.1840073
10      9       10 11.0156838
11     10        5  5.1553400
12     11        2  2.1933628
13     12        2  0.8554115