Previously i had looked at the Binomial Distribution examples in Sewall Wright’s book Evolution and the Genetics of Populations(Wright 1984). He also included examples of Poisson distributed data.
He includes the standard derivation of the Poisson distribution as the limiting low probability and highly asymmetrical version of a Binomial Distribution. The Binomial Distribution is the distribution for multiple Bernoulli Trials of a a binary variable. This is fairly standard in textbooks but when I was at Oxford and I was going to teach it this way the post-doctoral research fellow reviewing the material suggested an alternative view which comes from Wakeley’s book on Coalescent Theory(Wakeley 2008).
Wakeley takes a slightly different view of an exponential time process, such as radioactive decay, where the Poisson distribution represents the number of events in a specific time period. This is a better way to think about it if you are working on evolution or radioactive decay. However thinking about the decay model at the individual atom level. Each radioactive active atom is a Bernoulli Trial with a very small probability of decaying but it is part of a very large number of identical radioactive atoms that can decay. We usually model this decay by half lives - the time for half of the atoms to have decayed, which is exponential but we could count the number of decays in a time period which is a Poisson Distribution or we could look at those many Bernoulli Trials with a very small probability and it is Binomial. It is all the same process, we just have a different perspective and both approaches are equally valid but only one will capture the aspect that you are interested in.
The examples give by Wright are:
Red blood cell counts on a haemocytometer slide so long as there is no clumping.
The number of flour beetles in equal sized cubes of flour in a container.
The numbers of animals in a quadrat.
The number of animals under a board.
The frequencies of chiasmata in the short chromosomes of Vicia faba.
Wright shows that there are deviations when the events are not independent and where aggregation is favoured or if there is repulsion as in territorial cases.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
pois2 <-c(rep(0,237),rep(1,161),rep(2,45),rep(3,3),rep(4,2))pois2 <-as.data.frame(pois2)colnames(pois2) <-c("Beetles")df_dist2 <-data.frame(x=0:5, prob =dpois(0:5, 0.598))ggplot(pois2, aes(x=Beetles, y=after_stat(density)))+geom_histogram(fill="#560591",color="White", binwidth =1)+geom_line(data = df_dist2, aes(x = x, y = prob),color ="orange", size=0.7) +labs(title="Beetles in cubes in a jar of flour")+xlab("Beetles")
pois6 <-c(rep(0,0),rep(1,2),rep(2,21),rep(3,104),rep(4,106),rep(5,42),rep(6,6))pois6 <-as.data.frame(pois6)colnames(pois6) <-c("chiasmata")df_dist6 <-data.frame(x=0:12, prob =dpois(0:12, 3.651))ggplot(pois6, aes(x=chiasmata, y=after_stat(density)))+geom_histogram(fill="#560591",color="White", binwidth =1)+geom_line(data = df_dist6, aes(x = x, y = prob),color ="orange", size=0.7) +labs(title="Chiasmata in short chromosomes of Vicia faba")+xlab("Chiasmata")
Most of the calculated theoretical distributions seem a good fit apart from for the chiasmata where the assumption of independence is clearly wrong. With a chi-squared test it is also clear that the model does not fit for the diplopod data either but this is not clear just from visual inspection.
References
Wakeley, John. 2008. “Coalescent Theory: An Introduction. Roberts and Company.”Greenwood VillageWayne AF, Maxwell MA, Ward CG, Vellios CV, Wilson I, Wayne JC, Williams MR (2015) Sudden and Rapid Decline of the Abundant Marsupial Bettongia Penicillata in Australia. Oryx 49: 175185Webb.