In one of the first studies of the Poisson distribution, von Bortkiewicz considered the frequency of deaths from kicks in the Prussian army corps.
From the study of 14 corps over a 20-year period, he obtained the data shown.
Source: Ladislaus von Bortkiewicz, Das Gesetz der kleinen Zahlen [The law of small numbers] (Leipzig, Germany: B.G. Teubner, 1898).
4. Beispiel: Die durch Schlag eines Pferdes im preussischen Heere Getöteten.
[4th Example: Those killed in the Prussian army by a horse’s kick.]
Retrieved from https://archive.org/details/dasgesetzderklei00bortrich/page/n63 (Page 24 in source.)
NumberOfDeaths = 0:4 # Values per year, from above table, range from 0 through 4
# Actual values, tallied from Bortkiewicz's table
actuals = c(144,91,32,11,2)
# Total number of individuals who died from horse kicks
TotalDeaths = as.numeric(NumberOfDeaths %*% actuals) # 196
# Number of observed army corps (14) * years (20)
NumberOfObs = sum(actuals) # 280
# Poisson lambda -- expected (average) deaths per year, in a typical corps
lambda = TotalDeaths/NumberOfObs # 196/280 = 0.7
# Result of poissondistribution
predicted = round(dpois(NumberOfDeaths,lambda)*NumberOfObs)
table1 = cbind(NumberOfDeaths,actuals)
colnames(table1)[2] = "Number of corps with x deaths in a given year"
## display table1
table1 %>%
kable() %>%
column_spec(.,column = 1, width = "10em") %>%
column_spec(.,column = 2, width = "30em") %>%
kable_styling(c("striped", "bordered"))| NumberOfDeaths | Number of corps with x deaths in a given year |
|---|---|
| 0 | 144 |
| 1 | 91 |
| 2 | 32 |
| 3 | 11 |
| 4 | 2 |
The total number of deaths is 196 .
The total number of observations is 14 * 20 = 280
(i.e, 14 Prussian military corps observed over 20 years, from 1875 through 1894) .
table2 = cbind(table1, predicted)
colnames(table2)[3]="Poisson prediction"
table2 %>%
kable() %>%
column_spec(column = 1:3, width = "10em") %>%
kable_styling(c("striped", "bordered"))| NumberOfDeaths | Number of corps with x deaths in a given year | Poisson prediction |
|---|---|---|
| 0 | 144 | 139 |
| 1 | 91 | 97 |
| 2 | 32 | 34 |
| 3 | 11 | 8 |
| 4 | 2 | 1 |
barplot(t(table2[,2:3]),
beside=T, col=c("blue","green"), names.arg=NumberOfDeaths,
xlab="Number of Deaths in year (in a single corp)",
ylab="Number of corps with x deaths in year",
main = "Deaths from horse kicks in Prussian army corps, 1875-1894",
legend.text = c("Actual","Predicted from Poisson(lambda=0.7)")) ### Chi-Squared test
##
## Chi-squared test for given probabilities
##
## data: actuals
## X-squared = 2.78008849, df = 4, p-value = 0.59527466
Because the p-value from the chi-squared test is high, we fail to reject the null hypothesis, which is:
\(H_{0}\): The poisson distribution with lambda = 0.7 is appropriate for Bortkiewicz’s data set on deaths from horsekicks.
The Poisson distribution fits the data very well.