Prussian Cavalry Example
In 1898, a Russian economist and statistician, Ladislaus Josephovich Bortkiewicz, published an interesting findings about the probability distribution of Prussian soldiers accidentally killed by horse-kick. The data was derived from ten army corps who were observed over 20 years. There were a total of 200 observations and 122 soldiers were killed by horse kick over that 20 years. In average, the number of death is
\[ \lambda = \frac{122}{200} = 0.61\] By using lambda value 0.61, Bortkiewicz applied Poisson formula to predict the probability of number of death, \(x\), with x = 0, 1, 2, 3, 4, 5, 6:
dpois(0:6,lambda=0.61) %>% round(4)
## [1] 0.5434 0.3314 0.1011 0.0206 0.0031 0.0004 0.0000
Simulation Exercise
set.seed(12345)
Cavalry <- rpois(200,lambda=0.61)
Cavalry
## [1] 1 2 1 2 0 0 0 0 1 3 0 0 1 0 0 0 0 0 0 2 0 0 2 1 1 0 1 1 0 0 1 0 0 1 0 0 1
## [38] 2 1 0 1 0 2 1 0 0 0 0 0 1 2 1 0 0 1 0 1 0 0 0 1 0 3 1 3 0 2 0 1 2 1 0 0 0
## [75] 0 1 2 1 0 0 1 0 0 0 0 0 1 0 1 0 2 1 0 1 1 0 1 1 0 0 0 1 2 1 0 2 1 1 0 0 1
## [112] 0 1 0 1 2 3 1 0 0 1 0 0 1 1 1 2 0 0 1 0 1 1 1 2 2 1 0 0 1 0 1 0 1 0 2 3 0
## [149] 2 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 3 0 1 0 2 0 1 0 0 1 1 2 1 1 0 0 0 0 2
## [186] 1 0 2 0 0 1 2 0 0 3 0 0 1 0 0
Mean and Variance
- Should be numerically close.
- With a large sample, the convergence would be closer.
mean(Cavalry);var(Cavalry)
## [1] 0.705
## [1] 0.6612814
Dispersion Parameter
\[ \theta = \frac{Var(X)}{E(X)} \approx 1\]
Theta <- var(Cavalry)/mean(Cavalry)
Theta %>% round(3)
## [1] 0.938
Simulation Exercise
N <- 100000
Theta <- numeric(N)
for ( i in 1:N){
X=rpois(200,lambda=5);
Theta[i] = var(X)/mean(X)
}
95% Tolerance Interval of Theta Values
hist(Theta, breaks = 100, col=c("lightblue","lightpink"))
quantile(Theta, c(0.025,0.975)) %>% round(3)
## 2.5% 97.5%
## 0.816 1.207