The Poisson Distribution

Author

Dr Andrew Dalby

The Poisson distribution is a discrete probability distribution which represents the number of events occurring per unit of a quantity. This could be time, mass, distance, area, etc. The maximum number of events is not specified and the events occur at random.

Examples could be:

The number of flaws per square metre of cloth.
The number of accidents on a stretch of road
The number of people attending a doctor’s surgery each day
The number of errors in a document
The number of changes in a genetic sequence.

If the event is rare and the probability is low then the occurrence of the event is described by:

\(P(X=x) = e^{-\lambda} \dfrac{\lambda^{x}}{x!}\) for x = 0,1,2,3,4,5,…..

\[ \sum^{\infty}_{x=0}P(X=x)= e^{-\lambda} \left[ 1 + \lambda + \dfrac{\lambda^{2}}{2!}+\dfrac{\lambda^{3}}{3!} ... \right ] = e^{-\lambda}[e^{\lambda}]=1 \]

This means that P(X=x) is a discrete random variable.

The mean of a Poisson distribution is E(X)

\[ E(X)= \sum^{\infty}_{x=0}x e^{-\lambda}\dfrac{\lambda^{x}}{x!} = \lambda e^{-\lambda} e^{\lambda} = \lambda \]

Derivation of the Mean and Variance

\[ E(X)= (0)(e^{-\lambda})+ (1)\lambda(e^{-\lambda}) + (2)\left( \dfrac{\lambda^{2}}{2!} \right)(e^{-\lambda}) +(3)\left (\dfrac{\lambda^{3}}{3!} \right)(e^{-\lambda}) ... \] \[ = e^{-\lambda}\left[ \lambda+\lambda^{2}+\dfrac{\lambda^{3}}{2!}+\dfrac{\lambda^{4}}{3!} + ...\right ] \] \[ = \lambda e^{-\lambda}\left [ 1 + \lambda + \dfrac{\lambda^{2}}{2!}+\dfrac{\lambda^{3}}{3!} \right] \]

\(\text{Var}(X)=E(X^{2})-\mu^{2}\)

\(\mu = E(X) = \lambda\)

\[ E(X^{2})= \sum^{\infty}_{x=0}x^{2}\dfrac{\lambda^{x}}{x!}e^{-\lambda} \]

\[ = e^{-\lambda} \left[(0)(1)+(1)(\lambda) +(4) \left(\dfrac{\lambda^{2}}{2!} \right) + (9) \left (\dfrac{\lambda^{3}}{3!} \right) ... \right] \]

\[ =\lambda e^{-\lambda} \left[ 1 + 2\lambda + \dfrac{\lambda^{2}}{2!}+\dfrac{4\lambda^{3}}{3!} + ...\right ] \]

Now the tricky bit the part in the square brackets is the differential of a function such that:

\[ =\lambda e^{-\lambda} \dfrac{d}{d\lambda} \left[ \lambda + \lambda^{2} + \dfrac{\lambda^{3}}{2!} + \dfrac{\lambda^{4}}{3!}+ ... \right] \]

This is an example of a MacLaurin Series which can be reduced to:

\[ =\lambda e ^{-\lambda} \dfrac{d}{d \lambda} \left[ \lambda e^{\lambda} \right] \]

\[ =\lambda e^{-\lambda}[e^{\lambda}+ \lambda e^{\lambda}] = \lambda + \lambda^{2} \]

Therefore Var(X) = \((\lambda + \lambda^{2})- \lambda^{2}= \lambda\)

library(ggplot2)
x <- c(0:20)
lambda <- 2
p <- dpois(x, lambda)
data <- data.frame(x,p)
plot <- ggplot(data=data, aes(x=x, y=p))+
  geom_bar(stat="identity", fill="cornflowerblue") +
  labs(title="Poisson Probability Distribution for lambda=2", x="Number", y="Probability")
plot

library(ggplot2)
x <- c(0:20)
lambda <- 4
p <- dpois(x, lambda)
data <- data.frame(x,p)
plot <- ggplot(data=data, aes(x=x, y=p))+
  geom_bar(stat="identity", fill="olivedrab") +
  labs(title="Poisson Probability Distribution for lambda=4", x="Number", y="Probability")
plot

library(ggplot2)
x <- c(0:20)
lambda <- 8
p <- dpois(x, lambda)
data <- data.frame(x,p)
plot <- ggplot(data=data, aes(x=x, y=p))+
  geom_bar(stat="identity", fill="coral") +
  labs(title="Poisson Probability Distribution for lambda=8", x="Number", y="Probability")
plot

library(ggplot2)
x <- c(0:6)
lambda <- 0.5
p <- dpois(x, lambda)
data <- data.frame(x,p)
plot <- ggplot(data=data, aes(x=x, y=p))+
  geom_bar(stat="identity", fill="darkorchid") +
  labs(title="Poisson Probability Distribution for lambda=0.5", x="Number", y="Probability")
plot

Imagine that the average number of misprints on the page of a book is on average 0.1.

Then the probability that page contains more than one misprint will be:

\[ P(X=x) = \dfrac{0.1^{x}}{x!}e^{-0.1} \]

\[ P(X>1) = 1 - [P(X=0) + P(X=1)] \]

\[ 1 - e^{-0.1}(1 + 0.1) \]

=.005

1-ppois(1,0.1)

[1] 0.00467884

Now if you change the basic interval to be 20 pages instead of 1 there are now on average 2 misprints.

Now we can calculate the probability that there will be at least one misprint per 20 page chapter

\[ P(Y=y) = \dfrac{2^{y}}{y!}e^{-2} \]

\[ P(Y>1) = 1 - [P(Y=0) + P(Y=1)] \]

\[ 1 - e^{-2}(1 + 2) \]

=0.594

1-ppois(1,2)

[1] 0.5939942

A man travels to work by bus. He always arrives at the bus stop at the same time and if he has to wait more than five minutes for a bus he is late for work. The number of buses arriving at the bus stop is a Poisson variate with a mean of 2 buses every five minutes.

a) Find the probability that on any one day he will be late for work.

b) Find the probability that he will be late for work at least once in a five-day working week.

a) Let X be the number of buses arriving at the stop in a 5 minute period then@

\[ P(X=x)=\dfrac{2^{x}}{x!}e^{-2} \]

The man will be late for work if there are no buses in the next five minutes. That is when X=0.

\(P(X=0)= e^{-2}=0.135\)

ppois(0,2)

[1] 0.1353353

b) In a five day week the man can be late for 0-5 days. But the variable that the man is late is not the same as that in the first part and it is not Poisson distributed. If Y is the number of days that he is late then each day is a Bernoulli trial with the probability calculated in part a and Y will binomially distributed.

Therefore \(P(Y \ge 1) = 1- (P(Y=0)\)

\[=1 - (1-0.135)^{5} = 0.516\]

1-pbinom(0,5,0.135)

[1] 0.5157378

Slides are prepared from plant material from two different sources, A and B. The number of cells with two nuclei is randomly distributed and for slides made from source A, the man number of double nucleus cells is 1 per slide, while slides made from source B have a mean of 0.5 double nucleus cells per slide.

Find the probability that:

a slide made from source A has 2 double nucleus cells,
a slide made from source B has 1 double nucleus cell,
a slide chosen at random has exactly one double nucleus cell if equal numbers of slides are prepared from each source.

dpois(2,1)

[1] 0.1839397

dpois(1,0.5)

[1] 0.3032653

3) This is the answer in the book which is the sum of the probabilities for having 1 or more double nucleus for each slide. This is clearly wrong

ppois(1,1, lower.tail=F)+ppois(1,0.5,lower.tail=F)

[1] 0.3544451

This is the answer that I think you should use which is combining probabilities into a compound event. The probability of using a sample A slide is 0.5 and the probability of using a sample B slide is 0.5

0.5*(dpois(1,1))+0.5*(dpois(1,0.5))

[1] 0.3355724

When I first learnt about the Poisson distribution it was as a version of the binomial distribution but where the probabilities are very small. The two distributions can be used as approximations of one another. This derivation of the Poisson from the binomial was historically how it developed but from a mathematical perspective the Poisson is also related to the exponential distribution. If the probability of the time between events is exponentially distributed such as radioactice decay then the number of decays in a specific time period will be Poisson distributed. In this case the rate is lambda. The rate declines over long times because there is less material.

The advantage of the Poisson to Binomial approximation is that Poisson values were easier to calculate from the formulae.

library(ggplot2)
library(cowplot)

x <- c(0:15)
lambda <- 2
p <- dpois(x, lambda)
datap <- data.frame(x,p)
Poisson <- ggplot(data=datap, aes(x=x, y=p))+
  geom_bar(stat="identity", fill="darkorchid") +
  labs(title="Poisson, \u03BB=2", x="Number", y="Probability")

 
b <- dbinom(x, 100, 0.02)
datab <- data.frame(x,b)
Binomial <- ggplot(data=datab, aes(x=x, y=b))+
  geom_bar(stat="identity", fill="cornflowerblue") +
  labs(title="Binomial, n=100, p=0.02", x="Number", y="Probability")
plot (Binomial)

plot_grid(Poisson, Binomial, labels="AUTO")

1) Two dice are thrown 100 times. Find the probability of getting exactly 3 double sixes.

This is a binomial distribution with n=100 and p=1/36. this approximates to the Poisson with a \(\lambda\) of 100/36.

dbinom(3, 100, 1/36)

[1] 0.2254552

dpois(3, 100/36)

[1] 0.2221098

2) a) In a large population, one person in a hundred has a rhesus negative blood group. If a random sample of 300 people is taken from this population find approximately the probability that there will be at least 5 people with a rhesus negative blood group.

b) How many people must a sample contain so that the probability of including at least one person with rhesus negative blood is greater than 0.9?

pbinom(4,300,0.01, lower.tail=F)

[1] 0.1838888

ppois(4,3, lower.tail=F)

[1] 0.1847368

b) \(P(X \ge 1) = 1- P(X=0) < 0.9\)

From the Poisson \(\lambda = \dfrac{n}{100}\)

\[ 1-e^{-\dfrac{n}{100}} < 0.9 \]

\[ e^{-\dfrac{n}{100}}>0.1 \]

\(\dfrac{n}{100}\) > 2.303

n = 231 people