set.seed(100)
Poisson distribution is a probability distribution function, which is commonly used to model discrete events. A Poisson process generator is a generator which emits out the events based on Poisson distribution. It can be viewed as a recurrent Bernoulli’s process with occurrence of success indicating occurrence of event.
Bernoulli’s process is the fundamental building block above which Poisson process is built.
A simple example of BP is coin tossing (coin may be unfair also). We suppose that occurrence of HEAD (H) is a success and that an event has occurred. Let us assume that the coin is massively unfair and chances that it will fall HEAD is only 10% (p = 0.1).
We assume that the coin is tossed 100 times in an hour at a constant rate of one toss every 1/100th of hour.
We run a simulation of the whole process 10000 times (meaning simulation is for 10000 hours). The printed runs of “H and”T represent the occurrence of HEAD and TAIL over 1 hour for first 5 hours.
samp_100 <- rerun(10000, sample(c(1, 0), 100, replace = TRUE, prob = c(0.1, 0.9)))
samp_100[1:5]
## [[1]]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
## [36] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [71] 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0
##
## [[2]]
## [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
##
## [[3]]
## [1] 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1
## [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
##
## [[4]]
## [1] 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
## [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
##
## [[5]]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
## [36] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0
## [71] 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0
We can calculate the sum of occurrence of success (event) in 1 hour for each hour. We will have 10000 values. We show the values for first 100 one hour interval.
sum_success <- samp_100 %>% map_dbl(~ sum(.))
sum_success[1:100]
## [1] 9 12 10 11 14 11 10 11 11 14 13 14 8 9 12 10 12 16 7 12 2 8 9
## [24] 7 10 12 10 7 10 11 10 9 11 13 10 11 10 9 12 13 6 11 10 15 10 12
## [47] 8 6 8 7 12 7 11 10 11 6 8 12 11 7 9 7 8 10 10 15 9 14 17
## [70] 11 7 11 4 11 9 7 11 5 14 12 6 13 10 8 13 6 12 9 12 11 5 9
## [93] 11 7 13 15 9 9 5 12
Before, proceeding further, we can safely assume, that the average of the number of events occurring over 1 hour is 100 (number of coin tosses in an hour) multiplied by 0.1 (chances of success per coin toss) = 10.
The above model is the approximation of Poisson Process. The problem with the above model is that there is a smallest time interval (1/100 of an hour), below which we do not have a value for occurrence of the event.
We can better approximate the Poisson Process by subdividing the 1 hour more times (lets say, 1000) (more the subdivisions, better approximation it is). It means, that we are tossing a different coin 1000 times in 1 hour, which has the chance of success of 0.01 (10/1000), which leads to the same average number of events occurring over 1 hour (1000 * 0.01) of 10.
Lets run the simulation again
samp_1000 <- rerun(10000, sample(c(1, 0), 1000, replace = TRUE, prob = c(0.01, 0.99)))
samp_1000[1]
## [[1]]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [35] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [103] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [137] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [171] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [205] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [239] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [273] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [307] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [341] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [375] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [409] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## [443] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [477] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [511] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [545] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [579] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [613] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [647] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [681] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [715] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [749] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [783] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [817] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [851] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [885] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [919] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [953] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [987] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Lets calculate the number of events occurring in 1 hour
sum_success_1000 <- samp_1000 %>% map_dbl(~ sum(.))
sum_success_1000[1:100]
## [1] 6 13 8 13 8 9 10 15 8 13 8 13 9 11 12 4 8 12 16 16 6 14 10
## [24] 6 10 5 11 14 7 13 10 10 9 6 11 7 13 7 4 11 8 11 14 13 13 15
## [47] 19 10 13 7 4 10 15 12 7 9 13 8 7 11 11 8 12 9 8 9 7 16 5
## [70] 14 8 4 9 12 12 11 3 19 12 11 13 6 14 10 14 7 15 9 12 8 5 9
## [93] 10 12 6 9 7 9 15 8
Below is the histogram of the number of events occurring in 1 hour
hist(sum_success_1000, xlab = "Number of events occuring in 1 hour")
data_frame(binom = sum_success_1000, pois = rpois(10000, lambda = 10)) %>% gather("process", "num_event") %>% ggplot(aes(x = num_event, colour = process)) + geom_density() + xlab("Number of events occurring in 1 hour")
BOTH APPEARS ALMOST SAME.
From above we can intuitively derive the assumptions for us to tell that any event generating process is Poisson process.
When a process follows Poisson distribution, the time to event occurence follows exponential distribution.
We are concerned about the inter-event time interval.
time <- samp_1000 %>% map(~ diff(c(0, which(. == 1)))) %>% unlist() %>% `*`(0.001)
data_frame(bernoulli = time, exp = rexp(length(time), rate = 10)) %>% gather("process", "inter_event_time") %>% ggplot(aes(x = inter_event_time, colour = process)) + geom_density() + xlab("Inter-event time interval")
The above simulation shows that the distribution generated by bernoulli’s process is same as exponential distribution with rate = 10.
One important property of the distribution is that, the time that has elapsed without any event occurring has got no bearing on the time after which next event will occur. All processes, which generate events in a predictable manner do not follow exponential distribution and are not Poisson processes.
Bye.