Random variable X is distributed \(X \sim P(\lambda)\) with mean \(\mu = \lambda\) and variance \(\sigma^2 = \lambda\) , if X=x is the number of successes in n (many) trials when the probability of success \(\lambda/n\) is small. The probability of X=k successes is \(P(X=k) = \frac {(e^{\lambda} * \lambda^k)} {k!}\) .
dpois(x, lambda) is the probability of x successes in a period when the expected number of events is lambda.ppois(q, lambda, lower.tail) is the cumulative probability of less than or equal to q successes.rpois(n, lambda) returns n random numbers from the Poisson distribution x ~ P(lambda)Example 1
What is the probability of making 2 to 4 sales in a week if the average sales rate is 3 per week?
# Using cumulative probability
ppois(q = 4, lambda = 3, lower.tail = TRUE) -
ppois(q = 1, lambda = 3, lower.tail = TRUE)
[1] 0.62
# Using exact probability
dpois(x = 2, lambda = 3) +
dpois(x = 3, lambda = 3) +
dpois(x = 4, lambda = 4)
[1] 0.64
# expected number of sales = lambda = 3
# variance = lambda = 3
library(ggplot2)
library(dplyr)
options(scipen = 999, digits = 2) # sig digits
events <- 0:10
density <- dpois(x = events, lambda = 3)
prob <- ppois(q = events, lambda = 3, lower.tail = TRUE)
df <- data.frame(events, density, prob)
df %>% ggplot( aes(x = factor(events), y = density)) +
geom_col() +
geom_text(
aes(label = round(density,2), y = density + 0.01),
position = position_dodge(0.9),
size = 3,
vjust = 0
) +
labs(title = "PMF and CDF of Poisson Distribution",
subtitle = "P(3).",
x = "Events (x)",
y = "Density") +
geom_line(aes(x = events, y = prob))
Example 2 Suppose a baseball player has a p=.3 batting average. What is the probability of X<=150 hits in n=500 at bats? X=150? X>150?
# probability of x <= 150
ppois(q = 150, lambda = .3 * 500, lower.tail = TRUE)
[1] 0.52
# probability of x = 150
dpois(x = 150, lambda = .300 * 500)
[1] 0.033
# probability of x > 150
ppois(q = 150, lambda = .3 * 500, lower.tail = FALSE)
[1] 0.48
library(ggplot2)
library(dplyr)
options(scipen = 999, digits = 2) # sig digits
hits <- 0:300
density <- dpois(x = hits, lambda = .300 * 500)
prob <- ppois(q = hits, lambda = .300 * 500, lower.tail = TRUE)
df <- data.frame(hits, density, prob)
ggplot(df, aes(x = hits, y = density)) +
geom_col() +
labs(title = "Poisson(150)",
subtitle = "PMF and CDF of Poisson(150) distribution.",
x = "Hits (x)",
y = "Density") +
geom_line(data = df, aes(x = hits, y = prob))
NA
NA
Example 3 If electricity power failures occur according to a Poisson distribution with an average of 3 failures every twenty weeks, calculate the probability that there will not be more than one failure during a particular week.
#"Not more than one failure" means we need to include the probabilities
#for "0 failures" plus "1 failure".
lamda <- 3/20 #per week
ppois(5,lamda , lower.tail = TRUE)
[1] 1
lamda <- 3/20 #per week
hits <- 0:10
density <- dpois(x = hits, lambda =lamda)
prob <- ppois(q = hits, lambda = lamda, lower.tail = TRUE)
df <- data.frame(hits, density, prob)
ggplot(df, aes(x = hits, y = density)) +
geom_col() +
labs(title = "Poisson(3/20)",
subtitle = "PMF and CDF of Poisson(3/20) distribution.",
x = "Hits (x)",
y = "Density") +
geom_text(
aes(label = round(density,2), y = density + 0.01),
position = position_dodge(0.9),
size = 3,
vjust = 0
) +
geom_line(data = df, aes(x = hits, y = prob))
Example 4 Suppose a bank knows that on average 60 customers arrive between 10 A.M. and 11 A.M. daily. Thus 1 customer arrives per minute. Find the probability that exactly two customers arrive in a given one minute time interval between 10 and 11 A.M.
dpois(2 , lambda = 1 )
[1] 0.18
lamda <- 1 #per minute
hits <- 0:5
density <- dpois(x = hits, lambda =lamda)
prob <- ppois(q = hits, lambda = lamda, lower.tail = TRUE)
df <- data.frame(hits, density, prob)
ggplot(df, aes(x = hits, y = density)) +
geom_col(aes(fill = density)) +
labs(title = "Poisson(1)",
subtitle = "PMF and CDF of Poisson(1) distribution.",
x = "Hits (x)",
y = "Density") +
geom_text(
aes(label = round(density,2), y = density + 0.01),
position = position_dodge(0.9),
size = 3,
vjust = 0
) +
geom_line(data = df, aes(x = hits, y = prob ) )
The Method of moments
Population moments \(M_r = E(x^r)\)
Sample Moment
\(m_r = \frac{1}{n} \sum_{i = 1}^{n}{x_i^r}\)
Let \(X \sim P(\lambda)\) , Find MME of \(\lambda\)
Population > Equation 1: $M_1 = E(x^1) = E(x) = = $
Sample > Equation 2 : \(m_1 = \frac{1}{n} \sum_{i = 1}^{n}{x_i} = \bar{x}\)
from equation(1) and equation(2)
\(M_1=m_1\) then \(\lambda = \bar{x}\)
Thus, the MME equal:
\(\lambda = \bar{x}\)
Find MLE Poisson Distribution
Step 1: Write the PDF. \(f(x,\lambda) = \frac {(e^{-\lambda} \lambda^x)} {x!}\)
Step 2: Write the likelihood function. \(L(\lambda , x_1,x_2,...,x_n) = \prod_{x = 1}^{n} f(x,\lambda)\)
Step 3: Write the natural log likelihood function. \(L(\lambda , x_1,x_2,...,x_n) = ln(\prod_{x = 1}^{n} f(x,\lambda))\)
\(L(\lambda, x_1,x_2,...,x_n) = \sum_{x = 1}^{n} ln(f(x,\lambda))\)
\(L(\lambda , x_1,x_2,...,x_n) = ln(f(\lambda,x_1)) + ln(f(\lambda,x_2)) + ..... + ln(f(\lambda,x_n))\)
\(L(\lambda, x_1,x_2,...,x_n) = ln(\frac {(e^{-\lambda} \lambda^{x_1})} {x_1!}) + ln(\frac {(e^{-\lambda} \lambda^{x_2})} {x_2!}) + ..... + ln(\frac {(e^{\lambda} \lambda^{x_n})} {x_n!})\)
\(L(\lambda , x_1,x_2,...,x_n) = (-\lambda + x_1ln(\lambda) -log(x_1!)) + (-\lambda + x_2ln(\lambda) -log(x_2!)) + .... + (-\lambda + x_nln(\lambda) -log(x_n!))\)
\(L(\lambda , x_1,x_2,...,x_n) = -n\lambda + \sum_{i = 1}^{n}{x_i ln(\lambda)} - \sum_{i = 1}^{n}{x_i!}\)
Step 4: Calculate the derivative of the natural log likelihood function with respect to λ.
\(\frac {dL}{d\lambda} = -n + \sum_{i = 1}^{n}{x_i} * \frac{1}{\lambda} - 0\)
Step 5: Set the derivative equal to zero and solve for λ. \(\frac {dL}{d\lambda} = 0\)
\(-n + \sum_{i = 1}^{n}{x_i} * \frac{1}{\lambda} = 0\)
\(n = \frac{ \sum_{i = 1}^{n}{x_i}}{\lambda}\)
Thus, the MLE equal: \(\lambda = \frac{ \sum_{i = 1}^{n}{x_i}}{n}\)
\(\lambda = \bar{x}\)
An estimator \(\hat\theta\) is an UE of the unknown parameter \(\theta\) ,if
\(E[\hat\theta]=\theta\) for all \(\theta \in \Omega\)
Otherwise, it is a Biased Estimator of \(\theta\).
Check if \(\bar{x}\) is Unbiased Estimator
\(E(\lambda) = \lambda\)
\(E(\bar{x}) = E(\frac{ \sum_{i = 1}^{n}{x_i}}{n})\)
\(= \frac{1}{n} E (\sum_{i = 1}^{n}{x_i})\)
\(=\frac{1}{n} \lambda n\)
\(=\lambda\)
\(E(\bar{x}) = E(\lambda) = \lambda\)
then \(\bar{x}\) is Unbiased Estimator
if estimator \(\hat\theta\) is an Unbiased Estimator of the unknown parameter \(\theta\) and \(\lim_{x \to \infty} var(\hat\theta) = 0\) then \(\hat\theta\) is Consistent Estimator of \(\theta\)
Check if \(\bar{x}\) is Consistent Estimator
\(\lim_{x \to \infty} var(\bar{x})\)
\(lim_{x \to \infty} var(\frac{\sum_{i = 1}^{n}{x_i}}{n})\)
\(lim_{x \to \infty} \frac{1}{n^2} var(\sum_{i = 1}^{n}{x_i})\)
\(lim_{x \to \infty} \frac{1}{n^2} n\lambda\)
\(lim_{x \to \infty} \frac{\lambda}{n}\)
\(= 0\)
then \(\bar{x}\) is Consistent Estimator
For Poisson distribution, there are many different ways for calculating the confidence interval.
The most commonly used method is
The normal approximation (for large sample size)
The exact method (for small sample size)
For Poisson, the mean and the variance are both lambda (\(\lambda\)).
The standard error is calculated as:\(sqrt(\lambda /n)\) where \(\lambda\) is Poisson mean and \(n\) is sample size
The confidence interval can be calculated as: \(\lambda - z(\alpha/2)*sqrt(\lambda /n) < \hat{\lambda} < \lambda + z(\alpha/2)*sqrt(\lambda /n)\)
The 95-percent confidence interval is calculated as: $- 1.96sqrt(/n) < < + 1.96sqrt(/n) $
The 99-percent confidence interval is calculated as: \(\lambda - 2.58*sqrt(\lambda /n) < \hat{\lambda} < \lambda + 2.58*sqrt(\lambda /n)\)
The confidence interval for event X is calculated as:
\((qchisq(\alpha/2, 2*x)/2, qchisq(1-\alpha/2, 2*(x+1))/2 )\)
Where x is the number of events occurred under Poisson distribution.
In order to calculate the exact confidence interval for Poisson mean, the obtained confidence interval for the number of events need to be converted to the confidence interval for Poisson mean.
Find upper and lower confidence levels for a Poisson distribution Observations (\(n\)) = 88 Sample mean (\(\lambda\)) = 47.18182 what would the 95% confidence look like for this?
With Normal Approximation, the 95% confidence interval is calculated as:
lamda <- 47.18182
n <- 88
lower <- lamda - 1.96 * sqrt(lamda/n)
upper <- lamda + 1.96 * sqrt(lamda/n)
lower
[1] 46
upper
[1] 49
With Exact method, we first need to calculate x (# of events):
lamda <- 47.18182
n <- 88
x = lamda * n
alph <- 0.05
lower <- qchisq(alph/2,2*x)/2
upper <- qchisq(1-alph/2,2*(x+1))/2
lower
[1] 4027
upper
[1] 4280
The 95% confidence interval for mean (\(\lambda\)) is therefore:
lower_bound <- lower/n
upper_bound <- upper/n
lower_bound
[1] 46
upper_bound
[1] 49
population_size <- 1000
sample_size_n <- 40
population_lamda <- 50
#step 1
population <- rpois(population_size,population_lamda)
#step 2
sample <- sample(population , sample_size_n ,FALSE )
#step 3
x_bar <- mean(sample)
#step 4
# confidence interval using Normal Approximation
lamda_hat <- x_bar
lower_95 <- lamda_hat - 1.96 * sqrt(lamda_hat/sample_size_n)
upper_95 <- lamda_hat + 1.96 * sqrt(lamda_hat/sample_size_n)
x_bar
[1] 50
lower_95
[1] 48
upper_95
[1] 52
by using previous example but with 99% confidence interval
sample_size_n <- 40
#step 3
# confidence interval using Normal Approximation
lamda_hat <- x_bar
lower_99 <- lamda_hat - 2.58 * sqrt(lamda_hat/sample_size_n)
upper_99 <- lamda_hat + 2.58 * sqrt(lamda_hat/sample_size_n)
x_bar
[1] 50
lower_99
[1] 47
upper_99
[1] 53
by using previous example but with large sample size n=600
sample_size_n <- 600
#step 2
sample <- sample(population , sample_size_n ,FALSE )
#step 3
x_bar <- mean(sample)
#step 3
# confidence interval using Normal Approximation
lamda_hat <- x_bar
lower <- lamda_hat - 1.96 * sqrt(lamda_hat/sample_size_n)
upper <- lamda_hat + 1.96 * sqrt(lamda_hat/sample_size_n)
x_bar
[1] 50
lower
[1] 49
upper
[1] 50
\[\bar{x}\]
Calculate x bar for 1000 different sample from same population and check if these x bar belong to interval [lower,upper]
1. With Confidence 95%
2. With Confidence 99%
results = NULL
for(i in 1:1000){
sample_size_n <- 40
sample <- sample(population , sample_size_n ,FALSE )
x_bar <- mean(sample)
results = rbind(results, data.frame(i,x_bar))
}
# plot sample distribution for X bar
library(dplyr)
results %>% ggplot(mapping = aes(x = x_bar)) +
geom_histogram(aes(y=..density..) , colour="black", fill="white")+
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(aes(xintercept = upper_95), colour="red") +
geom_vline(aes(xintercept = lower_95), colour="red") +
geom_vline(aes(xintercept = upper_99), colour="blue" , linetype="dashed", size=1) +
geom_vline(aes(xintercept = lower_99), colour="blue") +
labs(title = "X Bar Sample Distribution" , subtitle = "With 95% and 99% Confidance Intervals ") +
xlab("X Bar") + ylab("count")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
NA
NA
simulationResult_95 <- mean(lower_95<results$x_bar & results$x_bar<upper_95)
simulationResult_99 <- mean(lower_99<results$x_bar & results$x_bar<upper_99)
simulationResult_95
[1] 0.96
simulationResult_99
[1] 0.99
the result after this simulation show us that about 95% of samples has x bar between the lower and upper calculated values when confidence level equal 95% , also about 99% of samples has x bar between the lower and upper calculated values when confidence level equal 99%