\(X\sim \text{Poisson}(\lambda)\)
Where the \(\lambda\) represents the mean of the distribution, or the expected value. It is also the variance of the distribution.
The Poisson distribution is a powerful distribution used to model count values. As a result, it is popular in many contexts, such as in sports, to model goals, yellow cards in football, or the number of hits in baseball.
Having briefly been introduced to the distribution in university, I decided that I wanted to explore the shapes of these distributions and how they relate to the mean of the distribution.
I also had the impression that the maximum PMF (probability mass function) of a particular Poisson distribution would always be the maximum at its mean, i.e. \(P(X = \lambda)\) would always be the largest possible \(P(X=x)\).
Today we can have a simple analysis and a few examples to see the behaviour of Poisson distributions.
x_vals <- 0:10
x_vals_wider <- 0:50
pmfs <- dpois(x_vals, lambda = 5)
plot(x_vals, pmfs, type = 'h', lwd = 2, col = 'blue')
cat('The x value with the highest pmf is: ', x_vals[which.max(pmfs)])
## The x value with the highest pmf is: 4
In this example, we can see that there are two peaks in the PMF, but can’t quite tell which is higher. We also want to see more cases than just one, which might be misleading.
# Define lambda values
lambdas <- seq(2,20,2)
# Create a data frame to store PMF values
x_upper_lim <- 30
pmf_data <- data.frame(x = rep(0:x_upper_lim, length(lambdas)),
lambda = rep(lambdas, each = length(0:x_upper_lim)))
pmf_data$pmf <- dpois(x = pmf_data$x, lambda = pmf_data$lambda)
max_pmf_x <- pmf_data %>% group_by(lambda) %>%
summarise(x_max = x[which.max(pmf)],
max_pmf = max(pmf))
# Plot PMFs with geom_line and markers
ggplot(pmf_data, aes(x, pmf, color = factor(lambda))) +
geom_line(size = 1) +
geom_point(data = max_pmf_x, aes(x = x_max, y = max_pmf,
# shape = 'x_max'
),
shape = 'x', size = 5, col = "darkred") +
geom_point(data = max_pmf_x, aes(x = lambda, y = max_pmf,
# shape = 'lambda'
),shape = 'm',
size = 5,
col = "orange") +
labs(x = "Number of Events (x)", y = "Probability",
title = "Poisson PMF for Different λ Values with Markers") +
theme_minimal() +
scale_color_discrete(name = "λ")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
In addition to this, I am also curious how the prob of \(P(X >= \lambda)\) varies as \(\lambda\) varies. Are they all similiar? Or does the value gradually increase or decrease? We explore this below:
library(dplyr)
# Define lambda values and x values
lambdas <- 0:50
x_vals <- 0:100
# Generate all combinations of lambda and x, and compute PMF
pmf_data <- expand.grid(lambda = lambdas, x = x_vals) %>%
mutate(pmf = dpois(x, lambda))
# Find the maximum PMF for each lambda and handle multiple x values
max_pmf_data <- pmf_data %>%
group_by(lambda) %>%
filter(pmf == max(pmf)) %>%
summarise(x = paste(unique(x), collapse = ","), pmf = first(pmf), .groups = 'drop')
# Compute P(X > lambda)
prob_data <- data.frame(lambda = lambdas) %>%
mutate(prob_larger_lambda = 1 - ppois(q = lambda, lambda = lambda, lower.tail = TRUE))
# Merge dataframes and create the comparison column
result_df <- max_pmf_data %>%
left_join(prob_data, by = "lambda") %>%
rowwise() %>% # This will ensure operations are performed row by row
mutate(comparison = {
xs_numeric <- as.numeric(unlist(strsplit(x, ","))) # Convert x values to numeric
if (all(xs_numeric > lambda)) {
"larger"
} else if (all(xs_numeric < lambda)) {
"smaller"
} else if (all(xs_numeric == lambda)) {
"equal"
} else {
"mixed"
}
}) %>%
ungroup()
kable(result_df, col.names = c('lambda', 'arg Max x(PMF)', 'max PMF', 'P(X>lambda)', 'arg Max x VS lambda'))
lambda | arg Max x(PMF) | max PMF | P(X>lambda) | arg Max x VS lambda |
---|---|---|---|---|
0 | 0 | 1.0000000 | 0.0000000 | equal |
1 | 0,1 | 0.3678794 | 0.2642411 | mixed |
2 | 1,2 | 0.2706706 | 0.3233236 | mixed |
3 | 3 | 0.2240418 | 0.3527681 | equal |
4 | 3 | 0.1953668 | 0.3711631 | smaller |
5 | 4 | 0.1754674 | 0.3840393 | smaller |
6 | 5,6 | 0.1606231 | 0.3936972 | mixed |
7 | 6,7 | 0.1490028 | 0.4012862 | mixed |
8 | 7 | 0.1395865 | 0.4074527 | smaller |
9 | 8,9 | 0.1317556 | 0.4125918 | mixed |
10 | 9,10 | 0.1251100 | 0.4169602 | mixed |
11 | 10,11 | 0.1193781 | 0.4207332 | mixed |
12 | 12 | 0.1143679 | 0.4240348 | equal |
13 | 12,13 | 0.1099398 | 0.4269554 | mixed |
14 | 13 | 0.1059891 | 0.4295633 | smaller |
15 | 14,15 | 0.1024359 | 0.4319104 | mixed |
16 | 15 | 0.0992175 | 0.4340376 | smaller |
17 | 16,17 | 0.0962846 | 0.4359771 | mixed |
18 | 17 | 0.0935973 | 0.4377550 | smaller |
19 | 18,19 | 0.0911231 | 0.4393926 | mixed |
20 | 19,20 | 0.0888353 | 0.4409074 | mixed |
21 | 20,21 | 0.0867116 | 0.4423140 | mixed |
22 | 21,22 | 0.0847332 | 0.4436248 | mixed |
23 | 22,23 | 0.0828844 | 0.4448501 | mixed |
24 | 24 | 0.0811515 | 0.4459988 | equal |
25 | 24,25 | 0.0795230 | 0.4470786 | mixed |
26 | 26 | 0.0779887 | 0.4480961 | equal |
27 | 27 | 0.0765399 | 0.4490571 | equal |
28 | 27 | 0.0751690 | 0.4499666 | smaller |
29 | 29 | 0.0738692 | 0.4508291 | equal |
30 | 30 | 0.0726345 | 0.4516485 | equal |
31 | 30 | 0.0714598 | 0.4524282 | smaller |
32 | 31 | 0.0703403 | 0.4531714 | smaller |
33 | 32 | 0.0692718 | 0.4538808 | smaller |
34 | 34 | 0.0682506 | 0.4545589 | equal |
35 | 34,35 | 0.0672732 | 0.4552080 | mixed |
36 | 36 | 0.0663366 | 0.4558300 | equal |
37 | 36,37 | 0.0654382 | 0.4564268 | mixed |
38 | 37,38 | 0.0645752 | 0.4570001 | mixed |
39 | 38,39 | 0.0637455 | 0.4575513 | mixed |
40 | 39,40 | 0.0629470 | 0.4580818 | mixed |
41 | 40,41 | 0.0621778 | 0.4585930 | mixed |
42 | 42 | 0.0614361 | 0.4590858 | equal |
43 | 42,43 | 0.0607203 | 0.4595615 | mixed |
44 | 43,44 | 0.0600290 | 0.4600210 | mixed |
45 | 44,45 | 0.0593608 | 0.4604652 | mixed |
46 | 45 | 0.0587144 | 0.4608948 | smaller |
47 | 46,47 | 0.0580886 | 0.4613108 | mixed |
48 | 47,48 | 0.0574825 | 0.4617138 | mixed |
49 | 48 | 0.0568949 | 0.4621044 | smaller |
50 | 49 | 0.0563250 | 0.4624833 | smaller |
Interestingly, the value of \(P(X>\lambda)\) increases with \(\lambda\). Also, note that only doing this more thoroughly did we realise that multiple x values that can rise to the largest PMFs for a particular distribution. In many cases, we get ‘mixed’ (i.e. when there are multiple x, one of them happens to be the lambda value), and many cases of ‘smaller’, but we didn’t encounter the possibility of having an argmax that is larger than lambda.
Hence, when given a lambda, e.g. if a goal distribution is Poisson(3), we know that the largest chance event will involve goals 3.