Simple look into the shape of Poisson distributions by the value of the mean

\(X\sim \text{Poisson}(\lambda)\)

Where the \(\lambda\) represents the mean of the distribution, or the expected value. It is also the variance of the distribution.

The Poisson distribution is a powerful distribution used to model count values. As a result, it is popular in many contexts, such as in sports, to model goals, yellow cards in football, or the number of hits in baseball.

Having briefly been introduced to the distribution in university, I decided that I wanted to explore the shapes of these distributions and how they relate to the mean of the distribution.

I also had the impression that the maximum PMF (probability mass function) of a particular Poisson distribution would always be the maximum at its mean, i.e. \(P(X = \lambda)\) would always be the largest possible \(P(X=x)\).

Today we can have a simple analysis and a few examples to see the behaviour of Poisson distributions.

First example: \(X \sim \text{Poisson}(\lambda)\)

x_vals <- 0:10
x_vals_wider <- 0:50
pmfs <- dpois(x_vals, lambda = 5)
plot(x_vals, pmfs, type = 'h', lwd = 2, col = 'blue')

cat('The x value with the highest pmf is: ', x_vals[which.max(pmfs)])

## The x value with the highest pmf is:  4

In this example, we can see that there are two peaks in the PMF, but can’t quite tell which is higher. We also want to see more cases than just one, which might be misleading.

More generalised

# Define lambda values
lambdas <- seq(2,20,2)

# Create a data frame to store PMF values
x_upper_lim <- 30
pmf_data <- data.frame(x = rep(0:x_upper_lim, length(lambdas)),
                       lambda = rep(lambdas, each = length(0:x_upper_lim)))
pmf_data$pmf <- dpois(x = pmf_data$x, lambda = pmf_data$lambda)


max_pmf_x <- pmf_data %>% group_by(lambda) %>% 
  summarise(x_max = x[which.max(pmf)], 
            max_pmf = max(pmf))


# Plot PMFs with geom_line and markers
ggplot(pmf_data, aes(x, pmf, color = factor(lambda))) +
  geom_line(size = 1) +
  geom_point(data = max_pmf_x, aes(x = x_max, y = max_pmf, 
                                   # shape = 'x_max'
                                   ), 
             shape = 'x', size = 5, col = "darkred") +
  
  geom_point(data = max_pmf_x, aes(x = lambda, y = max_pmf, 
                                   # shape = 'lambda'
                                   ),shape = 'm', 
             size = 5, 
             col = "orange") +
  labs(x = "Number of Events (x)", y = "Probability",
       title = "Poisson PMF for Different λ Values with Markers") +

  theme_minimal() +
  scale_color_discrete(name = "λ")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

In addition to this, I am also curious how the prob of \(P(X >= \lambda)\) varies as \(\lambda\) varies. Are they all similiar? Or does the value gradually increase or decrease? We explore this below:

Table of values (Poisson)

library(dplyr)

# Define lambda values and x values
lambdas <- 0:50
x_vals <- 0:100

# Generate all combinations of lambda and x, and compute PMF
pmf_data <- expand.grid(lambda = lambdas, x = x_vals) %>%
  mutate(pmf = dpois(x, lambda))

# Find the maximum PMF for each lambda and handle multiple x values
max_pmf_data <- pmf_data %>%
  group_by(lambda) %>%
  filter(pmf == max(pmf)) %>%
  summarise(x = paste(unique(x), collapse = ","), pmf = first(pmf), .groups = 'drop')

# Compute P(X > lambda)
prob_data <- data.frame(lambda = lambdas) %>%
  mutate(prob_larger_lambda = 1 - ppois(q = lambda, lambda = lambda, lower.tail = TRUE))

# Merge dataframes and create the comparison column
result_df <- max_pmf_data %>%
  left_join(prob_data, by = "lambda") %>%
  rowwise() %>%  # This will ensure operations are performed row by row
  mutate(comparison = {
    xs_numeric <- as.numeric(unlist(strsplit(x, ",")))  # Convert x values to numeric
    if (all(xs_numeric > lambda)) {
      "larger"
    } else if (all(xs_numeric < lambda)) {
      "smaller"
    } else if (all(xs_numeric == lambda)) {
      "equal"
    } else {
      "mixed"
    }
  }) %>%
  ungroup()
kable(result_df, col.names = c('lambda', 'arg Max x(PMF)', 'max PMF', 'P(X>lambda)', 'arg Max x VS lambda'))

lambda	arg Max x(PMF)	max PMF	P(X>lambda)	arg Max x VS lambda
0	0	1.0000000	0.0000000	equal
1	0,1	0.3678794	0.2642411	mixed
2	1,2	0.2706706	0.3233236	mixed
3	3	0.2240418	0.3527681	equal
4	3	0.1953668	0.3711631	smaller
5	4	0.1754674	0.3840393	smaller
6	5,6	0.1606231	0.3936972	mixed
7	6,7	0.1490028	0.4012862	mixed
8	7	0.1395865	0.4074527	smaller
9	8,9	0.1317556	0.4125918	mixed
10	9,10	0.1251100	0.4169602	mixed
11	10,11	0.1193781	0.4207332	mixed
12	12	0.1143679	0.4240348	equal
13	12,13	0.1099398	0.4269554	mixed
14	13	0.1059891	0.4295633	smaller
15	14,15	0.1024359	0.4319104	mixed
16	15	0.0992175	0.4340376	smaller
17	16,17	0.0962846	0.4359771	mixed
18	17	0.0935973	0.4377550	smaller
19	18,19	0.0911231	0.4393926	mixed
20	19,20	0.0888353	0.4409074	mixed
21	20,21	0.0867116	0.4423140	mixed
22	21,22	0.0847332	0.4436248	mixed
23	22,23	0.0828844	0.4448501	mixed
24	24	0.0811515	0.4459988	equal
25	24,25	0.0795230	0.4470786	mixed
26	26	0.0779887	0.4480961	equal
27	27	0.0765399	0.4490571	equal
28	27	0.0751690	0.4499666	smaller
29	29	0.0738692	0.4508291	equal
30	30	0.0726345	0.4516485	equal
31	30	0.0714598	0.4524282	smaller
32	31	0.0703403	0.4531714	smaller
33	32	0.0692718	0.4538808	smaller
34	34	0.0682506	0.4545589	equal
35	34,35	0.0672732	0.4552080	mixed
36	36	0.0663366	0.4558300	equal
37	36,37	0.0654382	0.4564268	mixed
38	37,38	0.0645752	0.4570001	mixed
39	38,39	0.0637455	0.4575513	mixed
40	39,40	0.0629470	0.4580818	mixed
41	40,41	0.0621778	0.4585930	mixed
42	42	0.0614361	0.4590858	equal
43	42,43	0.0607203	0.4595615	mixed
44	43,44	0.0600290	0.4600210	mixed
45	44,45	0.0593608	0.4604652	mixed
46	45	0.0587144	0.4608948	smaller
47	46,47	0.0580886	0.4613108	mixed
48	47,48	0.0574825	0.4617138	mixed
49	48	0.0568949	0.4621044	smaller
50	49	0.0563250	0.4624833	smaller

Interestingly, the value of \(P(X>\lambda)\) increases with \(\lambda\). Also, note that only doing this more thoroughly did we realise that multiple x values that can rise to the largest PMFs for a particular distribution. In many cases, we get ‘mixed’ (i.e. when there are multiple x, one of them happens to be the lambda value), and many cases of ‘smaller’, but we didn’t encounter the possibility of having an argmax that is larger than lambda.

Hence, when given a lambda, e.g. if a goal distribution is Poisson(3), we know that the largest chance event will involve goals 3.

A look into the behaviour of Poisson distributions

Ian Petrus Tan

2024-06-11

Simple look into the shape of Poisson distributions by the value of the mean

First example: \(X \sim \text{Poisson}(\lambda)\)

More generalised

Table of values (Poisson)