Overview

This tutorial introduces two important discrete probability distributions in R: the Binomial and the Poisson distributions. In each section, we will:

  1. simulate random values from the distribution,
  2. display the simulated outcomes graphically,
  3. calculate exact probabilities using R,
  4. compute sample summaries from simulated data, and
  5. compare simulated results to theoretical quantities.

Discrete distributions arise when a random variable can take only distinct countable values, such as 0, 1, 2, 3, and so on.

Common discrete distributions

Some of the most common discrete distributions are listed below.

Tutorials

Binomial Distribution

Introduction

A Binomial random variable counts the number of successes in a fixed number of independent trials, where each trial has the same probability of success.

In R, we commonly use the following functions for the Binomial distribution:

  • rbinom() to simulate random values,
  • dbinom() to compute exact probabilities,
  • pbinom() to compute cumulative probabilities.

The function for simulation is:

rbinom(number_of_experiments, number_of_trials, probability_of_success)

If the number of trials is 1, the Binomial distribution reduces to a Bernoulli distribution.

Example 1: Simulating Binomial random variables

Simulate 50 Binomial random values from a distribution with parameters:

  • number of trials: \(n = 5\)
  • probability of success: \(p = 0.60\)
# Set parameters for the simulation
m <- 50    # number of experiments
n <- 5     # number of trials
p <- 0.60  # probability of success

# Generate binomial random numbers
X_binom <- rbinom(m, n, p)

# Output
X_binom
##  [1] 2 4 4 3 3 2 3 4 2 2 3 3 2 4 4 4 3 3 2 2 1 2 4 4 4 2 2 3 4 3 1 4 3 2 3 3 5 4
## [39] 5 3 3 4 3 1 2 4 3 4 4 2

Example 2: Frequency distribution of the simulated values

A bar chart is appropriate here because the outcomes are discrete.

barplot(
  table(X_binom),
  xlab = "X",
  ylab = "Frequency",
  main = "Frequency Distribution of Simulated Binomial Values"
)

The bar chart gives an empirical view of the probability distribution \(P(X=x)\).

Theoretical Binomial probability function

The Binomial probability mass function is:

\[ P(X=x) = \binom{n}{x} p^x (1-p)^{n-x}, \quad x = 0,1,\ldots,n. \]

In R, exact probabilities are computed using dbinom(x, n, p).

Example 3: Calculating exact Binomial probabilities

For the Binomial distribution above, calculate:

\[ P(X=0), P(X=1), \ldots, P(X=5). \]

# Set parameters
n <- 5
x <- 0:n
p <- 0.60

# Calculate exact probabilities
binom_probabilities <- dbinom(x, n, p)

# Output
binom_probabilities
## [1] 0.01024 0.07680 0.23040 0.34560 0.25920 0.07776

Example 4: Plotting the theoretical probabilities

barplot(
  binom_probabilities,
  names.arg = x,
  xlab = "x",
  ylab = "Probability",
  main = "Binomial Probabilities for n = 5 and p = 0.60"
)

Example 5: Sample mean, variance, and standard deviation

Using the simulated sample from Example 1, compute the sample mean, sample variance, and sample standard deviation.

sample_mean_binom <- mean(X_binom)
sample_variance_binom <- var(X_binom)
sample_sd_binom <- sd(X_binom)

sample_mean_binom
## [1] 3.02
sample_variance_binom
## [1] 0.9995918
sample_sd_binom
## [1] 0.9997959

These values are sample-based estimates of the population quantities.

Theoretical mean and variance

For a Binomial random variable:

\[ E(X) = np \]

\[ \mathrm{Var}(X) = np(1-p) \]

\[ SD(X) = \sqrt{np(1-p)} \]

Example 6: Theoretical mean, variance, and standard deviation

# Parameters
n <- 5
p <- 0.60

# Theoretical quantities
mean_binomial <- n * p
variance_binomial <- n * p * (1 - p)
sd_binomial <- sqrt(variance_binomial)

mean_binomial
## [1] 3
variance_binomial
## [1] 1.2
sd_binomial
## [1] 1.095445

Example 7: Comparing theory to a large simulated sample

Now simulate 10,000 Binomial random values and compare the sample summaries to the theoretical values.

# Set parameters for a larger simulation
m <- 10000
n <- 5
p <- 0.60

# Generate large sample
X_binom_large <- rbinom(m, n, p)

# Compute sample summaries
sample_mean_large_binom <- mean(X_binom_large)
sample_variance_large_binom <- var(X_binom_large)
sample_sd_large_binom <- sd(X_binom_large)

sample_mean_large_binom
## [1] 3.0035
sample_variance_large_binom
## [1] 1.165804
sample_sd_large_binom
## [1] 1.079724

Summary comparison table

Comparison of theoretical and simulated Binomial summaries
Quantity Theoretical Simulated_Sample_50 Simulated_Sample_10000
Mean 3.0000 3.0200 3.0035
Variance 1.2000 0.9996 1.1658
Standard Deviation 1.0954 0.9998 1.0797

Poisson Distribution

Introduction

A Poisson random variable models the number of events that occur in a fixed interval of time, space, area, or volume, when events occur independently and at a constant average rate.

In R, we commonly use:

  • rpois() to simulate random values,
  • dpois() to compute exact probabilities,
  • ppois() to compute cumulative probabilities.

The simulation function is:

rpois(number_of_random_values, lambda)

where \(\lambda\) is the average rate.

Example 1: Simulating Poisson random variables

Generate 10 Poisson random values with parameter \(\lambda = 3\).

# Set the parameter for the Poisson distribution
lambda <- 3

# Generate 10 Poisson random numbers
X_pois <- rpois(10, lambda)

# Output
X_pois
##  [1] 1 4 1 1 7 2 1 3 1 3

Example 2: Frequency distribution of the simulated values

frequency_table <- table(X_pois)

barplot(
  frequency_table,
  xlab = "Number of Events",
  ylab = "Frequency",
  main = "Frequency Distribution of Simulated Poisson Values"
)

Theoretical Poisson probability function

The Poisson probability mass function is:

\[ P(X=x) = \frac{e^{-\lambda}\lambda^x}{x!}, \quad x = 0,1,2,\ldots \]

In R, exact probabilities are computed using dpois(x, lambda).

Example 3: Calculating exact Poisson probabilities

For \(\lambda = 3\), calculate:

\[ P(X=0), P(X=1), \ldots, P(X=5). \]

# Set values from 0 to 5
x_pois <- 0:5
lambda <- 3

# Calculate probabilities
pois_probabilities <- dpois(x_pois, lambda)

# Display probabilities
pois_probabilities
## [1] 0.04978707 0.14936121 0.22404181 0.22404181 0.16803136 0.10081881

Example 4: Plotting the theoretical probabilities

barplot(
  pois_probabilities,
  names.arg = x_pois,
  xlab = "Number of Events",
  ylab = "Probability",
  main = "Poisson Probabilities for lambda = 3"
)

Example 5: Sample mean, variance, and standard deviation

Using the simulated sample from Example 1, compute the sample mean, variance, and standard deviation.

sample_mean_pois <- mean(X_pois)
sample_variance_pois <- var(X_pois)
sample_sd_pois <- sd(X_pois)

sample_mean_pois
## [1] 2.4
sample_variance_pois
## [1] 3.822222
sample_sd_pois
## [1] 1.95505

Theoretical mean and variance

For a Poisson random variable:

\[ E(X) = \lambda \]

\[ \mathrm{Var}(X) = \lambda \]

\[ SD(X) = \sqrt{\lambda} \]

Example 6: Theoretical mean, variance, and standard deviation

For \(\lambda = 3\):

lambda <- 3

mean_pois <- lambda
variance_pois <- lambda
sd_pois <- sqrt(lambda)

mean_pois
## [1] 3
variance_pois
## [1] 3
sd_pois
## [1] 1.732051

Example 7: Comparing theory to a large simulated sample

Simulate 10,000 Poisson random values and compare the sample summaries to the theoretical values.

n <- 10000
lambda <- 3

# Generate a large sample
X_pois_large <- rpois(n, lambda)

# Compute sample summaries
large_sample_mean_pois <- mean(X_pois_large)
large_sample_variance_pois <- var(X_pois_large)
large_sample_sd_pois <- sd(X_pois_large)

large_sample_mean_pois
## [1] 2.9699
large_sample_variance_pois
## [1] 3.013095
large_sample_sd_pois
## [1] 1.735827

Summary comparison table

Comparison of theoretical and simulated Poisson summaries
Quantity Theoretical Simulated_Sample_10 Simulated_Sample_10000
Mean 3.0000 2.4000 2.9699
Variance 3.0000 3.8222 3.0131
Standard Deviation 1.7321 1.9551 1.7358

Closing remarks

As a general principle, the larger the simulated sample size, the closer the sample summaries tend to be to the theoretical values.