This tutorial introduces two important discrete probability distributions in R: the Binomial and the Poisson distributions. In each section, we will:
Discrete distributions arise when a random variable can take only distinct countable values, such as 0, 1, 2, 3, and so on.
Some of the most common discrete distributions are listed below.
A Binomial random variable counts the number of successes in a fixed number of independent trials, where each trial has the same probability of success.
In R, we commonly use the following functions for the Binomial distribution:
rbinom() to simulate random values,dbinom() to compute exact probabilities,pbinom() to compute cumulative probabilities.The function for simulation is:
rbinom(number_of_experiments, number_of_trials, probability_of_success)
If the number of trials is 1, the Binomial distribution
reduces to a Bernoulli distribution.
Simulate 50 Binomial random values from a distribution with parameters:
# Set parameters for the simulation
m <- 50 # number of experiments
n <- 5 # number of trials
p <- 0.60 # probability of success
# Generate binomial random numbers
X_binom <- rbinom(m, n, p)
# Output
X_binom
## [1] 2 4 4 3 3 2 3 4 2 2 3 3 2 4 4 4 3 3 2 2 1 2 4 4 4 2 2 3 4 3 1 4 3 2 3 3 5 4
## [39] 5 3 3 4 3 1 2 4 3 4 4 2
A bar chart is appropriate here because the outcomes are discrete.
barplot(
table(X_binom),
xlab = "X",
ylab = "Frequency",
main = "Frequency Distribution of Simulated Binomial Values"
)
The bar chart gives an empirical view of the probability distribution \(P(X=x)\).
The Binomial probability mass function is:
\[ P(X=x) = \binom{n}{x} p^x (1-p)^{n-x}, \quad x = 0,1,\ldots,n. \]
In R, exact probabilities are computed using
dbinom(x, n, p).
For the Binomial distribution above, calculate:
\[ P(X=0), P(X=1), \ldots, P(X=5). \]
# Set parameters
n <- 5
x <- 0:n
p <- 0.60
# Calculate exact probabilities
binom_probabilities <- dbinom(x, n, p)
# Output
binom_probabilities
## [1] 0.01024 0.07680 0.23040 0.34560 0.25920 0.07776
barplot(
binom_probabilities,
names.arg = x,
xlab = "x",
ylab = "Probability",
main = "Binomial Probabilities for n = 5 and p = 0.60"
)
Using the simulated sample from Example 1, compute the sample mean, sample variance, and sample standard deviation.
sample_mean_binom <- mean(X_binom)
sample_variance_binom <- var(X_binom)
sample_sd_binom <- sd(X_binom)
sample_mean_binom
## [1] 3.02
sample_variance_binom
## [1] 0.9995918
sample_sd_binom
## [1] 0.9997959
These values are sample-based estimates of the population quantities.
For a Binomial random variable:
\[ E(X) = np \]
\[ \mathrm{Var}(X) = np(1-p) \]
\[ SD(X) = \sqrt{np(1-p)} \]
# Parameters
n <- 5
p <- 0.60
# Theoretical quantities
mean_binomial <- n * p
variance_binomial <- n * p * (1 - p)
sd_binomial <- sqrt(variance_binomial)
mean_binomial
## [1] 3
variance_binomial
## [1] 1.2
sd_binomial
## [1] 1.095445
Now simulate 10,000 Binomial random values and compare the sample summaries to the theoretical values.
# Set parameters for a larger simulation
m <- 10000
n <- 5
p <- 0.60
# Generate large sample
X_binom_large <- rbinom(m, n, p)
# Compute sample summaries
sample_mean_large_binom <- mean(X_binom_large)
sample_variance_large_binom <- var(X_binom_large)
sample_sd_large_binom <- sd(X_binom_large)
sample_mean_large_binom
## [1] 3.0035
sample_variance_large_binom
## [1] 1.165804
sample_sd_large_binom
## [1] 1.079724
| Quantity | Theoretical | Simulated_Sample_50 | Simulated_Sample_10000 |
|---|---|---|---|
| Mean | 3.0000 | 3.0200 | 3.0035 |
| Variance | 1.2000 | 0.9996 | 1.1658 |
| Standard Deviation | 1.0954 | 0.9998 | 1.0797 |
A Poisson random variable models the number of events that occur in a fixed interval of time, space, area, or volume, when events occur independently and at a constant average rate.
In R, we commonly use:
rpois() to simulate random values,dpois() to compute exact probabilities,ppois() to compute cumulative probabilities.The simulation function is:
rpois(number_of_random_values, lambda)
where \(\lambda\) is the average rate.
Generate 10 Poisson random values with parameter \(\lambda = 3\).
# Set the parameter for the Poisson distribution
lambda <- 3
# Generate 10 Poisson random numbers
X_pois <- rpois(10, lambda)
# Output
X_pois
## [1] 1 4 1 1 7 2 1 3 1 3
frequency_table <- table(X_pois)
barplot(
frequency_table,
xlab = "Number of Events",
ylab = "Frequency",
main = "Frequency Distribution of Simulated Poisson Values"
)
The Poisson probability mass function is:
\[ P(X=x) = \frac{e^{-\lambda}\lambda^x}{x!}, \quad x = 0,1,2,\ldots \]
In R, exact probabilities are computed using
dpois(x, lambda).
For \(\lambda = 3\), calculate:
\[ P(X=0), P(X=1), \ldots, P(X=5). \]
# Set values from 0 to 5
x_pois <- 0:5
lambda <- 3
# Calculate probabilities
pois_probabilities <- dpois(x_pois, lambda)
# Display probabilities
pois_probabilities
## [1] 0.04978707 0.14936121 0.22404181 0.22404181 0.16803136 0.10081881
barplot(
pois_probabilities,
names.arg = x_pois,
xlab = "Number of Events",
ylab = "Probability",
main = "Poisson Probabilities for lambda = 3"
)
Using the simulated sample from Example 1, compute the sample mean, variance, and standard deviation.
sample_mean_pois <- mean(X_pois)
sample_variance_pois <- var(X_pois)
sample_sd_pois <- sd(X_pois)
sample_mean_pois
## [1] 2.4
sample_variance_pois
## [1] 3.822222
sample_sd_pois
## [1] 1.95505
For a Poisson random variable:
\[ E(X) = \lambda \]
\[ \mathrm{Var}(X) = \lambda \]
\[ SD(X) = \sqrt{\lambda} \]
For \(\lambda = 3\):
lambda <- 3
mean_pois <- lambda
variance_pois <- lambda
sd_pois <- sqrt(lambda)
mean_pois
## [1] 3
variance_pois
## [1] 3
sd_pois
## [1] 1.732051
Simulate 10,000 Poisson random values and compare the sample summaries to the theoretical values.
n <- 10000
lambda <- 3
# Generate a large sample
X_pois_large <- rpois(n, lambda)
# Compute sample summaries
large_sample_mean_pois <- mean(X_pois_large)
large_sample_variance_pois <- var(X_pois_large)
large_sample_sd_pois <- sd(X_pois_large)
large_sample_mean_pois
## [1] 2.9699
large_sample_variance_pois
## [1] 3.013095
large_sample_sd_pois
## [1] 1.735827
| Quantity | Theoretical | Simulated_Sample_10 | Simulated_Sample_10000 |
|---|---|---|---|
| Mean | 3.0000 | 2.4000 | 2.9699 |
| Variance | 3.0000 | 3.8222 | 3.0131 |
| Standard Deviation | 1.7321 | 1.9551 | 1.7358 |
As a general principle, the larger the simulated sample size, the closer the sample summaries tend to be to the theoretical values.