Discrete Distributions in R

Discrete Distributions

Discrete distributions refer to probability distributions that are connected with discrete random variables.

Unlike continuous random variables, which can take on any value within a given range, discrete random variables assume distinct and separate values typically represented by whole numbers or counts. Each possible value of the random variable is assigned a corresponding probability indicating its likelihood of occurrence.

Common discrete distributions

Bernoulli: Distribution representing a binary outcome (success or failure).

  • Example: Coin flip (success: heads, failure: tails).

Binomial: Counts the number of successes in a fixed number of independent Bernoulli trials.

  • Example: Number of heads in 10 coin flips.

Multinomial: Generalizes the binomial distribution to more than two categories.

  • Number of 1s, 2s, 3s, 4s, 5s, and 6s in 10 rolls of a die.

Poisson: Models the number of events occurring in a fixed interval of time or space.

  • Example: Number of typos per page.

Geometric: Represents the number of trials needed for the first success in a sequence of independent Bernoulli trials.

  • Example: Number of coin flips until the first head.

Hypergeometric: Models the number of successes in a sample drawn without replacement from a finite population.

  • Example: Number of green balls drawn from a bag without replacement.

Negative Binomial: Generalizes the geometric distribution and represents the number of trials needed for a fixed number of successes.

  • Example: Number of coin flips until the third head.

Simulating Binomial Random Variables

In some statistical applications, situations arise where one needs to simulate (generate) random scenarios that are binomial. To do this, we need to use the following R function (when you type ?rbinom in the R console, you get the R Documentation):

  • rbinom(number of experiments (m), number of trials (n), probability of success (p))

Note: If number of trials, \(n=1\), we have a Bernoulli random variable (distribution).

Example 1:

Simulate or generate 50 binomial random numbers from the Binomial distribution with parameters: \(n=5\), \(p=0.60\) using the rbinom() function.

# Set parameters for the simulation
m <- 50    # Number of experiments
n <- 5     # Number of trials
p <- 0.60  # Probability of success

# Generate binomial random numbers
X <- rbinom(m, n, p)  # Binomial random numbers

# Output
X

Plot a bar graph or chart of the simulated binomial random numbers (Frequency Distribution).

barplot(table(X), xlab = "X", ylab = "Frequency", main = "Frequency Distribution of X")

The bar graph is an estimate of the probability distribution \(P(X = x)\). The bar chart is appropriate since the data are discrete.

The theoretical Binomial distribution is given as follows:

\[P(X=x) = \frac{n!}{x!(n-x)!}\cdot p^x \cdot (1-p)^{n-x} \text{ for } x=0, 1, \ldots,n.\]

\(P(X=x)\) can be calculated using dbinom(x, n, p) function in R.

Example 2:

From the binomial distribution defined in Example 1, calculate \(P(X=0), P(X=1),\cdots, P(X=5)\) using the dbinom() function.

# Set parameters
n <- 5          # Number of trials
x <- 0:n       # Possible number of successes
p <- 0.6       # Probability of success

# Calculate binomial probabilities
probabilities <- dbinom(x, n, p)

# Output
probabilities

Plot the probabilities using the barplot() function.

barplot(probabilities)

Example 3:

Calculate the sample mean, sample variance, and sample standard deviation of the generated or simulated binomial random numbers (sample) in Example 1.

# Calculate sample mean, variance, and standard deviation
sample_mean <- mean(X)
sample_variance <- var(X)
sample_sd <- sd(X)

sample_mean
sample_variance
sample_sd

The sample mean is an estimate of the population mean (or expected value). The expected value of a binomial random variable is given by:

\[E(X) = \mu = n \times p\] The variance of a binomial random variable is calculated from the formula: \[Var(X) = n\cdot p\cdot(1 − p).\] The corresponding standard deviation is:

\[SD(X)=\sqrt{Var(X)}=\sqrt{n\cdot p\cdot(1 − p)}.\]

Example 4:

For the Binomial distribution defined in Example 1 (Parameters: \(n = 5\) and \(p = 0.60\)), calculate these quantities: mean, variance, and standard deviation.

# Parameters
n <- 5          # Number of trials
p <- 0.60       # Probability of success

# Calculate mean, variance, and standard deviation
mean_binomial <- n * p
variance_binomial <- n * p * (1 - p)
sd_binomial <- sqrt(variance_binomial)

# Output
mean_binomial
variance_binomial
sd_binomial

Example 5:

Compare your results from Example 4 to what you would obtain from a simulated sample of 10000 binomial random variables.

# Set parameters for the larger sample
m <- 10000     # Number of experiments
n <- 5         # Number of trials
p <- 0.60      # Probability of success

# Generate a large sample of binomial random numbers
X <- rbinom(m, n, p)

# Calculate sample mean, variance, and standard deviation for the larger sample
sample_mean_large <- mean(X)
sample_variance_large <- var(X)
sample_sd_large <- sd(X)

# Output
sample_mean_large
sample_variance_large
sample_sd_large

Simulation Poisson Random Variables

  • The rpois() function in R can be used to simulate N independent Poisson random variables. To do this, we need to use the following function: rpois(number of random values, lambda)

Example 1:

Generate 10 Poisson random numbers with parameter \(\lambda = 3\) as follows:

# Set the parameter for the Poisson distribution
lambda <- 3  # Rate parameter

# Generate 10 Poisson random numbers
X <- rpois(10, lambda)

# Output
X

Plot a bar graph or chart of the simulated Poisson random numbers (Frequency Distribution).

# Calculate frequencies
frequency_table <- table(X)

# Create a barplot of the frequencies
barplot(frequency_table, xlab = "Number of Events", ylab = "Frequency", 
        main = "Frequencies of Simulated Poisson Distribution with lambda = 3")

The theoretical Poisson distribution is given as follows:

\[P(X=x) = \frac{ \text{e}^{-\lambda} \cdot \lambda^x}{x!}, \text{ } x=0, 1, \ldots\]

\(P(X=x)\) can be calculated using dpois(x, lambda, log = FALSE) function in R.

Example 2:

From the Poisson distribution defined in Example 1, calculate \(P(X=0), P(X=1),\cdots, P(X=5)\) using the dpois() function.

# Set the parameter for the Poisson distribution
lambda <- 3  # Rate parameter
N <- 0:5     # Values from 0 to 5

# Calculate probabilities for values 0 to 5
probabilities <- dpois(N, lambda)

# Display the calculated probabilities
probabilities

Plot the probabilities using the barplot() function.

barplot(probabilities, xlab = "Number of Events", ylab = "Probability", 
        main = "Poisson Probabilities (lambda = 3)")

Example 3:

Calculate the sample mean, sample variance, and sample standard deviation of the generated or simulated Poisson random numbers (sample) in Example 1.

# Calculate sample mean, variance, and standard deviation
sample_mean <- mean(X)
sample_variance <- var(X)
sample_sd <- sd(X)

# Output
sample_mean
sample_variance
sample_sd

The sample mean is an estimate of the population mean (or expected value). The expected value of a Poisson random variable is given by: \(E(X) = \lambda\).

The variance of a Poisson random variable is \(Var(X) = \lambda\). The corresponding standard deviation is: \(SD(X)=\sqrt{Var(X)}=\sqrt{\lambda}\).

Example 4:

For the Poisson distribution defined in Example 1 (Parameter: \(\lambda = 3\)), calculate these quantities: mean, variance, and standard deviation.

\[\text{mean}=\text{variance}=\lambda=3\] \[\text{standard deviation}=\sqrt{\lambda}=\sqrt{3} \approx 1.7321\]

Example 5:

Compare your results from Example 4 to what you would obtain from a simulated sample of 10000 Poisson random variables.

n <- 10000    # number of random values
lambda <- 3     # Poisson parameter
# Generate a large sample of Poisson random numbers
large_sample <- rpois(n, lambda)

# Calculate statistics for the large sample
large_sample_mean <- mean(large_sample)
large_sample_variance <- var(large_sample)
large_sample_sd <- sd(large_sample)

# Output
large_sample_mean
large_sample_variance
large_sample_sd