2023-04-16

Introduction

The binomial distribution is a probability distribution that describes the outcomes of a fixed number of independent trials with only two possible outcomes (success or failure) and a constant probability of success. It is characterized by two parameters:

  • n = the number of trials, and

  • p = the probability of success in each trial.

To calculate the probability of obtaining k successes in n trials, we use the probability mass function(PMF) given by the formula:

\[\begin{equation} P(X = k) = {n \choose k} p^k (1-p)^{n-k} \end{equation}\] where X represents the number of successes in n trials with probability of success p. The symbol \({n \choose k}\) represents the number of ways to choose k successes from n trials, and can be calculated using the formula \({n \choose k} = \frac{n!}{k!(n-k)!}\).

This presentation provides an overview of the binomial distribution, its characteristics, applications, and importance in various fields of study.

Example: Scenario-Rolling a Die

Rolling a die 5 times and finding the probability of getting a 6 on each roll is an example of a binomial distribution. It satisfies the criteria of having:

  • A fixed number of trials (5),

  • Two possible outcomes (success or failure),

  • A constant probability of success (1/6),

  • And independent trials.

This scenario helps us understand the properties of binomial distributions, including mean, variance, and shape.

Properties of binomial distribution

Binomial distribution is a discrete probability distribution with two parameters: n (number of trials) and p (probability of success). Its properties include:

  • Discrete probability distribution: Binomial distribution is a discrete probability distribution since it assigns probabilities to a finite set of possible outcomes.

  • PMF: Probability mass function gives the probability of obtaining k successes in n trials.

  • CDF: Cumulative distribution function gives the probability of obtaining k or fewer successes in n trials.

Understanding these properties is important for using binomial distribution in various applications like quality control, hypothesis testing, and risk analysis.

Example: Binomial Distribution

Suppose we have a binomial distribution with n = 200 and p = 0.5. We can generate a random sample from this distribution using the rbinom() function in R.

# Generate random sample
set.seed(123); n <- 200; p <- 0.5; x <- rbinom(n, size = n, prob = p)
# Plot histogram
ggplot(data.frame(x), aes(x)) +
  geom_histogram(binwidth = 1, color = "black", fill = "blue") +
  labs(x = "Number of successes", y = "Frequency") + 
  ggtitle("Binomial Distribution") + 
  theme(plot.title = element_text(hjust = 0.5))

Probability Mass Function (pmf)

  • The PMF of a binomial distribution gives the probability of observing a specific number of successes (k) in n trials with probability of success p. It is defined as:\[P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}\]

  • The PMF maps each possible value to its probability, and is represented by a bar plot.

  • PMF is useful for modeling and analyzing discrete random variables, and can be used to calculate statistical measures such as mean, variance, and standard deviation.

  • PMF has the following properties: The probability of any value must be between 0 and 1. The sum of probabilities over all possible values must equal 1.

Bar Plot of pmf

# Set the parameters
n <- 30;p <- 0.5;x <- 0:n
# Calculate the probability mass function
pmf <- dbinom(x, n, p)
pmf_df <- data.frame(x, pmf)
#creates a ggplot object with pmf_df as the data source 
ggplot(pmf_df, aes(x = x, y = pmf)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  ggtitle("PMF of Binomial Distribution") + xlab("Number of Successes") +
  ylab("Probability")+ theme(plot.title = element_text(hjust = 0.5))

Cumulative Distribution Function (cdf)

  • The Cumulative Distribution Function (CDF) is a function that shows the probability of a random variable taking on a value less than or equal to a certain number.Mathematically, the CDF is defined as:\[F(x) = P(X \leq x)\]

  • It ranges from 0 to 1 and provides a complete description of the distribution of the random variable.

  • It can be used to calculate statistical measures and make predictions. It can be represented graphically as a step function or a smooth curve.

Bar Plot of cdf

# Set the parameters
n <- 10;p <- 0.5;x <- 0:n
# Cumulative Distribution Function
cdf <- pbinom(x, n, p)
cdf_df <- data.frame(x, cdf)
#creates a ggplot object with cdf_df as the data source
ggplot(cdf_df, aes(x = x, y = cdf)) +
  geom_line(color = "red") + 
  ggtitle("CDF of Binomial Distribution") +
  xlab("Number of Successes") + ylab("Cumulative Probability")+ 
  theme(plot.title = element_text(hjust = 0.5))

Mean and Variance

Given by the binomial probability mass function (PMF): \[P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}\] where \(\binom{n}{k}\) is the binomial coefficient, which gives the number of ways to choose k successes from n trials. - The expected value and variance of a binomial distribution are: \[E(X) = np\] and \[Var(X) = np(1-p)\]. - The mean tells us the average value of the data, while the variance tells us how spread out the data is. - These two measures are important for understanding the shape and behavior of probability distributions, including the binomial distribution.

Normal Approximation

The binomial distribution can be approximated by a normal distribution with the same mean and variance when n is large and p is not too close to 0 or 1. This is known as the normal approximation to the binomial distribution.

  • The mean and variance of the normal distribution are:\[\mu = np\] and \[\sigma^2 = np(1-p)\].
  • The normal approximation can be useful when calculating probabilities for large n and p that are not too extreme. Suppose n = 100 and p = 0.3 then \(\mu = np = 30\) and \(\sigma^2 = np(1-p) = 21\).

\[Z = \frac{X - \mu}{\sigma} \sim N(0,1)\] where X is the number of successes in n trials, and Z is the standardized normal variable.

Binomial Distribution in 3D

3D scatter plot of the binomial distribution using Plotly in R. It visualizes the number of successes in a fixed number of trials with the same probability of success. The plot_ly() function is used to create the plot and the layout() function is used to add axis labels and a title.

Conclusion

  • The binomial distribution models the probability of a fixed number of successes in a set of independent trials with equal chances of success.

  • In R, we have various functions, including rbinom(), dbinom(), pbinom(), and qbinom(), to generate random numbers, compute probabilities, and determine quantiles of the binomial distribution.

  • Additionally, R provides multiple visualization tools for the binomial distribution, including histogram, density plot, CDF plot, and 3D plot.