Binomial Distribution

2023-04-16

Introduction

The binomial distribution is a probability distribution that describes the outcomes of a fixed number of independent trials with only two possible outcomes (success or failure) and a constant probability of success. It is characterized by two parameters:

n = the number of trials, and
p = the probability of success in each trial.

To calculate the probability of obtaining k successes in n trials, we use the probability mass function(PMF) given by the formula:

\[\begin{equation} P(X = k) = {n \choose k} p^k (1-p)^{n-k} \end{equation}\] where X represents the number of successes in n trials with probability of success p. The symbol \({n \choose k}\) represents the number of ways to choose k successes from n trials, and can be calculated using the formula \({n \choose k} = \frac{n!}{k!(n-k)!}\).

This presentation provides an overview of the binomial distribution, its characteristics, applications, and importance in various fields of study.

Example: Scenario-Rolling a Die

Rolling a die 5 times and finding the probability of getting a 6 on each roll is an example of a binomial distribution. It satisfies the criteria of having:

A fixed number of trials (5),
Two possible outcomes (success or failure),
A constant probability of success (1/6),
And independent trials.

This scenario helps us understand the properties of binomial distributions, including mean, variance, and shape.

Properties of binomial distribution

Binomial distribution is a discrete probability distribution with two parameters: n (number of trials) and p (probability of success). Its properties include:

Discrete probability distribution: Binomial distribution is a discrete probability distribution since it assigns probabilities to a finite set of possible outcomes.
PMF: Probability mass function gives the probability of obtaining k successes in n trials.
CDF: Cumulative distribution function gives the probability of obtaining k or fewer successes in n trials.

Understanding these properties is important for using binomial distribution in various applications like quality control, hypothesis testing, and risk analysis.

Example: Binomial Distribution

Suppose we have a binomial distribution with n = 200 and p = 0.5. We can generate a random sample from this distribution using the rbinom() function in R.

# Generate random sample
set.seed(123); n <- 200; p <- 0.5; x <- rbinom(n, size = n, prob = p)
# Plot histogram
ggplot(data.frame(x), aes(x)) +
  geom_histogram(binwidth = 1, color = "black", fill = "blue") +
  labs(x = "Number of successes", y = "Frequency") + 
  ggtitle("Binomial Distribution") + 
  theme(plot.title = element_text(hjust = 0.5))

Probability Mass Function (pmf)

The PMF of a binomial distribution gives the probability of observing a specific number of successes (k) in n trials with probability of success p. It is defined as:\[P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}\]
The PMF maps each possible value to its probability, and is represented by a bar plot.
PMF is useful for modeling and analyzing discrete random variables, and can be used to calculate statistical measures such as mean, variance, and standard deviation.
PMF has the following properties: The probability of any value must be between 0 and 1. The sum of probabilities over all possible values must equal 1.

Bar Plot of pmf

# Set the parameters
n <- 30;p <- 0.5;x <- 0:n
# Calculate the probability mass function
pmf <- dbinom(x, n, p)
pmf_df <- data.frame(x, pmf)
#creates a ggplot object with pmf_df as the data source 
ggplot(pmf_df, aes(x = x, y = pmf)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  ggtitle("PMF of Binomial Distribution") + xlab("Number of Successes") +
  ylab("Probability")+ theme(plot.title = element_text(hjust = 0.5))

Cumulative Distribution Function (cdf)

The Cumulative Distribution Function (CDF) is a function that shows the probability of a random variable taking on a value less than or equal to a certain number.Mathematically, the CDF is defined as:\[F(x) = P(X \leq x)\]
It ranges from 0 to 1 and provides a complete description of the distribution of the random variable.
It can be used to calculate statistical measures and make predictions. It can be represented graphically as a step function or a smooth curve.

Bar Plot of cdf

# Set the parameters
n <- 10;p <- 0.5;x <- 0:n
# Cumulative Distribution Function
cdf <- pbinom(x, n, p)
cdf_df <- data.frame(x, cdf)
#creates a ggplot object with cdf_df as the data source
ggplot(cdf_df, aes(x = x, y = cdf)) +
  geom_line(color = "red") + 
  ggtitle("CDF of Binomial Distribution") +
  xlab("Number of Successes") + ylab("Cumulative Probability")+ 
  theme(plot.title = element_text(hjust = 0.5))

Mean and Variance

Given by the binomial probability mass function (PMF): \[P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}\] where \(\binom{n}{k}\) is the binomial coefficient, which gives the number of ways to choose k successes from n trials. - The expected value and variance of a binomial distribution are: \[E(X) = np\] and \[Var(X) = np(1-p)\]. - The mean tells us the average value of the data, while the variance tells us how spread out the data is. - These two measures are important for understanding the shape and behavior of probability distributions, including the binomial distribution.

Normal Approximation

The binomial distribution can be approximated by a normal distribution with the same mean and variance when n is large and p is not too close to 0 or 1. This is known as the normal approximation to the binomial distribution.

The mean and variance of the normal distribution are:\[\mu = np\] and \[\sigma^2 = np(1-p)\].
The normal approximation can be useful when calculating probabilities for large n and p that are not too extreme. Suppose n = 100 and p = 0.3 then \(\mu = np = 30\) and \(\sigma^2 = np(1-p) = 21\).

\[Z = \frac{X - \mu}{\sigma} \sim N(0,1)\] where X is the number of successes in n trials, and Z is the standardized normal variable.

Binomial Distribution in 3D

3D scatter plot of the binomial distribution using Plotly in R. It visualizes the number of successes in a fixed number of trials with the same probability of success. The plot_ly() function is used to create the plot and the layout() function is used to add axis labels and a title.

Conclusion

The binomial distribution models the probability of a fixed number of successes in a set of independent trials with equal chances of success.
In R, we have various functions, including rbinom(), dbinom(), pbinom(), and qbinom(), to generate random numbers, compute probabilities, and determine quantiles of the binomial distribution.
Additionally, R provides multiple visualization tools for the binomial distribution, including histogram, density plot, CDF plot, and 3D plot.