2025-03-25

What is a Discrete Random Variable?

A discrete random variable (DRV) is a variable whose possible values form a finite or countably infinite set.

The DRV’s actual value is random, based on the number of successes of a random event.

Each of the DRV’s possible values has a numerical probability of occurring between 0 and 1.

Plotting the potential values of the DRV against their corresponding probabilities gives a graph with a specific shape, also known as a distribution.

This presentation will mainly focus on binomial distribution, but it will also mention Bernoulli distribution and normal distribution.

Bernoulli Distribution - Explanation

Used for single-trial events where the only options are whether or not an outcome occurs. Ex: Flipping a coin weighted toward heads.

Here’s the code for an example bar plot showing the Bernoulli distribution of flipping a coin with a 75% chance of landing on heads.

bernoulli = data.frame(outcome = c("Heads", "Tails"), 
    probability = c(0.75, 0.25))
g1 <- ggplot(data = bernoulli, 
    aes(x = outcome, y = probability, fill = "red")) + 
    geom_bar(stat = "identity") + theme(legend.position = "none")

Bernoulli Distribution - Graph

Note how the probability for heads is 0.75, while the probability for tails is 1 - 0.75 = 0.25.

Binomial Distribution - Explanation

Used for multiple trials of independent events where the only options are a specific outcome does or does not happen, also known as multiple bernoulli distributions. Ex: Flipping a fair coin 10 times in a row.

The probability of each possiblity is given by the following formula, where x is the number of successes, n is the number of trials, and p is the probability of a success:

\[P(X=x)=\frac{n!}{x!(n-x)!}p^x(1-p)^{(n-x)}\]

Binomial Distribution - Graph

Here’s the code for an example bar plot showing the binomial distribution of flipping a coin 10 times. Included with the graph is an interactive slider, allowing you to change the probability of a success from p = 0.1 to p = 0.9.

#create data
aval <- list()
for (step in 1:9) {
  aval[[step]] <- list(visible = FALSE, 
                       name = paste0("p = ", step / 10), 
                       x = 0:10, 
                       y = dbinom(0:10, 10, step / 10))
}
aval[5][[1]]$visible = TRUE

Binomial Distribution - Graph

#create steps and plot all bars
steps <- list()
fig <- plot_ly()
for (i in 1:9) {
  fig <- add_bars(fig, 
                  x=aval[i][[1]]$x, 
                  y=aval[i][[1]]$y, 
                  visible=aval[i][[1]]$visible, 
                  name=aval[i][[1]]$name,
                  type='bar', 
                  hoverinfo='name', 
                  showlegend=FALSE,
                  color='blue')
  step <- list(args = list('visible', rep(FALSE, length(aval))),
               method = 'restyle')
  step$args[[2]][i] = TRUE  
  steps[[i]] = step
}

Binomial Distribution - Graph

#add slider to control plot and label axes
fig <- fig %>%
  layout(sliders=list(list(active=4,
                           currentvalue = list(
                             prefix = "Probability: "),
                           steps = steps)), 
         xaxis=list(title="Number of Heads"),
         yaxis=list(title="Probability"),
         title="Probability Mass Function of Ten Coin Flips")

On the next slide will be the graph. Note how the probability for extreme scenarios (all heads, all tails) is less than the probability for scenarios with mixed results (both heads and tails were flipped), even as the probability changes.

Binomial Distribution - Graph

Normal Distribution - Explanation

A common mistake is confusing binomial distribution and normal distribution.

While they may seem similar at first glance, they have two main differences:

  • Binomial distribution only works for discrete random variables, while normal distribution only works for continuous random variables.

  • Binomial distribution can be skewed left or right, while normal distribution is always symmetrical with the mean, median, and mode aligned.

Normal Distribution - Bell Curve

The graph of binomial distribution is also known as a bell curve.

The graph of this curve is given by the following formula, where mu is the mean and sigma is the standard deviation:

\[f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}\]

While normal distribution only applies to continuous random variables, a bell curve can still be used to estimate or group the data from a binomial distribution.

Binomial Distribution - Bell Curve

Here is the code for a graph of the binomial distribution of a fair coin being tossed 10 times. On the following slide is the graph. Note how the bell curve hits the middle of the bars in most cases.

binomialprob = dbinom(x=0:10, 10, 0.5)
binomial = data.frame(heads = 0:10, 
    probability = binomialprob)
g2 <- ggplot(data = binomial, 
    aes(x = heads, y = probability, fill = "red")) + 
    geom_bar(stat = "identity") + theme(legend.position = "none") + 
  stat_function(fun=dnorm, n=101, args=list(m=5,sd=sqrt(2.5)))

Binomial Distribution - Bell Curve

Thanks For Viewing!