Counting successes

There are many families of probability distributions in statistics, but in this course we focus on just a few important ones. One of the most useful is the binomial distribution, which describes the number of “successes” in a fixed number of independent yes/no trials.

A random variable \(X\) follows a binomial distribution if:

There is a fixed number of trials, \(n\).
Each trial has only two possible outcomes (often called “success” and “failure”).
The probability of success, \(p\), is the same on every trial.
The trials are independent of one another.

When these conditions are met we write \(X \sim \text{Binomial}(n,p)\), and the probability of getting exactly \(x\) successes is

\[ P(X = x) \;=\; \binom{n}{x}\, p^{\,x} (1-p)^{\,n-x}, \qquad x = 0, 1, 2, \ldots, n. \]

You do not need to evaluate that formula by hand. R will do it for us. Common examples of binomial random variables include the number of heads in \(n\) coin flips, the number of free throws made out of \(n\) attempts by a basketball player with a fixed shooting percentage, or the number of voters in a sample of \(n\) people who support a particular candidate. Note: \(\binom{n}{x}\) is the notation professional mathematicians and statisticians use for the binomial coefficient, or combination “n choose x”, which is often written \(C(n,x)\) or \(_nC_x\) in introductory courses.

In this lab we will use R to compute binomial probabilities, visualize the distribution, and explore what happens to the shape of the distribution as \(n\) grows. By the end of the lab you should have a good sense of why statisticians say “for large \(n\), the binomial distribution looks normal.” The normal distribution is the famous bell-shaped, symmetric, mound-shaped curve you may have seen described in class. It is fully determined by just two numbers — a mean \(\mu\) and a standard deviation \(\sigma\) — and its shape is what the empirical rule (68%–95%–99.7%) describes exactly. One of the most striking facts in statistics is that the binomial distribution, which is built from counting discrete yes/no outcomes, gradually takes on this smooth bell shape as the number of trials \(n\) grows large.

Computing binomial probabilities

R has four built-in functions for the binomial distribution. The two we will use today are:

dbinom(x, size, prob) returns \(P(X = x)\), the probability of exactly \(x\) successes.
pbinom(x, size, prob) returns \(P(X \leq x)\), the probability of at most \(x\) successes.

Here size is \(n\) (the number of trials) and prob is \(p\) (the probability of success on each trial).

Suppose we flip a fair coin 25 times and let \(X\) be the number of heads. Then \(X \sim \text{Binomial}(25, 0.5)\). The probability of getting exactly 12 heads is:

dbinom(12, size = 25, prob = 0.5)

The probability of getting at most 12 heads is:

pbinom(12, size = 25, prob = 0.5)

Notice that dbinom is for an exact count and pbinom is for at most that many. We can build a full probability table by feeding dbinom a vector of \(x\) values from 0 to \(n\):

x <- 0:25
probs <- dbinom(x, size = 25, prob = 0.5)
data.frame(x = x, probability = round(probs, 4))

The shortcut 0:25 produces the vector \(0, 1, 2, \ldots, 25\). The function dbinom is vectorized, meaning it computes \(P(X=x)\) for every value of x at once and returns the answers as a vector.

Now do Exercise 1.

Visualizing the distribution (n = 25)

A probability table is informative, but a picture tells the story faster. Because \(X\) takes integer values, the natural plot is a probability histogram with one bar at each value of \(x\). We can build it with the barplot function in R:

x <- 0:25
probs <- dbinom(x, size = 25, prob = 0.5)
barplot(probs, names.arg = x,
        xlab = "x (number of successes)",
        ylab = "P(X = x)",
        main = "Binomial(n = 25, p = 0.5)")

The argument names.arg = x puts the value of \(x\) underneath each bar. Try running the chunk above. You should see a roughly mound-shaped distribution centered at 12 or 13.

Now we want to investigate how the shape of the distribution changes as we change \(p\). Rather than retyping the code three times, we can write a small helper function:

plot_binom <- function(n, p) {
  x <- 0:n
  probs <- dbinom(x, size = n, prob = p)
  barplot(probs, names.arg = x,
          xlab = "x", ylab = "P(X = x)",
          main = paste0("Binomial(n = ", n, ", p = ", p, ")"))
}

Once you’ve run that chunk, you can produce a histogram for any \(n\) and \(p\) with a single line, e.g. plot_binom(25, 0.2).

Now do Exercise 2.

The shape you observed is not an accident. When \(p\) is far from \(0.5\) and \(n\) is small, the distribution is squeezed against one end of the range and stretches a long tail toward the other end. As \(p\) approaches \(0.5\), the distribution becomes symmetric.

Now do Exercise 3.

Mean and standard deviation

For a binomial distribution, the mean and standard deviation have simple formulas:

\[ \mu \;=\; n p, \qquad \sigma \;=\; \sqrt{n\, p\,(1-p)}. \]

We can compute these directly in R. For \(n = 25\) and \(p = 0.5\):

n <- 25
p <- 0.5
mu <- n * p
sigma <- sqrt(n * p * (1 - p))
mu
sigma

The empirical rule

Recall the empirical rule for mound-shaped distributions: about 68% of the probability lies within one standard deviation of the mean, about 95% within two standard deviations, and about 99.7% within three.

Let’s check how well the binomial distribution with \(n = 25\) and \(p = 0.5\) follows this rule. The mean is \(\mu = 12.5\) and the standard deviation is \(\sigma = 2.5\), so we want \(P(10 \leq X \leq 15)\) (within one \(\sigma\)), \(P(7.5 \leq X \leq 17.5)\) (within two \(\sigma\)), and so on.

Because \(X\) takes only integer values, we use pbinom carefully:

# P(10 <= X <= 15) = P(X <= 15) - P(X <= 9)
pbinom(15, size = 25, prob = 0.5) - pbinom(9, size = 25, prob = 0.5)

That number should be reasonably close to \(0.68\), but probably not exactly.

Now do Exercise 4.

Increasing n: the distribution gets smoother

So far we’ve worked with \(n = 25\). Let’s see what happens when \(n\) gets much larger. We’ll keep \(p = 0.5\) and try \(n = 100\) and then \(n = 900\).

plot_binom(100, 0.5)

plot_binom(900, 0.5)

Two things to notice. First, the distribution stays mound-shaped — in fact it looks even smoother than before. Second, when \(n = 25\) you can see each individual bar (a “staircase”), but by the time \(n = 900\) the bars are so thin and there are so many of them that the histogram looks like a smooth curve.

Now do Exercise 5.

The normal approximation

What you’re observing is one of the most important results in statistics: when \(n\) is large, the binomial distribution is approximately normal, with the same mean \(\mu = np\) and the same standard deviation \(\sigma = \sqrt{np(1-p)}\). This is a special case of the Central Limit Theorem, which we will study formally in class soon.

We can see the approximation visually by drawing the binomial distribution and laying a normal curve on top. To do this, copy and paste the following code into the console:

n <- 100
p <- 0.5
mu    <- n * p
sigma <- sqrt(n * p * (1 - p))

x <- 0:n
probs <- dbinom(x, size = n, prob = p)

# plot(..., type = "h") draws a vertical line ("histogram-like") at each x
plot(x, probs, type = "h", lwd = 2,
     xlab = "x", ylab = "P(X = x)",
     main = paste0("Binomial(", n, ", ", p, ") with Normal overlay"))

# add the normal curve with the same mean and sd
curve(dnorm(x, mean = mu, sd = sigma),
      add = TRUE, col = "red", lwd = 2)

The black vertical lines are the binomial probabilities \(P(X = x)\), and the red curve is the normal density with the same \(\mu\) and \(\sigma\). You should see that the curve tracks the tops of the lines almost perfectly.

Now do Exercise 6.

When does the approximation work?

A common rule of thumb is that the normal approximation to the binomial is good when both \(np \geq 10\) and \(n(1-p) \geq 10\). This rule combines the two things you’ve now seen: you need \(n\) to be reasonably large, and \(p\) shouldn’t be too close to 0 or 1 (otherwise the distribution is skewed and the normal curve fits badly).

Now do Exercise 7.

Putting it to use

Now do Exercise 8.

Acknowledgements

This lab was written for MATH 106 by Benjamin Jackson with input from Ross Magi to introduce the binomial distribution and motivate the normal approximation. It adapts ideas from an earlier Excel-based lab written by Jonathan Duncan on the same topic.

MATH 106: Introduction to Statistics

Lab 4: The Binomial Distribution