1 Definitions and background
2 Simulating random numbers from the binomial distribution
3 The binomial distribution
- 3.1 Varying p
- 3.2 Varying N
4 Calculating the likelihood
5 Maximum likelihood estimation

1 Definitions and background

A Bernoulli trial is an experiment with two potential outcomes, e.g. yes/no, success/failure, dead/alive, head/tail etc.

The Binomial distribution gives the probability for x successes in a sequence of N independent Bernoulli trials, when the probability of success in each trial equals p.

The parameters of a binomial distribution are:

N … the number of trials (size in R)
p … the probability of success in each trial (prob in R)

Links:

2 Simulating random numbers from the binomial distribution

For random number generation the rbinom() function is used. The following command generates 30 random numbers from a binomial distribution with p = 0.5 and N = 10. The simulated data set will differ for each simulation run due to stochasticity. The simulation could for example represent 30 students who each toss 10 coins (N = 10) and count the number of coins showing a head (p = 0.5) With the function table() you can count how often each number of successes (i.e. coin showing head) occurred.

sample_size <- 30
p <- 0.5
N <- 10
dat1 <- rbinom(sample_size, prob = p, size = N)
table(dat1)

## dat1
##  2  3  4  5  6  7  8  9 
##  1  2  3 11  7  4  1  1

Now the data are visualised using barplot. The conversion to a factor is just used for visualization.

dat1_fac <- factor(dat1, levels = 0:N)
par(las = 1, cex.lab = 1.2)
barplot(table(dat1_fac), xlab = "No. of successes", ylab = "Frequency")

3 The binomial distribution

In contrast to the function rbinom(), which generates random numbers, the function dbinom() provides the probability for x successes given the parameters p and N. The d in the function name is derived from density, since for continuous distributions function, i.e. the normal distributions, this function is called probability density function. Unfortunately for discrete distributions including the binomial distributions this function is called probability mass function. That means even if this terminology is confusing the function dbinom() provides the probability mass function of the binomial distribution.

First, we visualize this function for one specific parameter set. Note that the probabilities have to some up to one by definition. When you toss ten coins you know that there will be between zero and ten heads, i.e. the probability of a result between zero and ten equals one.

p <- 0.5
N <- 10
binom1 <- dbinom(x = 0:10, prob = p, size = N)
sum(binom1)

## [1] 1

par(las = 1, cex.lab = 1.2)
plot(x = 0:10, y = binom1, type = "h", col = "red", xlab = "No. of successes",
     ylab = "Probability")
points(x = 0:10, y = binom1, col = "red", pch = 19)

3.1 Varying p

In the next step, we assess how the function looks like for different values of p.

N <- 10
p_vec <- c(0.2, 0.5, 0.9)

x <- 0:N
par(las = 1, cex.lab = 1.2)
plot(x, y = dbinom(x, prob = p_vec[1], size = N), type = "n",
     xlab = "No. of successes", ylab = "Probability", ylim = c(0, 0.5))
for (i in 1:length(p_vec)){
   points(x, y = dbinom(x, prob = p_vec[i], size = N), col = i, type = "b",
          pch = 19, lty = 2)
}
legend("top", legend = p_vec, col = 1:3, pch = 19, title = "p")

3.2 Varying N

Now we check how the function looks like for different values of N.

N_vec <- c(5, 10, 20)
p <- 0.5

x <- 0:20
par(las = 1, cex.lab = 1.2)
plot(x, y = dbinom(x, prob = p, size = N_vec[1]), type = "n",
     xlab = "No. of successes", ylab = "Probability")
for (i in 1:length(N_vec)){
   points(x, y = dbinom(x, prob = p, size = N_vec[i]), col = i, type = "b",
          pch = 19, lty = 2)
}
legend("topright", legend = N_vec, col = 1:3, pch = 19, title = "N")

4 Calculating the likelihood

The likelihood is defined as the probability of the data given the model. With dbinom() we obtain the probability for each single sample, assuming a certain parameter value of p. In this case we know the true value of p, because we simulated the “data”. However, when we have real data we usually do not know the true value of p. The parameter N is usually defined by the design of the experiment or sampling.

dbinom(x = dat1, prob = 0.5, size = 10)

##  [1] 0.117187500 0.009765625 0.246093750 0.205078125 0.117187500
##  [6] 0.117187500 0.246093750 0.246093750 0.043945312 0.205078125
## [11] 0.205078125 0.246093750 0.205078125 0.246093750 0.043945312
## [16] 0.205078125 0.246093750 0.246093750 0.246093750 0.205078125
## [21] 0.246093750 0.117187500 0.205078125 0.205078125 0.246093750
## [26] 0.205078125 0.205078125 0.246093750 0.117187500 0.117187500

However, we want to know the likelihood of the entire data set and not of each individual data point. We assume that each sample in the data set is independent. For independent values we obtain the overall likelihood of the data set by multiplying the single values.

prod(dbinom(x = dat1, prob = 0.5, size = 10))

## [1] 1.288584e-24

Apparently, this is a very small number. In order to avoid numerical problems usually the log-likelihood is calculated, which provides values that are numerically more tractable (i.e. negative values that are not as close to zero.)

Remember the rule log(a*b) = log(a) + log(b)

This indicates that we have to sum up the single sample log-likelihoods instead of multiplying the untransformed likelihood values. Note the use of the log = T argument in the dbinom() function.

sum(dbinom(x = dat1, prob = 0.5, size = 10, log = T))

## [1] -55.0085

Of course this just gives the likelihood for a specific parameter value. To be able to calculate the likelihood for any parameter values, we define a function for the log-likelihood with the parameters and the data as arguments.

log_likelihood <- function(par_p, par_N, data)
{
   LL <- sum(dbinom(x = data, prob = par_p, size = par_N, log = T))
   return(LL)
}

log_likelihood(par_p = 0.5, par_N = 10, data = dat1)

## [1] -55.0085

5 Maximum likelihood estimation

Finally, we want to derive the Maximum likelihood estimate of p. This means we would like to find the model, or parameter set respectively, that most likely generated the data.

To understand this better, we first visualize the data with models of different parameter values. For this purpose the observed frequencies in the data are converted to proportions by dividing by the sample size. Note that the frequencies sum up to the sample size (30 in this example), while the proportions sum up to one. This means the conversion from frequencies to proportions standardizes the data and the theoretical probabilities to the same scale.

par(las = 1, cex.lab = 1.2)
prop_dat <- table(dat1_fac)/sample_size
xbars <- barplot(prop_dat, xlab = "No. of successes",
                 ylab = "Proportion", ylim = c(0, 0.4))

p_vec <- seq(0.2, 0.8, by = 0.1)
col_vec <- rainbow(length(p_vec))
N <- 10
x <- 0:10
for (i in 1:length(p_vec))
   points(xbars, dbinom(x, prob = p_vec[i], size = N), type = "b", col = col_vec[i], lty = 2,
          pch = 19)

This figure gives a visual impression which parameter values might be useful, or in other words, which parameter values most likely generated the data. To derive the maximum likelihood estimate of p, we evaluate the likelihood function for a sequence of parameter values using a fine step size. Note that the maximum of the log-likelihood is identical to the maximum of the likelihood, because the log-function is a monotonously increasing function.

p_vec <- seq(0, 1, by = 0.01)
logLik <- sapply(p_vec, FUN = log_likelihood, par_N = 10, data = dat1)

par(las = 1, cex.lab = 1.2)
plot(p_vec, logLik, type = "l", xlab = "p", ylab = "log-Likelihood")

imax <- which.max(logLik)
p_MLE <- p_vec[imax]
p_MLE

## [1] 0.54

points(p_MLE, max(logLik), pch = 19, col = "red")
abline(v = p_MLE, col = "red")

Obviously the maximum likelihood estimate is close to, but differs from the true value used for the data generation. However, the larger the sample size, the lower this difference will be (on average).

With the code provided in this script you can assess the convergence of the estimate to the true value with increasing sample size.

Finally, we plot the data with the theoretical underlying distribution as well as with the distribution based on the MLE estimate

par(las = 1, cex.lab = 1.2)
xbars <- barplot(prop_dat, xlab = "No. of successes",
                 ylab = "Proportion", ylim = c(0, 0.4))
points(xbars, dbinom(x, prob = 0.5, size = 10), type = "b", col = "blue",
       pch = 19, lty = 2)
points(xbars, dbinom(x, prob = p_MLE, size = 10), type = "b", col = "red",
       pch = 19, lty = 2)
legend("topright", legend = c("Data","True p","MLE p"),
       col = c("gray", "blue","red"), pch = c(15, 19, 19))

Maximum likelihood estimation of the binomial distribution parameter

Felix May

13 January 2016