The Bernoulli distribution

The Bernoulli distribution arises as the result of a binary outcome
Bernoulli random variables take (only) the values 1 and 0 with probabilities of (say) \( p \) and \( 1-p \) respectively
The PMF for a Bernoulli random variable \( X \) is \[ P(X = x) = p^x (1 - p)^{1 - x} \]
The mean of a Bernoulli random variable is \( p \) and the variance is \( p(1 - p) \)
If we let \( X \) be a Bernoulli random variable, it is typical to call \( X=1 \) as a “success” and \( X=0 \) as a “failure”

iid Bernoulli trials

If several iid Bernoulli observations, say \( x_1,\ldots, x_n \), are observed the likelihood is \[ \prod_{i=1}^n p^{x_i} (1 - p)^{1 - x_i} = p^{\sum x_i} (1 - p)^{n - \sum x_i} \]
Notice that the likelihood depends only on the sum of the \( x_i \)
Because \( n \) is fixed and assumed known, this implies that the sample proportion \( \sum_i x_i / n \) contains all of the relevant information about \( p \)
We can maximize the Bernoulli likelihood over \( p \) to obtain that \( \hat p = \sum_i x_i / n \) is the maximum likelihood estimator for \( p \)

Plotting all possible likelihoods for a small n

n <- 5
pvals <- seq(0, 1, length = 1000)
plot(c(0, 1), c(0, 1.2), type = "n", frame = FALSE, xlab = "p", ylab = "likelihood")
text((0 : n) /n, 1.1, as.character(0 : n))
sapply(0 : n, function(x) {
  phat <- x / n
  if (x == 0) lines(pvals,  ( (1 - pvals) / (1 - phat) )^(n-x), lwd = 3)
  else if (x == n) lines(pvals, (pvals / phat) ^ x, lwd = 3)
  else lines(pvals, (pvals / phat ) ^ x * ( (1 - pvals) / (1 - phat) ) ^ (n-x), lwd = 3) 
  }
)
title(paste("Likelihoods for n = ", n))

Binomial trials

The binomial random variables are obtained as the sum of iid Bernoulli trials
In specific, let \( X_1,\ldots,X_n \) be iid Bernoulli\( (p) \); then \( X = \sum_{i=1}^n X_i \) is a binomial random variable
The binomial mass function is \[ P(X = x) = \left( \begin{array}{c} n \\ x \end{array} \right) p^x(1 - p)^{n-x} \] for \( x=0,\ldots,n \)

Choose

Recall that the notation \[ \left( \begin{array}{c} n \\ x \end{array} \right) = \frac{n!}{x!(n-x)!} \] (read “\( n \) choose \( x \)”) counts the number of ways of selecting \( x \) items out of \( n \) without replacement disregarding the order of the items

\[ \left( \begin{array}{c} n \\ 0 \end{array} \right) = \left( \begin{array}{c} n \\ n \end{array} \right) = 1 \]

Example justification of the binomial likelihood

Consider the probability of getting \( 6 \) heads out of \( 10 \) coin flips from a coin with success probability \( p \)
The probability of getting \( 6 \) heads and \( 4 \) tails in any specific order is \[ p^6(1-p)^4 \]
There are \[ \left( \begin{array}{c} 10 \\ 6 \end{array} \right) \] possible orders of \( 6 \) heads and \( 4 \) tails

Example

Suppose a friend has \( 8 \) children (oh my!), \( 7 \) of which are girls and none are twins
If each gender has an independent \( 50 \)% probability for each birth, what's the probability of getting \( 7 \) or more girls out of \( 8 \) births? \[ \left( \begin{array}{c} 8 \\ 7 \end{array} \right) .5^{7}(1-.5)^{1} + \left( \begin{array}{c} 8 \\ 8 \end{array} \right) .5^{8}(1-.5)^{0} \approx 0.04 \]

choose(8, 7) * .5 ^ 8 + choose(8, 8) * .5 ^ 8

[1] 0.03516

pbinom(6, size = 8, prob = .5, lower.tail = FALSE)

[1] 0.03516

plot(pvals, dbinom(7, 8, pvals) / dbinom(7, 8, 7/8) , 
     lwd = 3, frame = FALSE, type = "l", xlab = "p", ylab = "likelihood")

The normal distribution

A random variable is said to follow a normal or Gaussian distribution with mean \( \mu \) and variance \( \sigma^2 \) if the associated density is \[ (2\pi \sigma^2)^{-1/2}e^{-(x - \mu)^2/2\sigma^2} \] If \( X \) a RV with this density then \( E[X] = \mu \) and \( Var(X) = \sigma^2 \)
We write \( X\sim \mbox{N}(\mu, \sigma^2) \)
When \( \mu = 0 \) and \( \sigma = 1 \) the resulting distribution is called the standard normal distribution
The standard normal density function is labeled \( \phi \)
Standard normal RVs are often labeled \( Z \)

zvals <- seq(-3, 3, length = 1000)
plot(zvals, dnorm(zvals), 
     type = "l", lwd = 3, frame = FALSE, xlab = "z", ylab = "Density")
sapply(-3 : 3, function(k) abline(v = k))

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

[[7]]
NULL

Facts about the normal density

If \( X \sim \mbox{N}(\mu,\sigma^2) \) the \( Z = \frac{X -\mu}{\sigma} \) is standard normal
If \( Z \) is standard normal \[ X = \mu + \sigma Z \sim \mbox{N}(\mu, \sigma^2) \]
The non-standard normal density is \[ \phi\{(x - \mu) / \sigma\}/\sigma \]

More facts about the normal density

Approximately \( 68\% \), \( 95\% \) and \( 99\% \) of the normal density lies within \( 1 \), \( 2 \) and \( 3 \) standard deviations from the mean, respectively
\( -1.28 \), \( -1.645 \), \( -1.96 \) and \( -2.33 \) are the \( 10^{th} \), \( 5^{th} \), \( 2.5^{th} \) and \( 1^{st} \) percentiles of the standard normal distribution respectively
By symmetry, \( 1.28 \), \( 1.645 \), \( 1.96 \) and \( 2.33 \) are the \( 90^{th} \), \( 95^{th} \), \( 97.5^{th} \) and \( 99^{th} \) percentiles of the standard normal distribution respectively

Question

What is the \( 95^{th} \) percentile of a \( N(\mu, \sigma^2) \) distribution?
- Quick answer in R qnorm(.95, mean = mu, sd = sd)
We want the point \( x_0 \) so that \( P(X \leq x_0) = .95 \) \[ \begin{eqnarray*} P(X \leq x_0) & = & P\left(\frac{X - \mu}{\sigma} \leq \frac{x_0 - \mu}{\sigma}\right) \\ \\ & = & P\left(Z \leq \frac{x_0 - \mu}{\sigma}\right) = .95 \end{eqnarray*} \]
Therefore \[ \frac{x_0 - \mu}{\sigma} = 1.645 \] or \( x_0 = \mu + \sigma 1.645 \)
In general \( x_0 = \mu + \sigma z_0 \) where \( z_0 \) is the appropriate standard normal quantile

Question

What is the probability that a \( \mbox{N}(\mu,\sigma^2) \) RV is 2 standard deviations above the mean?
We want to know \[ \begin{eqnarray*} P(X > \mu + 2\sigma) & = & P\left(\frac{X -\mu}{\sigma} > \frac{\mu + 2\sigma - \mu}{\sigma}\right) \\ \\ & = & P(Z \geq 2 ) \\ \\ & \approx & 2.5\% \end{eqnarray*} \]

Other properties

The normal distribution is symmetric and peaked about its mean (therefore the mean, median and mode are all equal)
A constant times a normally distributed random variable is also normally distributed (what is the mean and variance?)
Sums of normally distributed random variables are again normally distributed even if the variables are dependent (what is the mean and variance?)
Sample means of normally distributed random variables are again normally distributed (with what mean and variance?)
The square of a standard normal random variable follows what is called chi-squared distribution
The exponent of a normally distributed random variables follows what is called the log-normal distribution
As we will see later, many random variables, properly normalized, limit to a normal distribution

Final thoughts on normal likelihoods

The MLE for \( \mu \) is \( \bar X \).
The MLE for \( \sigma^2 \) is \[ \frac{\sum_{i=1}^n (X_i - \bar X)^2}{n} \] (Which is the biased version of the sample variance.)
The MLE of \( \sigma \) is simply the square root of this estimate

The Poisson distribution

Used to model counts
The Poisson mass function is \[ P(X = x; \lambda) = \frac{\lambda^x e^{-\lambda}}{x!} \] for \( x=0,1,\ldots \)
The mean of this distribution is \( \lambda \)
The variance of this distribution is \( \lambda \)
Notice that \( x \) ranges from \( 0 \) to \( \infty \)

Some uses for the Poisson distribution

Modeling event/time data
Modeling radioactive decay
Modeling survival data
Modeling unbounded count data
Modeling contingency tables
Approximating binomials when \( n \) is large and \( p \) is small

Poisson derivation

\( \lambda \) is the mean number of events per unit time
Let \( h \) be very small
Suppose we assume that
- Prob. of an event in an interval of length \( h \) is \( \lambda h \) while the prob. of more than one event is negligible
- Whether or not an event occurs in one small interval does not impact whether or not an event occurs in another small interval then, the number of events per unit time is Poisson with mean \( \lambda \)

Rates and Poisson random variables

Poisson random variables are used to model rates
\( X \sim Poisson(\lambda t) \) where
- \( \lambda = E[X / t] \) is the expected count per unit of time
- \( t \) is the total monitoring time

Poisson approximation to the binomial

When \( n \) is large and \( p \) is small the Poisson distribution is an accurate approximation to the binomial distribution
Notation
- \( \lambda = n p \)
- \( X \sim \mbox{Binomial}(n, p) \), \( \lambda = n p \) and
- \( n \) gets large
- \( p \) gets small
- \( \lambda \) stays constant

Example

The number of people that show up at a bus stop is Poisson with a mean of \( 2.5 \) per hour.

If watching the bus stop for 4 hours, what is the probability that \( 3 \) or fewer people show up for the whole time?

ppois(3, lambda = 2.5 * 4)

[1] 0.01034

Example, Poisson approximation to the binomial

We flip a coin with success probablity \( 0.01 \) five hundred times.

What's the probability of 2 or fewer successes?

pbinom(2, size = 500, prob = .01)

[1] 0.1234

ppois(2, lambda=500 * .01)

[1] 0.1247