Bernoulli Distribution - binary sample space

The simplest and most important distribution or \(\boxed{\text{Ber}(p)}\) or \(\text{Bernoulli}(p)\). Has a binary outcome; yes/no, success/failure, heads/tails. Modeled as a random variable \(X\sim\text{Ber}(p)\) with possible values of \(0\) and \(1\), \(X\in[0,1]\) each with the probabilities… \(P(X-1)=p\) and \(P(X=0)=1-p\) where typically \(X_i\in\{0,1\}\).

A simple model for the Bernoulli distribution is to flip a coin with probability \(p\) of heads, with \(X = 1\) on heads and \(X = 0\) on tails. The general terminology is to say \(X\) is \(1\) on success and \(0\) on failure, with success and failure defined by the context. Many decisions can be modeled as a binary choice, such as votes for or against a proposal. If \(p\) is the proportion of the voting population that favors the proposal, than the vote of a random individual is modeled by a \(Ber(p)\).

Binomial Distribution - multiple independent Bernoulli trials

The binomial distribution \(\boxed{Bin(n,p)}\) or \(Binomial(n,p)\), models the number of successes in \(n\) independent \(Bernoulli(p)\) trials. A single Bernoulli trial is, say, one toss of a coin. A single binomial trial (typically \(X=\sum_iX_i\in\{0,1\dots n\}\)) consists of \(n\) Bernoulli trials \(\{X_1,X_2,X_3,\dots X_n\}\) where \(X_i\in\{0,1\}\).

  • dbinom(x, size, prob, log = FALSE) gives the density
  • pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE) gives the distribution function
  • qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE) gives the quantile function
  • rbinom(n, size, prob) generates random deviates

probability distribution: \[\boxed{P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k}}\] For coin flips the sample space for a Bernoulli trial is \({H, T }\). The sample space for a binomial trial is all sequences of heads and tails of length \(n\). Likewise a Bernoulli random variable takes values \(0\) and \(1\) and a binomial random variables takes values \(0, 1, 2, \dots, n\). \[ \begin{align} P(X = k) &= \binom{n}{k} p^k (1 - p)^{n-k}\\ \text{where}&\dots \\ p^k&\quad\text{is the probability of }k\text{ heads} \\ (1 - p)^{n-k}&\quad\text{is the probability of}n-k\text{ tails} \\ \binom{n}{k}&\stackrel{\:\text{is the number of different combinations of}} {\quad k\:\text{heads and}\:n-k\:\text{tails}} \end{align} \]
Expected Value: \[\boxed{E[X_{Bin}]=n\cdot p}\] Expectation value is linear for each bernoulli trial i.e. if \(X=X_1+X_2+X_3\dots\) then \(E[X]=E[X_1]+E[X_2]+E[X_3]\dots\). And for each Bernoulli trial \(E[X_i]=p\). So for n trials \(E[X_{Bin}]=n⋅p\)
Variance \[\boxed{\text{Var}[X_{Bin}]=n\cdot p\cdot(1-p)}\] For a single Bernoulli trial, \(Var(X_i)\equiv E[X_i^2]−E[X_i]^2\) where \(E[X_i^2]=E[X_i]\) (since \(\Omega=\{0,1\}\)) and \(E[X_i]^2=p^2\) therefore \(Var(X_{Bernoulli})\equiv p−p^2\). Variance is linear for each independent bernoulli trial i.e. if \(X=X_1+X_2+X_3\dots\) then \(\text{Var}[X]=\text{Var}(X_1)+\text{Var}(X_2)\dots\). And for each Bernoulli trial \(\text{Var}(X_i)=p\). So for n trials \(\text{Var}(X_{Bin})=n⋅p(1-p)\)
Maximum Likelihood Estimatate: \[\boxed{\hat{p}=\frac{k}{n}}\] \[ \begin{align} ln(P(k|p))&=ln\left(\binom{n}{k} p^k (1 - p)^{n-k}\right)\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ 0&=\frac{d}{d\hat{p}}\:ln\left(\binom{n}{k} \hat{p}^k (1 - \hat{p})^{n-k}\right)\\ 0&=\frac{d}{d\hat{p}}\: k\:ln(\hat{p}) + (n-k)\:ln(1 - \hat{p})\\ 0&= \frac{k}{\hat{p}} - \frac{n-k}{1 - \hat{p}}\\ \frac{k}{\hat{p}} &= \frac{n-k}{1 - \hat{p}}\\ k(1-\hat{p}) &= \hat{p}(n-k)\\ k-k\hat{p} &= \hat{p}(n-k)\\ \hat{p}&=\frac{k}{n}\\ \end{align} \]
Maximum Likelihood Estimate (Numerical Example 1):
# Generate data
n_trials <- 10
n_observations <- 100
data <- data.table(y = rbinom(n_observations, size = n_trials, prob = 0.3), n = n_trials)

# define neqative log likelihood function
negative_log_likelihood <- function(p, data) {
  if (p <= 0 || p >= 1) return(Inf)  # Ensure valid probability range
   -sum(dbinom(data$y, size = data$n, prob = p, log = TRUE))
}

# numerical fit using conjugate gradient method
mle_result <- optim(par = 0.5, fn = negative_log_likelihood, data = data, method = "Brent", lower = 0, upper = 1)
## [1] "MLE estimate of p: 0.304"

Geometric Distribution - Bernoulli trial stopping probabilities

A geometric distribution \(\boxed{\text{geo}(p)}\) or \(\text{geometric}(p)\) models the number of successful benoulli trials (heads) before the first failure in a sequence of Bernoulli trials (or coin flips). Here \(X\in\mathbb{N}\) i.e. \(0,1,2,3\dots\).

  • dgeom(x, prob, log = FALSE) gives the density
  • pgeom(q, prob, lower.tail = TRUE, log.p = FALSE) gives the distribution function
  • qgeom(p, prob, lower.tail = TRUE, log.p = FALSE) gives the quantile function
  • rgeom(n, prob) generates random deviates
Reursive quartering of a quarter
Reursive quartering of a quarter

upto probability distribution: \[\boxed{P(X = k) = (1 - p)^{k}\cdot p}\] upto and including probability distribution: \[\boxed{P(X = k) = (1 - p)^{k}\cdot p}\]
  • Think of traversing down the branch of a probability tree for the relavent stopping criteria…
    • \((1 - p)^{k}\) is the probability of \(k\) successive tails
    • \(p\) is the probability of the final heads.
    • The probability of all these independent events is the product of the two.
Exclusive Expected Value - Number of Failures before first Successful trial: \[\boxed{E[X]=\frac{1-p}{p}}\]
  1. to compute \(E[X]\) we have to sum the infinite series \(E[X]=\sum^\infty_{k=0} k(1−p)^kp\)
  2. we know the sum of the geometric series: \(\sum_{k=0}^\infty x^k = \frac{1}{1-x}\)
  3. Differentiate both sides: \(\sum_{k=0}^\infty kx^{k-1} = \frac{1}{\left(1-x\right)^2}\)
  4. Multiply by \(x\): \(\sum_{k=0}^\infty kx^{k} = \frac{x}{\left(1-x\right)^2}\)
  5. Replace \(x\) by \(1 − p\): \(\sum_{k=0}^\infty k\left(1 − p\right)^{k} = \frac{1 − p}{p^2}\)
  6. Multiply by p: \(\sum_{k=0}^\infty k\left(1 − p\right)^kp = \frac{1 − p}{p}\) QED
Inclusive Expected Value - Number of Trials upto & including the First Success: \[\boxed{E[X]=\frac{1}{p}}\] TODO
variance: \[\boxed{\text{Var}[X]=\frac{1 − p}{p^2}}\]
upto and including Maximum Likelihood Estimate: \[\boxed{\hat{p}=\frac{n}{\sum_{i=1}^nk_i}}\]

$$ \[\begin{align} P(k_1\dots k_n|p)&=\Pi_{i=1}^n(1-p)^{k_i-1}p\\ ln(P(k_1\dots k_n|p))&=ln\left(\Pi_{i=1}^n(1-p)^{k_i-1}p\right)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^nln\left((1-p)^{k_i-1}p\right)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^n(k_i-1)\:ln((1-p))+ln(p)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^n\left[(k_i-1)\:ln((1-p))\right]+n\:ln(p)\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ 0&=\frac{d}{d\hat{p}}\left(\sum_{i=1}^n\left[(k_i-1)\:ln((1-\hat{p}))\right]+n\:ln(\hat{p})\right)\\ 0&=\sum_{i=1}^n\left[-\frac{k_i-1}{1-\hat{p}}\right]+\frac{n}{\hat{p}}\\ \frac{n}{\hat{p}}&=\sum_{i=1}^n\left[\frac{k_i-1}{1-\hat{p}}\right]\\ n(1-p)&=\hat{p}\sum_{i=1}^nk_i-1\\ n&=\hat{p}\left(\sum_{i=1}^nk_i-1+1\right)\\ \hat{p}&=\frac{n}{\sum_{i=1}^nk_i}\\ \end{align}\] $$

Maximum Likelihood Estimate (Numerical Example)
# Generate data
n_observations <- 100
data <- data.table(y = rgeom(n_observations, prob = 0.3))

# define neqative log likelihood function
negative_log_likelihood <- function(p, data) {
  if (p <= 0 || p >= 1) return(Inf)  # Ensure valid probability range
   -sum(dgeom(data$y, prob = p, log = TRUE))
}

# numerical fit using conjugate gradient method
mle_result <- optim(par = 0.5, fn = negative_log_likelihood, data = data, method = "Brent", lower = 0, upper = 1)
## [1] "MLE estimate of p: 0.287"

Poisson Distribution - many trials, \(n\gg E\langle X\rangle\) (i.e. very small \(p\))

The Poisson distribution \(\boxed{Pois(\lambda)}\) is the limit of the binomial distribution when the number of trials \(n\) becomes very large. The probability of success \(p\) in each trial becomes very small, but the expected number of successes \(\lambda\equiv\langle k\rangle=n\cdot p\) remains constant.

  • dpois(x, lambda) gives the density where x is a vector of (non-negative integer) quantiles
  • ppois(q, lambda) gives the distribution function q vector of quantiles.
  • qpois(p, lambda) gives the quantile function where p is a vector of probabilities.
  • rpois(n, lambda) generates random deviates where n is the number of random values to return.
carrot farm in minecraft
carrot farm in minecraft
probability distribution: \[\boxed{P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}}\]
  1. Start with the Binomial Distribution: The probability of \(k\) successes in \(n\) trials of a binomial distribution is: \[P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\]

  2. \(\lambda\equiv\langle k\rangle=n\cdot p\): Remember the binomial distribution has mean of \(E[X]=nâ‹…p\). Let the expected number of successes be \(\lambda\). Therefore, $p=.

  3. Substituting into the Binomial Formula: \[ \begin{align} P(X=k)&=\binom{n}{k}\left(\frac{\lambda}{n}\right)^k\left(1-\frac{\lambda}{n}\right)^{n-k}\\ &\;\text{where}\;\binom{n}{k}=\frac{n!}{k!(n-k)!}\\ P(X=k)&=\frac{n!}{k!(n-k)!}\left(\frac{\lambda}{n}\right)^k\left(1-\frac{\lambda}{n}\right)^{n-k}\\ \\&\text{consider terms as }n \to \infty:\\ &\qquad\lim_{n \to \infty} \frac{n!}{(n-k)!} \approx n^k\\ &\qquad\lim_{n \to \infty}\left(1-\frac{\lambda}{n}\right)^n\approx e^{-\lambda}\\ &\qquad\lim_{n \to \infty}\left(1-\frac{\lambda}{n}\right)^{-k}\approx 1\\\\ \therefore \qquad &\boxed{P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}}\\ \end{align} \]

Expected Value: \[\boxed{E[X]\equiv\lambda}\]
variance: \[\boxed{\text{Var}[X]=\dots}\]

Example: Thallium-207 \(\beta\) decay with Half-life: \(\lambda\approx 4.77\:\text{minutes}\), \(^{207}\text{Tl}\:\xrightarrow{\beta^-}\:^{204}\text{Pb}\)

Hypergeometric Distribution - Binomial without replacement

The \(\boxed{Hypergeometric(N,K,n)}\) distribution describes the probability of \(k\) successes (random draws for which the object drawn has a specified feature) in \(n\) draws, without replacement, from a finite population of size \(N\) that contains exactly \(K\) objects with that feature.

e.g. You have 10 coins in total \(N=10\) where 6 of them are heads \(K=6\) and 4 of them are tails. You randomly select \(n=5\) coins (without replacement). The hypergeometric distribution can calculate the probability of selecting exactly \(k\) coins that show heads. Here \(X\in{\{0,1\dots n\}}\)

  • dhyper(x, m, n, k, log = FALSE) gives the density
  • phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) gives the distribution function
  • qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE) gives the quantile function
  • rhyper(nn, m, n, k) generates random deviates

capture/recapture is used for estimating wild animal populations

probability distribution: \[\boxed{P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}}\]
  • \(\binom{K}{k}\) is the number of ways to choose \(k\) successes from \(K\) successes
  • \(\binom{N-K}{n-k}\) is the number of ways to choose \(n-k\) failures from \(N-K\) failures
  • \(\binom{N}{n}\) is the number of ways to chose \(n\) elements from \(N\)
  • Probability of \(k\) successes is: the ways to choose \(k\) successes and \(n-k\) failures divided by the ways to choose \(n\) elements
Expected Value: \[\boxed{E}\] TODO
Variance: \[\boxed{\text{Var}=}\] TODO
Maximum Likelihood Estimate (Numerical Example)

The capture/recapture method is a way to estimate the size of a population in the wild. The method assumes that each animal in the population is equally likely to be captured by a trap. Suppose \(K=10\) animals are captured, tagged and released. A few months later, \(n=20\) animals are captured, examined, and released. \(k=4\) of these 20 are found to be tagged. Estimate the size \(N=?\) of the wild population using the MLE for the probability that a wild animal is tagged.

\[ \begin{align} P(k,n,K|N)&=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\\ &=\frac{\binom{10}{4}\binom{N-10}{20-4}}{\binom{N}{20}}=\frac{\binom{10}{4}\binom{N-10}{16}}{\binom{N}{20}}\\ \end{align} \]

# define neqative log likelihood function
negative_log_likelihood <- function(N) {
  if (N <= 20) return(Inf)  # Avoid invalid values
  return(-log(choose(10,4) * choose(N-10,16) / choose(N,20)))
}

# numerical fit using conjugate gradient method
mle_result <- optim(par = 100, fn = negative_log_likelihood, method = "Brent", lower = 10, upper = 1000)
## [1] "MLE estimate of wild animal population: 49.4950014654967"

Multinomial distribution - generalisation of binomial dist. for non-binary trial

The \(\text{mult}(n,p_1\dots p_k)\) \(\mathcal{M}(n,\vec{p})\) where \(\sum_k p_i = 1\) models the probability of counts for each side of a \(k\)-sided dice rolled \(n\) times.

N <- 100  # Number of trials
p <- c(1/6, 3/6, 2/6)  # Prob. for each face

pEven <- dmultinom(c(33,33,34), size = N, prob = p); # even dist.
pExtreme1 <- dmultinom(c(100,0,0), size = N, prob = p); # all first
pNo1 <- dmultinom(c(0,50,50), size = N, prob = p); # all 2nd and 3rd

probability distribution: \[\boxed{P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}}\]
  • \(\binom{K}{k}\) is the number of ways to choose \(k\) successes from \(K\) successes
  • \(\binom{N-K}{n-k}\) is the number of ways to choose \(n-k\) failures from \(N-K\) failures
  • \(\binom{N}{n}\) is the number of ways to chose \(n\) elements from \(N\)
  • Probability of \(k\) successes is: the ways to choose \(k\) successes and \(n-k\) failures divided by the ways to choose \(n\) elements
Expected Value: \[\boxed{E}\] TODO

https://en.wikipedia.org/wiki/Hardy%e2%80%93Weinberg_principle

Consider a population of monoecious diploids, where each organism produces male and female gametes at equal frequency, and has two alleles at each gene locus.

Genotype AA Aa (or aA) aa
Probability \(\theta^2\) \(2\theta(1-\theta)\) \((1-\theta)^2\)

Suppose we test a random sample of people and find that \(k_1\) are AA, \(k_2\) are Aa, and \(k_3\) are aa. Find the MLE of \(\theta\).

Boltzmann Distribution - TODO