probability_distributions.knit

Bernoulli Distribution - binary sample space

The simplest and most important distribution or $\boxed{\text{Ber}(p)}$ or $\text{Bernoulli}(p)$. Has a binary outcome; yes/no, success/failure, heads/tails. Modeled as a random variable $X\sim\text{Ber}(p)$ with possible values of $0$ and $1$, $X\in[0,1]$ each with the probabilities… $P(X-1)=p$ and $P(X=0)=1-p$ where typically $X_i\in\{0,1\}$.

A simple model for the Bernoulli distribution is to flip a coin with probability $p$ of heads, with $X = 1$ on heads and $X = 0$ on tails. The general terminology is to say $X$ is $1$ on success and $0$ on failure, with success and failure defined by the context. Many decisions can be modeled as a binary choice, such as votes for or against a proposal. If $p$ is the proportion of the voting population that favors the proposal, than the vote of a random individual is modeled by a $Ber(p)$.

Binomial Distribution - multiple independent Bernoulli trials

The binomial distribution $\boxed{Bin(n,p)}$ or $Binomial(n,p)$, models the number of successes in $n$ independent $Bernoulli(p)$ trials. A single Bernoulli trial is, say, one toss of a coin. A single binomial trial (typically $X=\sum_iX_i\in\{0,1\dots n\}$) consists of $n$ Bernoulli trials $\{X_1,X_2,X_3,\dots X_n\}$ where $X_i\in\{0,1\}$.

dbinom(x, size, prob, log = FALSE) gives the density
pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE) gives the distribution function
qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE) gives the quantile function
rbinom(n, size, prob) generates random deviates

probability distribution: \[\boxed{P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k}}\]

For coin flips the sample space for a Bernoulli trial is ${H, T }$. The sample space for a binomial trial is all sequences of heads and tails of length $n$. Likewise a Bernoulli random variable takes values $0$ and $1$ and a binomial random variables takes values $0, 1, 2, \dots, n$. \[ \begin{align} P(X = k) &= \binom{n}{k} p^k (1 - p)^{n-k}\\ \text{where}&\dots \\ p^k&\quad\text{is the probability of }k\text{ heads} \\ (1 - p)^{n-k}&\quad\text{is the probability of}n-k\text{ tails} \\ \binom{n}{k}&\stackrel{\:\text{is the number of different combinations of}} {\quad k\:\text{heads and}\:n-k\:\text{tails}} \end{align} \]

Expected Value: \[\boxed{E[X_{Bin}]=n\cdot p}\]

Expectation value is linear for each bernoulli trial i.e. if $X=X_1+X_2+X_3\dots$ then $E[X]=E[X_1]+E[X_2]+E[X_3]\dots$. And for each Bernoulli trial $E[X_i]=p$. So for n trials $E[X_{Bin}]=n⋅p$

Variance \[\boxed{\text{Var}[X_{Bin}]=n\cdot p\cdot(1-p)}\]

For a single Bernoulli trial, $Var(X_i)\equiv E[X_i^2]−E[X_i]^2$ where $E[X_i^2]=E[X_i]$ (since $\Omega=\{0,1\}$) and $E[X_i]^2=p^2$ therefore $Var(X_{Bernoulli})\equiv p−p^2$. Variance is linear for each independent bernoulli trial i.e. if $X=X_1+X_2+X_3\dots$ then $\text{Var}[X]=\text{Var}(X_1)+\text{Var}(X_2)\dots$. And for each Bernoulli trial $\text{Var}(X_i)=p$. So for n trials $\text{Var}(X_{Bin})=n⋅p(1-p)$

Maximum Likelihood Estimatate: \[\boxed{\hat{p}=\frac{k}{n}}\]

\[ \begin{align} ln(P(k|p))&=ln\left(\binom{n}{k} p^k (1 - p)^{n-k}\right)\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ 0&=\frac{d}{d\hat{p}}\:ln\left(\binom{n}{k} \hat{p}^k (1 - \hat{p})^{n-k}\right)\\ 0&=\frac{d}{d\hat{p}}\: k\:ln(\hat{p}) + (n-k)\:ln(1 - \hat{p})\\ 0&= \frac{k}{\hat{p}} - \frac{n-k}{1 - \hat{p}}\\ \frac{k}{\hat{p}} &= \frac{n-k}{1 - \hat{p}}\\ k(1-\hat{p}) &= \hat{p}(n-k)\\ k-k\hat{p} &= \hat{p}(n-k)\\ \hat{p}&=\frac{k}{n}\\ \end{align} \]

Maximum Likelihood Estimate (Numerical Example 1):

# Generate data
n_trials <- 10
n_observations <- 100
data <- data.table(y = rbinom(n_observations, size = n_trials, prob = 0.3), n = n_trials)

# define neqative log likelihood function
negative_log_likelihood <- function(p, data) {
  if (p <= 0 || p >= 1) return(Inf)  # Ensure valid probability range
   -sum(dbinom(data$y, size = data$n, prob = p, log = TRUE))
}

# numerical fit using conjugate gradient method
mle_result <- optim(par = 0.5, fn = negative_log_likelihood, data = data, method = "Brent", lower = 0, upper = 1)

## [1] "MLE estimate of p: 0.304"

Geometric Distribution - Bernoulli trial stopping probabilities

A geometric distribution $\boxed{\text{geo}(p)}$ or $\text{geometric}(p)$ models the number of successful benoulli trials (heads) before the first failure in a sequence of Bernoulli trials (or coin flips). Here $X\in\mathbb{N}$ i.e. $0,1,2,3\dots$.

dgeom(x, prob, log = FALSE) gives the density
pgeom(q, prob, lower.tail = TRUE, log.p = FALSE) gives the distribution function
qgeom(p, prob, lower.tail = TRUE, log.p = FALSE) gives the quantile function
rgeom(n, prob) generates random deviates

Reursive quartering of a quarter

upto probability distribution: \[\boxed{P(X = k) = (1 - p)^{k}\cdot p}\] upto and including probability distribution: \[\boxed{P(X = k) = (1 - p)^{k}\cdot p}\]

Think of traversing down the branch of a probability tree for the relavent stopping criteria…
- $(1 - p)^{k}$ is the probability of $k$ successive tails
- $p$ is the probability of the final heads.
- The probability of all these independent events is the product of the two.

Exclusive Expected Value - Number of Failures before first Successful trial: \[\boxed{E[X]=\frac{1-p}{p}}\]

to compute $E[X]$ we have to sum the infinite series $E[X]=\sum^\infty_{k=0} k(1−p)^kp$
we know the sum of the geometric series: $\sum_{k=0}^\infty x^k = \frac{1}{1-x}$
Differentiate both sides: $\sum_{k=0}^\infty kx^{k-1} = \frac{1}{\left(1-x\right)^2}$
Multiply by $x$: $\sum_{k=0}^\infty kx^{k} = \frac{x}{\left(1-x\right)^2}$
Replace $x$ by $1 − p$: $\sum_{k=0}^\infty k\left(1 − p\right)^{k} = \frac{1 − p}{p^2}$
Multiply by p: $\sum_{k=0}^\infty k\left(1 − p\right)^kp = \frac{1 − p}{p}$ QED

Inclusive Expected Value - Number of Trials upto & including the First Success: \[\boxed{E[X]=\frac{1}{p}}\]

TODO

variance: \[\boxed{\text{Var}[X]=\frac{1 − p}{p^2}}\]

upto and including Maximum Likelihood Estimate: \[\boxed{\hat{p}=\frac{n}{\sum_{i=1}^nk_i}}\]

$$ \[\begin{align} P(k_1\dots k_n|p)&=\Pi_{i=1}^n(1-p)^{k_i-1}p\\ ln(P(k_1\dots k_n|p))&=ln\left(\Pi_{i=1}^n(1-p)^{k_i-1}p\right)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^nln\left((1-p)^{k_i-1}p\right)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^n(k_i-1)\:ln((1-p))+ln(p)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^n\left[(k_i-1)\:ln((1-p))\right]+n\:ln(p)\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ 0&=\frac{d}{d\hat{p}}\left(\sum_{i=1}^n\left[(k_i-1)\:ln((1-\hat{p}))\right]+n\:ln(\hat{p})\right)\\ 0&=\sum_{i=1}^n\left[-\frac{k_i-1}{1-\hat{p}}\right]+\frac{n}{\hat{p}}\\ \frac{n}{\hat{p}}&=\sum_{i=1}^n\left[\frac{k_i-1}{1-\hat{p}}\right]\\ n(1-p)&=\hat{p}\sum_{i=1}^nk_i-1\\ n&=\hat{p}\left(\sum_{i=1}^nk_i-1+1\right)\\ \hat{p}&=\frac{n}{\sum_{i=1}^nk_i}\\ \end{align}\] $$

Maximum Likelihood Estimate (Numerical Example)

# Generate data
n_observations <- 100
data <- data.table(y = rgeom(n_observations, prob = 0.3))

# define neqative log likelihood function
negative_log_likelihood <- function(p, data) {
  if (p <= 0 || p >= 1) return(Inf)  # Ensure valid probability range
   -sum(dgeom(data$y, prob = p, log = TRUE))
}

# numerical fit using conjugate gradient method
mle_result <- optim(par = 0.5, fn = negative_log_likelihood, data = data, method = "Brent", lower = 0, upper = 1)

## [1] "MLE estimate of p: 0.287"

Poisson Distribution - many trials, $n\gg E\langle X\rangle$ (i.e. very small $p$)

The Poisson distribution $\boxed{Pois(\lambda)}$ is the limit of the binomial distribution when the number of trials $n$ becomes very large. The probability of success $p$ in each trial becomes very small, but the expected number of successes $\lambda\equiv\langle k\rangle=n\cdot p$ remains constant.

dpois(x, lambda) gives the density where x is a vector of (non-negative integer) quantiles
ppois(q, lambda) gives the distribution function q vector of quantiles.
qpois(p, lambda) gives the quantile function where p is a vector of probabilities.
rpois(n, lambda) generates random deviates where n is the number of random values to return.

$carrot farm in minecraft$

carrot farm in minecraft

probability distribution: \[\boxed{P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}}\]

Start with the Binomial Distribution: The probability of $k$ successes in $n$ trials of a binomial distribution is: \[P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\]
$\lambda\equiv\langle k\rangle=n\cdot p$: Remember the binomial distribution has mean of $E[X]=n⋅p$. Let the expected number of successes be $\lambda$. Therefore, $p=.
Substituting into the Binomial Formula: \[ \begin{align} P(X=k)&=\binom{n}{k}\left(\frac{\lambda}{n}\right)^k\left(1-\frac{\lambda}{n}\right)^{n-k}\\ &\;\text{where}\;\binom{n}{k}=\frac{n!}{k!(n-k)!}\\ P(X=k)&=\frac{n!}{k!(n-k)!}\left(\frac{\lambda}{n}\right)^k\left(1-\frac{\lambda}{n}\right)^{n-k}\\ \\&\text{consider terms as }n \to \infty:\\ &\qquad\lim_{n \to \infty} \frac{n!}{(n-k)!} \approx n^k\\ &\qquad\lim_{n \to \infty}\left(1-\frac{\lambda}{n}\right)^n\approx e^{-\lambda}\\ &\qquad\lim_{n \to \infty}\left(1-\frac{\lambda}{n}\right)^{-k}\approx 1\\\\ \therefore \qquad &\boxed{P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}}\\ \end{align} \]

Expected Value: \[\boxed{E[X]\equiv\lambda}\]

variance: \[\boxed{\text{Var}[X]=\dots}\]

Example: Thallium-207 $\beta$ decay with Half-life: $\lambda\approx 4.77\:\text{minutes}$, $^{207}\text{Tl}\:\xrightarrow{\beta^-}\:^{204}\text{Pb}$

Hypergeometric Distribution - Binomial without replacement

The $\boxed{Hypergeometric(N,K,n)}$ distribution describes the probability of $k$ successes (random draws for which the object drawn has a specified feature) in $n$ draws, without replacement, from a finite population of size $N$ that contains exactly $K$ objects with that feature.

$x+m=N$ the population number (sum of $K$ successes and $N-K$ failures)
$x=K$ number of successes in the population $N$
$m=N-K$ the number of failures in the population $N$
$n$ is the number drawn (without replacement) and

e.g. You have 10 coins in total $N=10$ where 6 of them are heads $K=6$ and 4 of them are tails. You randomly select $n=5$ coins (without replacement). The hypergeometric distribution can calculate the probability of selecting exactly $k$ coins that show heads. Here $X\in{\{0,1\dots n\}}$

dhyper(x, m, n, k, log = FALSE) gives the density
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) gives the distribution function
qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE) gives the quantile function
rhyper(nn, m, n, k) generates random deviates

capture/recapture is used for estimating wild animal populations

probability distribution: \[\boxed{P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}}\]

$\binom{K}{k}$ is the number of ways to choose $k$ successes from $K$ successes
$\binom{N-K}{n-k}$ is the number of ways to choose $n-k$ failures from $N-K$ failures
$\binom{N}{n}$ is the number of ways to chose $n$ elements from $N$
Probability of $k$ successes is: the ways to choose $k$ successes and $n-k$ failures divided by the ways to choose $n$ elements

Expected Value: \[\boxed{E}\]

TODO

Variance: \[\boxed{\text{Var}=}\]

TODO

Maximum Likelihood Estimate (Numerical Example)

The capture/recapture method is a way to estimate the size of a population in the wild. The method assumes that each animal in the population is equally likely to be captured by a trap. Suppose $K=10$ animals are captured, tagged and released. A few months later, $n=20$ animals are captured, examined, and released. $k=4$ of these 20 are found to be tagged. Estimate the size $N=?$ of the wild population using the MLE for the probability that a wild animal is tagged.

\[ \begin{align} P(k,n,K|N)&=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\\ &=\frac{\binom{10}{4}\binom{N-10}{20-4}}{\binom{N}{20}}=\frac{\binom{10}{4}\binom{N-10}{16}}{\binom{N}{20}}\\ \end{align} \]

# define neqative log likelihood function
negative_log_likelihood <- function(N) {
  if (N <= 20) return(Inf)  # Avoid invalid values
  return(-log(choose(10,4) * choose(N-10,16) / choose(N,20)))
}

# numerical fit using conjugate gradient method
mle_result <- optim(par = 100, fn = negative_log_likelihood, method = "Brent", lower = 10, upper = 1000)

## [1] "MLE estimate of wild animal population: 49.4950014654967"

Multinomial distribution - generalisation of binomial dist. for non-binary trial

The $\text{mult}(n,p_1\dots p_k)$ $\mathcal{M}(n,\vec{p})$ where $\sum_k p_i = 1$ models the probability of counts for each side of a $k$-sided dice rolled $n$ times.

rmultinom(n, size, prob)
dmultinom(x, size = NULL, prob, log = FALSE)

N <- 100  # Number of trials
p <- c(1/6, 3/6, 2/6)  # Prob. for each face

pEven <- dmultinom(c(33,33,34), size = N, prob = p); # even dist.
pExtreme1 <- dmultinom(c(100,0,0), size = N, prob = p); # all first
pNo1 <- dmultinom(c(0,50,50), size = N, prob = p); # all 2nd and 3rd

probability distribution: \[\boxed{P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}}\]

$\binom{K}{k}$ is the number of ways to choose $k$ successes from $K$ successes
$\binom{N-K}{n-k}$ is the number of ways to choose $n-k$ failures from $N-K$ failures
$\binom{N}{n}$ is the number of ways to chose $n$ elements from $N$
Probability of $k$ successes is: the ways to choose $k$ successes and $n-k$ failures divided by the ways to choose $n$ elements

Expected Value: \[\boxed{E}\]

TODO

https://en.wikipedia.org/wiki/Hardy%e2%80%93Weinberg_principle

Consider a population of monoecious diploids, where each organism produces male and female gametes at equal frequency, and has two alleles at each gene locus.

Genotype	AA	Aa (or aA)	aa
Probability	$\theta^2$	$2\theta(1-\theta)$	$(1-\theta)^2$

Suppose we test a random sample of people and find that $k_1$ are AA, $k_2$ are Aa, and $k_3$ are aa. Find the MLE of $\theta$.

Catalogue of Discrete Probability Distributions supported in R

Bernoulli Distribution - binary sample space

Binomial Distribution - multiple independent Bernoulli trials

Geometric Distribution - Bernoulli trial stopping probabilities

Poisson Distribution - many trials, \(n\gg E\langle X\rangle\) (i.e. very small \(p\))

Hypergeometric Distribution - Binomial without replacement

Multinomial distribution - generalisation of binomial dist. for non-binary trial

Boltzmann Distribution - TODO