Catalogue of Discrete Probability Distributions supported in R |
|
The simplest and most important distribution or \(\boxed{\text{Ber}(p)}\) or \(\text{Bernoulli}(p)\). Has a binary outcome; yes/no, success/failure, heads/tails. Modeled as a random variable \(X\sim\text{Ber}(p)\) with possible values of \(0\) and \(1\), \(X\in[0,1]\) each with the probabilities… \(P(X-1)=p\) and \(P(X=0)=1-p\) where typically \(X_i\in\{0,1\}\).
A simple model for the Bernoulli distribution is to flip a coin with probability \(p\) of heads, with \(X = 1\) on heads and \(X = 0\) on tails. The general terminology is to say \(X\) is \(1\) on success and \(0\) on failure, with success and failure defined by the context. Many decisions can be modeled as a binary choice, such as votes for or against a proposal. If \(p\) is the proportion of the voting population that favors the proposal, than the vote of a random individual is modeled by a \(Ber(p)\).
The binomial distribution \(\boxed{Bin(n,p)}\) or \(Binomial(n,p)\), models the number of successes in \(n\) independent \(Bernoulli(p)\) trials. A single Bernoulli trial is, say, one toss of a coin. A single binomial trial (typically \(X=\sum_iX_i\in\{0,1\dots n\}\)) consists of \(n\) Bernoulli trials \(\{X_1,X_2,X_3,\dots X_n\}\) where \(X_i\in\{0,1\}\).
# Generate data
n_trials <- 10
n_observations <- 100
data <- data.table(y = rbinom(n_observations, size = n_trials, prob = 0.3), n = n_trials)
# define neqative log likelihood function
negative_log_likelihood <- function(p, data) {
if (p <= 0 || p >= 1) return(Inf) # Ensure valid probability range
-sum(dbinom(data$y, size = data$n, prob = p, log = TRUE))
}
# numerical fit using conjugate gradient method
mle_result <- optim(par = 0.5, fn = negative_log_likelihood, data = data, method = "Brent", lower = 0, upper = 1)
## [1] "MLE estimate of p: 0.304"
A geometric distribution \(\boxed{\text{geo}(p)}\) or \(\text{geometric}(p)\) models the number of successful benoulli trials (heads) before the first failure in a sequence of Bernoulli trials (or coin flips). Here \(X\in\mathbb{N}\) i.e. \(0,1,2,3\dots\).
$$ \[\begin{align} P(k_1\dots k_n|p)&=\Pi_{i=1}^n(1-p)^{k_i-1}p\\ ln(P(k_1\dots k_n|p))&=ln\left(\Pi_{i=1}^n(1-p)^{k_i-1}p\right)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^nln\left((1-p)^{k_i-1}p\right)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^n(k_i-1)\:ln((1-p))+ln(p)\\ ln(P(k_1\dots k_n|p))&=\sum_{i=1}^n\left[(k_i-1)\:ln((1-p))\right]+n\:ln(p)\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ 0&=\frac{d}{d\hat{p}}\left(\sum_{i=1}^n\left[(k_i-1)\:ln((1-\hat{p}))\right]+n\:ln(\hat{p})\right)\\ 0&=\sum_{i=1}^n\left[-\frac{k_i-1}{1-\hat{p}}\right]+\frac{n}{\hat{p}}\\ \frac{n}{\hat{p}}&=\sum_{i=1}^n\left[\frac{k_i-1}{1-\hat{p}}\right]\\ n(1-p)&=\hat{p}\sum_{i=1}^nk_i-1\\ n&=\hat{p}\left(\sum_{i=1}^nk_i-1+1\right)\\ \hat{p}&=\frac{n}{\sum_{i=1}^nk_i}\\ \end{align}\] $$
# Generate data
n_observations <- 100
data <- data.table(y = rgeom(n_observations, prob = 0.3))
# define neqative log likelihood function
negative_log_likelihood <- function(p, data) {
if (p <= 0 || p >= 1) return(Inf) # Ensure valid probability range
-sum(dgeom(data$y, prob = p, log = TRUE))
}
# numerical fit using conjugate gradient method
mle_result <- optim(par = 0.5, fn = negative_log_likelihood, data = data, method = "Brent", lower = 0, upper = 1)
## [1] "MLE estimate of p: 0.287"
The Poisson distribution \(\boxed{Pois(\lambda)}\) is the limit of the binomial distribution when the number of trials \(n\) becomes very large. The probability of success \(p\) in each trial becomes very small, but the expected number of successes \(\lambda\equiv\langle k\rangle=n\cdot p\) remains constant.
Start with the Binomial Distribution: The probability of \(k\) successes in \(n\) trials of a binomial distribution is: \[P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\]
\(\lambda\equiv\langle k\rangle=n\cdot p\): Remember the binomial distribution has mean of \(E[X]=nâ‹…p\). Let the expected number of successes be \(\lambda\). Therefore, $p=.
Substituting into the Binomial Formula: \[ \begin{align} P(X=k)&=\binom{n}{k}\left(\frac{\lambda}{n}\right)^k\left(1-\frac{\lambda}{n}\right)^{n-k}\\ &\;\text{where}\;\binom{n}{k}=\frac{n!}{k!(n-k)!}\\ P(X=k)&=\frac{n!}{k!(n-k)!}\left(\frac{\lambda}{n}\right)^k\left(1-\frac{\lambda}{n}\right)^{n-k}\\ \\&\text{consider terms as }n \to \infty:\\ &\qquad\lim_{n \to \infty} \frac{n!}{(n-k)!} \approx n^k\\ &\qquad\lim_{n \to \infty}\left(1-\frac{\lambda}{n}\right)^n\approx e^{-\lambda}\\ &\qquad\lim_{n \to \infty}\left(1-\frac{\lambda}{n}\right)^{-k}\approx 1\\\\ \therefore \qquad &\boxed{P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}}\\ \end{align} \]
The \(\boxed{Hypergeometric(N,K,n)}\) distribution describes the probability of \(k\) successes (random draws for which the object drawn has a specified feature) in \(n\) draws, without replacement, from a finite population of size \(N\) that contains exactly \(K\) objects with that feature.
e.g. You have 10 coins in total \(N=10\) where 6 of them are heads \(K=6\) and 4 of them are tails. You randomly select \(n=5\) coins (without replacement). The hypergeometric distribution can calculate the probability of selecting exactly \(k\) coins that show heads. Here \(X\in{\{0,1\dots n\}}\)
The capture/recapture method is a way to estimate the size of a population in the wild. The method assumes that each animal in the population is equally likely to be captured by a trap. Suppose \(K=10\) animals are captured, tagged and released. A few months later, \(n=20\) animals are captured, examined, and released. \(k=4\) of these 20 are found to be tagged. Estimate the size \(N=?\) of the wild population using the MLE for the probability that a wild animal is tagged.
\[ \begin{align} P(k,n,K|N)&=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\\ &=\frac{\binom{10}{4}\binom{N-10}{20-4}}{\binom{N}{20}}=\frac{\binom{10}{4}\binom{N-10}{16}}{\binom{N}{20}}\\ \end{align} \]
# define neqative log likelihood function
negative_log_likelihood <- function(N) {
if (N <= 20) return(Inf) # Avoid invalid values
return(-log(choose(10,4) * choose(N-10,16) / choose(N,20)))
}
# numerical fit using conjugate gradient method
mle_result <- optim(par = 100, fn = negative_log_likelihood, method = "Brent", lower = 10, upper = 1000)
## [1] "MLE estimate of wild animal population: 49.4950014654967"
The \(\text{mult}(n,p_1\dots p_k)\) \(\mathcal{M}(n,\vec{p})\) where \(\sum_k p_i = 1\) models the probability of counts for each side of a \(k\)-sided dice rolled \(n\) times.
N <- 100 # Number of trials
p <- c(1/6, 3/6, 2/6) # Prob. for each face
pEven <- dmultinom(c(33,33,34), size = N, prob = p); # even dist.
pExtreme1 <- dmultinom(c(100,0,0), size = N, prob = p); # all first
pNo1 <- dmultinom(c(0,50,50), size = N, prob = p); # all 2nd and 3rd
https://en.wikipedia.org/wiki/Hardy%e2%80%93Weinberg_principle
Consider a population of monoecious diploids, where each organism produces male and female gametes at equal frequency, and has two alleles at each gene locus.
Genotype | AA | Aa (or aA) | aa |
---|---|---|---|
Probability | \(\theta^2\) | \(2\theta(1-\theta)\) | \((1-\theta)^2\) |
Suppose we test a random sample of people and find that \(k_1\) are AA, \(k_2\) are Aa, and \(k_3\) are aa. Find the MLE of \(\theta\).