Catalogue of Continuous Probability Distributions supported in R |
|
The normal or Gaussian distribution \(\boxed{N(\mu,\sigma)}\), fundamental distribution important to the central limit theorem, and the method of least squares.
The standard normal distribution \(N(0,1)\) has mean \(\mu=0\) and variance \(\sqrt{\sigma}=1\). Symbol \(Z\) is reserved for a standard normal random variable \(P(Z=z)=\frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{z}}\)
The cumulative distribution function (CDF) is obtained by integrating the PDF from: \[F(x)=\int_{-\infty}^\infty \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{\left(x-\mu\right)^2}{2\sigma^2}}\:dx\] Since this integral does not have a closed-form solution, the CDF is commonly expressed using the error function \(\text{erf}(x)\) defined as: \[\text{erf}(x)\equiv \frac{2}{\sqrt{\pi}}\int_0^ze^{-t^2}\:dt\] Thus, the CDF of the normal distribution can be written as: \[F(x)=\frac{1}{2}\left[1+\text{erf}\left(\frac{x-\mu}{2\sigma^2}\right)\right]\]
$$ \[\begin{align} P(x_1\dots x_n|\mu,\sigma)&=\prod_{i=1}^n\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{\left(x-\mu\right)^2}{2\sigma^2}}\\ &=\left(\frac{1}{\sigma\sqrt{2\pi}}\right)^ne^{-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}}\\ ln(P(x_1\dots x_n|\mu,\sigma))&=ln\left(\left(\frac{1}{\sigma\sqrt{2\pi}}\right)^ne^{-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}}\right)\\ &=n\:ln\left(\frac{1}{\sigma\sqrt{2\pi}}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\\ &=n\:ln\left(\frac{1}{\sqrt{2\pi}}\right)+n\:ln\left(\frac{1}{\sigma}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ \\ 0&=\frac{d}{d\hat{\sigma}}\left(n\:ln\left(\frac{1}{\sqrt{2\pi}}\right)+n\:ln\left(\frac{1}{\sigma}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\right)\\ &=\frac{n}{\sigma}+\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{\sigma^3}\qquad\therefore\frac{n}{\sigma}=-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{\sigma^3}\qquad\therefore\quad\boxed{\sigma^2=\frac{\sum_{i=1}^n(x_i-\mu)^2}{n}}\\ \\ 0&=\frac{d}{d\hat{\mu}}\left(n\:ln\left(\frac{1}{\sqrt{2\pi}}\right)+n\:ln\left(\frac{1}{\sigma}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\right)\\ &=\sum_{i=1}^n\frac{x-\mu}{\sigma^2}\qquad\therefore\boxed{\mu=\frac{\sum_{i=1}^nx_i}{n}}\\ \end{align}\] $$
\[ \mu=\frac{\sum_{i=1}^nx_i}{n}=\bar{x}\quad\text{and}\quad\sigma^2=\frac{\sum_{i=1}^n(x_i-\mu)^2}{n}\\ \quad\\ \text{biased estimator of the population variance...}\\ \sigma^2=\sum_{i=1}^n\frac{(x_i-\bar{x})^2}{n} \]# Generate data
data <- data.table(y = rnorm(1000, mean = 0, sd = 3))
# define neqative log likelihood function
negative_log_likelihood <- function(params, data) {
mu <- params[1]
sigma <- params[2]
if (sigma <= 0) return(Inf)
-sum(dnorm(data$y, mean = mu, sd = sigma, log = TRUE))
}
# numerical fit using conjugate gradient method
mle_result <- optim(par = c(1,1), fn = negative_log_likelihood, data = data, method = "CG")
## [1] "MLE estimate of mean: -0.167097319190886 and MLE estimate of std.: 2.8948269587307"
The uniform distribution \(\boxed{U(a,b)}\) or \(\boxed{\text{uniform}(a,b)}\). Useful prior in the form \(\text{Beta}(a=1,b=1)\) used for Bayesian inferences when no other prior in known:
normalisation of a constant value over the \([b,a]\) interval: \[ \begin{align} 1&=\int_a^b A\:dx \\ &=\left[Ax\right]_a^b=A\left(b-a\right) \\\\ \therefore\quad &A=\frac{1}{b-a} \end{align} \]
\[ \begin{align} F(x)&=\int_a^x\frac{1}{b-a}\:dx \\ &=\left[\frac{x}{b-a}\right]_a^x=\frac{x-a}{b-a} \end{align} \]
\[ P(X) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ 0, & \text{otherwise} \end{cases}\\ \\ \text{the probability is maximised when the width of the nurmalised boxcar is minimised}\\ \text{the samples witnessed must have non-zero probability of ocurring}\\ \boxed{\hat{b}=\text{max}(x_1\dots x_n)}\qquad\boxed{\hat{a}=\text{min}(x_1\dots x_n)} \]
The probability density function (PDF) of the beta distribution \(\boxed{Beta(a,b)}\), for \(0\le x\leq1\), and shape parameters \(a,b>0\) is a power function of the variable \(\theta\) and its reflection \(1-\theta\). It is the conjugate of many priors; exponential,
The Gamma Function \(\Gamma(z)\) where \(z\in\mathbb{C}\) and \(\text{Re}(z)>1\) is defined as: \[\boxed{\Gamma(z)=\int_0^\infty t^{z-1}e^{-t}\:dt}\] \[ \begin{align} \Gamma(z+1)&=\int_0^\infty t^ze^{-t}\:dt\qquad\dots\text{from definition and substitution }u\to u+1\\ &=\left[-t^ze^{-t}\right]^\infty_0+\int_0^\infty zt^{z-1}e^{-t}\:dt\qquad\dots\text{from integration by parts}\\ &=\displaystyle \lim_{t \to \infty}\left[-t^ze^{-t}\right]+z\int_0^\infty t^{z-1}e^{-t}\:dt\\ &\boxed{\Gamma(z+1)=z\Gamma(z)} \end{align} \] \[ \begin{align} \Gamma(1)&=\int_0^\infty t^{1-1}e^{-t}\:dt\\ &=\int_0^\infty e^{-t}\:dt=1 &\boxed{\Gamma(1)=1} \end{align} \] Considering these two results (remember \(0!=1\)) we can see by induction: \[\boxed{\boxed{\Gamma(n)=(n-1)!}}\]
The Beta Function \(B(z_1,z_2)\) where \(z_1,z_2\in\mathbb{C}\) and \(\text{Re}(z_1),\text{Re}(z_2)>1\) is defined as: \[\boxed{B(z_1,z_2)=\int_0^1 t^{z_1-1}(1-t)^{z_2-1}\:dt}\] this is related to the \(\Gamma\) function as follows \[ \begin{align} \Gamma(z_1)\Gamma(z_2)&=\int_{u=0}^\infty u^{z_1-1}e^{-u}\:du\:\cdot\:\int_{v=0}^\infty v^{z_2-1}e^{-v}\:dv\\ &=\int_{v=0}^\infty\int_{u=0}^\infty u^{z_1-1}v^{z_2-1}e^{-u-v}\:du\:dv\\ \end{align} \] Introduce a changing variables:
With these definitions, we can get two independent variables to split our integral into two separate steps: - \(u=st\) so \(du=sdt+tds\) - \(v=s(1-t)\) so \(dv=(1-t)ds-sdt\)
The transformation matrix is: \[ J=\begin{bmatrix}\frac{\partial u}{\partial s} & \frac{\partial u}{\partial t} \\ \frac{\partial v}{\partial s} & \frac{\partial v}{\partial t}\end{bmatrix}=\begin{bmatrix} t & s \\ 1-t & -s\end{bmatrix} \] The determinant of this matrix is: \[ \text{det}(J)=-ts-s(1-t)=-s \] Since we are dealing with volume elements, we take the absolute value: \[|J|=s\] Therefore: \[ du\:dv=s\:ds\:dt \]
\[ \begin{align} \Gamma(z_1)\Gamma(z_2)&=\int_{v=0}^\infty\int_{u=0}^\infty u^{z_1-1}v^{z_2-1}e^{-u-v}\:du\:dv\\ &=\int_{v=0}^\infty\int_{u=0}^1 (st)^{z_1-1}(s(1-t))^{z_2-1}e^{-s}s\:dt\:ds\\ &=\int_{v=0}^\infty e^{-s}s^{z_1+z_2-1}\:ds \cdot\int_{u=0}^1 t^{z_1-1}(1-t)^{z_2-1}\:dt\\ &=\Gamma(z_1+z_2)\:\cdot\:B(z_1,z_2) \end{align} \] So finally, \[\Gamma(z_1)\Gamma(z_2)=\Gamma(z_1+z_2)\:\cdot\:B(z_1,z_2)\\ \therefore\quad\boxed{\boxed{B(z_1,z_2)=\frac{\Gamma(z_1)\Gamma(z_2)}{\Gamma(z_1+z_2)}}}\quad\text{where}\quad\boxed{B(z_1,z_2)=\int_0^1 t^{z_1-1}(1-t)^{z_2-1}\:dt}\]
The normalisation factor of the Beta function is given by: \[B(a,b)=\int_0^1\theta^{(a-1)}(1-\theta)^{(b-1)}\:d\theta\] We will relate this to the Gamma function: \[\Gamma(z)=\int_0^{\infty}t^{z-1}e^{-t}\:dt\] we use the substitution: \[\theta=\frac{u}{1+u}\quad\text{and}\quad 1-\theta=\frac{1}{1+u}\quad\text{so that}\quad d\theta=\frac{du}{(1-u)^2}\] Substituting these into \(B(x,y)\): \[d\theta=\frac{d\theta}{du}du=\frac{d}{du}\frac{u}{1+u}\:du\\ d\theta=\frac{1\cdot(1+u)-u\cdot 1}{(1+u)^2}=\frac{1}{(1+u)^2}\] \[ \begin{align} B(a,b)&=\int_0^1\left(\frac{u}{1+u}\right)^{(a-1)}\left(\frac{1}{1+u}\right)^{(b-1)}\:d\theta\\ &=\int_0^\infty\left(\frac{u}{1+u}\right)^{(a-1)}\left(\frac{1}{1+u}\right)^{(b-1)}\:\frac{du}{(1+u)^2}\\ &=\int_0^\infty u^{a-1}(1+u)^{-a+1-b+1-2}\:du\\ &=\int_0^\infty u^{a-1}(1+u)^{-(a+b)}\:du \end{align} \]
THis is similar to a \(\text{Beta}\) function but with upper limit of \(\infty\) instead on \(1\). Change variables using standard trick of \(t=\frac{u}{1+u}\). \(\therefore\quad\frac{t}{1-t}=\frac{\frac{u}{1+u}}{\frac{1+u}{1+u}-\frac{u}{1+u}}=\frac{\frac{u}{1+u}}{\frac{1}{1+u}}=u\)
Now from quotient rule \(dt=\frac{(1+u)\:du-u\:du}{(1+u)^2}=\frac{du}{(1+u)^2}\) so \(du=(1+u)^2dt=\frac{dt}{(1-t)^2}\)
So our integral becomes: \[ \begin{align} B(a,b)&=\int_0^\infty u^{a-1}(1+u)^{-(a+b)}\:du\\ &=\int_0^1 \left(\frac{t}{1-t}\right)^{a-1}(1-t)^{(a+b)}\:\frac{dt}{(1-t)^2}\quad\text{since}\:1+u=\frac{1-t}{1-t}+\frac{t}{1+t}=\frac{1}{1+t}\\ &=\int_0^1 t^{a-1}(1-t)^{-a+1+a+b-2}\:dt\\ &=\int_0^1 t^{a-1}(1-t)^{b-1}\:dt\\ \end{align} \] This is exactly a \(\text{Beta}\) distribution:
\[\boxed{\boxed{\boxed{B(a,b)=\frac{\Gamma(z_1)\Gamma(z_2)}{\Gamma(z_1+z_2)}}}}\]
The CDF of the Beta distribution is the regularised incomplete Beta function \(F(x;a,b)=I_x(a,b)\) \[ \begin{align} F(x;a,b)&=\int_0^x\frac{t^{a-1}(1-t)^{b-1}}{B(a,b)}\:dt\\&=\frac{B_x(a,b)}{B(a,b)}\\ &=I_x(a,b)\\ &\text{where}\quad I_x(a,b)=\frac{\int_0^xt^{a-1}(1-t)^{b-1}dt}{\int_0^1t^{a-1}(1-t)^{b-1}dt} \end{align} \]
123
## [1] 123
The exponential distribution \(\boxed{exp(\lambda)}\) or \(\boxed{\text{exponential}(\lambda)}\) is the continuous counterpart to the discrete Geometric probabilty distribution.
Where the exponential distribution models the the time (continuous RV) for a random event to occur the Poisson models the number of events (discrete RV) that can occur in a given time window. (The unrelated Geometric distribution measures the number of samples (discrete RV) before hitting a specific value with/without replacement - quite different nature of events)
The exponential probability distribution is the continuous counterpart of the discrete Geometric probability distribution and it takes the form \(Ae^{-\lambda x}\) where \(A\) is the normlisation constant. We can calculate the normalisation constant as follows: \[ \begin{align} 1&=\int_0^\infty Ae^{-\lambda x}\:dx \\ &=A\left[\frac{e^{-\lambda x}}{-\lambda}\right]_0^\infty\\ &=\frac{A}{\lambda}\qquad\qquad\therefore\;A=\lambda \end{align} \]
The cumulative distribution function (CDF) is obtained by integrating the PDF:
\[ \begin{align} P(X\leq x)&=\int_0^x\:\lambda e^{-\lambda x}\:dx \\ &=\lambda\left[\frac{e^{-\lambda x}}{-\lambda}\right]_0^x = -e^{-\lambda x}+1 \\ &=1-e^{-\lambda x} \end{align} \]
cummulative right tail distribution \[P(X>x)=1−F(x)=e^{-\lambda x}\]
The exponential distribution has infinite support with range \([0,\infty]\): \[ \begin{align} E(x)&=\int_{-\infty}^\infty xf(x)\:dx\\ &=\int^\infty_0\lambda xe^{-\lambda x}\:dx\\ \\ &\boxed{(uv)' = u'v + u v'\quad\therefore\quad\int u\cdot v'=u\cdot v-\int u'\cdot v}\\ &\text{let } u=x\quad\text{and }v'=\lambda e^{-\lambda x} \\ &\therefore u'=1\quad\text{and }v= -e^{-\lambda x}\\ \\ E(x)&=\left[-xe^{-\lambda x}\right]_0^\infty -\int_0^\infty -e^{-\lambda x}\:dx\\ &=\left[0\right]+\left[\frac{e^{-\lambda x}}{-\lambda}\right]_0^\infty\\ &=\frac{1}{\lambda} \end{align} \]
\[ \begin{align} P(\lambda|x_1\dots x_n)&=\Pi_{i=1}^n\lambda e^{-\lambda x_i}\\ &=\lambda^ne^{-\sum_{i=1}^n\lambda x_i}\\ \\ ln(P(\lambda|x_1\dots x_n))&=ln\left(\lambda^ne^{-\sum_{i=1}^n\lambda x_i}\right)\\ &=n\:ln(\lambda)-\lambda\sum_{i=1}^n x_i\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ 0&=\frac{d}{d\hat{\lambda}}\left(n\:ln(\hat{\lambda})-\hat{\lambda}\sum_{i=1}^n x_i\right)\\ &=\frac{n}{\hat{\lambda}}-\sum_{i=1}^n x_i\\ \\ \therefore\quad&\hat{\lambda}=\frac{n}{\sum_{i=1}^n x_i} \end{align} \]
## y
## <num>
## 1: 2.8115242
## 2: 1.9220342
## 3: 4.4301829
## 4: 0.1052579
## 5: 0.1873699
## 6: 1.0550041
neg_log_likelihood <- function(lambda, data) {
if (lambda < 0) return(Inf) # Ensure valid probability range
# Negative log-likelihood
-sum(dexp(data$y, rate = lambda, log = TRUE))
}
mle_result <- optim(par = 0.5, fn = neg_log_likelihood, data = data,
method = "Brent", lower = 0, upper = 1)
mle_p <- mle_result$par
print(paste("MLE estimate of lambda:", round(mle_p, 3)))
## [1] "MLE estimate of lambda: 0.287"
The Pareto distribution \(\boxed{\text{Pareto}(m,\alpha)}\) models scale invariance, meaning if wealth (or any resource) grows proportionally, the relative distribution remains unchanged. E.g. Wealth accumulation often follows a preferential attachment mechanism, where individuals with more wealth have better opportunities to invest and grow their wealth at a faster rate, leading to a widening wealth gap.
these command requre the Vector Generalized Additive Models package
library(VGAM)
which uses the Type II Pareto distribution,
which introduces an extra scale parameter and shifts the distribution
\[P(X)=\frac{\alpha\theta^\alpha}{\left(x+\theta\right)^{\alpha+1}}\quad\text{where}\quad
x\ge0\] so to use the standard form of the pareto distribution
we’ve got to substitute;
where: - \(\alpha>0\) is the shape parameter (also called the Pareto index), which determines the heaviness of the tail. A lower \(\alpha\) implies greater inequality. - \(m>0\) is the scale parameter, representing the minimum value of \(x\) - \(x\) is the random variable value
We can caluclate the pdf from the left tail…
\[ \begin{align} &P(X<x)=1-\left(\frac{m}{x}\right)^\alpha \\ \\ &\text{pdf}(x)=\frac{d\:}{dx}\:P(X<x)\\ &\quad=-m^\alpha (-\alpha) x^{-\alpha-1}\\ \\ \therefore&\quad\boxed{\text{pdf}(x)=\frac{\alpha m^\alpha}{x^{\alpha+1}}} \end{align} \]
The pareto distribution is an inverse power law, the probability of the random variable value \(X\) being greater than a value \(x\) is proportional to an inverse power of \(x\) \[ \begin{align} P(X\geq x)&\propto x^{-\alpha}\qquad\qquad\text{right tail inverse power law}\\ \\ P(X\geq x)&=Ax^{-\alpha}\qquad\qquad\text{normalisation factor, }A\\ \end{align} \] Normalise to find \(A\): \[ \begin{align} 1&=\left[Ax^{-\alpha}\right]^m_\infty \\ &=Am^{-\alpha}-[0] \\ \therefore&\quad A=\frac{1}{m^{-\alpha}}=m^\alpha\\ \\ \therefore&\quad P(X\geq x)=\left(\frac{m}{x}\right)^\alpha \end{align} \]
Now the left tail…
\[ \begin{align} P(X<x)&=1-P(X\geq x)\\ &=1-\left(\frac{m}{x}\right)^\alpha \\ \end{align} \]
A generalisation of the exponential distribution - the gamma distribution is used to model the time until an event occurs \(k\) times.
Not to be confused with the related gamma function, the Gamma distribution \(\boxed{\Gamma(\alpha,\theta)}\) shape parameter \(\alpha\) and scale parameter \(\theta\) (or alternatively a rate parameter \(\lambda\)). It serves as a conjugate prior to the poisson, exponential and normal (w/unknown parameters) distributions and it is also a generalisation of the following;
Distribution | \(X\) | Constraint |
---|---|---|
the Exponential distribution | \(X\sim exp(\lambda)\) | \(\alpha=1\) |
the Erlang distribution | \(X\sim\text{Erlang}(k,\lambda)\) | \(\alpha=k\) where
\(k\in\mathbb{N}\) and \(\beta=\lambda\) where \(\beta>0\) |
the Chi-squared distribution | \(X\sim\chi^2_k\) | \(\alpha=\frac{k}{2}\) and \(\lambda =\frac{1}{2}\) |
Let \(S_\alpha\) be the sum of \(\alpha\) independent exponential random variables (the PDF of \(S_\alpha\) is the Gamma distribution): \[S_a=X_1+X_2+\dots+X_\alpha\] The PDF of the sum of independent random variables is the convolution of their individual PDFs: \[f_{S_s}(s)=\int_0^s f_X(x)f_X(s-x)\:dx\] For the sum of \(\alpha\) independent exponential random variables, we need to compute the convolution of the exponential PDFs \(alpha\) times.
So for two exponentials \[ \begin{align} f_{S_2}(s)&=\int_0^s \lambda e^{-\lambda x}\cdot\lambda e^{-\lambda (s-x)}\:dx \\ &=\lambda^2e^{-\lambda s}\int_0^s\:dx \\ &=\lambda^2e^{-\lambda s}\cdot s \end{align}\] This is the PDF for the gamma distribution with shape parameter \(2\) and rate \(lambda\),
or the general case of \(\alpha\) independent exponential random variables, the final result from performing the convolution \(alpha\) times is:
\[f_{S_\alpha}(s)=\frac{\lambda^\alpha s^{\alpha-1}e^{-\lambda s}}{\left(\alpha-1\right)!}\]
The Chi-squared distribution \(\boxed{\chi^2_k}\) is the distribution of the random variable equal to the square sum of \(k\) independent random variables all having a standard normal distribution \[\chi^2_k=\sum_{i=1}^k\:Z_i^2\quad\text{where}\quad Z_i\sim N(0,1)\:\text{are i.i.d.}\]
Test if data follows a specific distribution. The \(\chi^2\) test compares observed data against the expected data under a given hypothesis (\(H_0\) distribution). Given \(n\) categories, each category has an observed value \(O_i\) and an expected value \(E_i\) (the rule or thumb is we need at least a value of \(5\) for each observed value for the CLT to apply - the higher the numner of samples the better). The error between each observed value and the corresponding expected value should be gaussian. Therefore, the sum of the standardised differences (the chi-squared statistic) follows a \(\chi^2\) distribution with \(n-1-p\) d.o.f. where \(p\) is the number of estimated parameters {if any})
\[\chi^2=\sum^n_{i=1}\frac{(O_i-E_i)^2}{E_i}\] We can reject the null hypothesis the distribution is a match if the calculated \(\chi^2\) statistic falls into the corresponding rejection region determined by the critical value from the chi-squared distribution at significance level \(\alpha\) If \(\chi^2\) is greater than the critical value, we reject \(H_0\), meaning the data does not follow the assumed distribution.
The students’s t-distribution \(\boxed{t_\nu}\) is single parameter (d.o.f. \(\nu\)) continuous distribution. It is a function of the standard normal and chi-squared distribution \[t_\nu\equiv\frac{Z}{\sqrt{V/\nu}}\quad\text{where}\quad Z\sim N(0,1)\quad\text{and}\quad V\sim{\chi^2}_\nu\]
Test if data follows a specific distribution. The \(\chi^2\) test compares observed data against the expected data under a given hypothesis (\(H_0\) distribution). Given \(n\) categories, each category has an observed value \(O_i\) and an expected value \(E_i\) (the rule or thumb is we need at least a value of \(5\) for each observed value for the CLT to apply - the higher the numner of samples the better). The error between each observed value and the corresponding expected value should be gaussian. Therefore, the sum of the standardised differences (the chi-squared statistic) follows a \(\chi^2\) distribution with \(n-1-p\) d.o.f. where \(p\) is the number of estimated parameters {if any})
\[\chi^2=\sum^n_{i=1}\frac{(O_i-E_i)^2}{E_i}\] We can reject the null hypothesis the distribution is a match if the calculated \(\chi^2\) statistic falls into the corresponding rejection region determined by the critical value from the chi-squared distribution at significance level \(\alpha\) If \(\chi^2\) is greater than the critical value, we reject \(H_0\), meaning the data does not follow the assumed distribution.