Normal (Gaussian) Distribution

The normal or Gaussian distribution \(\boxed{N(\mu,\sigma)}\), fundamental distribution important to the central limit theorem, and the method of least squares.

The standard normal distribution \(N(0,1)\) has mean \(\mu=0\) and variance \(\sqrt{\sigma}=1\). Symbol \(Z\) is reserved for a standard normal random variable \(P(Z=z)=\frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{z}}\)

  • dnorm(x, mean = 0, sd = 1, log = FALSE) gives the density (PDF)
  • pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • rnorm(n, mean = 0, sd = 1) generates random deviates (pseudo-random generator)

probability distribution: \[\boxed{P(X = k) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{\left(x-\mu\right)^2}{2\sigma^2}}}\]
cumulative density function: \[F(x)=\frac{1}{2}\left[1+\text{erf}\left(\frac{x-\mu}{2\sigma^2}\right)\right]\]

The cumulative distribution function (CDF) is obtained by integrating the PDF from: \[F(x)=\int_{-\infty}^\infty \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{\left(x-\mu\right)^2}{2\sigma^2}}\:dx\] Since this integral does not have a closed-form solution, the CDF is commonly expressed using the error function \(\text{erf}(x)\) defined as: \[\text{erf}(x)\equiv \frac{2}{\sqrt{\pi}}\int_0^ze^{-t^2}\:dt\] Thus, the CDF of the normal distribution can be written as: \[F(x)=\frac{1}{2}\left[1+\text{erf}\left(\frac{x-\mu}{2\sigma^2}\right)\right]\]

Maximum Likelihood Estimate: \[\boxed{\sigma^2=\sum_{i=1}^n\frac{(x_i-\bar{x})^2}{n}}\]

$$ \[\begin{align} P(x_1\dots x_n|\mu,\sigma)&=\prod_{i=1}^n\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{\left(x-\mu\right)^2}{2\sigma^2}}\\ &=\left(\frac{1}{\sigma\sqrt{2\pi}}\right)^ne^{-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}}\\ ln(P(x_1\dots x_n|\mu,\sigma))&=ln\left(\left(\frac{1}{\sigma\sqrt{2\pi}}\right)^ne^{-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}}\right)\\ &=n\:ln\left(\frac{1}{\sigma\sqrt{2\pi}}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\\ &=n\:ln\left(\frac{1}{\sqrt{2\pi}}\right)+n\:ln\left(\frac{1}{\sigma}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ \\ 0&=\frac{d}{d\hat{\sigma}}\left(n\:ln\left(\frac{1}{\sqrt{2\pi}}\right)+n\:ln\left(\frac{1}{\sigma}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\right)\\ &=\frac{n}{\sigma}+\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{\sigma^3}\qquad\therefore\frac{n}{\sigma}=-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{\sigma^3}\qquad\therefore\quad\boxed{\sigma^2=\frac{\sum_{i=1}^n(x_i-\mu)^2}{n}}\\ \\ 0&=\frac{d}{d\hat{\mu}}\left(n\:ln\left(\frac{1}{\sqrt{2\pi}}\right)+n\:ln\left(\frac{1}{\sigma}\right)-\sum_{i=1}^n\frac{\left(x-\mu\right)^2}{2\sigma^2}\right)\\ &=\sum_{i=1}^n\frac{x-\mu}{\sigma^2}\qquad\therefore\boxed{\mu=\frac{\sum_{i=1}^nx_i}{n}}\\ \end{align}\] $$

\[ \mu=\frac{\sum_{i=1}^nx_i}{n}=\bar{x}\quad\text{and}\quad\sigma^2=\frac{\sum_{i=1}^n(x_i-\mu)^2}{n}\\ \quad\\ \text{biased estimator of the population variance...}\\ \sigma^2=\sum_{i=1}^n\frac{(x_i-\bar{x})^2}{n} \]
Maximum Likelihood Estimate (Numerical Example)
# Generate data
data <- data.table(y = rnorm(1000, mean = 0, sd = 3))

# define neqative log likelihood function
negative_log_likelihood <- function(params, data) {
  mu <- params[1]
  sigma <- params[2]
  if (sigma <= 0) return(Inf)
  -sum(dnorm(data$y,  mean = mu, sd = sigma, log = TRUE))
}

# numerical fit using conjugate gradient method
mle_result <- optim(par = c(1,1), fn = negative_log_likelihood, data = data, method = "CG")
## [1] "MLE estimate of mean: -0.167097319190886  and MLE estimate of std.: 2.8948269587307"

Uniform Distribution

The uniform distribution \(\boxed{U(a,b)}\) or \(\boxed{\text{uniform}(a,b)}\). Useful prior in the form \(\text{Beta}(a=1,b=1)\) used for Bayesian inferences when no other prior in known:

  • dunif(x, min = 0, max = 1, log = FALSE) gives the density (PDF)
  • punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • runif(n, min = 0, max = 1) generates random deviates (pseudo-random generator)
measure to the nearest marking - rounding erroris uniformly distributed over the interval bounded by midpoints between adjacent markings
measure to the nearest marking - rounding erroris uniformly distributed over the interval bounded by midpoints between adjacent markings

probability distribution: \[\boxed{P(X = k) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ 0, & \text{otherwise} \end{cases} }\]

normalisation of a constant value over the \([b,a]\) interval: \[ \begin{align} 1&=\int_a^b A\:dx \\ &=\left[Ax\right]_a^b=A\left(b-a\right) \\\\ \therefore\quad &A=\frac{1}{b-a} \end{align} \]

cumulative density function (and right tail distribution): \[ \boxed{P(X\leq x)=\begin{cases} \frac{x-a}{b-a}, & a \leq x \leq b \end{cases}}\\\\ P(X> x)=\begin{cases} \frac{b-x}{b-a}, & a \leq x \leq b \end{cases} \]

\[ \begin{align} F(x)&=\int_a^x\frac{1}{b-a}\:dx \\ &=\left[\frac{x}{b-a}\right]_a^x=\frac{x-a}{b-a} \end{align} \]

Maximum Likelihood Estimate: \[\boxed{\hat{b}=\text{max}(x_1\dots x_n)}\qquad\boxed{\hat{a}=\text{min}(x_1\dots x_n)}\]

\[ P(X) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ 0, & \text{otherwise} \end{cases}\\ \\ \text{the probability is maximised when the width of the nurmalised boxcar is minimised}\\ \text{the samples witnessed must have non-zero probability of ocurring}\\ \boxed{\hat{b}=\text{max}(x_1\dots x_n)}\qquad\boxed{\hat{a}=\text{min}(x_1\dots x_n)} \]

Beta Distribution

The probability density function (PDF) of the beta distribution \(\boxed{Beta(a,b)}\), for \(0\le x\leq1\), and shape parameters \(a,b>0\) is a power function of the variable \(\theta\) and its reflection \(1-\theta\). It is the conjugate of many priors; exponential,

  • dbeta(x, shape1, shape2, ncp = 0, log = FALSE) gives the density (PDF)
  • pbeta(q, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qbeta(p, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • rbeta(n, shape1, shape2, ncp = 0) generates random deviates (pseudo-random generator)

probability distribution: \[f(\theta)=\boxed{\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{(a-1)}(1-\theta)^{(b-1)}}\quad\text{where}\quad\Gamma(x)=(x-1)!\]

The Gamma Function \(\Gamma(z)\) where \(z\in\mathbb{C}\) and \(\text{Re}(z)>1\) is defined as: \[\boxed{\Gamma(z)=\int_0^\infty t^{z-1}e^{-t}\:dt}\] \[ \begin{align} \Gamma(z+1)&=\int_0^\infty t^ze^{-t}\:dt\qquad\dots\text{from definition and substitution }u\to u+1\\ &=\left[-t^ze^{-t}\right]^\infty_0+\int_0^\infty zt^{z-1}e^{-t}\:dt\qquad\dots\text{from integration by parts}\\ &=\displaystyle \lim_{t \to \infty}\left[-t^ze^{-t}\right]+z\int_0^\infty t^{z-1}e^{-t}\:dt\\ &\boxed{\Gamma(z+1)=z\Gamma(z)} \end{align} \] \[ \begin{align} \Gamma(1)&=\int_0^\infty t^{1-1}e^{-t}\:dt\\ &=\int_0^\infty e^{-t}\:dt=1 &\boxed{\Gamma(1)=1} \end{align} \] Considering these two results (remember \(0!=1\)) we can see by induction: \[\boxed{\boxed{\Gamma(n)=(n-1)!}}\]

The Beta Function \(B(z_1,z_2)\) where \(z_1,z_2\in\mathbb{C}\) and \(\text{Re}(z_1),\text{Re}(z_2)>1\) is defined as: \[\boxed{B(z_1,z_2)=\int_0^1 t^{z_1-1}(1-t)^{z_2-1}\:dt}\] this is related to the \(\Gamma\) function as follows \[ \begin{align} \Gamma(z_1)\Gamma(z_2)&=\int_{u=0}^\infty u^{z_1-1}e^{-u}\:du\:\cdot\:\int_{v=0}^\infty v^{z_2-1}e^{-v}\:dv\\ &=\int_{v=0}^\infty\int_{u=0}^\infty u^{z_1-1}v^{z_2-1}e^{-u-v}\:du\:dv\\ \end{align} \] Introduce a changing variables:

  • \(s=u+v\) the sum of the two variiables \(u\) and \(v\) (total size)
  • \(t=\frac{u}{u+v}\) the proportion of \(u\) in the sum \(s\) (scaling factor)

With these definitions, we can get two independent variables to split our integral into two separate steps: - \(u=st\) so \(du=sdt+tds\) - \(v=s(1-t)\) so \(dv=(1-t)ds-sdt\)

The transformation matrix is: \[ J=\begin{bmatrix}\frac{\partial u}{\partial s} & \frac{\partial u}{\partial t} \\ \frac{\partial v}{\partial s} & \frac{\partial v}{\partial t}\end{bmatrix}=\begin{bmatrix} t & s \\ 1-t & -s\end{bmatrix} \] The determinant of this matrix is: \[ \text{det}(J)=-ts-s(1-t)=-s \] Since we are dealing with volume elements, we take the absolute value: \[|J|=s\] Therefore: \[ du\:dv=s\:ds\:dt \]

\[ \begin{align} \Gamma(z_1)\Gamma(z_2)&=\int_{v=0}^\infty\int_{u=0}^\infty u^{z_1-1}v^{z_2-1}e^{-u-v}\:du\:dv\\ &=\int_{v=0}^\infty\int_{u=0}^1 (st)^{z_1-1}(s(1-t))^{z_2-1}e^{-s}s\:dt\:ds\\ &=\int_{v=0}^\infty e^{-s}s^{z_1+z_2-1}\:ds \cdot\int_{u=0}^1 t^{z_1-1}(1-t)^{z_2-1}\:dt\\ &=\Gamma(z_1+z_2)\:\cdot\:B(z_1,z_2) \end{align} \] So finally, \[\Gamma(z_1)\Gamma(z_2)=\Gamma(z_1+z_2)\:\cdot\:B(z_1,z_2)\\ \therefore\quad\boxed{\boxed{B(z_1,z_2)=\frac{\Gamma(z_1)\Gamma(z_2)}{\Gamma(z_1+z_2)}}}\quad\text{where}\quad\boxed{B(z_1,z_2)=\int_0^1 t^{z_1-1}(1-t)^{z_2-1}\:dt}\]

The normalisation factor of the Beta function is given by: \[B(a,b)=\int_0^1\theta^{(a-1)}(1-\theta)^{(b-1)}\:d\theta\] We will relate this to the Gamma function: \[\Gamma(z)=\int_0^{\infty}t^{z-1}e^{-t}\:dt\] we use the substitution: \[\theta=\frac{u}{1+u}\quad\text{and}\quad 1-\theta=\frac{1}{1+u}\quad\text{so that}\quad d\theta=\frac{du}{(1-u)^2}\] Substituting these into \(B(x,y)\): \[d\theta=\frac{d\theta}{du}du=\frac{d}{du}\frac{u}{1+u}\:du\\ d\theta=\frac{1\cdot(1+u)-u\cdot 1}{(1+u)^2}=\frac{1}{(1+u)^2}\] \[ \begin{align} B(a,b)&=\int_0^1\left(\frac{u}{1+u}\right)^{(a-1)}\left(\frac{1}{1+u}\right)^{(b-1)}\:d\theta\\ &=\int_0^\infty\left(\frac{u}{1+u}\right)^{(a-1)}\left(\frac{1}{1+u}\right)^{(b-1)}\:\frac{du}{(1+u)^2}\\ &=\int_0^\infty u^{a-1}(1+u)^{-a+1-b+1-2}\:du\\ &=\int_0^\infty u^{a-1}(1+u)^{-(a+b)}\:du \end{align} \]

THis is similar to a \(\text{Beta}\) function but with upper limit of \(\infty\) instead on \(1\). Change variables using standard trick of \(t=\frac{u}{1+u}\). \(\therefore\quad\frac{t}{1-t}=\frac{\frac{u}{1+u}}{\frac{1+u}{1+u}-\frac{u}{1+u}}=\frac{\frac{u}{1+u}}{\frac{1}{1+u}}=u\)

Now from quotient rule \(dt=\frac{(1+u)\:du-u\:du}{(1+u)^2}=\frac{du}{(1+u)^2}\) so \(du=(1+u)^2dt=\frac{dt}{(1-t)^2}\)

So our integral becomes: \[ \begin{align} B(a,b)&=\int_0^\infty u^{a-1}(1+u)^{-(a+b)}\:du\\ &=\int_0^1 \left(\frac{t}{1-t}\right)^{a-1}(1-t)^{(a+b)}\:\frac{dt}{(1-t)^2}\quad\text{since}\:1+u=\frac{1-t}{1-t}+\frac{t}{1+t}=\frac{1}{1+t}\\ &=\int_0^1 t^{a-1}(1-t)^{-a+1+a+b-2}\:dt\\ &=\int_0^1 t^{a-1}(1-t)^{b-1}\:dt\\ \end{align} \] This is exactly a \(\text{Beta}\) distribution:

\[\boxed{\boxed{\boxed{B(a,b)=\frac{\Gamma(z_1)\Gamma(z_2)}{\Gamma(z_1+z_2)}}}}\]

cumulative density function: \[F(x;a,b)=\frac{B_x(a,b)}{B(a,b)}\]

The CDF of the Beta distribution is the regularised incomplete Beta function \(F(x;a,b)=I_x(a,b)\) \[ \begin{align} F(x;a,b)&=\int_0^x\frac{t^{a-1}(1-t)^{b-1}}{B(a,b)}\:dt\\&=\frac{B_x(a,b)}{B(a,b)}\\ &=I_x(a,b)\\ &\text{where}\quad I_x(a,b)=\frac{\int_0^xt^{a-1}(1-t)^{b-1}dt}{\int_0^1t^{a-1}(1-t)^{b-1}dt} \end{align} \]

Maximum Likelihood Estimate: \[\boxed{}\]
Maximum Likelihood Estimate (Numerical Example)
123
## [1] 123

Exponential Distribution

The exponential distribution \(\boxed{exp(\lambda)}\) or \(\boxed{\text{exponential}(\lambda)}\) is the continuous counterpart to the discrete Geometric probabilty distribution.

Where the exponential distribution models the the time (continuous RV) for a random event to occur the Poisson models the number of events (discrete RV) that can occur in a given time window. (The unrelated Geometric distribution measures the number of samples (discrete RV) before hitting a specific value with/without replacement - quite different nature of events)

It exhibits the memorylessness property, consider thar \(X\) models the time for an event to occur. The probability of the event occurring after a period of time \(\Delta t\) is \(p\) and that time elapses but the event has NOT occurred, then the probability of the event occurring after another period \(\Delta t\) is still \(p\), in terms of the right tail: \[\boxed{P(X>\Delta t+t\:|\:X>\Delta t)=P(X>t)}\] $$ \[\begin{align} \left(X>\Delta t + t\right)\cap\left(X>\Delta t\right)&=\left(X>\Delta t + t\right) \\ \\ P(X>\Delta t+t\:|\:X>\Delta t) &=\frac{P(X>\Delta t+t)}{P(X>\Delta t)} \\ &=\frac{\lambda e^{-\lambda\Delta t}\lambda e^{-\lambda t}}{\lambda e^{-\lambda \Delta t}} \\ &=\lambda e^{-\lambda t} \end{align}\] $$
  • dexp(x, rate = 1, log = FALSE) gives the density (PDF)
  • pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • rexp(n, rate = 1) generates random deviates (pseudo-random generator)

probability distribution: \[\boxed{P(X = k) =\lambda e^{-\lambda x}}\]

The exponential probability distribution is the continuous counterpart of the discrete Geometric probability distribution and it takes the form \(Ae^{-\lambda x}\) where \(A\) is the normlisation constant. We can calculate the normalisation constant as follows: \[ \begin{align} 1&=\int_0^\infty Ae^{-\lambda x}\:dx \\ &=A\left[\frac{e^{-\lambda x}}{-\lambda}\right]_0^\infty\\ &=\frac{A}{\lambda}\qquad\qquad\therefore\;A=\lambda \end{align} \]

cumulative density function (and right tail distribution): \[ \boxed{P(X\leq x)=1-e^{-\lambda x}}\\ P(X>x)=e^{-\lambda x}\]

The cumulative distribution function (CDF) is obtained by integrating the PDF:

\[ \begin{align} P(X\leq x)&=\int_0^x\:\lambda e^{-\lambda x}\:dx \\ &=\lambda\left[\frac{e^{-\lambda x}}{-\lambda}\right]_0^x = -e^{-\lambda x}+1 \\ &=1-e^{-\lambda x} \end{align} \]

cummulative right tail distribution \[P(X>x)=1−F(x)=e^{-\lambda x}\]

Mean (Expected Value): \[\mathbb{E}[X] = \frac{1}{\lambda}\]

The exponential distribution has infinite support with range \([0,\infty]\): \[ \begin{align} E(x)&=\int_{-\infty}^\infty xf(x)\:dx\\ &=\int^\infty_0\lambda xe^{-\lambda x}\:dx\\ \\ &\boxed{(uv)' = u'v + u v'\quad\therefore\quad\int u\cdot v'=u\cdot v-\int u'\cdot v}\\ &\text{let } u=x\quad\text{and }v'=\lambda e^{-\lambda x} \\ &\therefore u'=1\quad\text{and }v= -e^{-\lambda x}\\ \\ E(x)&=\left[-xe^{-\lambda x}\right]_0^\infty -\int_0^\infty -e^{-\lambda x}\:dx\\ &=\left[0\right]+\left[\frac{e^{-\lambda x}}{-\lambda}\right]_0^\infty\\ &=\frac{1}{\lambda} \end{align} \]

Variance: \[\text{Var}(X) = \frac{1}{\lambda^2}\]
Maximum Likelihood Estimate: \[\boxed{\hat{\lambda}=\frac{n}{\sum_{i=1}^n x_i}}\]

\[ \begin{align} P(\lambda|x_1\dots x_n)&=\Pi_{i=1}^n\lambda e^{-\lambda x_i}\\ &=\lambda^ne^{-\sum_{i=1}^n\lambda x_i}\\ \\ ln(P(\lambda|x_1\dots x_n))&=ln\left(\lambda^ne^{-\sum_{i=1}^n\lambda x_i}\right)\\ &=n\:ln(\lambda)-\lambda\sum_{i=1}^n x_i\\ \\ &\text{derivative of the log likelihood function is }0\text{ at maximum }P\\ 0&=\frac{d}{d\hat{\lambda}}\left(n\:ln(\hat{\lambda})-\hat{\lambda}\sum_{i=1}^n x_i\right)\\ &=\frac{n}{\hat{\lambda}}-\sum_{i=1}^n x_i\\ \\ \therefore\quad&\hat{\lambda}=\frac{n}{\sum_{i=1}^n x_i} \end{align} \]

##            y
##        <num>
## 1: 2.8115242
## 2: 1.9220342
## 3: 4.4301829
## 4: 0.1052579
## 5: 0.1873699
## 6: 1.0550041
neg_log_likelihood <- function(lambda, data) {
  if (lambda < 0) return(Inf)  # Ensure valid probability range
  # Negative log-likelihood
  -sum(dexp(data$y, rate = lambda, log = TRUE))
}

mle_result <- optim(par = 0.5, fn = neg_log_likelihood, data = data, 
                    method = "Brent", lower = 0, upper = 1)

mle_p <- mle_result$par

print(paste("MLE estimate of lambda:", round(mle_p, 3)))
## [1] "MLE estimate of lambda: 0.287"

Pareto Distribution - the 80/20 rule

The Pareto distribution \(\boxed{\text{Pareto}(m,\alpha)}\) models scale invariance, meaning if wealth (or any resource) grows proportionally, the relative distribution remains unchanged. E.g. Wealth accumulation often follows a preferential attachment mechanism, where individuals with more wealth have better opportunities to invest and grow their wealth at a faster rate, leading to a widening wealth gap.

  • dpareto(x, shape, scale, log = FALSE) gives the density (PDF)
  • ppareto(q, shape, scale, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qpareto(p, shape, scale, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • rpareto(n, shape, scale) generates random deviates (pseudo-random generator)
small asteroids are abundant whilst larger ones are rare
small asteroids are abundant whilst larger ones are rare

these command requre the Vector Generalized Additive Models package library(VGAM) which uses the Type II Pareto distribution, which introduces an extra scale parameter and shifts the distribution \[P(X)=\frac{\alpha\theta^\alpha}{\left(x+\theta\right)^{\alpha+1}}\quad\text{where}\quad x\ge0\] so to use the standard form of the pareto distribution we’ve got to substitute;

probability distribution: \[\boxed{P(X)=\begin{cases} \frac{\alpha m^\alpha}{x^{\alpha + 1}}, & x \geq m\\ 0, & x < m \end{cases}}\]

where: - \(\alpha>0\) is the shape parameter (also called the Pareto index), which determines the heaviness of the tail. A lower \(\alpha\) implies greater inequality. - \(m>0\) is the scale parameter, representing the minimum value of \(x\) - \(x\) is the random variable value


We can caluclate the pdf from the left tail…

\[ \begin{align} &P(X<x)=1-\left(\frac{m}{x}\right)^\alpha \\ \\ &\text{pdf}(x)=\frac{d\:}{dx}\:P(X<x)\\ &\quad=-m^\alpha (-\alpha) x^{-\alpha-1}\\ \\ \therefore&\quad\boxed{\text{pdf}(x)=\frac{\alpha m^\alpha}{x^{\alpha+1}}} \end{align} \]

cumulative density function (and right tail distribution): \[\boxed{P(X<x)=1-\left(\frac{m}{x}\right)^\alpha}\\P(X\geq x)=\left(\frac{m}{x}\right)^\alpha\]

The pareto distribution is an inverse power law, the probability of the random variable value \(X\) being greater than a value \(x\) is proportional to an inverse power of \(x\) \[ \begin{align} P(X\geq x)&\propto x^{-\alpha}\qquad\qquad\text{right tail inverse power law}\\ \\ P(X\geq x)&=Ax^{-\alpha}\qquad\qquad\text{normalisation factor, }A\\ \end{align} \] Normalise to find \(A\): \[ \begin{align} 1&=\left[Ax^{-\alpha}\right]^m_\infty \\ &=Am^{-\alpha}-[0] \\ \therefore&\quad A=\frac{1}{m^{-\alpha}}=m^\alpha\\ \\ \therefore&\quad P(X\geq x)=\left(\frac{m}{x}\right)^\alpha \end{align} \]

Now the left tail…

\[ \begin{align} P(X<x)&=1-P(X\geq x)\\ &=1-\left(\frac{m}{x}\right)^\alpha \\ \end{align} \]

Mean (Expected Value): \[\mathbb{E}[X] = \begin{cases} \frac{\alpha x_{\text{min}}}{\alpha - 1}, & \text{if } \alpha > 1 \\ \infty, & \text{if } \alpha \leq 1 \end{cases}\]
Variance: \[\text{Var}(X) = \begin{cases} \frac{x_{\text{min}}^2 \alpha}{(\alpha - 1)^2 (\alpha - 2)}, & \text{if } \alpha > 2 \\ \infty, & \text{if } \alpha \leq 2 \end{cases}\]
Median: \[\text{Median} = x_{\text{min}} \cdot 2^{1/\alpha}\]

Gamma Distribution - sum of exponential distributions

A generalisation of the exponential distribution - the gamma distribution is used to model the time until an event occurs \(k\) times.

Not to be confused with the related gamma function, the Gamma distribution \(\boxed{\Gamma(\alpha,\theta)}\) shape parameter \(\alpha\) and scale parameter \(\theta\) (or alternatively a rate parameter \(\lambda\)). It serves as a conjugate prior to the poisson, exponential and normal (w/unknown parameters) distributions and it is also a generalisation of the following;

Distribution \(X\) Constraint
the Exponential distribution \(X\sim exp(\lambda)\) \(\alpha=1\)
the Erlang distribution \(X\sim\text{Erlang}(k,\lambda)\) \(\alpha=k\) where \(k\in\mathbb{N}\) and
\(\beta=\lambda\) where \(\beta>0\)
the Chi-squared distribution \(X\sim\chi^2_k\) \(\alpha=\frac{k}{2}\) and \(\lambda =\frac{1}{2}\)
  • dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE) gives the density (PDF)
  • pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qgamma(p, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • rgamma(n, shape, rate = 1, scale = 1/rate) generates random deviates (pseudo-random generator)
time it takes for all alpha carrots to from on the minecraft carrot farm
time it takes for all alpha carrots to from on the minecraft carrot farm

probability distribution: \[\boxed{P(X)=\frac{1}{\Gamma(\alpha)\theta^\alpha}x^{\alpha-1}e^{-\frac{x}{\theta}}}\quad\text{or}\quad\boxed{P(X)=\frac{\lambda^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\lambda x}}\quad\text{where}\quad\lambda=\frac{1}{\theta}\]
  • 1st “normalisation” constant \(\frac{1}{\Gamma(\alpha)\theta^\alpha}\)
  • 2nd “power” term \(x^{\alpha-1}\) accounts for more events requiring more time to accumulate
    • \(\alpha>1\): the function starts low and increases (rises at the beginning)
    • \(\alpha=1\): exponential distribution - the curve starts flat and just decays.
    • \(\alpha<1\): the curve is asymptotic at \(x=0\) i.e. \(\lim_{x\to 0} \Gamma = \infty\) at the start and immediately drops.
  • 3rd “decay” term \(e^{-\frac{x}{\theta}}\), here \(\theta\) is the average time between events - its function is to stretch time and stretch the exponential distribution in time..

Let \(S_\alpha\) be the sum of \(\alpha\) independent exponential random variables (the PDF of \(S_\alpha\) is the Gamma distribution): \[S_a=X_1+X_2+\dots+X_\alpha\] The PDF of the sum of independent random variables is the convolution of their individual PDFs: \[f_{S_s}(s)=\int_0^s f_X(x)f_X(s-x)\:dx\] For the sum of \(\alpha\) independent exponential random variables, we need to compute the convolution of the exponential PDFs \(alpha\) times.

So for two exponentials \[ \begin{align} f_{S_2}(s)&=\int_0^s \lambda e^{-\lambda x}\cdot\lambda e^{-\lambda (s-x)}\:dx \\ &=\lambda^2e^{-\lambda s}\int_0^s\:dx \\ &=\lambda^2e^{-\lambda s}\cdot s \end{align}\] This is the PDF for the gamma distribution with shape parameter \(2\) and rate \(lambda\),

or the general case of \(\alpha\) independent exponential random variables, the final result from performing the convolution \(alpha\) times is:

\[f_{S_\alpha}(s)=\frac{\lambda^\alpha s^{\alpha-1}e^{-\lambda s}}{\left(\alpha-1\right)!}\]

cumulative density function (and right tail distribution): \[\boxed{P(X<x)=}\]
Mean (Expected Value): \[\mathbb{E}[X] =\]
Variance: \[\text{Var}(X) =\]
Median: \[\text{Median} = \]

Chi-squared - square sum of standard normal distributions

The Chi-squared distribution \(\boxed{\chi^2_k}\) is the distribution of the random variable equal to the square sum of \(k\) independent random variables all having a standard normal distribution \[\chi^2_k=\sum_{i=1}^k\:Z_i^2\quad\text{where}\quad Z_i\sim N(0,1)\:\text{are i.i.d.}\]

Pearsons Chi-Squared Goodness-of-Fit Test/Null hypothesis Rejection

simple example

Test if data follows a specific distribution. The \(\chi^2\) test compares observed data against the expected data under a given hypothesis (\(H_0\) distribution). Given \(n\) categories, each category has an observed value \(O_i\) and an expected value \(E_i\) (the rule or thumb is we need at least a value of \(5\) for each observed value for the CLT to apply - the higher the numner of samples the better). The error between each observed value and the corresponding expected value should be gaussian. Therefore, the sum of the standardised differences (the chi-squared statistic) follows a \(\chi^2\) distribution with \(n-1-p\) d.o.f. where \(p\) is the number of estimated parameters {if any})

\[\chi^2=\sum^n_{i=1}\frac{(O_i-E_i)^2}{E_i}\] We can reject the null hypothesis the distribution is a match if the calculated \(\chi^2\) statistic falls into the corresponding rejection region determined by the critical value from the chi-squared distribution at significance level \(\alpha\) If \(\chi^2\) is greater than the critical value, we reject \(H_0\), meaning the data does not follow the assumed distribution.

Pearsons Chi-Squared Test for Independence Check if two categorical variables are correlated
  • dchisq(x, df, ncp = 0, log = FALSE) gives the density (PDF)
  • pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • rchisq(n, df, ncp = 0) generates random deviates (pseudo-random generator)

Student’s t-distribution - function of standard normal and Chi-squared

The students’s t-distribution \(\boxed{t_\nu}\) is single parameter (d.o.f. \(\nu\)) continuous distribution. It is a function of the standard normal and chi-squared distribution \[t_\nu\equiv\frac{Z}{\sqrt{V/\nu}}\quad\text{where}\quad Z\sim N(0,1)\quad\text{and}\quad V\sim{\chi^2}_\nu\]

Students t-test null hypothesis rejection

Test if data follows a specific distribution. The \(\chi^2\) test compares observed data against the expected data under a given hypothesis (\(H_0\) distribution). Given \(n\) categories, each category has an observed value \(O_i\) and an expected value \(E_i\) (the rule or thumb is we need at least a value of \(5\) for each observed value for the CLT to apply - the higher the numner of samples the better). The error between each observed value and the corresponding expected value should be gaussian. Therefore, the sum of the standardised differences (the chi-squared statistic) follows a \(\chi^2\) distribution with \(n-1-p\) d.o.f. where \(p\) is the number of estimated parameters {if any})

\[\chi^2=\sum^n_{i=1}\frac{(O_i-E_i)^2}{E_i}\] We can reject the null hypothesis the distribution is a match if the calculated \(\chi^2\) statistic falls into the corresponding rejection region determined by the critical value from the chi-squared distribution at significance level \(\alpha\) If \(\chi^2\) is greater than the critical value, we reject \(H_0\), meaning the data does not follow the assumed distribution.

  • dt(x, df, ncp, log = FALSE) gives the density (PDF)
  • pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE) gives the distribution function (CDF)
  • qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE) gives the quantile function (Inverse CDF)
  • rt(n, df, ncp) generates random deviates (pseudo-random generator)

Boltzmann distribution

Rayleigh distribution

Cauchy distribution

Dirichlet distribution