Derivation - The Central Limit Theorem |
|
The central limit theorem allows us to approximate a sum or average of i.i.d random variables by a normal random variable.
Given a random variable \(\mathcal{X}\), its cumulative probability distribution is given by: \[P_\mathcal{X}(\mathcal{X}\leq x)=h(x)\]
We can transform this random variable into \(\mathcal{y}\) using: \[\mathcal{Y}=t(\mathcal{X})\]
Theorem the cumulative probability \(P(\mathcal{X}\leq x)\) is equal to the cumulative probability \(P(\mathcal{Y}\leq y)\)
\[ \boxed{P(\mathcal{Y}\leq y)=P(h(\mathcal{X})\leq y)= P(\mathcal{X}\leq h^{-1}(y))=P(\mathcal{X}\leq x)\\ \quad\therefore\boxed{P(\mathcal{Y}\leq y)=P(\mathcal{X}\leq x)}\\} \]
The probability distribution of \(\mathcal{Y}\) is the derivative of of it’s cumulative distribution function:
\[ \begin{align} \text{pdf}_\mathcal{Y}(y)&=\frac{d}{dy}P(\mathcal{Y}\leq y)\\ &=\frac{d}{dy}P(\mathcal{X}\leq x)\quad\text{since }P(\mathcal{Y}\leq y)=P(\mathcal{X}\leq x)\\ &=\frac{d}{dx}P(\mathcal{X}\leq x)\cdot\frac{1}{\frac{dy}{dx}}\\ &=\frac{\text{pdf}_\mathcal{X}(y)}{\frac{dy(y)}{dx}}\\ \\ \therefore&\quad\boxed{ \boxed{\text{pdf}_\mathcal{Y}(y)=\frac{\frac{d}{dx}P(\mathcal{X}\leq x)}{\frac{dy(y)}{dx}}}\\ \qquad\quad\text{and}\\ \boxed{\text{pdf}_\mathcal{Y}(y)=\frac{\text{pdf}_\mathcal{X}(y)}{\frac{dy(y)}{dx}}} } \end{align} \]
A random variable with a non-standard normal population distribution \(\mathcal{X}\sim\mathcal{N(\mu,\sigma^2)}\) can be standardised (converted into standard form) \(\mathcal{Z}\sim\mathcal{N}(0,1)\) using the following change of variable: \[\boxed{\mathcal{Z}=\frac{\mathcal{X}-\mu}{\sigma}}\]
Similarly, a normal sample distribution using estimated mean and variance \(\mathcal{X}\sim\mathcal{N(\hat\mu,\hat\sigma^2)}\) may be standardised \(\mathcal{Z}\sim\mathcal{N}(0,1)\) using the following change of variable: \[\boxed{\mathcal{Z}=\frac{\mathcal{X}-\mu}{SE}=\frac{\mathcal{X}-\mu}{\sigma/\sqrt{n}}}\qquad SE\text{ is the standard error}\]
\[\begin{align} \text{pdf}_\mathcal{Z}(z)&=\frac{\frac{d}{dx}P(\mathcal{X}\leq x)}{\frac{dz(z)}{dx}}\\ &=\left(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right)\cdot\frac{dx}{dz}\\ &=\left(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right)\cdot\sigma\\ &\text{substituting }x\text{ for }z\dots\\ &=\frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}=\mathcal{N}(0,1)\qquad QED \end{align}\]
The strong law is a fundamental postulates that is immediately obvious and the weak who’s purpose is slightly perplexing has origins in mathematical rigour to formally prove the former. To can wrap your head round the latter as follows, as the sample size increases enormously, there’s still a chance to sample a new improbably unlikely outlier that will still fluctuate the mean however, it becomes harder for these fluctuations to desturb the mean as the data grows in acordance with the theorem:
Let \(X_1,\dots,X_n\) be i.i.d. with:
Then the Weak Law states that the probability of tightening convergence is probability one (certain). It’s expressed like this:
\[\lim_{n\to 0}\mathbb{P}\left(\left|\frac{1}{n}\sum_{i=1}^n X_i-\mu\right|>\epsilon\right)=0\]
Let \(X_1,\dots,X_n\) be i.i.d. with:
Then the Strong Law states that the probability of absolute convergence with the mean is probability one (certain): \[\frac{1}{n}\sum_{i=1}^n X_i=\mu\]
Suppose random variables \(\mathcal{X_1},\dots\mathcal{X_n}\) are indenpendent and identically distributed i.i.d. with unknown probability distribution and all having mean \(\mu\) and variance \(\sigma^2\). If we sum (or average) these variables the random variable corresponding to the sum \(\mathcal{S}\) (or average \(\bar{\mathcal{X}})\) approximates a normal distribution< with increasing \(n\).
Sum \(\mathcal{S}\) | Average \(\bar{\mathcal{X}}\) | Notes | |
---|---|---|---|
expression | \[\mathcal{S}=\mathcal{X}_1+\dots+\mathcal{X_n}=\sum^n_{i=1}\mathcal{X}_i\] | \[\bar{\mathcal{X}}=\frac{\mathcal{X}_1+\dots+\mathcal{X_n}}{n}=\frac{1}{n}\sum^n_{i=1}\mathcal{X}_i\] | |
mean | \[E[\mathcal{S}]=n\mu\] | \[E[\bar{\mathcal{X}}]=\mu\] | |
variance | \[Var(\mathcal{S})=n\sigma^2\] | \[Var(\bar{\mathcal{X}})=\frac{\sigma^2}{n}\] | see linearity and scaling |
std. | \[\sigma_\mathcal{S}=\sqrt{n}\sigma\] | \[\sigma_{\bar{\mathcal{X}}}=\frac{\sigma}{\sqrt{n}}\] | |
approximation | \[\mathcal{S}\approx\mathcal{N}(n\mu,n\sigma^2)\] | \[\bar{\mathcal{X}}\approx\mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right)\] | for large \(n\) |
Standardisation | \[Z\sim\frac{\mathcal{S}-n\mu}{\sqrt{n}\sigma}\] | \[Z\sim\frac{\bar{\mathcal{X}}-\mu}{\frac{\sigma}{\sqrt{n}}}\] | \[\mathcal{N}(0,1)\] |
Linearity of Expectation Values (Reminder)
\[ \boxed{ \begin{align} \text{Linearity}&:\\\\ E[X+Y]&= \sum_i\left(x_i + y_i\right)P(\omega_i)\\ &= \sum_i x_iP(\omega_i) + \sum_i y_i P(\omega_i)\\ &= E[X]+E[Y]\\ \therefore\quad&\boxed{E\left[\sum_iX_i\right]=\sum_iE[X_i]} \end{align}\\} \]
Population Variance Shortcut Formula
We start with the definition of population variance:
\[ \mathrm{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] \]
Now, expand the squared term:
\[ (X - \mathbb{E}[X])^2 = X^2 - 2X\mathbb{E}[X] + (\mathbb{E}[X])^2 \]
Taking the expectation of both sides:
\[ \mathrm{Var}(X) = \mathbb{E}[X^2 - 2X\mathbb{E}[X] + (\mathbb{E}[X])^2] \]
Using the linearity of expectation:
\[ \mathrm{Var}(X) = \mathbb{E}[X^2] - 2\mathbb{E}[X]\mathbb{E}[X] + (\mathbb{E}[X])^2 \]
Simplifying:
\[ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \]
This is known as the computational formula or shortcut formula for variance.Linearity & Scaling of Variance for independent events (Reminder)
Let \(X\) and \(Y\) be two random variables. The variance of their sum is:
\[ \operatorname{Var}(X + Y) = \mathbb{E}[(X + Y)^2] - \left( \mathbb{E}[X + Y] \right)^2 \]
Expand the square:
\[ = \mathbb{E}[X^2 + 2XY + Y^2] - \left( \mathbb{E}[X] + \mathbb{E}[Y] \right)^2 \]
Apply linearity of expectation:
\[ = \mathbb{E}[X^2] + 2\mathbb{E}[XY] + \mathbb{E}[Y^2] - \left( \mathbb{E}[X]^2 + 2\mathbb{E}[X]\mathbb{E}[Y] + \mathbb{E}[Y]^2 \right) \]
Group terms:
\[ = \left( \mathbb{E}[X^2] - \mathbb{E}[X]^2 \right) + \left( \mathbb{E}[Y^2] - \mathbb{E}[Y]^2 \right) + 2\left( \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] \right) \]
Recognizing variance and covariance:
\[ = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X, Y) \]
If \(X\) and \(Y\) are independent, then \(\operatorname{Cov}(X, Y) = 0\), so:
\[ \operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) \]\[ \boxed{ \text{Variance Shifting & Scaling Properties}\\\\ \begin{align} \quad&\text{Let}\:\mu=E[X]\text{. and }E[aX+b]=a\mu+b\\\\ &\text{Then}\\ & Var(aX+b) = E[\left(aX+b−[a\mu+b]\right)^2]\\ &\quad = E[(aX−a\mu)^2] \\ &\quad = E[a^2(X−μ)2] \\ &\quad= a^2E[(X−μ)2] \\ &\quad = a^2Var(X)\\ &\therefore\:\boxed{Var(aX+b) = a^2Var(X)} \end{align}\\} \]