central_limit_theorem.knit

The central limit theorem allows us to approximate a sum or average of i.i.d random variables by a normal random variable.

1. Random Variable Transformation:

Given a random variable \(\mathcal{X}\), its cumulative probability distribution is given by: \[P_\mathcal{X}(\mathcal{X}\leq x)=h(x)\]

We can transform this random variable into \(\mathcal{y}\) using: \[\mathcal{Y}=t(\mathcal{X})\]

Theorem the cumulative probability \(P(\mathcal{X}\leq x)\) is equal to the cumulative probability \(P(\mathcal{Y}\leq y)\)

\[ \boxed{P(\mathcal{Y}\leq y)=P(h(\mathcal{X})\leq y)= P(\mathcal{X}\leq h^{-1}(y))=P(\mathcal{X}\leq x)\\ \quad\therefore\boxed{P(\mathcal{Y}\leq y)=P(\mathcal{X}\leq x)}\\} \]

The probability distribution of \(\mathcal{Y}\) is the derivative of of it’s cumulative distribution function:

\[ \begin{align} \text{pdf}_\mathcal{Y}(y)&=\frac{d}{dy}P(\mathcal{Y}\leq y)\\ &=\frac{d}{dy}P(\mathcal{X}\leq x)\quad\text{since }P(\mathcal{Y}\leq y)=P(\mathcal{X}\leq x)\\ &=\frac{d}{dx}P(\mathcal{X}\leq x)\cdot\frac{1}{\frac{dy}{dx}}\\ &=\frac{\text{pdf}_\mathcal{X}(y)}{\frac{dy(y)}{dx}}\\ \\ \therefore&\quad\boxed{ \boxed{\text{pdf}_\mathcal{Y}(y)=\frac{\frac{d}{dx}P(\mathcal{X}\leq x)}{\frac{dy(y)}{dx}}}\\ \qquad\quad\text{and}\\ \boxed{\text{pdf}_\mathcal{Y}(y)=\frac{\text{pdf}_\mathcal{X}(y)}{\frac{dy(y)}{dx}}} } \end{align} \]

2. Standardisation of Normat Distributions:

A random variable with a non-standard normal population distribution \(\mathcal{X}\sim\mathcal{N(\mu,\sigma^2)}\) can be standardised (converted into standard form) \(\mathcal{Z}\sim\mathcal{N}(0,1)\) using the following change of variable: \[\boxed{\mathcal{Z}=\frac{\mathcal{X}-\mu}{\sigma}}\]

Similarly, a normal sample distribution using estimated mean and variance \(\mathcal{X}\sim\mathcal{N(\hat\mu,\hat\sigma^2)}\) may be standardised \(\mathcal{Z}\sim\mathcal{N}(0,1)\) using the following change of variable: \[\boxed{\mathcal{Z}=\frac{\mathcal{X}-\mu}{SE}=\frac{\mathcal{X}-\mu}{\sigma/\sqrt{n}}}\qquad SE\text{ is the standard error}\]

\[\begin{align} \text{pdf}_\mathcal{Z}(z)&=\frac{\frac{d}{dx}P(\mathcal{X}\leq x)}{\frac{dz(z)}{dx}}\\ &=\left(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right)\cdot\frac{dx}{dz}\\ &=\left(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right)\cdot\sigma\\ &\text{substituting }x\text{ for }z\dots\\ &=\frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}=\mathcal{N}(0,1)\qquad QED \end{align}\]

3. Statement of Law of Large Numbers:

The strong law is a fundamental postulates that is immediately obvious and the weak who’s purpose is slightly perplexing has origins in mathematical rigour to formally prove the former. To can wrap your head round the latter as follows, as the sample size increases enormously, there’s still a chance to sample a new improbably unlikely outlier that will still fluctuate the mean however, it becomes harder for these fluctuations to desturb the mean as the data grows in acordance with the theorem:

Weak LoLN for i.i.d.

Let \(X_1,\dots,X_n\) be i.i.d. with:

finite mean \(\mu=\mathbb{E}[X_i]\)
finite variance \(\sigma^2=\text{Var}(X_i)\)

Then the Weak Law states that the probability of tightening convergence is probability one (certain). It’s expressed like this:

\[\lim_{n\to 0}\mathbb{P}\left(\left|\frac{1}{n}\sum_{i=1}^n X_i-\mu\right|>\epsilon\right)=0\]

Strong LoLN for i.i.d.

Let \(X_1,\dots,X_n\) be i.i.d. with:

finite mean \(\mu=\mathbb{E}[X_i]\)
finite variance \(\sigma^2=\text{Var}(X_i)\)

Then the Strong Law states that the probability of absolute convergence with the mean is probability one (certain): \[\frac{1}{n}\sum_{i=1}^n X_i=\mu\]

4. Statement of the Central Limit Theorem:

Suppose random variables \(\mathcal{X_1},\dots\mathcal{X_n}\) are indenpendent and identically distributed i.i.d. with unknown probability distribution and all having mean \(\mu\) and variance \(\sigma^2\). If we sum (or average) these variables the random variable corresponding to the sum \(\mathcal{S}\) (or average \(\bar{\mathcal{X}})\) approximates a normal distribution< with increasing \(n\).

	Sum \(\mathcal{S}\)	Average \(\bar{\mathcal{X}}\)	Notes
expression	\[\mathcal{S}=\mathcal{X}_1+\dots+\mathcal{X_n}=\sum^n_{i=1}\mathcal{X}_i\]	\[\bar{\mathcal{X}}=\frac{\mathcal{X}_1+\dots+\mathcal{X_n}}{n}=\frac{1}{n}\sum^n_{i=1}\mathcal{X}_i\]
mean	\[E[\mathcal{S}]=n\mu\]	\[E[\bar{\mathcal{X}}]=\mu\]
variance	\[Var(\mathcal{S})=n\sigma^2\]	\[Var(\bar{\mathcal{X}})=\frac{\sigma^2}{n}\]	see linearity and scaling
std.	\[\sigma_\mathcal{S}=\sqrt{n}\sigma\]	\[\sigma_{\bar{\mathcal{X}}}=\frac{\sigma}{\sqrt{n}}\]
approximation	\[\mathcal{S}\approx\mathcal{N}(n\mu,n\sigma^2)\]	\[\bar{\mathcal{X}}\approx\mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right)\]	for large \(n\)
Standardisation	\[Z\sim\frac{\mathcal{S}-n\mu}{\sqrt{n}\sigma}\]	\[Z\sim\frac{\bar{\mathcal{X}}-\mu}{\frac{\sigma}{\sqrt{n}}}\]	\[\mathcal{N}(0,1)\]

Useful Random Variable Algebra

Linearity of Expectation Values (Reminder)

We know basic random Variable Arithmetic…
- To add discrete random variables we just add the values of all permutations then calculate the probability of each of the sums
- To multiply discrete random variables we just multiply the values of all permutaions then calculate the probabilty of each of the products
- …
The expected value of a function of a random variable is given by: \(E[h(X)] = \sum_jh(x_j)p(x_j)\)

\[ \boxed{ \begin{align} \text{Linearity}&:\\\\ E[X+Y]&= \sum_i\left(x_i + y_i\right)P(\omega_i)\\ &= \sum_i x_iP(\omega_i) + \sum_i y_i P(\omega_i)\\ &= E[X]+E[Y]\\ \therefore\quad&\boxed{E\left[\sum_iX_i\right]=\sum_iE[X_i]} \end{align}\\} \]

Population Variance Shortcut Formula

\[ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \]

We start with the definition of population variance:

\[ \mathrm{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] \]

Now, expand the squared term:

\[ (X - \mathbb{E}[X])^2 = X^2 - 2X\mathbb{E}[X] + (\mathbb{E}[X])^2 \]

Taking the expectation of both sides:

\[ \mathrm{Var}(X) = \mathbb{E}[X^2 - 2X\mathbb{E}[X] + (\mathbb{E}[X])^2] \]

Using the linearity of expectation:

\[ \mathrm{Var}(X) = \mathbb{E}[X^2] - 2\mathbb{E}[X]\mathbb{E}[X] + (\mathbb{E}[X])^2 \]

Simplifying:

\[ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \]

This is known as the computational formula or shortcut formula for variance.

Linearity & Scaling of Variance for independent events (Reminder)

\[ \boxed{ \text{In the general case}:\\\:\\ \quad\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X, Y)\\ \:\\ \begin{align} \text{Linearity of Variance }&:\\\\ \text{if }X_i\:&\text{are all mutually independent then}\dots\\ \quad&\boxed{Var\left(\sum_iX_i\right)=\sum_iVar[X_i]} \end{align}\\ }\]

Let \(X\) and \(Y\) be two random variables. The variance of their sum is:

\[ \operatorname{Var}(X + Y) = \mathbb{E}[(X + Y)^2] - \left( \mathbb{E}[X + Y] \right)^2 \]

Expand the square:

\[ = \mathbb{E}[X^2 + 2XY + Y^2] - \left( \mathbb{E}[X] + \mathbb{E}[Y] \right)^2 \]

Apply linearity of expectation:

\[ = \mathbb{E}[X^2] + 2\mathbb{E}[XY] + \mathbb{E}[Y^2] - \left( \mathbb{E}[X]^2 + 2\mathbb{E}[X]\mathbb{E}[Y] + \mathbb{E}[Y]^2 \right) \]

Group terms:

\[ = \left( \mathbb{E}[X^2] - \mathbb{E}[X]^2 \right) + \left( \mathbb{E}[Y^2] - \mathbb{E}[Y]^2 \right) + 2\left( \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] \right) \]

Recognizing variance and covariance:

\[ = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X, Y) \]

If \(X\) and \(Y\) are independent, then \(\operatorname{Cov}(X, Y) = 0\), so:

\[ \operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) \]

\[ \boxed{ \text{Variance Shifting & Scaling Properties}\\\\ \begin{align} \quad&\text{Let}\:\mu=E[X]\text{. and }E[aX+b]=a\mu+b\\\\ &\text{Then}\\ & Var(aX+b) = E[\left(aX+b−[a\mu+b]\right)^2]\\ &\quad = E[(aX−a\mu)^2] \\ &\quad = E[a^2(X−μ)2] \\ &\quad= a^2E[(X−μ)2] \\ &\quad = a^2Var(X)\\ &\therefore\:\boxed{Var(aX+b) = a^2Var(X)} \end{align}\\} \]

Derivation - The Central Limit Theorem