Why normal distribution are normal?

Introduction and example.

Suppose you and thousand of your friends line up on the halfway line of a soccer field(football pitch). Each of you has a coin in your hand. At the sound of the whistle, you begin flipping the coins. Each time a coin comes up heads, that person moves one step towards the left-hand goal. Each time a coin comes up heads, that person moves one step toward the right-hand goal. Each person flips the coin 16 times, follows the implied moves, and then stand still. Now we measure the distance of each person from the halfway line. Can you predict what proportion of the thousand people who are standing on the halfway line? How about the proportion 5 yards left of the line?

It’s hard to say where any individual person will end up, but you can say with great confidence what the collection of positions will be. The distances will be distributed in approximately normal, or Gaussian fashion. This is true even though the underlying distribution is binomial (each tosses of the coin is a Bernuilli experiment, the 16 flipping follows a binomial distribution ). It does this because there are so many more possible ways to realize a sequence that ends up one step left or right of zero, and so on, with the number of possible sequences declining in the characteristic bell curve of the normal distribution.

Normal by addition

Let’s see this result, by simulating this experiment in R. To show that there’s nothing special about the underlying coin flip, assume instead that each step is different from all the others, a random distance between zero and one meter These are the individual steps. Then we add these steps together to get the position after 16 steps. Then we need to replicate this procedure 1000 times. It will be usefull to take into account the central limit theorem to calculate the theoretical gaussian: Let \(S_n = X_1, X_2, ... X_n\) the sum of n random variables i.i.d with finite variance \(\sigma^2\). Denote \(\mu\) the expectation of each variables \(X_i\), it means that \[E(X_1) =E(X_2)...= E(X_n) = \mu\] then: \[\frac{S_n - n\mu}{\sqrt{n}} \backsim N(0, \sigma^2) \quad \rightarrow \quad S_n \backsim N(0, n\sigma^2)\] On another hand, it’ll be usefull the expectation and variance of the uniforme distribution. As seeing in the link: \(E(X) = \frac{a + b}{2}\) and \(Var(X) = \frac{(b-a)^2}{12}\). As we’ve used: \(X_i \backsim Uniform(-1,1)\) then \(E(X_i) = 0\) and \(Var(X_i) = \frac{(1 + 1)^2}{12} = \frac{1}{3}\)

Let’s make clearer how to calculate the Expected Value and Variance of a sum of random variables. The Expected Value operator is linear in the sense of:. \[E [X + Y] = E[X] + E[Y] \\ E[aX] = aE[X]\] as we have \[E[X_1 + X_2+...X_n]= E[X_1] + E[X_2] + ... + E[X_n]\] and \[E[X_1] = E[X_2] ... = E[X_n] = \mu\] so

\[E[X_1 + X_2+...X_n] = n\mu\].

Respect of the Variance:

\[Var(\sum_{i=1}^NX_i) = \sum_{i = 1}^N Var(X_i) + \sum_{i \neq j}^N Cov(X_i, X_j)\] The second term of this expression is zero because the random variables are uncorrelated, so the operator become lineal. Applying: \[Var(\sum_{i=1}^NX_i) = \sum_{i = 1}^N Var(X_i) = Var(X_1) + Var(X_2) + Var(X_3) ... + Var(X_n)\] as: \[Var(X_1) = Var(X_2) = Var(X_3) ... = Var(X_n) = \sigma^2\] so: \[\sum_{i = 1}^N Var(X_i) = n\sigma^2\]

Follow with the problem, let’s simulate the experiment of tosses a coin 2, 4, 8 and 16 times. As we have 10000 friends flipping the coing, let’s repeat each experiment 10000 times. We’ll plot the histogram, the approximated density function(blue) and the theoretical normal (green). The next chunk of code is a function to plot and compare the histogram, the density function and the theoretical normal distribution:

plotsLCT <- function(n, max_d, x_lim, title){
  pos <- replicate(10000, sum(runif(n, -1, 1)))
  hist(pos, breaks = 20, col = "red", freq = FALSE, main = title,
       xlab = "distances from the halfway line", ylab = "proportion",
       xlim = c(-x_lim, x_lim), ylim = c(0, max_d))
  lines(density(pos), col = "blue", lwd = 2)
  lines(seq(from = -x_lim, to = x_lim, length.out = 1000) ,
        dnorm(seq(from = -x_lim, to = x_lim, length.out = 1000), mean = 0, sd = sqrt(n/3)),
        col = "green", lwd = 4)
}

So as it’s seen, the expected value is always \(E[S_n] = 0\) and depending of the number of trials(\(n = 2, n = 4, n = 8, n = 16\))we’ll have: \[n = 2 \quad \rightarrow \quad Var(S_n) = \frac{n}{3} =\frac{2}{3} \]

plotsLCT(n = 2, max_d = 0.6, x_lim = 3, title = "2 flips")

\[n = 4 \quad \rightarrow \quad Var(S_n) = \frac{n}{3} =\frac{4}{3} \]

plotsLCT(n = 4, max_d = 0.4, x_lim = 4, title = "4 flips")

\[n = 8 \quad \rightarrow \quad Var(S_n) = \frac{n}{3} =\frac{8}{3} \]

plotsLCT(n = 8, max_d = 0.3, x_lim = 5, title = "8 flips")

\[n = 16 \quad \rightarrow \quad Var(S_n) = \frac{n}{3} =\frac{16}{3} \]

plotsLCT(n = 16, max_d = 0.3, x_lim = 7, title = "16 flips")

The more number of trials, the more is the convergence of the experiment to the normal distribution.

Any process that adds together random values from the same distribution converges to a normal. But it’s not easy to grasp why addition should result in a bell curve of sums. Here’s a conceptual way to think of the process. Whatever the average value of the source distribution, each sample from it can be thought of as a fluctuation from that average value. When we begin to add these fluctuations together, they also begin to cancel one another out. A large positive gluctuation will cancel a large negative one. The more terms in the sum, the more chaces for each fluctuation to be canceled by another, or by a series of smaller ones in the opposite direction. So eventually the most likely sum, in the sense that there are the most ways to realize it, will be a sum in chich every fluctuation is canceled by another, a sum of zero (relative to the mean).

It doesn’t matter what shape the underlying distribution possesses. It could be uniform like in our example above, or it could be anything else. Technically, the distribution of sums converges to normal only when the original distribution has finitie variance. What this means practically is that the magnitude of any newly sample value cannot be so big as to overwhelm all of the previuos values. There are natural phenomena with infinite variance, but we won’t be working with any. Depending upon the underlying distribution, the convergence might be slow, but it will be inevitable. Often, as in this exmaple, convergence is rapid.

Why normal distribution?

Reference Book: Statistical Rethinking from Richard McElreath

Sergio Marrero

January 15, 2018

Why normal distribution are normal?

Introduction and example.

Normal by addition