rm(list=ls()) # remove all variables from worspace
library(latex2exp)
This script addresses the Central Limit Theorem (zentraler Grenzwertsatz) that is essential to the Gaussian Distribution. Here we demonstrate it, by drawing from a binominal distribution, e.g. calculating the sums of n trials of a Bernoulli experiment. According to the Central Limit Theorem, we expect that those sums should have a normal distribution, although they originate from a different one.
To do this we simulate throwing a two faced coin. The probability of the coin showing a number is p for each trial. For a fair coin p=0.5. The probability of showing a sympbol is than 1-p. The Binominal distribution rests on the Bernoulli experiment, which only knows outcomes 0 and 1 and hence, here we take the number facing up as success (=1) and symbol as failure (=0).
Throwing a coin n=10 times, using p=0.5 (fair coin) can be simulated in R using a random number generator drawing from a binominal distribution B(n=10, size = 1, p=0.5).
(rbinom(n=10, size=1, prob = 1/2))
## [1] 0 0 0 1 1 0 1 1 0 1
That is one example of the outcome of n=10 experiments, in each I am throwing the coin once (one trial, in R this corresponds to size=1).
We can also simulate the sum of 100 trials (size = 100) and repeat this experiment n = 100.000 times.
x <-(rbinom(n=100000, size=100, prob = 1/2))
x now contains 100.000 sums of 100 trials. The histrogram of x yields:
hist(x, nclass=50, xlab = TeX("$x_{0.5,100}$"), main="")
We indeed find that the outcome resembles a normal distribution. Also, it peaks as expected at 50. With p=0.5, we expect 50 of 100 trials yield an outcome of 1, summed over all 100 trials this yields ideally a sum of 50.
See, what happens, if we change the shape of the underlying Bernoulli distribution, e.g. setting p=0.25 (we expect now that only 25% of the outcomes to be successfull and yield 1).
Example with B(n=10, size = 1, p=0.25) as above yields now
(rbinom(n=10, size=1, prob = 0.25))
## [1] 0 0 0 0 0 1 0 0 1 0
Much fewer successuful outcomes. For the large experiment with 100 trials per 100.000 experiments, we find:
x25<-(rbinom(n=100000, size=100, prob = 0.25))
hist(x25, nclass=50, xlab = TeX("$x_{0.25,100}$"), main="")
The shape still resembles the normal distribution. The peak shifted to 25, which is expected.
There is one situation, in which the shape of the distribution DOES change change, and this is when we calculate the sum of few trials, e.g. size=10, and chosing p=0.25. This yields the following:
x25_10<-(rbinom(n=100000, size=10, prob = 0.25))
hist(x25_10, main = "", xlab = TeX("$x_{0.25,10}$"))
The distribution is skewed and not Gaussian, although I did almost the same, as above and the Central Limit Theorem still applies. However, the potential outcomes are limited to zero and positive numbers, leaving only few options for oucomes between the expected value (2.5, which does not even exist!) and the minimum possible outcome (zero), whereas the maximum, but very rare outcome is 10. The outcomes beeing squeezed on the left but allowing large numbers on the right lead to a skew with a right tail.