Pablo Rodriguez
July 10th 2015
The core of the art of statistical inference is estimating the probability distribution of a given population by sampling.
Usually this can not be done with infinite accuracy. As a rule of thumb: the larger the sample, the closer will be its mean and standard deviation to that of the population.
Let's see this in action.
Let's create a population with the following parameters:
popMean <- 0 # The population's mean
popSd <- 1 # The population's standard deviation
So it looks like:
If we sample 10 points from this population and calculate the mean and standard deviation of the samples we get:
N <- 10
sample <- rnorm(N, popMean, popSd)
sampleMean <- mean(sample) # The sample's mean
sampleSd <- sqrt(var(sample)) # The sample's standard deviation
print(sampleMean)
[1] 0.1596238
print(sampleSd)
[1] 0.8221985
As we expected, both distributions are similar but not exactly equal.
Bigger samples tend to led to better results.