Inference simulation

Pablo Rodriguez
July 10th 2015

Slide 2: Description

The core of the art of statistical inference is estimating the probability distribution of a given population by sampling.

Usually this can not be done with infinite accuracy. As a rule of thumb: the larger the sample, the closer will be its mean and standard deviation to that of the population.

Let's see this in action.

Slide 3: Simulating a population

Let's create a population with the following parameters:

popMean <- 0 # The population's mean
popSd <- 1 # The population's standard deviation

So it looks like:

plot of chunk unnamed-chunk-3

Slide 4: Guessing parameters through sampling

If we sample 10 points from this population and calculate the mean and standard deviation of the samples we get:

N <- 10
sample <- rnorm(N, popMean, popSd)
sampleMean <- mean(sample) # The sample's mean
sampleSd <- sqrt(var(sample)) # The sample's standard deviation
print(sampleMean)
[1] 0.1596238
print(sampleSd)
[1] 0.8221985

Slide 5: Plotting population vs. inferred distributions

As we expected, both distributions are similar but not exactly equal.

plot of chunk unnamed-chunk-7

Bigger samples tend to led to better results.