Sampling from the posterior

Last Week

Grid approximation - a simple and not really efficient way to approximate the posterior distribution.

hint: remember the distribution is continuous and we want to avoid integrals


library(tidyverse)

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)

# We need to define how coarse the grid is
grid_points <- 100

# Define Bayes theorem through grid approximation
grid_posterior <- tibble(

    # GRID OF PARAMETER VALUES
    grid = seq(from       = 0, 
               to         = 1, 
               length.out = grid_points),
    # UNINFORMATIVE PRIOR
    prior      = 1,
    # LIKELIHOOD
    likelihood = dbinom(sum(draws), size = length(draws), prob = grid),
    # POSTERIOR
    posterior  = (prior * likelihood) / sum(prior * likelihood)
)

This week: Sampling

Why sampling?

To summarise the data
To simulate predictions

Why sampling?

If we want to sample to summarise the data (the posterior), we can simply sample with replacement the parameter values and create a frequency distribution of observed values

:::: columns

Why sampling?

Instead, if we want to sample to make predictions (inference), we need an extra step: 1. Sample a parameter value from the posterior 2. Simulate the next N observations with that value 3. Sample one of the observations (weighting each observation by its plausibility)

Repeat this over and over

Sampling

Imagine we put each possible value in a bag, in a proportion that respects its posterior plausibility. For instance, our bag will contain more 0.30 values that everything else.

We shuffle the bag and we extract one value at random.

Now we sample again

And again…

…and again…

… and again thousands of times

How to sample?

If we have a dataset containing the posterior values, we can sample with replacement its rows. If we have a vector, we can sample its values.

# if we work with a tibble/dataframe we can use dplyr `slice_sample`
slice_sample(my_dataframe, n = 1000, weight_by = plausibility, raplace = TRUE)

# If we work with a single vector of values we use the base function 'sample'
sample(my_vector, size = 1000, replace = TRUE, prob = plausibility)

n/size: number of samples to take
weight/prob: vector of plausibilities associated with each parameter value
replace: we want to add each value back after having extracted it

Our data looks like this at the moment

# A tibble: 6 × 3
    grid distribution plausibility
   <dbl> <chr>               <dbl>
1 0      prior            1   e+ 0
2 0      likelihood       0       
3 0      posterior        0       
4 0.0101 prior            1   e+ 0
5 0.0101 likelihood       3.40e-12
6 0.0101 posterior        4.12e-13

The weights/probs are stored in:

Our data looks like this at the moment

# A tibble: 6 × 3
    grid distribution plausibility
   <dbl> <chr>               <dbl>
1 0      prior            1   e+ 0
2 0      likelihood       0       
3 0      posterior        0       
4 0.0101 prior            1   e+ 0
5 0.0101 likelihood       3.40e-12
6 0.0101 posterior        4.12e-13

The weights/probs are stored in: plausibility

Our data looks like this at the moment

# A tibble: 6 × 3
    grid distribution plausibility
   <dbl> <chr>               <dbl>
1 0      prior            1   e+ 0
2 0      likelihood       0       
3 0      posterior        0       
4 0.0101 prior            1   e+ 0
5 0.0101 likelihood       3.40e-12
6 0.0101 posterior        4.12e-13

The weights/probs are stored in: plausibility

Having defined this, we can use slice_sample to sample from the posterior distribution

# NOTE: we need to select only the rows that refer to the posterior distribution!
# To do so we use `filter` to select those rows where distribution == posterior
posterior_samples <- grid_posterior %>% 
    filter(distribution == "posterior") %>% 
    slice_sample(n = 10000, weight_by = plausibility, replace = TRUE)

Looking at the posterior samples

Note, let’s compare the posterior distribution with the posterior samples

They look similar right?

However…

Pay attention that the two distributions have different meaning. Specifically, the sampling distribution is a distribution of frequencies of parameter values and not a distribution of plausibilities.

Summarise data

For instance, we want to know how much of the posterior probability lies between 0.3 and 0.7.

Summarise data

In this case we can compute the proportion of samples where the parameter value is within 0.3 and 0.7:

\[\text{interval} = \frac{\text{number samples in the interval}}{\text{total number of sample}}\]

posterior_samples %>% 
    summarise(proportion = sum(grid >= .3 & grid <= .7)/n())

# A tibble: 1 × 1
  proportion
       <dbl>
1      0.727

Another way to do this is:

[1] 0.7266

Remember: \(\frac{\text{number samples in the interval}}{\text{total number of sample}}\)

Summarise data

We can also find the compatibility/credibility intervals - aka the confidence intervals of the posterior distribution. While before we defined the boundaries of the data, here we define the mass we want to capture (eg. 80% of the data) and then we find the boundaries that contain this mass.

# Find the upper 20% of the data
quantile(posterior_samples$grid, .80)

      80% 
0.7272727

Summarise data

Skewed posterior distribution?

Let’s say we want to find the 50% compatibility interval of this distribution

Skewed posterior distribution?

# Compute the 50% compatibility interval
quantile(skewed_distribution$grid, c(.25, .75))

How to sample for prediction

Don’t worry, we break this down together step-by-step

# The first part is as before

# Assuming we have our grid posterior in long format from last time
grid_posterior %>%
    # We select only the rows that contain information regarding the posterior distribution
    filter(distribution == "posterior") %>% 
    # We now sample with replacement the remaining rows 
    slice_sample(n = 1000, weight_by = plausibility, replace = TRUE) %>% 
    
    # Now for each row (sample) we want to create a binomial distribution using the parameter value as probability
    # Basically it's like we are assuming that parameter value is the correct one. Once we have the binomial for each
    # row, we will sample one value from it as discussed at the beginning. To do all of this we:
    # 1. tell R to work rowwise (row-by-row)
    rowwise() %>% 
    #2. we create the binomial distribution and sample from it in one line. We will store only the sampled value
    mutate(sampled_values = sample(0:9, size = 1, prob = dbinom(0:9, 9, prob = grid)))

# A tibble: 6 × 4
# Rowwise: 
   grid distribution plausibility sampled_values
  <dbl> <chr>               <dbl>          <int>
1 0.677 posterior          0.0284              7
2 0.455 posterior          0.0142              4
3 0.616 posterior          0.0293              6
4 0.677 posterior          0.0284              6
5 0.687 posterior          0.0277              7
6 0.556 posterior          0.0255              8

To be continued

Exercises

From 3E1 to 3E5