Bayesian Engines

Last week

2E4. The Bayesian statistician Bruno de Finetti (1906–1985) began his 1973 book on probability theory with the declaration: “PROBABILITY DOES NOT EXIST.” The capitals appeared in the original, so I imagine de Finetti wanted us to shout this statement. What he meant is that probability is a device for describing uncertainty from the perspective of an observer with limited knowledge; it has no objective reality. Discuss the globe tossing example from the chapter, in light of this statement. What does it mean to say “the probability of water is 0.7”?

Discussion

In contrast, Bayesian estimates are valid for any sample size. This does not mean that more data isn’t helpful—it certainly is. Rather, the estimates have a clear and valid interpretation, no matter the sample size. But the price for this power is dependency upon the initial plausibilities, the prior. If the prior is a bad one, then the resulting inference will be misleading.

Video

Click here

Post-video discussion

Why sampling?

Implement the engines

Draw cards
Define grid resolution
Initialise grid
Define prior
Define Likelihood
Compute posterior
Plot!

Draw cards

# Store draws (1 = R, 0 = B)
draws

Draw cards

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)

Grid resolution

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)
# We need to define how coarse the grid is
grid_points <-

Grid resolution

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)
# We need to define how coarse the grid is
grid_points <- 100

Implementing Grid Approximation

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)

# We need to define how coarse the grid is
grid_points <- 100

# Define Bayes theorem through grid approximation
grid_posterior <- tibble(

    # GRID OF PARAMETER VALUES
    
    # UNINFORMATIVE PRIOR
    
    # LIKELIHOOD
    
    # POSTERIOR
)

Grid of parameter values

A grid is simply a selection of values that the parameter(s) of interest can take. It’s a way to discretize a continuous distribution

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)

# We need to define how coarse the grid is
grid_points <- 100

# Define Bayes theorem through grid approximation
grid_posterior <- tibble(

    # GRID OF PARAMETER VALUES
    grid = seq(from       = 0, 
               to         = 1, 
               length.out = grid_points),
    # UNINFORMATIVE PRIOR
    
    # LIKELIHOOD
    
    # POSTERIOR
)

Uninformative prior

An uninformative prior is a uniform distribution where every parameter value has the same probability than the others

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)

# We need to define how coarse the grid is
grid_points <- 100

# Define Bayes theorem through grid approximation
grid_posterior <- tibble(

    # GRID OF PARAMETER VALUES
    grid = seq(from       = 0, 
               to         = 1, 
               length.out = grid_points),
    # UNINFORMATIVE PRIOR
    prior      = 1,
    # LIKELIHOOD
    
    # POSTERIOR
)

Likelihood

The probability of the data given a specific parameter value [P(D|p)]. Our data consist of red and black cards, so we are asking what is the probability of observing N red cards in N draws?

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)

# We need to define how coarse the grid is
grid_points <- 100

# Define Bayes theorem through grid approximation
grid_posterior <- tibble(

    # GRID OF PARAMETER VALUES
    grid = seq(from       = 0, 
               to         = 1, 
               length.out = grid_points),
    # UNINFORMATIVE PRIOR
    prior      = 1,
    # LIKELIHOOD
    likelihood = dbinom(sum(draws), size = length(draws), prob = grid),
    # POSTERIOR
)

Posterior

We have the grid, the prior and the likelihood, let’s apply Bayes’ theorem

# Store draws (1 = R, 0 = B)
draws <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1)

# We need to define how coarse the grid is
grid_points <- 100

# Define Bayes theorem through grid approximation
grid_posterior <- tibble(

    # GRID OF PARAMETER VALUES
    grid = seq(from       = 0, 
               to         = 1, 
               length.out = grid_points),
    # UNINFORMATIVE PRIOR
    prior      = 1,
    # LIKELIHOOD
    likelihood = dbinom(sum(draws), size = length(draws), prob = grid),
    # POSTERIOR
    posterior  = (prior * likelihood) / sum(prior * likelihood)
)

Plot!

grid_posterior %>% 
    ggplot(aes( x = grid, y = plausibility, colour = distribution)) +
    geom_point() +
    theme_minimal() +
    facet_grid(distribution~., scales = "free_y")

Exercises

2M1. Recall the globe tossing model from the chapter. Compute and plot the grid approximate posterior distribution for each of the following sets of observations. In each case, assume a uniform prior for p. (1) W, W, W (2) W, W, W, L (3) L, W, W, L, W, W, W

Exercises

2M2. Now assume a prior for p that is equal to zero when p < 0.5 and is a positive constant when p ≥ 0.5. Again compute and plot the grid approximate posterior distribution for each of the sets of observations in the problem just above.

Discussion

If you don’t have a strong argument for any particular prior, then try different ones. Because the prior is an assumption, it should be interrogated like other assumptions: by altering it and checking how sensitive inference is to the assumption. No one is required to swear an oath to the assumptions of a model, and no set of assumptions deserves our obedience.

Next time

Finish reading chapter 2 (from 2.3 to 2.5)
Exercise: 2E1, 2E2, 2E3, 2M3, 2M4, 2M5, 2M6, 2M7