Statistical Rethinking - Week 2.1 - Small Worlds and Large World

We have a bag with four marbles (or a globe with four sides, if you wish). We don’t know how many are of each color.
W: Water L: Land

0. ⚫⚫⚫⚫	0% water
1. ⚪⚫⚫⚫	25% water
2. ⚪⚪⚫⚫	50% water
3. ⚪⚪⚪⚫	75% water
4. ⚪⚪⚪⚪	100% water

Observation: WLW

Let’s get the number of ways for each \(p\) (proportion of water) can produce the observation.

# get ways for given p, number of W and number of L
get_ways <- function(p, numW, numL) {
  ways <- (4 * p)^numW * (4-4*p)^numL
  return(ways)
}

Let’s test it: Give me number of ways 75% water can give 2 W and 1 L.

get_ways(p=0.75, numW=2, numL=1)

## [1] 9

Now let’s apply it for all possible probabilities in our 4 sided world:

p<-c(0, 0.25, 0.50, 0.75,1)
sapply(p, get_ways, numW=2, numL=1)

## [1] 0 3 8 9 0

And let’s create the table

ways <- sapply(p, get_ways, numW=2, numL=1)
prob <- ways/sum(ways)
data.frame(p, ways, prob)

##      p ways prob
## 1 0.00    0 0.00
## 2 0.25    3 0.15
## 3 0.50    8 0.40
## 4 0.75    9 0.45
## 5 1.00    0 0.00

Good. We have the values. Now, let’s generalize it and turn it to a function that gets the data and p values as input and gives us the same table as output.

compute_posterior <- function(my_sample, p) {
  W <- sum(my_sample == "W")
  L <- sum(my_sample == "L")
  
  ways <- sapply(p, get_ways, numW=W, numL=L)
  post <- ways/sum(ways)
  data.frame(p, ways, post)
}

compute_posterior(c("W", "L", "W"), p=p)

##      p ways post
## 1 0.00    0 0.00
## 2 0.25    3 0.15
## 3 0.50    8 0.40
## 4 0.75    9 0.45
## 5 1.00    0 0.00

Let’s try it with a new observation

compute_posterior(c("W", "L", "W","W"), p=p)

##      p ways       post
## 1 0.00    0 0.00000000
## 2 0.25    3 0.06521739
## 3 0.50   16 0.34782609
## 4 0.75   27 0.58695652
## 5 1.00    0 0.00000000

posterior<- compute_posterior(c("W", "L", "W","W"), p=p)

ggplot(posterior, aes(y=post, x=as.factor(p))) +
  geom_col()

Moving to continuous world

Now instead of four sides, let’s move to a continuous word, where possible values of p are from 0 to 1.

Method 1 - Grid Approximation

In this method, we pick some p values in an equal space. Normally the grid size would be much larger but for demonstration let’s pick with intervals of 0.1

p_grid <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)

Since we don’t have any specific assumptions about the world we are, let’s get the prior as 1 at each point.

prior <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)

Building our new, continuous `get_ways()` function.

get_ways_grid <- function(p, numW, numL) {
  # more like a likelihood but same spirit
  # refer to page 33, or you can use dbinom(numW, numW+numL, p)
  
  likelihood <-(factorial(numW+numL)/(factorial(numW)*factorial(numL))) * (p^numW * (1-p)^numL)
return(likelihood)
  
}

Now let’s test it:

get_ways_grid(p=0.5, 2,1)

## [1] 0.375

compute_posterior_grid <- function(my_sample, prior, p) {
  W <- sum(my_sample == "W")
  L <- sum(my_sample == "L")
  
  # likelihood for each grid value
  likelihood <- sapply(p_grid, get_ways_grid, numW=W, numL=L)
  
  unstandardized_posterior <- likelihood * prior
  ## normalize posterior so they sum up to 1
  posterior <- unstandardized_posterior / sum(unstandardized_posterior)
  return(posterior)
}

observation <- c("W", "L", "W")
prior <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
p_grid <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)
posterior_grid <-compute_posterior_grid(observation, prior=prior, p_grid)
print(posterior_grid)

##  [1] 0.00000000 0.01090909 0.03878788 0.07636364 0.11636364 0.15151515
##  [7] 0.17454545 0.17818182 0.15515152 0.09818182 0.00000000

plot(p_grid, posterior_grid, "b")

Method 2 - The Analytical Approach

Using mathematial derivation, we will get the exact posterior, instead of approxmating it. Note that in real cases, we cannot often do that.

We we will use the beta distribution for that. (You can take a ook at the derivation here: https://www.youtube.com/watch?v=46Ym07yKf4A)

## Note that this is true for the "unassuming" prior.
compute_posterior_analytical <-function(my_sample, p_values) {
  W <- sum(my_sample == "W")
  L <- sum(my_sample == "L")
  
  posterior <- dbeta(p_values, W+1, L+1)
  return(posterior)
}

## We can c
p_values <- seq(0,1,0.01) # we can be generous here
posterior_analytical <- compute_posterior_analytical(c("W","L","W"), p_values)

plot(p_values, posterior_analytical,"l")

Assignment

Problem 1 - Suppose the globe tossing data had turned out to be 4 water and 11 land. Construct the posterior distribution.

We can directly get the analytical answer using beta distribution

  curve(dbeta(x, 4+1, 11+1), from=0, to=1)

And we can get the answer also with our grid approx. function. Let’s first create our observation.

observation <- c(rep("W", 4), rep("L", 11))
print(observation)

##  [1] "W" "W" "W" "W" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L"

Now let’s use our grid approximation function:

p_grid <- seq(0,1,0.01)
prior <- rep(1, length(p_grid)) # we would like to create 1 as prior for each p value
posterior_grid <- compute_posterior_grid(observation, prior, p_grid)

plot(p_grid, posterior_grid, "b")

Problem 2 - Using the posterior distribution from 1, compute the posterior predictive distribution for the next 5 tosses of the same globe.

Analytical way:

# Get a sample of p values
p_samples <- rbeta(10000, 4+1, 11+1)
 
# For each p value, draw 5 samples
W_sim_an <- rbinom(10000, 5, p=p_samples)


plot(table(W_sim_an))

Grid approximation:

# Sample from p values, given their probabilities (posterior)
p_samples <- sample(p_grid, 10000, prob=posterior_grid, replace=TRUE)
 
# For each p value, draw 5 samples
W_sim_grid <- rbinom(10000, 5, p=p_samples)


plot(table(W_sim_grid))

Problem 3 - Use the posterior predictive distribution from 2 to calculate the probability of 3 or more water samples in the next 5 tosses.

Now I can use the samples I draw.