Small Worlds and Large Worlds

H W 3

Q U E S T I O N S

1. DEFINE

Prior Pr(p) is the estimate of what you know, think you know, or your best guess plausible distribution for each possible value of parameter. In the example in the book, our posterior has a uniform distribution (changeable), and can be any value from 0 to 1.

Posterior Pr(your data|p)*Pr(p), or likelihood * prior, this is the updated distribution of your data. You divide the posterior by the average of the priors to normalize the posterior to 1.

Likelihood Pr(your data|p) is the distribution function assigned to your observed variable. The probability of getting your data. In the example in the book, Fig. 2.6 shows the likelihood in the middle graphs, as the distribution of the new data. These particular data follow a binomial distribution (because you can only get either W or L).

2. TEACH

Bayes’ Theorem to a 10 year old

THE QUESTION

Given the data below, what is the probability of have a green plant (G) given that it is short (S) Pr(G|S)?

\[ \color{#00B9BE}{\Pr(G\mid S)} = \frac{ \color{#CA75FD}{\Pr(S\mid G)} \color{#78A900}{\Pr(G)} }{ \color{#F0726A}{\Pr(S)} } \]

THE DATA

	Green (G)	Yellow (Y)	Row Total
Pea Plant Data
Tall	700	300	1,000
Short	300	100	400
Total	1,000	400	1,400
Analysis of Biological Data, Whitlock & Schluter

??
Pr(S|G): 300/1000 = 0.3
Pr(G): 1000/1400 = 0.7143
Pr(S): 400/1400 = 0.2857

(0.3*0.7143)/0.2857

## [1] 0.7500525

Pr(G|S) = 0.75

3. QUESTION

What is the difference between Bayesian plausibility and a non-Bayesian estimator?

4. INSTALL brms

Done! I had to uninstall rlang and then reinstall both rlang and brms, and voila!

5. GARDEN OF FORKING PATHS

Maybe this is another way of saying that probability, as we often imagine it, is a farce, not because the world doesn’t branch, but because those branches are not equally live. The future is shaped by unresolved pasts. At each decision point Yu Tsun reaches, he believes his conscious mind is choosing, but it isn’t. His nervous system is. His childhood is.

He even names this himself:

“I carried out my plan because I felt the Chief had some fear of those of my race, of those uncountable forebears whose culmination lies in me. I wished to prove to him that a yellow man could save his armies. Besides, I had to escape the Captain. His hands and voice could, at any moment, knock and beckon at my door.”

Yu Tsun is not selecting among equally probable futures. He is acting from a history of racialized erasure and the urgent need to be seen as worthy. The child who was never seen is still governing the decision. In that sense, equal likelihood is a silly idea. Not because choice doesn’t exist, but because choice is constrained by internal states that are not neutral in the animal world. And that is the maze. It was never external, it was always internal.

N O T E S

By: Richard McElreath

The model below is referencing spinning a globe and landing on either water (W) or land (L). These are notes from Chapter 2 of Statistical Rethinking.

2.3 A Model is born

Below you can see the model math! We are using a binomial likelihood distribution (the common coin tossing distribution), because we have either W or L in our data. You can easily compute the likelihood using this distribution in r. We have n=9, with 6 waters and 3 lands:

dbinom(6, 9, prob=0.5)

## [1] 0.1640625

Here we are assuming water and land have an equal chance of showing up on our globe toss…but we could change probability and see how our likelihood changes.

dbinom(6, 9, prob=0.1)

## [1] 6.1236e-05

dbinom(6, 9, 0.9)

## [1] 0.04464104

dbinom(6, 9, 0.67) #using this because I know this is the max likelihood

## [1] 0.2730674

The unobserved parameter p similarily gets a uniform distribution. This means p has uniform-flat-prior over its entire possible range, from 0 to 1. This is obviously not the best we can do, since we know the Earth has more water than land…but let’s assume ignorance.

\(W\) ~ \(Binomial(N,p)\)

Where N = W + L, and the observed parameter \(p\) gets the following prior:

\(p\) ~ \(Uniform(0,1)\)

2.4 Making the model go

Bayes’ Theorem:

\[ \color{#00B9BE}{\Pr(p\mid W,L)} = \frac{ \color{#CA75FD}{\Pr(W,L\mid p)} \color{#78A900}{\Pr(p)} }{ \color{#F0726A}{\Pr(W,L)} } \]

POSTERIOR, LIKELIHOOD, PRIOR, EVIDENCE

The denominator \(Pr(W,L)\) is the EVIDENCE and is averaged over the PRIOR to standardize the POSTERIOR so it adds to 1. The denominator does not depend on the parameter, so it does not affect the shape of the POSTERIOR. For shape, comparison, intuition, and plotting you can drop it. For actual probabilities you should put it back via normalization.

The posterior is proportional to the prior times the likelihood; the denominator only normalizes.

The key lesson:

\[ \color{#00B9BE}{Posterior} \propto \color{#CA75FD}{Likelihood} * \color{#78A900}{Prior} \]

MOTORS

The 3 conditioning engines used in this book include:

(1) Grid Analysis

#define grid
p_grid <- seq(0, 1, length.out=20)

#define prior 
prior <- rep(1,20)

#compute likelihood at each value in grid
likelihood <- dbinom(6, 9, p_grid)

#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior

#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)

#now you have a grid of 20 points. 
par(mfrow = c(1, 3))

#plot it
plot(p_grid, posterior, type = "b", xlab="probability of water", ylab="posterior probability 1")
mtext("20 points")

#do it again for different priors

prior2 <- ifelse(p_grid < 0.5, 0, 1)
unstandardized.posterior2 <- likelihood*prior2
posterior2 <- unstandardized.posterior2/sum(unstandardized.posterior2)
plot(p_grid, posterior2, type = "b", xlab="probability of water", ylab="posterior probability 2")

prior3 <- exp(-5*abs(p_grid-0.5))
unstandardized.posterior3 <- likelihood*prior3
posterior3 <- unstandardized.posterior3/sum(unstandardized.posterior3)
plot(p_grid, posterior3, type = "b", xlab="probability of water", ylab="posterior probability 3")

(2) Quadratic approximation

library(rethinking)

globe_quadratic <- quap(
  alist(
    W ~ dbinom(W+L,p), #binomial likelihood
    p~dunif(0,1) #uniform prior
    ), 
  data=list(W=6,L=3))

#display summary of quadratic approximation
precis(globe_quadratic)

##        mean        sd      5.5%     94.5%
## p 0.6666669 0.1571337 0.4155368 0.9177969

There is an 89% percentile interval. Assuming the posterior is Gaussian, it is maximized at 0.67, and its standard deviation is 0.16 for n=9.

#Now updating with sample size of 18, 12 water, 6 land

globe_quadratic <- quap(
  alist(
    W ~ dbinom(W+L,p), #binomial likelihood
    p~dunif(0,1) #uniform prior
    ), 
  data=list(W=12,L=6))

precis(globe_quadratic)

##        mean        sd      5.5%     94.5%
## p 0.6666662 0.1111104 0.4890902 0.8442421

There is an 89% percentile interval. Assuming the posterior is Gaussian, it is maximized at 0.67, and its standard deviation is 0.11 for n=18.

par(mfrow=c(1,2))
#check it using the analytic calculation
W <- 6
L <- 3

curve(dbeta(x, W+1, L+1), from=0, to=1, main="red = quadratic approx.", xlab="proportion of water, n=9", ylab="density", col="steelblue")

#quadratic approximation
curve(dnorm(x, 0.67, 0.16), lty=2, add=T, col="coral")

#See what happens as you increase your sample size
W2 <- 12
L2 <- 6

curve(dbeta(x, W2+1, L2+1), from=0, to=1, main="blue = exact posterior dist.", xlab="proportion of water, n=18", ylab="density", col="steelblue")

#quadratic approximation
curve(dnorm(x, 0.67, 0.11), lty=2, add=T, col="coral")

(3) Markov chain Monte Carlo (MCMC)

n_samples <- 1000
p <- rep(NA, n_samples) #create an empty vector to iterate over

p[1]<-0.5 #place 0.5 into the first NA in the vector above
W <- 6
L <- 3

#create a forloop
for (i in 2:n_samples) {
  p_new <- rnorm(1, p[i-1], 0.1)
  if (p_new < 0) p_new <- abs(p_new)
  if (p_new > 1) p_new <- 2-p_new
  q0 <- dbinom(W, W+L, p[i-1])
  q1 <- dbinom(W, W+L, p_new)
  p[i] <- ifelse(runif(1) < q1/q0, p_new, p[i-1])
}

The values in p are samples from teh posterior distribution. Now, to compare the analytical posterior:

dens(p, xlim=c(0,1))
curve(dbeta(x, W+1, L+1), lty=2, add=T)

P R O B Z

1| Write an expression that corresponds to the statement: the probability of rain on Monday?

Pr(rain|Monday) Pr(rain, Monday)/Pr(Monday)

2| Write a sentence for: Pr(Monday|rain)

The probability it is Monday given that it’s raining.

3| Write an expression for this statement: the probability that it is Monday, given that it is raining?

Pr(Monday|rain) Pr(rain|Monday)Pr(rain)/Pr(Monday)

5| Recall the globe tossing model from the chapter. Compute and plot the grid approximate posterior distribution for each of the following sets of observations. In each case, assume a uniform prior for p.

W, W, W
W, W, W, L
L, W, W, L, W, W, W

#define grid
p_grid <- seq(0, 1, length.out=20)

#define prior 
prior <- rep(1,20)

#compute likelihood at each value in grid
likelihood <- dbinom(3, 3, p_grid)

#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior

#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)

#now you have a grid of 20 points. 
par(mfrow = c(1, 3))

#plot it
plot(p_grid, posterior, type = "c", xlab="probability of water", ylab="posterior probability 3/3 W")

#compute likelihood at each value in grid
likelihood <- dbinom(3, 4, p_grid)

#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior

#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)

#plot it
plot(p_grid, posterior, type = "b", xlab="probability of water", ylab="posterior probability 3/4 W")

#compute likelihood at each value in grid
likelihood <- dbinom(5, 7, p_grid)

#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior

#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)

#plot it
plot(p_grid, posterior, type = "l", xlab="probability of water", ylab="posterior probability 5/7 W")

6| Now assume a prior for p that is equal to zero when p < 0.5 and is a positive constant when p ≥ 0.5. Again compute and plot the grid approximate posterior distribution for each of the sets of observations in the problem just above.

p_grid <- seq(0, 1, length.out=20)
#define prior 
prior <- ifelse(p_grid < 0.5, 0, 1) #this is the only thing that changes
likelihood <- dbinom(3, 3, p_grid)
unstandardized.posterior <- likelihood*prior
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
par(mfrow = c(1, 3))

#plot it
plot(p_grid, posterior, type = "c", xlab="probability of water", ylab="posterior probability 3/3 W", main="PRIOR CHANGE")

#compute likelihood at each value in grid
likelihood <- dbinom(3, 4, p_grid)

#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior

#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)

#plot it
plot(p_grid, posterior, type = "b", xlab="probability of water", ylab="posterior probability 3/4 W")

#compute likelihood at each value in grid
likelihood <- dbinom(5, 7, p_grid)

#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior

#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)

#plot it
plot(p_grid, posterior, type = "l", xlab="probability of water", ylab="posterior probability 5/7 W")

7| Suppose there are two globes, one for Earth and one for Mars. The Earth globe is 70% covered in water. The Mars globe is 100% land. Further suppose that one of these globes—you don’t know which—was tossed in the air and produced a “land” observation. Assume that each globe was equally likely to be tossed. Show that the posterior probability that the globe was the Earth, conditional on seeing “land” (Pr(Earth|land)), is 0.23.

KNOWNS Earth 0.7 W Mars 0 W

probability of Earth|land 0.23 probability of land|Earth 0.30 probability of land|Mars 1.0

priors are 0.5

\(Pr(Earth|land)=\frac{Pr(land|Earth)Pr(Earth)}{Pr(land)}\)

Probability on the denominator expands to the average probability of land:

\(Pr(Earth|land)=\frac{Pr(land|Earth)Pr(Earth)}{Pr(land|Earth)Pr(Earth)+Pr(land|Mars)Pr(Mars)}\)

Now just plug the values in:

0.3*0.5/(0.3*0.5+1*0.5)

## [1] 0.2307692

The prior probability was 0.5 and the updated probability is now 0.23.

8| Suppose you have a deck with only three cards. Each card has two sides, and each side is either black or white. One card has two black sides. The second card has one black and one white side. The third card has two white sides. Now suppose all three cards are placed in a bag and shuffled. Someone reaches into the bag and pulls out a card and places it flat on a table. A black side is shown facing up, but you don’t know the color of the side facing down. Show that the probability that the other side is also black is 2/3. Use the counting method (Section 2 of the chapter) to approach this problem. This means counting up the ways that each card could produce the observed data (a black side facing up on the table).

black –> black black –> white white –> white

Updated conjecture knowing you have 1 black side: black –> black 2 * 1 (we already saw one of the sides) black –> white 1 * 1

2/3 ways to see black on the other side.