Q U E S T I O N S
1. DEFINE
Prior Pr(p) is the estimate of what you
know, think you know, or your best guess plausible distribution for each
possible value of parameter. In the example in the book, our posterior
has a uniform distribution (changeable), and can be any value from 0 to
1.
Posterior Pr(your data|p)*Pr(p), or
likelihood * prior, this is the updated distribution of your data. You
divide the posterior by the average of the priors to normalize the
posterior to 1.
Likelihood Pr(your data|p) is the
distribution function assigned to your observed variable. The
probability of getting your data. In the example in the book, Fig. 2.6
shows the likelihood in the middle graphs, as the distribution of the
new data. These particular data follow a binomial distribution (because
you can only get either W or L).
2. TEACH
Bayes’ Theorem to a 10 year old
THE QUESTION
Given the data below, what is the probability of have a green plant (G) given that it is short (S) Pr(G|S)?
\[ \color{#00B9BE}{\Pr(G\mid S)} = \frac{ \color{#CA75FD}{\Pr(S\mid G)} \color{#78A900}{\Pr(G)} }{ \color{#F0726A}{\Pr(S)} } \]
THE DATA
| Pea Plant Data | |||
| Green (G) | Yellow (Y) | Row Total | |
|---|---|---|---|
| Tall | 700 | 300 | 1,000 |
| Short | 300 | 100 | 400 |
| Total | 1,000 | 400 | 1,400 |
| Analysis of Biological Data, Whitlock & Schluter | |||
??
Pr(S|G): 300/1000 =
0.3
Pr(G): 1000/1400 =
0.7143
Pr(S): 400/1400 =
0.2857
(0.3*0.7143)/0.2857
## [1] 0.7500525
Pr(G|S) = 0.75
3. QUESTION
What is the difference between Bayesian plausibility and a non-Bayesian estimator?
4. INSTALL brms
Done! I had to uninstall rlang and then reinstall both
rlang and brms, and voila!
5. GARDEN OF FORKING PATHS
Maybe this is another way of saying that probability, as we often imagine it, is a farce, not because the world doesn’t branch, but because those branches are not equally live. The future is shaped by unresolved pasts. At each decision point Yu Tsun reaches, he believes his conscious mind is choosing, but it isn’t. His nervous system is. His childhood is.
He even names this himself:
“I carried out my plan because I felt the Chief had some fear of those of my race, of those uncountable forebears whose culmination lies in me. I wished to prove to him that a yellow man could save his armies. Besides, I had to escape the Captain. His hands and voice could, at any moment, knock and beckon at my door.”
Yu Tsun is not selecting among equally probable futures. He is acting from a history of racialized erasure and the urgent need to be seen as worthy. The child who was never seen is still governing the decision. In that sense, equal likelihood is a silly idea. Not because choice doesn’t exist, but because choice is constrained by internal states that are not neutral in the animal world. And that is the maze. It was never external, it was always internal.
By: Richard McElreath
The model below is referencing spinning a globe and landing on either water (W) or land (L). These are notes from Chapter 2 of Statistical Rethinking.
Below you can see the model math! We are using a binomial likelihood distribution (the common coin tossing distribution), because we have either W or L in our data. You can easily compute the likelihood using this distribution in r. We have n=9, with 6 waters and 3 lands:
dbinom(6, 9, prob=0.5)
## [1] 0.1640625
Here we are assuming water and land have an equal chance of showing up
on our globe toss…but we could change probability and see how our
likelihood changes.
dbinom(6, 9, prob=0.1)
## [1] 6.1236e-05
dbinom(6, 9, 0.9)
## [1] 0.04464104
dbinom(6, 9, 0.67) #using this because I know this is the max likelihood
## [1] 0.2730674
The unobserved parameter p similarily gets a uniform
distribution. This means p has
uniform-flat-prior over its entire possible range, from
0 to 1. This is obviously not the best we can do, since we know
the Earth has more water than land…but let’s assume ignorance.
\(W\) ~ \(Binomial(N,p)\)
Where N = W + L, and the observed parameter \(p\) gets the following prior:
\(p\) ~ \(Uniform(0,1)\)
\[ \color{#00B9BE}{\Pr(p\mid W,L)} = \frac{ \color{#CA75FD}{\Pr(W,L\mid p)} \color{#78A900}{\Pr(p)} }{ \color{#F0726A}{\Pr(W,L)} } \]
POSTERIOR, LIKELIHOOD, PRIOR, EVIDENCE
The denominator \(Pr(W,L)\) is the EVIDENCE and is averaged over the PRIOR to standardize the POSTERIOR so it adds to 1. The denominator does not depend on the parameter, so it does not affect the shape of the POSTERIOR. For shape, comparison, intuition, and plotting you can drop it. For actual probabilities you should put it back via normalization.
The posterior is proportional to the prior times the likelihood; the denominator only normalizes.
\[ \color{#00B9BE}{Posterior} \propto \color{#CA75FD}{Likelihood} * \color{#78A900}{Prior} \]
The 3 conditioning engines used in this book include:
#define grid
p_grid <- seq(0, 1, length.out=20)
#define prior
prior <- rep(1,20)
#compute likelihood at each value in grid
likelihood <- dbinom(6, 9, p_grid)
#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior
#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
#now you have a grid of 20 points.
par(mfrow = c(1, 3))
#plot it
plot(p_grid, posterior, type = "b", xlab="probability of water", ylab="posterior probability 1")
mtext("20 points")
#do it again for different priors
prior2 <- ifelse(p_grid < 0.5, 0, 1)
unstandardized.posterior2 <- likelihood*prior2
posterior2 <- unstandardized.posterior2/sum(unstandardized.posterior2)
plot(p_grid, posterior2, type = "b", xlab="probability of water", ylab="posterior probability 2")
prior3 <- exp(-5*abs(p_grid-0.5))
unstandardized.posterior3 <- likelihood*prior3
posterior3 <- unstandardized.posterior3/sum(unstandardized.posterior3)
plot(p_grid, posterior3, type = "b", xlab="probability of water", ylab="posterior probability 3")
library(rethinking)
globe_quadratic <- quap(
alist(
W ~ dbinom(W+L,p), #binomial likelihood
p~dunif(0,1) #uniform prior
),
data=list(W=6,L=3))
#display summary of quadratic approximation
precis(globe_quadratic)
## mean sd 5.5% 94.5%
## p 0.6666669 0.1571337 0.4155368 0.9177969
There is an 89% percentile interval. Assuming the posterior
is Gaussian, it is maximized at 0.67, and its standard deviation is 0.16
for n=9.
#Now updating with sample size of 18, 12 water, 6 land
globe_quadratic <- quap(
alist(
W ~ dbinom(W+L,p), #binomial likelihood
p~dunif(0,1) #uniform prior
),
data=list(W=12,L=6))
precis(globe_quadratic)
## mean sd 5.5% 94.5%
## p 0.6666662 0.1111104 0.4890902 0.8442421
There is an 89% percentile interval. Assuming the posterior
is Gaussian, it is maximized at 0.67, and its standard deviation is 0.11
for n=18.
par(mfrow=c(1,2))
#check it using the analytic calculation
W <- 6
L <- 3
curve(dbeta(x, W+1, L+1), from=0, to=1, main="red = quadratic approx.", xlab="proportion of water, n=9", ylab="density", col="steelblue")
#quadratic approximation
curve(dnorm(x, 0.67, 0.16), lty=2, add=T, col="coral")
#See what happens as you increase your sample size
W2 <- 12
L2 <- 6
curve(dbeta(x, W2+1, L2+1), from=0, to=1, main="blue = exact posterior dist.", xlab="proportion of water, n=18", ylab="density", col="steelblue")
#quadratic approximation
curve(dnorm(x, 0.67, 0.11), lty=2, add=T, col="coral")
n_samples <- 1000
p <- rep(NA, n_samples) #create an empty vector to iterate over
p[1]<-0.5 #place 0.5 into the first NA in the vector above
W <- 6
L <- 3
#create a forloop
for (i in 2:n_samples) {
p_new <- rnorm(1, p[i-1], 0.1)
if (p_new < 0) p_new <- abs(p_new)
if (p_new > 1) p_new <- 2-p_new
q0 <- dbinom(W, W+L, p[i-1])
q1 <- dbinom(W, W+L, p_new)
p[i] <- ifelse(runif(1) < q1/q0, p_new, p[i-1])
}
The values in p are samples from teh posterior
distribution. Now, to compare the analytical posterior:
dens(p, xlim=c(0,1))
curve(dbeta(x, W+1, L+1), lty=2, add=T)
1| Write an expression that corresponds to the statement: the probability of rain on Monday?
Pr(rain|Monday)
Pr(rain, Monday)/Pr(Monday)
2| Write a sentence for:
Pr(Monday|rain)
The probability it is Monday given that it’s raining.
3| Write an expression for this statement: the probability that it is Monday, given that it is raining?
Pr(Monday|rain)
Pr(rain|Monday)Pr(rain)/Pr(Monday)
5| Recall the globe tossing model from the chapter. Compute
and plot the grid approximate posterior distribution for each of the
following sets of observations. In each case, assume a uniform prior for
p.
#define grid
p_grid <- seq(0, 1, length.out=20)
#define prior
prior <- rep(1,20)
#compute likelihood at each value in grid
likelihood <- dbinom(3, 3, p_grid)
#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior
#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
#now you have a grid of 20 points.
par(mfrow = c(1, 3))
#plot it
plot(p_grid, posterior, type = "c", xlab="probability of water", ylab="posterior probability 3/3 W")
#compute likelihood at each value in grid
likelihood <- dbinom(3, 4, p_grid)
#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior
#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
#plot it
plot(p_grid, posterior, type = "b", xlab="probability of water", ylab="posterior probability 3/4 W")
#compute likelihood at each value in grid
likelihood <- dbinom(5, 7, p_grid)
#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior
#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
#plot it
plot(p_grid, posterior, type = "l", xlab="probability of water", ylab="posterior probability 5/7 W")
6| Now assume a prior for p that is equal to zero when p <
0.5 and is a positive constant when p ≥ 0.5. Again compute and plot the
grid approximate posterior distribution for each of the sets of
observations in the problem just above.
p_grid <- seq(0, 1, length.out=20)
#define prior
prior <- ifelse(p_grid < 0.5, 0, 1) #this is the only thing that changes
likelihood <- dbinom(3, 3, p_grid)
unstandardized.posterior <- likelihood*prior
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
par(mfrow = c(1, 3))
#plot it
plot(p_grid, posterior, type = "c", xlab="probability of water", ylab="posterior probability 3/3 W", main="PRIOR CHANGE")
#compute likelihood at each value in grid
likelihood <- dbinom(3, 4, p_grid)
#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior
#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
#plot it
plot(p_grid, posterior, type = "b", xlab="probability of water", ylab="posterior probability 3/4 W")
#compute likelihood at each value in grid
likelihood <- dbinom(5, 7, p_grid)
#compute product of likelihood and prior
unstandardized.posterior <- likelihood*prior
#standardize the posterior, so it sums to one
posterior <- unstandardized.posterior/sum(unstandardized.posterior)
#plot it
plot(p_grid, posterior, type = "l", xlab="probability of water", ylab="posterior probability 5/7 W")
7| Suppose there are two globes, one for Earth and one for
Mars. The Earth globe is 70% covered in water. The Mars globe is 100%
land. Further suppose that one of these globes—you don’t know which—was
tossed in the air and produced a “land” observation. Assume that each
globe was equally likely to be tossed. Show that the posterior
probability that the globe was the Earth, conditional on seeing “land”
(Pr(Earth|land)), is 0.23.
KNOWNS Earth 0.7 W Mars 0 W
probability of Earth|land 0.23 probability of land|Earth 0.30 probability of land|Mars 1.0
priors are 0.5
\(Pr(Earth|land)=\frac{Pr(land|Earth)Pr(Earth)}{Pr(land)}\)
Probability on the denominator expands to the average probability of land:
\(Pr(Earth|land)=\frac{Pr(land|Earth)Pr(Earth)}{Pr(land|Earth)Pr(Earth)+Pr(land|Mars)Pr(Mars)}\)
Now just plug the values in:
0.3*0.5/(0.3*0.5+1*0.5)
## [1] 0.2307692
The prior probability was 0.5 and the updated probability is now
0.23.
8| Suppose you have a deck with only three cards. Each card has two sides, and each side is either black or white. One card has two black sides. The second card has one black and one white side. The third card has two white sides. Now suppose all three cards are placed in a bag and shuffled. Someone reaches into the bag and pulls out a card and places it flat on a table. A black side is shown facing up, but you don’t know the color of the side facing down. Show that the probability that the other side is also black is 2/3. Use the counting method (Section 2 of the chapter) to approach this problem. This means counting up the ways that each card could produce the observed data (a black side facing up on the table).
black –> black black –> white white –> white
Updated conjecture knowing you have 1 black side: black –> black 2 * 1 (we already saw one of the sides) black –> white 1 * 1
2/3 ways to see black on the other side.