library(ggplot2)

1 DISCLAIMER

Below are my solutions to exercises in Chapter 2 of Statistical Rethinking by Richard McElreath. Check out Richard’s page for details on the book and for online lectures here.

The results might not be correct, and I would appreciate comments and thoughts below.

If you haven’t tried to solved the exercises yourself, then this might not be too useful to you. Please don’t copy these results and hand it in as your own.

2 Easy

2.1 Exercise 2E1

the probability of rain on Monday

I think the statement could be interpreted in two different ways. First, the statement might just mean “what is the probability of rain on any given day”. In this cases, statement (1) is the only consistent statement.

OR…

It could also mean what the is the probability of rain given that it is Monday, in this case, statment (2) and (4) are correct.

I assume the author meant the latter.

2.2 Exercise 2E2

Pr(Monday|rain)

The statement reads: the probability that it is Monday given that it is raining, which is statement (3).

Statement (4) implies the joint probability of rain and Monday, which would need to be standardized by the probability of Monday to match the above expression.

2.3 Exercise 2E3

the probability that it is Monday, given that it is raining

In mathematica notation, this could be expressed as Pr(Monday|rain). By Bayes rule, this statement can also be expressed as Pr(rain|Monday)*Pr(Monday)/Pr(rain). So statements (1) and (4) are consistent with the assertion.

2.4 Exercise 2E4

To me the statement implies that if we had complete information, we could predict exactly where the finger would land on the globe with each toss. Our lack of information about how the globe is tossed, how it is caught, and about how much water vs. land there is (and possibly all sorts of minor things that might have very small effects in the final outcome, just wind, indoors/outdoors, strength of the person, strength of gravity at the point, the occurrence of solar flares, etc…), means we are uncertain about the outcome. Thus, the probability of water is 0.7 just measures our degree of uncertainty about the outcome of tossing the globe due to our imperfect knowledge of the world.


3 Medium

3.1 Exercise 2M1

3.1.1 2M1.1

count_W <- 3 # observed number of water
total_count <- 3 # total number of tosses
possible_p <- seq(0,1, length.out = 100) # possible proportion of water
prior_p <- 1 # prior probability of p
likelihood_p <- function(p, count, size) {
  return(dbinom(x = count, size = size, prob = p))
} # likelihood of p given the data
# the posterior is:
# P(p|n,w) = P(w|n,p)p(p)/(p(w))
post1 <- likelihood_p(possible_p, 
                      count = count_W,
                      size = total_count) * prior_p # posterior prob of p
                                                    # given the data
plot(possible_p, post1, type = "l")
Posterior for 2M1.1

Posterior for 2M1.1

3.1.2 2M1.2

count_W <- 3
total_count <- 4
possible_p <- seq(0,1, length.out = 100)
prior_p <- 1
post2 <- likelihood_p(possible_p, 
                      count = count_W,
                      size = total_count) * prior_p
post2_std <- post2/sum(post2)
plot(possible_p, post2_std, type = "l")
Posterior for 2M1.2

Posterior for 2M1.2

3.1.3 2M1.3

count_W <- 5
total_count <- 7
possible_p <- seq(0,1, length.out = 100)
prior_p <- 1
post3 <- likelihood_p(possible_p, 
                      count = count_W,
                      size = total_count) * prior_p
post3_std <- post3/sum(post3)
plot(possible_p, post3_std, type = "l")
Posterior for 2M1.3

Posterior for 2M1.3

3.2 Exercise 2M2

3.2.1 2M2.1

count_W <- 3
total_count <- 3
possible_p <- seq(0,1, length.out = 100)
prior_p <- ifelse(possible_p < 0.5, 0, 10)
post4 <- likelihood_p(possible_p, 
                      count = count_W,
                      size = total_count) * prior_p
post4_std <- post4/sum(post4)
plot(possible_p, post4_std, type = "l")
Posterior for 2M2.1

Posterior for 2M2.1

3.2.2 2M2.2

count_W <- 3
total_count <- 4
possible_p <- seq(0,1, length.out = 100)
prior_p <- ifelse(possible_p < 0.5, 0, 10)
post5 <- likelihood_p(possible_p, 
                      count = count_W,
                      size = total_count) * prior_p
post5_std <- post5/sum(post5)
plot(possible_p, post5_std, type = "l")
Posterior for 2M2.2

Posterior for 2M2.2

3.2.3 2M2.3

count_W <- 5
total_count <- 7
possible_p <- seq(0,1, length.out = 100)
prior_p <- ifelse(possible_p < 0.5, 0, 10)
post6 <- likelihood_p(possible_p, 
                      count = count_W,
                      size = total_count) * prior_p
post6_std <- post6/sum(post6)
plot(possible_p, post6_std, type = "l")
Posterior for 2M2.3

Posterior for 2M2.3

3.3 Exercise 2M3

p_E = 1/2 # probability that globe is Earth
p_M = 1/2 # probability that globe is Mars

p_land_E = 0.3 # probability land given Earth p(land|Earth)
p_land_M = 1.0 # probability land given Mars p(land|Mars)

p_land = p_land_E*p_E + p_land_M*p_M # probability of land 

# What is p(Earth|land) ?
## p(Earth|land) = (p(land|Earth) * p(Earth)) / p(land)
p_earth_L = (p_land_E * p_E)/p_land
p_earth_L
## [1] 0.2307692

3.4 Exercise 2M4

# using the counting approach

# possible outcomes |First side black | Second side black |
# b1 -> b2          |         1     ` |       1           |
# b2 -> b1          |         1       |       1           |
# b1 -> w1          |         1       |       0           |
# w1 -> b1          |         0       |       0           |
# w1 -> w2          |         0       |       0           |
# w2 -> w1          |         0       |       0           |

# Total outcomes where the first card has a black side-up is 3
# Total outcomes where the first card had black side-up, and 
# the second side is also black is 2.
# So, the probability that the second side is black given that
# the first side was black is 2/3

In Bayes rule this can be solved as follows:

\(P(2SideBlack|1SideBlack) = \frac{P(1SideBlack|2SideBlack) * P(2SideBlack)}{p(1SideBlack)}\)

The probability that the first side is black given that the card is black on both sides is 1.

\(P(1SideBlack|2SideBlack) = 1\)

The probability that the card has black on both sides is 1/3.

\(P(2SideBlack) = 1/3\)

The probability that one side is black is 1/2 (there are 3 possible ways of having black on one side out of 6 possible ways).

\(p(1SideBlack) = 1/2\)

So, the results is: \(P(2SideBlack|1SideBlack) = \frac{1 * 1/3}{1/2}\)

\(P(2SideBlack|1SideBlack) = \frac{1*2}{3*1}\)

\(P(2SideBlack|1SideBlack) = 2/3\)

3.5 Exercise 2M5

# using the counting approach

# now we hae an extra 2 black sided card

# possible outcomes |First side black | Second side black |
# b1 -> b2          |         1     ` |       1           |
# b2 -> b1          |         1       |       1           |
# b1 -> b2          |         1     ` |       1           |
# b2 -> b1          |         1       |       1           |
# b1 -> w1          |         1       |       0           |
# w1 -> b1          |         0       |       0           |
# w1 -> w2          |         0       |       0           |
# w2 -> w1          |         0       |       0           |

# Total outcomes where the first card has a black side-up is now 5
# Total outcomes where the first card had black side-up, and 
# the second side is also black is 4.
# So, the probability that the second side is black given that
# the first side was black is 4/5

In Bayes rule this can be solved as follows:

\(P(2SideBlack|1SideBlack) = \frac{P(1SideBlack|2SideBlack) * P(2SideBlack)}{p(1SideBlack)}\)

The probability that the first side is black given that the card is black on both sides is 1.

\(P(1SideBlack|2SideBlack) = 1\)

The probability that the card has black on both sides is 1/2 (2 out 4 cards have black on both sides).

\(P(2SideBlack) = 1/2\)

The probability that one side is black is 5/8 (there are 5 possible ways of having black on one side out of 8 possible ways).

\(p(1SideBlack) = 5/8\)

So, the results is: \(P(2SideBlack|1SideBlack) = \frac{1 * 2/4}{5/8}\)

\(P(2SideBlack|1SideBlack) = \frac{2*8}{4*5}\)

\(P(2SideBlack|1SideBlack) = 4/5\)

3.6 2M6

# using the counting approach

# now we have different weights for different cards, making
# it less likely to pick a card with two black sides

# possible outcomes |First side black | Second side black |
# b1 -> b2          |         1     ` |       1           |
# b2 -> b1          |         1       |       1           |
# b1 -> w1          |         1       |       0           |
# w1 -> b1          |         0       |       0           |
# b1 -> w1          |         1       |       0           |
# w1 -> b1          |         0       |       0           |
# w1 -> w2          |         0       |       0           |
# w2 -> w1          |         0       |       0           |
# w1 -> w2          |         0       |       0           |
# w2 -> w1          |         0       |       0           |
# w1 -> w2          |         0       |       0           |
# w2 -> w1          |         0       |       0           |

# Total outcomes where the first card has a black side-up is 4
# Total outcomes where the first card had black side-up, and 
# the second side is also black is 2.
# So, the probability that the second side is black given that
# the first side was black is 2/4 or 1/2

In Bayes rule this can be solved as follows:

\(P(2SideBlack|1SideBlack) = \frac{P(1SideBlack|2SideBlack) * P(2SideBlack)}{p(1SideBlack)}\)

The probability that the first side is black given that the card is black on both sides is 1.

\(P(1SideBlack|2SideBlack) = 1\)

The probability that the card has black on both sides is 1/6.

\(P(2SideBlack) = 1/6\)

The probability that one side is black is 4/12 (there are 4 possible ways of having black on one side out of 12 possible ways).

\(p(1SideBlack) = 4/12\)

So, the results is: \(P(2SideBlack|1SideBlack) = \frac{1 * 1/6}{4/12}\)

\(P(2SideBlack|1SideBlack) = \frac{1*12}{6*4}\)

\(P(2SideBlack|1SideBlack) = 1/2\)

3.7 2M7

# the trick here is to do the proper conditioning
# i had problems because I kept forgetting to count cards with
# 2 black sides twice

# Valid | Correct |Colour of card 1 (backside col) | Colour of card 2 (backside col)
#  0    |  0      |b1(b2)                              | b1(w1)
#  1    |  1      |b1(b2)                              | w1(b1)
#  1    |  1      |b1(b2)                              | w1(w2)
#  1    |  1      |b1(b2)                              | w2(w1)
#  0    |  0      |b2(b1)                              | b1(w1)
#  1    |  1      |b2(b1)                              | w1(b1)
#  1    |  1      |b2(b1)                              | w1(w2)
#  1    |  1      |b2(b1)                              | w2(w1)
#  0    |  0      |b1(w1)                              | b1(b2)
#  0    |  0      |b1(w1)                              | b2(b1)
#  1    |  0      |b1(w1)                              | w1(w2)
#  1    |  0      |b1(w1)                              | w2(w1)
# ===== | ======= |  
#  8    |  6      |

# out of the 8 possible outcomes where the the first card has 
# a black side up, and second card has a white side up, there
# are six for which the first card has a black on the other
# side. Thus, the probability that the second side of the 
# first card is black, given that the first side was black, and
# the top side of the second ard is white is 6/8 = 3/4.

Using Bayes rule we have, \(P(B2 | 1B, 1W) = \frac{P(1W|1B,B2)P(1B|B2)P(B2)}{P(1B,1W)}\)

In English, the above statement is equivalent to saying:

The probability of having a card with two black sides (B2) given I have observed a card with at least 1 black side (1B), and then another with at least 1 white side (1W) is equal to the probability of observing a card with at least 1 white side given that I have observed a card with at least one black side that has 2 black sides (3/4) times the probability of observing a card with at least one black side given that it has 2 black sides (1) times the probability of observing a card with 2 black sides (1/3) divided by the joint probability of observing a card with at least 1 black side, and another card with at least 1 white side.

The joint probability P(1B, 1W) is computed as:

\(P(1B, 1W) = P(1W|1B,B2)P(1B|B2)P(B2) + P(1W|1B,B1)P(1B|B1)P(B1)\)

The first half of the LHS I have already described above, the second half (to the right of the + sign) is simply the probability of observing a card with at least 1 white side given that I have observed a card with at least 1 black side, and it has a single black side (2/4), times the probability of observing the black side of a card with only 1 black side (1/2) times the probability of a card with a sigle black side (1/3). \(P(1W|1B,B2) = 2/4\) can be derived from the table above by conditioning on the cases where the card has a single black side, and we see it first (four possible cases), and then counting how many time a card with a white side comes up first (2).

If we plug in the numbers:

\(P(B2 | 1B, 1W) = \frac{3/4 * 1 * 1/3}{(3/4 * 1 * 1/3) + (2/4 * 1/2 * 1/3))}\)

\(P(B2 | 1B, 1W) = \frac{3/12}{(3/12) + (1/12))}\)

\(P(B2 | 1B, 1W) = \frac{3/12}{4/12}\)

\(P(B2 | 1B, 1W) = (3 * 12)/(12 * 4)\)

\(P(B2 | 1B, 1W) = 3/4\)


4 Hard

4.1 2H1

p_A = 1/2 # prior probability of species A
p_B = 1/2 # prior probability of species B

p_twins_A = 0.1 # P(Twins|Species = A)
p_twins_B = 0.2 # P(Twins|Species = B)

# Given first birth are twins, what is the probability that
# second birth are also twins


# Given a species, births should be independent. 
# But, because we are unsure about the species, we need to 
# average out the species to estimate the probability of
# next birth being twins, given that the first birth 
# were twins.

# In other words,if I knew the species, knowing that the current 
# birth resulted in twins, would not change my expectation about 
# twins for the following birth.

# Thus, in principle:
# P(twins) = P(twins|Species = A)P(A) + P(twins|Species = B)P(B)

# However, knowing about a single twin birth, does change 
# the relative plausibilities of the two species. So, P(A) and
# P(B), are actually P(A|twins) and P(B|twins).

# P(A|twins) = P(twins|A)P(A)/P(twins)
# P(B|twins) = P(twins|B)P(B)/P(twins)
# P(twins) = P(twins|A)P(A) + P(twins|B)P(B)
p_twins = 0.1 * 0.5 + 0.2 * 0.5

p_A_twins = (0.1 * 0.5)/p_twins
p_B_twins = (0.2 * 0.5)/p_twins
p_A_twins
## [1] 0.3333333
p_B_twins
## [1] 0.6666667
p_twins_twins = 0.1 * p_A_twins + 0.2 * p_B_twins
p_twins_twins
## [1] 0.1666667

The result of 0.167 makes sense. If the species was species B, then the probability of twins would be 0.2, if species A, it would be 0.1. As we don’t know the species, it makes sense that the probability should be something in between those two values. Where exactly, will depend on how plausible species A and B are. Without any additional information, we see that that the probability of the twins is 0.15, or exactly in the middle. When observing a single twin birth, we should update the relative plausibility of each species accordingly. In doing so, the probability of observing twins in the next birth moves a little towards 0.2, reflecting that now species B is more plausible than species A.

4.2 2H2

# P(Species = A|Twins)?
# P(Species = A |Twins) = P(Twins |Species = A)P(A)/P(Twins)

#P(Twins) = P(Twins|Species = A)P(A) + P(Twins|Species = B)P(B)

p_A_twins = ( 0.1 * 0.5 ) / (( 0.1 * 0.5 ) + ( 0.2 * 0.5 ))
p_A_twins
## [1] 0.3333333

This result cames out from 2H1. It shows how priors for A and B are updated given that we have observed twins. So, we started with equal probabilities for each species, and now our updated probabilities are 2/3 for species B and 1/3 for species A.

4.3 2H3

# First birth = Twins
# Second birth = Single
# P(Species = A| Twins, Single)?

# P(Species = A | Twins, Single) = P(Twins | Species = A) *
#                                   P(Single | Species = A) *
#                                      P(A) /
#                                         P(Twins,Single)
# P(Twins,Single) =  P(Twins | Species = A) *
#                       P(Single | Species = A) *
#                          P(A) +
#                    P(Twins | Species = B) *
#                       P(Single | Species = B) *
#                          P(B)

p_A_twins_single = ( 0.1 * 0.9 * 0.5) / 
                      (( 0.1 * 0.9 * 0.5) + 
                            ( 0.2 * 0.8 * 0.5 ))
p_A_twins_single
## [1] 0.36

In 2H2, we saw how the probability of species A decreased to 1/3 by observing a single twin birth (since twins are more likely for species B). Now, observing a single birth, there is a modest increase in the updated probability of species A, as single births are more common in species A.

4.4 2H4

4.4.1 2H4.1

# Probability of species A given a postive genetic test for A?

# P(test = A | Species = A) = 0.80
# P(test = B | Species = A) = 0.20
# P(test = B | Species = B) = 0.65
# P(test = A | Species = B) = 0.35

# P(Species = A | test = A) = P(test = A | A)P(A)/P(test = A)

#P(test = A) = P(test = A | A)P(A) + P(test = A | B)P(B)

p_testA = 0.8 * 0.5 + 0.35 * 0.5
p_testA
## [1] 0.575
p_A_testA = (0.8 * 0.5) / p_testA
p_A_testA
## [1] 0.6956522

Following the above theme, the result makes sense. It is the average of the probability of the test giving the correct answer, and it giving an incorrect answer, weighted by the probability of observing each of the species.

4.4.2 2H4.2

# P(Species = A | test = A, twins, single)?

# P(Species = A | test = A, twins, single) =
#               P(test = A | Species = A) *
#               P(twins | Species = A) *
#               P(single | Species = A) *
#               P(Species = A) \
#               P(test = A, twins, single)

# P(test = A, twins, single) =
#               P(test = A | Species = A) *
#               P(twins | Species = A) *
#               P(single | Species = A) *
#               P(Species = A) +
#               P(test = A | Species = B) *
#               P(twins | Species = B) *
#               P(single | Species = B) *
#               P(Species = B)

# The individual probabilities are only conditional on
# species being equal to A because otherwise they are
# independent. A test = A does not depend on births, and
# births should be independent.

p_testA_twins_single = 0.8 * 0.1 * 0.9 * 0.5 +
                        0.35 * 0.2 * 0.8 * 0.5

p_A_testA_twins_single = (0.8 * 0.1 * 0.9 * 0.5) / p_testA_twins_single

p_A_testA_twins_single
## [1] 0.5625
# We can see that part of the above has already been resolved
# in 2H3, and continues the theme of updating our knowledge
# with new information. So, the above can be simplified as:

# P(test = A | A) = P(A | test = A)P(A)/P(test = A)
# Where P(A) = 0.36 and P(B) = 0.64

p_A_testA = (0.8 * 0.36) / (( 0.8 * 0.36) + (0.35 * 0.64))
p_A_testA
## [1] 0.5625

So, the conjunction of the evidence, suggest that species A is mildly more plausible than species B. But, I found it fascinating how the plausibilities changed with each additional line of evidence:

evidence = factor(rep(c("prior", 
                        "prior + twins", 
                        "prior + twins + single", 
                        "prior + twins + single + test = A"), 
               each = 2), 
               levels = c("prior", 
                        "prior + twins", 
                        "prior + twins + single", 
                        "prior + twins + single + test = A"))
species = rep(c("Species A", "Species B"), 4)
prob = c(0.5, 0.5, 0.33, 0.67, 0.36, 0.64, 0.57, 0.43)
change_df <- data.frame(species = species,
                        evidence = evidence,
                        prob = prob)
ggplot(change_df, aes(x = evidence, 
                      y = prob, 
                      colour = species, 
                      group = species)) +
  geom_line() +
  geom_point(size = 3) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) +
  xlab("Data") +
  ylab("Posterior Probability")