Bayesian Statistics the Fun Way (Kurt, 2019) is an engaging introduction to the Bayesian school of statistics. A slightly paraphrased excerpt of the back description on the book discusses the subjects that the reader can expect to learn:

Calculation of distributions to see one’s range of beliefs. […] Comparison of different hypothesises to draw reliable conclusions. […] Theoretical and practical aspects of Bayes’ theorem. […] Calculatuon of the posterior and likelihood to work out accuracy of conclusions.

All of these things are to be done in the R programming language and the book comes with statistics problems. These are my notes regarding what I find interesting about the book; and my solutions to the book’s problems.

Part I: Introduction to Probability

Chapter 1: Bayesian Thinking and Everyday Reasoning

Important notions

[TODO]

Problem sets

Question 1: Rewrite the following statements as equations using the mathematical notation you learned in this chapter:

  • “The probability of rain is low”
  • “The probability of rain given that it is cloudy is high”
  • “The probability of you having an umbrella given it is raining is much greater than the probability of you having an umbrella in general.” (quoted from Kurt, 2019, ch. 1)

Answer:

Plain English Mathematical Notation
The probability of rain is low \[P(\text{rain}) << .5 \]
The probability of rain given that it is cloudy is high \[P(\text{rain} \ | \ \text{cloudy}) >> .5 \]
The probability of you having an umbrella given it is raining is much greater than the probability of you having an umbrella in general. \[ P(\text{umbrella} \ | \ \text{raining}) >> P(\text{umbrella}) \]

Question 2: Organize the data you observe in the following scenario into a mathematical notation, using the techniques we’ve covered in this chapter. Then come up with a hypothesis to explain this data: “You come home from work and notice that your front door is open and the side window is broken. As you walk inside, you immediately notice that your laptop is missing.” (quoted from Kurt, 2019, ch. 1)

Answer: I contend that my laptop may have been stolen. Let \(P(s)\) stand for the probability that my laptop was stolen, \(d\) stand for broken door and \(w\) stand for broken window. A formal treatment of the hypothesis is as follows:

\[P(s | d \land w) > .5\]


Question 3: The following scenario adds data to the previous one. Demonstrate how this new information changes your beliefs and come up with a second hypothesis to explain the data, using the notation you’ve learned in this chapter: “A neighborhood child runs up to you and apologizes profusely for accidentally throwing a rock through your window. They claim that they saw the laptop and didn’t want it stolen so they opened the front door to grab it, and your laptop is safe at their house.” (quoted from Kurt, 2019, ch. 1)

Answer: Inheriting the variable names from the second answer, I will let \(c\) be the child’s confession. A formal treatment of this comparison is as follows:

\[P(s | d \land w) << P(\neg s | c)\]

Chapter 2: Measuring Uncertainty

Important notions

[TODO]

Problem sets

Question 1: What is the probability of rolling two six-sided dice and getting a value greater than 7? (quoted from Kurt, 2019, ch. 2)

Answer: I put together the following script to work out the probability:

target <- 0.0
total <- 0.0
for (k in c(1:6)) {
  for (w in c(1:6)) {
    if (k + w > 7.0) 
      target = target + 1.0
    total = total + 1.0
  }
}
sprintf ("%s%s", "Answer to the problem: ", target / total)
## [1] "Answer to the problem: 0.416666666666667"

Question 2 What is the probability of rolling three six-sided dice and getting a value greater than 7? (quoted from Kurt, 2019, ch. 2)

Answer: I put together the following script (a modified version of the ch. 2, q. 1 solution) to work out the probability:

target <- 0.0
total <- 0.0
for (x in c(1:6)) {
  for (y in c(1:6)) {
    for (z in c(1:6)) {
      if (x + y + z > 7.0) 
        target = target + 1.0
      total = total + 1.0
    }
  }
}
sprintf ("%s%s", "Answer to the problem: ", target / total)
## [1] "Answer to the problem: 0.837962962962963"

Question 3: The Yankees are playing the Red Sox. You’re a diehard Sox fan and bet your friend they’ll win the game. You’ll pay your friend $30 if the Sox lose and your friend will have to pay you only $5 if the Sox win. What is the probability you have intuitively assigned to the belief that the Red Sox will win? (quoted from Kurt, 2019, ch. 2)

Answer: I used the formula:

\[ P(H) = \frac{O(H)}{1 + O(H)} \] to work out the subjective probability, where \(O(H)\) is the odds ratio of hypothesis \(H\). I will lose $30 if the Sox lose and gain $5 if the Sox win. I then have a \(O(H)\) of

\[ O(\text{sox}) = \frac{30}{5} = 6\]

Plugging this in, I get:

\[ P(O(\text{sox})) = \frac{6}{6 + 1} = \frac{6}{7} \]

Which represents a subjective probability of \(\sim 85\%\)

Chapter 3: The Logic of Uncertainty

Important notions

Given a probability \(P\) for some event \(x\):

  • The negation of an event is: \[\neg P(X) = 1 - P(X)\]
  • The product rule for mutually exclusive events is: \[P(X_1 \land X_2 \land \ ... \land X_n) = P(X_1) \times P(X_2) \times \ ... \times P(X_n)\]
  • The addition rule for mutually exclusive events is: \[P(X_1 \lor X_2 \lor ... \lor X_n) = P(X_1) + P(X_2) + ... + P(X_n)\]
    • The addition rule for non-mutually exclusive events is: \[P(X_1 \lor X_2) = P(X_1) + P(X_2) - P(X_1 \land X_2)\]

Problem sets

Question 1: What is the probability of rolling a 20 three times in a row on a 20-sided die? (quoted from Kurt, 2019, ch. 3)

Answer: I shall apply the product rule for independent outcomes:

\[ P(20 \land 20 \land 20) = \Big(\frac{1}{20}\Big)^3 = 1.25 \times 10^{-4} \]


Question 2: The weather report says there’s a 10 percent chance of rain tomorrow, and you forget your umbrella half the time you go out. What is the probability that you’ll be caught in the rain without an umbrella tomorrow? (quoted from Kurt, 2019, ch. 3)

Answer: Again, I will apply the product rule for independent outcomes:

\[ P(\text{rain} \land \neg \text{umbrella}) = \frac{1}{10} \times \frac{1}{2} = .05 \]


Question 3: Raw eggs have a 1/20,000 probability of having salmonella. If you eat two raw eggs, what is the probability you ate a raw egg with salmonella? (quoted from Kurt, 2019, ch. 3)

Answer: I will make use of the adding rule for mutually-exclusive events:

\[ P(\text{salmonella}) = \frac{1}{20000} + \frac{1}{20000} = 1 \times 10^{-4} \] NOTE: that this answer is wrong, I should have used the formula for non-mutually exclusive events. The correct answer is near identical to my mistake though.


Question 4: What is the probability of either flipping two heads in two coin tosses or rolling three 6s in three six-sided dice rolls? (quoted from Kurt, 2019, ch. 3)

Answer: I will make use of the additive rule for non-mutually exclusive events:

\[ P(2 \ \text{Heads} \vee 3 \ \text{sixes}) = \frac{1}{4} + \frac{1}{216} - \Bigg[\frac{1}{4} \times \frac{1}{216}\Bigg] \approx .254\]

Chapter 4: Creating a Binomial Distribution

Important notions

A binomial distribution is used to work out the probability of a dichotomous variable over many trials; its probability mass function is defined as \[ \phi_{\text{Binomial}}(k; n, p) = {n \choose k} p^k (1 - p)^{n - k} \]

… where \(p\) is the probability of the event happening, \(n\) is the number of trials and \(k\) is the number of successes of the event in question. Do note that the authors derive the binomial distribution’s probability mass function (pp. 34–41).

In the R language, it is possible to work out cumulative binomial probabilities with the following function:

pbinom(k, n, p, lower.tail=FALSE)

… where the lower.tail bit will work out the sum of values greater than \(k\) when set to FALSE and work out the sum of values less than \(k\) when set to TRUE.

The binomial distribution’s antagonist is a multinomial distribution is used to work out the probability of an event with more than two (2) outcomes over many trials.

Problem sets

[TODO]

Question 1: What are the parameters of a binomial distribution for the probability of rolling either a 1 or a 20 on a 20-sided die, if we roll 12 times?

Answer: The probability of the event \(p\) would be \(p = 0.1\) and the number of iterations \(n\) would be \(n = 12\) with \(k\) being the thing being tested; specifically, the probability mass function is: \[ \phi_{\text{event}}(k) = {12 \choose k} .1^k \times .9^{12 - k}\]

Chapter 5: The Beta Distribution

Important notions

[TODO]

The beta distribution is used to estimate the metaprobability of a dichotomous thing being measured. Given parameters \(\alpha\) as count of successes and \(\beta\) for count of failures, its probability density function is defined as:

\[ \phi_{\text{Beta}}(p) = \frac{p^{\alpha - 1} \times (1 - p)^{\beta - 1}}{ \int_0^1 p^{\alpha - 1} \times (1 - p)^{\beta - 1}} \]

Problem sets

[TODO]

Part II: Bayesian Probability and Prior Probabilities

Chapter 6: Conditional Probability

Important notions

[TODO]

Problem sets

[TODO]

Chapter 7: Bayes’ Theorem with LEGO

Important notions

[TODO]

Problem sets

[TODO]

Chapter 8: The Prior, Likelihood and Posterior of Bayes Theorem

Important notions

[TODO]

Problem sets

[TODO]

Chapter 9: Bayesian Priors and Working with Probability Distributions

Important notions

[TODO]

Problem sets

[TODO]

Part III: Parameter Estimation

Chapter 10: Introduction to Averaging and Parameter Estimation

Important notions

[TODO]

Problem sets

[TODO]

Chapter 11: Measuring the Spread of Our Data

Important notions

[TODO]

The mean absolute deviation is a robust measure of dispersion that assumes data is spread out in a linear manner — it is formally defined as:

\[ \sigma_{\text{M.A.D}}(x) = n^{-1} \sum_{i=1}^n |x - \mu| \]

The standard deviation is a measure of dispersion that assumes that data is spread in a manner that errors decrease exponentially — it is formally defined as:

\[ \sigma = \sqrt{n^{-1} \sum_{i=1}^n (x_i - \mu)^2} \]

Problem sets

[TODO]

Chapter 12: The Normal Distribution

Important notions

[TODO]

Problem sets

[TODO]

Chapter 13: Tools of Parameter Estimation: The PDF, CDF, and Quantile Function

Important notions

[TODO]

Problem sets

[TODO]

Chapter 14: Parameter Estimation with Prior Probabilities

Important notions

[TODO]

Problem sets

[TODO]

Part IV: Hypothesis Testing: The Heart of Statistics

Chapter 15: From Parameter Estimation to Hypothesis Testing — Building a Bayesian A/B Test

Important notions

[TODO]

Problem sets

[TODO]

Chapter 16: Introduction to the Bayes Factor and Posterior Odds — The Competition of Ideas

Important notions

[TODO]

Problem sets

[TODO]

Chapter 17: Bayesian Reasoning in the Twilight Zone

Important notions

[TODO]

Problem sets

[TODO]

Chapter 18: When Data Doesn’t Convince You

Important notions

[TODO]

Problem sets

[TODO]

Chapter 19: From Hypothesis Testing to Parameter Estimation

Important notions

[TODO]

Problem sets

[TODO]

Works cited

Kurt, W. (2019). Bayesian Statistics the Fun Way: Understanding Statistics and Probability with Star Wars, LEGO, and Rubber Ducks. No Starch Press.


Note that these notes have originally been uploaded to rpubs.com circa the summer of 2021.