Bayesian Statistics the Fun Way (Kurt, 2019) is an engaging introduction to the Bayesian school of statistics. A slightly paraphrased excerpt of the back description on the book discusses the subjects that the reader can expect to learn:
Calculation of distributions to see one’s range of beliefs. […] Comparison of different hypothesises to draw reliable conclusions. […] Theoretical and practical aspects of Bayes’ theorem. […] Calculatuon of the posterior and likelihood to work out accuracy of conclusions.
All of these things are to be done in the R programming language and the book comes with statistics problems. These are my notes regarding what I find interesting about the book; and my solutions to the book’s problems.
[TODO]
Question 1: Rewrite the following statements as equations using the mathematical notation you learned in this chapter:
Answer:
| Plain English | Mathematical Notation |
|---|---|
| The probability of rain is low | \[P(\text{rain}) << .5 \] |
| The probability of rain given that it is cloudy is high | \[P(\text{rain} \ | \ \text{cloudy}) >> .5 \] |
| The probability of you having an umbrella given it is raining is much greater than the probability of you having an umbrella in general. | \[ P(\text{umbrella} \ | \ \text{raining}) >> P(\text{umbrella}) \] |
Question 2: Organize the data you observe in the following scenario into a mathematical notation, using the techniques we’ve covered in this chapter. Then come up with a hypothesis to explain this data: “You come home from work and notice that your front door is open and the side window is broken. As you walk inside, you immediately notice that your laptop is missing.” (quoted from Kurt, 2019, ch. 1)
Answer: I contend that my laptop may have been stolen. Let \(P(s)\) stand for the probability that my laptop was stolen, \(d\) stand for broken door and \(w\) stand for broken window. A formal treatment of the hypothesis is as follows:
\[P(s | d \land w) > .5\]
Question 3: The following scenario adds data to the previous one. Demonstrate how this new information changes your beliefs and come up with a second hypothesis to explain the data, using the notation you’ve learned in this chapter: “A neighborhood child runs up to you and apologizes profusely for accidentally throwing a rock through your window. They claim that they saw the laptop and didn’t want it stolen so they opened the front door to grab it, and your laptop is safe at their house.” (quoted from Kurt, 2019, ch. 1)
Answer: Inheriting the variable names from the second answer, I will let \(c\) be the child’s confession. A formal treatment of this comparison is as follows:
\[P(s | d \land w) << P(\neg s | c)\]
[TODO]
Question 1: What is the probability of rolling two six-sided dice and getting a value greater than 7? (quoted from Kurt, 2019, ch. 2)
Answer: I put together the following script to work out the probability:
target <- 0.0
total <- 0.0
for (k in c(1:6)) {
for (w in c(1:6)) {
if (k + w > 7.0)
target = target + 1.0
total = total + 1.0
}
}
sprintf ("%s%s", "Answer to the problem: ", target / total)
## [1] "Answer to the problem: 0.416666666666667"
Question 2 What is the probability of rolling three six-sided dice and getting a value greater than 7? (quoted from Kurt, 2019, ch. 2)
Answer: I put together the following script (a modified version of the ch. 2, q. 1 solution) to work out the probability:
target <- 0.0
total <- 0.0
for (x in c(1:6)) {
for (y in c(1:6)) {
for (z in c(1:6)) {
if (x + y + z > 7.0)
target = target + 1.0
total = total + 1.0
}
}
}
sprintf ("%s%s", "Answer to the problem: ", target / total)
## [1] "Answer to the problem: 0.837962962962963"
Question 3: The Yankees are playing the Red Sox. You’re a diehard Sox fan and bet your friend they’ll win the game. You’ll pay your friend $30 if the Sox lose and your friend will have to pay you only $5 if the Sox win. What is the probability you have intuitively assigned to the belief that the Red Sox will win? (quoted from Kurt, 2019, ch. 2)
Answer: I used the formula:
\[ P(H) = \frac{O(H)}{1 + O(H)} \] to work out the subjective probability, where \(O(H)\) is the odds ratio of hypothesis \(H\). I will lose $30 if the Sox lose and gain $5 if the Sox win. I then have a \(O(H)\) of
\[ O(\text{sox}) = \frac{30}{5} = 6\]
Plugging this in, I get:
\[ P(O(\text{sox})) = \frac{6}{6 + 1} = \frac{6}{7} \]
Which represents a subjective probability of \(\sim 85\%\)
Given a probability \(P\) for some event \(x\):
Question 1: What is the probability of rolling a 20 three times in a row on a 20-sided die? (quoted from Kurt, 2019, ch. 3)
Answer: I shall apply the product rule for independent outcomes:
\[ P(20 \land 20 \land 20) = \Big(\frac{1}{20}\Big)^3 = 1.25 \times 10^{-4} \]
Question 2: The weather report says there’s a 10 percent chance of rain tomorrow, and you forget your umbrella half the time you go out. What is the probability that you’ll be caught in the rain without an umbrella tomorrow? (quoted from Kurt, 2019, ch. 3)
Answer: Again, I will apply the product rule for independent outcomes:
\[ P(\text{rain} \land \neg \text{umbrella}) = \frac{1}{10} \times \frac{1}{2} = .05 \]
Question 3: Raw eggs have a 1/20,000 probability of having salmonella. If you eat two raw eggs, what is the probability you ate a raw egg with salmonella? (quoted from Kurt, 2019, ch. 3)
Answer: I will make use of the adding rule for mutually-exclusive events:
\[ P(\text{salmonella}) = \frac{1}{20000} + \frac{1}{20000} = 1 \times 10^{-4} \] NOTE: that this answer is wrong, I should have used the formula for non-mutually exclusive events. The correct answer is near identical to my mistake though.
Question 4: What is the probability of either flipping two heads in two coin tosses or rolling three 6s in three six-sided dice rolls? (quoted from Kurt, 2019, ch. 3)
Answer: I will make use of the additive rule for non-mutually exclusive events:
\[ P(2 \ \text{Heads} \vee 3 \ \text{sixes}) = \frac{1}{4} + \frac{1}{216} - \Bigg[\frac{1}{4} \times \frac{1}{216}\Bigg] \approx .254\]
A binomial distribution is used to work out the probability of a dichotomous variable over many trials; its probability mass function is defined as \[ \phi_{\text{Binomial}}(k; n, p) = {n \choose k} p^k (1 - p)^{n - k} \]
… where \(p\) is the probability of the event happening, \(n\) is the number of trials and \(k\) is the number of successes of the event in question. Do note that the authors derive the binomial distribution’s probability mass function (pp. 34–41).
In the R language, it is possible to work out cumulative binomial probabilities with the following function:
pbinom(k, n, p, lower.tail=FALSE)
… where the lower.tail bit will work out the sum of values greater than \(k\) when set to FALSE and work out the sum of values less than \(k\) when set to TRUE.
The binomial distribution’s antagonist is a multinomial distribution is used to work out the probability of an event with more than two (2) outcomes over many trials.
[TODO]
Question 1: What are the parameters of a binomial distribution for the probability of rolling either a 1 or a 20 on a 20-sided die, if we roll 12 times?
Answer: The probability of the event \(p\) would be \(p = 0.1\) and the number of iterations \(n\) would be \(n = 12\) with \(k\) being the thing being tested; specifically, the probability mass function is: \[ \phi_{\text{event}}(k) = {12 \choose k} .1^k \times .9^{12 - k}\]
[TODO]
The beta distribution is used to estimate the metaprobability of a dichotomous thing being measured. Given parameters \(\alpha\) as count of successes and \(\beta\) for count of failures, its probability density function is defined as:
\[ \phi_{\text{Beta}}(p) = \frac{p^{\alpha - 1} \times (1 - p)^{\beta - 1}}{ \int_0^1 p^{\alpha - 1} \times (1 - p)^{\beta - 1}} \]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
The mean absolute deviation is a robust measure of dispersion that assumes data is spread out in a linear manner — it is formally defined as:
\[ \sigma_{\text{M.A.D}}(x) = n^{-1} \sum_{i=1}^n |x - \mu| \]
The standard deviation is a measure of dispersion that assumes that data is spread in a manner that errors decrease exponentially — it is formally defined as:
\[ \sigma = \sqrt{n^{-1} \sum_{i=1}^n (x_i - \mu)^2} \]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
[TODO]
Kurt, W. (2019). Bayesian Statistics the Fun Way: Understanding Statistics and Probability with Star Wars, LEGO, and Rubber Ducks. No Starch Press.
Note that these notes have originally been uploaded to rpubs.com circa the summer of 2021.