Note: start recording
- Overview of the module: schedule, assessment, required reading
- Introducing probability and relevant notation
- Two rules of probability
- Conditional, joint, and marginal probability
- (Brief introduction to Bayes’ Theorem)
Note: start recording
Probability is a means to quantify uncertainty (like miles are a means to measure physical distance).
Examples:
Can’t say for sure but we can quantify uncertainty.
If a variable can take on more than one value, probability can be used to describe the uncertainty that it will take each one of its possible values.
Example: rolling a fair six-sided die (there is a 1 in 6 probability [0.1667] that the die can land on 1, 2, 3, 4, 5, or 6).
What day of the week was I born?
Consider: How many possible values does the variable of interest have and how many correct outcomes are possible?
Cannot be less certain than impossible or more certain than absolute certainty.
Example drawing a card from a standard deck: the probability of drawing a card of heart, diamond, club or spade is 13/52; so therefore 13/52 + 13/52 + 13/52 + 13/52 = 1
If \(X\) is a variable with possible values \(\{x_1, x_2, \ldots, x_k, \ldots, x_K\}\), then
\[ P(X = x_k) \]
gives the probability that \(X\) takes the value \(x_k\).
This notation says “the probability that the random variable \(X\) takes on the value \(x_k\)”.
Notation: \(x_1\) is the first value in a set; \(x_k\) is the \(k\)-th value (out of \(K\) possible values); \(x_K\) is the last value in the list
If \(X\) is a variable with possible values \(\{x_1, x_2, \ldots, x_k, \ldots, x_K\}\), then
\[ P(X = x_k) \]
If \(X\) = “number showing on a fair 6-sided die” then
\(P(X = x_k) = \dfrac{1}{6}\) for each \(x_k\)
cause \(X \in \{1, 2, 3, 4, 5, 6\}\).
For every possible value \(x_k\) that \(X\) could take, the probability \(P(X=x_k)\) must be between 0 (impossible) and 1 (certain).
\[ 0 \le P(X = x_k) \le 1 \qquad \forall x_k \]
If you add up the probabilities of all possible values that \(X\) can take (from the first value \(k=1\) to the last value \(k=K\)), the total must be exactly 1 – meaning one of the possible outcomes has to happen.
\[ \sum_{k=1}^{K} P(X = x_k) = 1. \] i.e. short for:
\[ P(X = x_1) + P(X = x_2) + \dots + P(X = x_k) + \dots + P(X = x_K) \]
On NOW, see “Week 1: Probability In-Class Questions”
Complete questions 1, 2.
If X is a random variable that takes three possible values, named A, B, and C, which of the following is probability distribution over X?
If X is a random variable that takes three possible values, named A, B, and C, which of the following is probability distribution over X?
Answer: 0.15, 0.15, 0.7
Why: If X can take the possible values, the probability of each of these values must add up to 1
If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.5 and the probability that X takes the value of B is 0.5, what is the probability that X takes the value of C?
If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.5 and the probability that X takes the value of B is 0.5, what is the probability that X takes the value of C?
Answer: 0
Why: Probability cannot exceed 1 and \(P(\text{A}) = 0.5 + P(\text{B}) = 0.5 = 1\), so \(P(\text{C})\) must be 0.
If two values are mutually exclusive, then the probability of either happening is the sum of their individual probabilities, i.e.
\[ P(X = x_k \ \text{or}\ X = x_j) = P(X = x_k) + P(X = x_j). \]
It applies when \(x_k \ne x_j\) — so they can’t happen at the same time.
Example (fair 6-sided die): What is the probability of rolling either a 2 or a 5?
\[ P(X = 2 \ \text{or}\ X = 5) = P(X = 2) + P(X = 5) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}. \]
You can’t get more than one response at the same time.
Complete questions 3, 4, 5, 6, 7.
If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.25 and the probability that X takes the value of B is 0.25, and the probability that X takes the value of C is 0.5, what is the probability that X takes the value of B or C?
If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.25 and the probability that X takes the value of B is 0.25, and the probability that X takes the value of C is 0.5, what is the probability that X takes the value of B or C?
Answer: 0.75
Why? If X is either B, C (and these are mutually exclusive outcomes) then the probability that it is either B or C is the sum of the individual probabilities
If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of either 0 or 1?
If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of either 0 or 1?
Answer: 1
Why: It has to take one of these values (0 or 1) so probability that it takes one of them is 1.
If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of 1?
If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of 1?
Answer: Unknown
Why: We have insufficient information. Not clear if the two possible events are equally likely (e.g. will it rain tomorrow vs will the sun rise tomorrow).
We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of obtaining either a six or a one or a five on a single throw?
We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of obtaining either a six or a one or a five on a single throw?
Answer: \(1/2\)
Why? \(1/6 + 1/6 + 1/6 = 3/6 = 1/2\)
We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of not obtaining a six on a single throw?
We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of not obtaining a six on a single throw?
Answer: \(5/6\)
Why? \(1/6 + 1/6 + 1/6 + 1/6 + 1/6\)
or \(6/6 - 1/6\)
or \(5 \times 1/6\)
If two different variables are independent (outcome of one does not affect the other), the joint probability of their outcomes is the product of their individual probabilities. If \(X\) and \(Y\) are independent, then
\[ P(X = x_k, Y = y_l) = P(X = x_k)\times P(Y = y_l). \]
The comma in \(P(X = x_k, Y = y_l)\) mean “and” and the order of \(X\) and \(Y\) is arbitrary.
Example: What’s the probability that we roll 2 on die \(X\) and 5 on die \(Y\)? Both dice are fair.
\[ P(X = 2, Y = 5) = P(X = 2)\times P(Y = 5) = \frac{1}{6}\times \frac{1}{6} = \frac{1}{36}. \]
p_2_and_5 <- (1/6) * (1/6) p_2_and_5
[1] 0.02777778
MASS::fractions(p_2_and_5)
[1] 1/36
Complete questions 8
We have a fair six sided die and a fair coin. If we throw the die and flip the coin, what is the probability of getting a six and Heads, respectively?
We have a fair six sided die and a fair coin. If we throw the die and flip the coin, what is the probability of getting a six and Heads, respectively?
Answer: 1/12
Why? \(P(\text{Six and Heads})\) is a joint probability – these events are independent and thus the joint probability is the product of the individual probabilities \(1/6 \times 1/2\).
\(\dots\) become especially important when events are not independent, but they are fundamental in all joint probability situations, independent or not.
\(\dots\) can be derived from the joint probability as follows:
\[ P(X = x_k) = \sum_{l=1}^{L} P(X = x_k,\; Y = y_l). \]
To get the probability of just one particular \(X = x_k\), you add up the probabilities across all possible values of \(Y\).
\[ P(X = x_k) = P(X = x_k,\; Y = y_1) + P(X = x_k,\; Y = y_2) + \cdots + P(X = x_k,\; Y = y_L). \]
| Test result | COVID = yes | COVID = no |
|---|---|---|
| Positive | 0.0612 | 0.0009 |
| Negative | 0.0068 | 0.9311 |
We will use a COVID testing example from January 2022 in England.
To get the marginal probability \(P(\text{COVID = yes})\), we add up the joint probabilities over all possible test outcomes:
\[ P(\text{COVID = yes},\, \text{test = pos}) + P(\text{COVID = yes},\, \text{test = neg}) \]
Equivalently,
\[ \sum_{\text{y} \in \{\text{pos},\,\text{neg}\}} P(\text{COVID = infected},\, \text{test}=y) \]
| Test result | COVID = yes | COVID = no |
|---|---|---|
| Positive | 0.0612 | 0.0009 |
| Negative | 0.0068 | 0.9311 |
Complete questions 9, 10, 11
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
What is the probability of a day being warm and dry?
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
What is the probability of a day being warm and dry?
Answer: 0.2
Why? \(P(\text{Warm and Dry}) = 200 / 1000\)
It’s also a joint probability but the events are not independent.
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
What is the probability of a day being warm?
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
What is the probability of a day being warm?
Answer: 0.3
Why? \(P(\text{Warm})\) = all possible warm days (rainy or dry) out of all possible days, so \((100 + 200) / 1000 = 300 / 1000\)
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
What is the probability of a day being dry?
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
What is the probability of a day being dry?
Answer: 0.6
Why? \(P(\text{Dry})\) = all possible dyr days (warm or cold) out of all possible days, so \((400 + 200) / 1000 = 600 / 1000\)
\(\dots\) is the probability distribution of a variable when the value of another variable is known.
The probability of one event given another.
\[ P(X = x_k \mid Y = y_l) \]
is read as: “the probability that \(X\) is \(x_k\) given \(Y\) is \(y_l\).”
The vertical bar “\(\mid\)” indicates “given” like in \(P(\text{t-value} \mid \text{H} = 0)\) (which is what p-values are).
\(\dots\) can be derived from the joint probability as follows:
\[ P(X = x_k \mid Y = y_l) = \frac{P(X = x_k,\; Y = y_l)}{P(Y = y_l)}. \]
\[ P(X = x_k \mid Y = y_l) = \frac{\text{probability that BOTH }X\text{ and }Y\text{ happens together}} {\text{probability that }Y\text{ happens}} \]
Example: I draw a card from a standard deck. The card I got is Hearts. What’s the probability that its the King of Hearts?
\(P(\text{King} \mid \text{Heart})\)
Given the card is a Heart, what is the probability that it’s a King?
A standard deck has 52 cards with equal numbers of each suit, so 52/4 = 13 are Hearts. Among those 13 Hearts, only 1 is a king (the King of Hearts).
\[ P(\text{King}\mid \text{Heart})= \frac{P(\text{King}, \text{Heart})}{P(\text{Heart})}=\frac{1/52}{13/52}=\frac{1}{13}. \]
Numerator is 1/52 because this is the joint probability of drawing a card that is both King and Hearts; denominator is the probability of Hearts 13/52.
p_king_hearts <- (1/52) / (13/52) p_king_hearts
[1] 0.07692308
MASS::fractions(p_king_hearts)
[1] 1/13
Complete questions 12, 13
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
Given that the day was dry, what is the probability that is was warm?
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
Given that the day was dry, what is the probability that is was warm?
Answer: 0.33
Why? Number of warm days over the total number of dry days:
200/ (400 + 200) = 1/3 \(\approx\) 0.333
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
Given that the day was warm, what is the probability that is was dry?
Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:
| Rainy | Dry | |
|---|---|---|
| Cold | 300 | 400 |
| Warm | 100 | 200 |
Given that the day was warm, what is the probability that is was dry?
Answer: 0.67
Why? Number of dry days over the total number of warm days:
200 / (100 + 200) = 2/3 \(\approx\) 0.67
\(\dots\) allows you to go from one conditional probability to another.
Example: COVID tests are not perfect. There is a (small) probability that a test comes back positive even if you do not have COVID (a false positive).
\[P(\text{test}=\text{pos}\mid \text{COVID} = \text{no})\]
Bayes rule allows us to invert this: what is the probability that you don’t have COVID given that the test ist positive.
\[P(\text{COVID}=\text{no} \mid \text{test}=\text{pos})\]
\[ P(\text{COVID}=\text{no}\mid \text{test}=\text{pos}) = \frac{P(\text{test}=\text{positive}\mid \text{COVID}=\text{no}) \times P(\text{COVID}=\text{no})}{P(\text{test}=\text{pos})}. \]
To do this, we need the marginal probability that someone has COVID, regardless of what the test says.
During the Omicron wave in January 2022, COVID prevalence in England was very high: about 6.85% of people (roughly 1 in 15) were infected.
We assume:
where overall probability of a positive test \(P(\text{test}=\text{pos})\)
\[ P(\text{pos}\mid\text{yes}) \times P(\text{yes}) + P(\text{pos}\mid\text{no}) \times P(\text{no}) \]
\[ (0.90 \times 0.0685) + (0.001 \times 0.9315) = 0.0626 \]
Bayes’ theorem
\[ P(\text{COVID}=\text{yes} \mid \text{test}=\text{pos}) = \frac{ P(\text{test}=\text{pos} \mid \text{COVID}=\text{yes}) \times P(\text{COVID}=\text{yes}) }{ P(\text{test}=\text{pos}) } \]
Probability of truly having COVID given a positive test:
\[ \frac{0.90 \times 0.0685}{0.0626} \approx 0.985 \]
Key message:
When disease prevalence is high, positive tests are highly reliable – Bayes’ theorem makes this precise.