Session aims

Note: start recording

  • Overview of the module: schedule, assessment, required reading
  • Introducing probability and relevant notation
  • Two rules of probability
  • Conditional, joint, and marginal probability
  • (Brief introduction to Bayes’ Theorem)

What is probability?

Probability is a means to quantify uncertainty (like miles are a means to measure physical distance).

Examples:

  • Will it rain tomorrow? Forecast may say 70% chance of rain tomorrow.
  • What’s the probability that the sun rise tomorrow?
  • Will my dog run off and chase a squirrel on our next walk?
  • Will people vote Green during the next General Election?

Can’t say for sure but we can quantify uncertainty.

What is probability?

If a variable can take on more than one value, probability can be used to describe the uncertainty that it will take each one of its possible values.

Example: rolling a fair six-sided die (there is a 1 in 6 probability [0.1667] that the die can land on 1, 2, 3, 4, 5, or 6).

What is probability?

What day of the week was I born?

  • What’s the probability you were born on a Monday, Tuesday, etc?
  • What if you knew that it was a weekend?

Consider: How many possible values does the variable of interest have and how many correct outcomes are possible?

What is probability?

  • The probability of any outcome must be between 0 (impossible) and 1 (certain).

Cannot be less certain than impossible or more certain than absolute certainty.

  • The probabilities of all possible values must add up to 1 (something must happen).

Example drawing a card from a standard deck: the probability of drawing a card of heart, diamond, club or spade is 13/52; so therefore 13/52 + 13/52 + 13/52 + 13/52 = 1

What is probability? Notation

If \(X\) is a variable with possible values \(\{x_1, x_2, \ldots, x_k, \ldots, x_K\}\), then

\[ P(X = x_k) \]

gives the probability that \(X\) takes the value \(x_k\).

This notation says “the probability that the random variable \(X\) takes on the value \(x_k\)”.

Notation: \(x_1\) is the first value in a set; \(x_k\) is the \(k\)-th value (out of \(K\) possible values); \(x_K\) is the last value in the list

What is probability? Notation

If \(X\) is a variable with possible values \(\{x_1, x_2, \ldots, x_k, \ldots, x_K\}\), then

\[ P(X = x_k) \]

If \(X\) = “number showing on a fair 6-sided die” then

\(P(X = x_k) = \dfrac{1}{6}\) for each \(x_k\)

cause \(X \in \{1, 2, 3, 4, 5, 6\}\).

Notation of two rules

For every possible value \(x_k\) that \(X\) could take, the probability \(P(X=x_k)\) must be between 0 (impossible) and 1 (certain).

\[ 0 \le P(X = x_k) \le 1 \qquad \forall x_k \]

If you add up the probabilities of all possible values that \(X\) can take (from the first value \(k=1\) to the last value \(k=K\)), the total must be exactly 1 – meaning one of the possible outcomes has to happen.

\[ \sum_{k=1}^{K} P(X = x_k) = 1. \] i.e. short for:

\[ P(X = x_1) + P(X = x_2) + \dots + P(X = x_k) + \dots + P(X = x_K) \]

In-class questions

On NOW, see “Week 1: Probability In-Class Questions”

Complete questions 1, 2.

Problem 1

If X is a random variable that takes three possible values, named A, B, and C, which of the following is probability distribution over X?

  • 0, 0, 0
  • 0.1, 0.1, 0.5
  • 0.5, 0.75, -0.25
  • 0.15, 0.15, 0.7
  • 1, -1, 1

Problem 1

If X is a random variable that takes three possible values, named A, B, and C, which of the following is probability distribution over X?

Answer: 0.15, 0.15, 0.7

Why: If X can take the possible values, the probability of each of these values must add up to 1

Problem 2

If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.5 and the probability that X takes the value of B is 0.5, what is the probability that X takes the value of C?

  • 0
  • 1
  • 0.5
  • 0.333
  • Unknown

Problem 2

If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.5 and the probability that X takes the value of B is 0.5, what is the probability that X takes the value of C?

Answer: 0

Why: Probability cannot exceed 1 and \(P(\text{A}) = 0.5 + P(\text{B}) = 0.5 = 1\), so \(P(\text{C})\) must be 0.

Probability rule 1: sum rule

If two values are mutually exclusive, then the probability of either happening is the sum of their individual probabilities, i.e.

\[ P(X = x_k \ \text{or}\ X = x_j) = P(X = x_k) + P(X = x_j). \]

It applies when \(x_k \ne x_j\) — so they can’t happen at the same time.

Example (fair 6-sided die): What is the probability of rolling either a 2 or a 5?

\[ P(X = 2 \ \text{or}\ X = 5) = P(X = 2) + P(X = 5) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}. \]

You can’t get more than one response at the same time.

In-class questions (see NOW)

Complete questions 3, 4, 5, 6, 7.

Problem 3

If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.25 and the probability that X takes the value of B is 0.25, and the probability that X takes the value of C is 0.5, what is the probability that X takes the value of B or C?

  • 0.333
  • 0.5
  • 0.75
  • 1
  • Unknown

Problem 3

If X is a random variable that takes three possible values, named A, B, and C, if the probability that X takes the value of A is 0.25 and the probability that X takes the value of B is 0.25, and the probability that X takes the value of C is 0.5, what is the probability that X takes the value of B or C?

Answer: 0.75

Why? If X is either B, C (and these are mutually exclusive outcomes) then the probability that it is either B or C is the sum of the individual probabilities

Problem 4

If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of either 0 or 1?

  • 0.333
  • 0.5
  • 0.75
  • 1
  • Unknown

Problem 4

If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of either 0 or 1?

Answer: 1

Why: It has to take one of these values (0 or 1) so probability that it takes one of them is 1.

Problem 5

If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of 1?

  • 0.333
  • 0.5
  • 0.75
  • 1
  • Unknown

Problem 5

If X is a random variable that takes two possible values, 0 and 1, what is the probability that X takes the value of 1?

Answer: Unknown

Why: We have insufficient information. Not clear if the two possible events are equally likely (e.g. will it rain tomorrow vs will the sun rise tomorrow).

Problem 6

We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of obtaining either a six or a one or a five on a single throw?

  • 0
  • 1/6
  • 1/3
  • 1/2
  • 1

Problem 6

We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of obtaining either a six or a one or a five on a single throw?

Answer: \(1/2\)

Why? \(1/6 + 1/6 + 1/6 = 3/6 = 1/2\)

Problem 7

We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of not obtaining a six on a single throw?

  • 0
  • 1/6
  • 1/2
  • 5/6
  • 1

Problem 7

We have a fair six sided die (all of possible outcomes are equally likely). What is the probability of not obtaining a six on a single throw?

Answer: \(5/6\)

Why? \(1/6 + 1/6 + 1/6 + 1/6 + 1/6\)

or \(6/6 - 1/6\)

or \(5 \times 1/6\)

Probability rule 2: Product rule

If two different variables are independent (outcome of one does not affect the other), the joint probability of their outcomes is the product of their individual probabilities. If \(X\) and \(Y\) are independent, then

\[ P(X = x_k, Y = y_l) = P(X = x_k)\times P(Y = y_l). \]

The comma in \(P(X = x_k, Y = y_l)\) mean “and” and the order of \(X\) and \(Y\) is arbitrary.

Example: What’s the probability that we roll 2 on die \(X\) and 5 on die \(Y\)? Both dice are fair.

\[ P(X = 2, Y = 5) = P(X = 2)\times P(Y = 5) = \frac{1}{6}\times \frac{1}{6} = \frac{1}{36}. \]

p_2_and_5 <- (1/6) * (1/6)
p_2_and_5
[1] 0.02777778
MASS::fractions(p_2_and_5)
[1] 1/36

In-class questions (see NOW)

Complete questions 8

Problem 8

We have a fair six sided die and a fair coin. If we throw the die and flip the coin, what is the probability of getting a six and Heads, respectively?

  • 0
  • 1/12
  • 1/2
  • 4/6
  • 5/6

Problem 8

We have a fair six sided die and a fair coin. If we throw the die and flip the coin, what is the probability of getting a six and Heads, respectively?

Answer: 1/12

Why? \(P(\text{Six and Heads})\) is a joint probability – these events are independent and thus the joint probability is the product of the individual probabilities \(1/6 \times 1/2\).

Marginal probability

\(\dots\) become especially important when events are not independent, but they are fundamental in all joint probability situations, independent or not.

\(\dots\) can be derived from the joint probability as follows:

\[ P(X = x_k) = \sum_{l=1}^{L} P(X = x_k,\; Y = y_l). \]

To get the probability of just one particular \(X = x_k\), you add up the probabilities across all possible values of \(Y\).

\[ P(X = x_k) = P(X = x_k,\; Y = y_1) + P(X = x_k,\; Y = y_2) + \cdots + P(X = x_k,\; Y = y_L). \]

Marginal probability

Test result COVID = yes COVID = no
Positive 0.0612 0.0009
Negative 0.0068 0.9311


We will use a COVID testing example from January 2022 in England.

  • Suppose we do not yet know the test result.
  • We only care about the overall probability that a randomly chosen person has COVID.

Marginal probability

To get the marginal probability \(P(\text{COVID = yes})\), we add up the joint probabilities over all possible test outcomes:

\[ P(\text{COVID = yes},\, \text{test = pos}) + P(\text{COVID = yes},\, \text{test = neg}) \]

Equivalently,

\[ \sum_{\text{y} \in \{\text{pos},\,\text{neg}\}} P(\text{COVID = infected},\, \text{test}=y) \]

Marginal probability

Test result COVID = yes COVID = no
Positive 0.0612 0.0009
Negative 0.0068 0.9311


  • The marginal probability \(P(\text{COVID = yes})\) is the sum of the column, so 0.0612 + 0.0068 = 0.068.
  • This is the base rate of COVID infection, before seeing the test result.
  • (Rows sum to the marginal probabilities of each test result.)

In-class questions (see NOW)

Complete questions 9, 10, 11

Problem 9

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

What is the probability of a day being warm and dry?

  • 0.2
  • 0.25
  • 0.5
  • 1.0
  • 20

Problem 9

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

What is the probability of a day being warm and dry?

Answer: 0.2

Why? \(P(\text{Warm and Dry}) = 200 / 1000\)

It’s also a joint probability but the events are not independent.

Problem 10

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

What is the probability of a day being warm?

  • 0.2
  • 0.3
  • 0.5
  • 1.0
  • 20

Problem 10

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

What is the probability of a day being warm?

Answer: 0.3

Why? \(P(\text{Warm})\) = all possible warm days (rainy or dry) out of all possible days, so \((100 + 200) / 1000 = 300 / 1000\)

Problem 11

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

What is the probability of a day being dry?

  • 0.2
  • 0.25
  • 0.5
  • 0.6
  • 1.0

Problem 11

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

What is the probability of a day being dry?

Answer: 0.6

Why? \(P(\text{Dry})\) = all possible dyr days (warm or cold) out of all possible days, so \((400 + 200) / 1000 = 600 / 1000\)

Conditional probability

\(\dots\) is the probability distribution of a variable when the value of another variable is known.

The probability of one event given another.

\[ P(X = x_k \mid Y = y_l) \]

is read as: “the probability that \(X\) is \(x_k\) given \(Y\) is \(y_l\).”

The vertical bar “\(\mid\)” indicates “given” like in \(P(\text{t-value} \mid \text{H} = 0)\) (which is what p-values are).

Conditional probability

\(\dots\) can be derived from the joint probability as follows:

\[ P(X = x_k \mid Y = y_l) = \frac{P(X = x_k,\; Y = y_l)}{P(Y = y_l)}. \]

  • numerator: probability that X and Y both happen (the comma indicates joint probability: X happens AND Y happens)
  • denominator: probability that Y happens

\[ P(X = x_k \mid Y = y_l) = \frac{\text{probability that BOTH }X\text{ and }Y\text{ happens together}} {\text{probability that }Y\text{ happens}} \]

Conditional probability

Example: I draw a card from a standard deck. The card I got is Hearts. What’s the probability that its the King of Hearts?

\(P(\text{King} \mid \text{Heart})\)

Given the card is a Heart, what is the probability that it’s a King?

Conditional probability

A standard deck has 52 cards with equal numbers of each suit, so 52/4 = 13 are Hearts. Among those 13 Hearts, only 1 is a king (the King of Hearts).

\[ P(\text{King}\mid \text{Heart})= \frac{P(\text{King}, \text{Heart})}{P(\text{Heart})}=\frac{1/52}{13/52}=\frac{1}{13}. \]

Numerator is 1/52 because this is the joint probability of drawing a card that is both King and Hearts; denominator is the probability of Hearts 13/52.

p_king_hearts <- (1/52) / (13/52)
p_king_hearts
[1] 0.07692308
MASS::fractions(p_king_hearts)
[1] 1/13

In-class questions (see NOW)

Complete questions 12, 13

Problem 12

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

Given that the day was dry, what is the probability that is was warm?

  • 0.2
  • 0.25
  • 0.33
  • 0.5
  • 1.0

Problem 12

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

Given that the day was dry, what is the probability that is was warm?

Answer: 0.33

Why? Number of warm days over the total number of dry days:

200/ (400 + 200) = 1/3 \(\approx\) 0.333

Problem 13

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

Given that the day was warm, what is the probability that is was dry?

  • 0.1
  • 0.2
  • 0.33
  • 0.5
  • 0.67

Problem 13

Over 1000 days we record the number of days that were Cold or Warm and also Rainy or Dry:

Rainy Dry
Cold 300 400
Warm 100 200

Given that the day was warm, what is the probability that is was dry?

Answer: 0.67

Why? Number of dry days over the total number of warm days:

200 / (100 + 200) = 2/3 \(\approx\) 0.67

Bayes Theorem

\(\dots\) allows you to go from one conditional probability to another.

Example: COVID tests are not perfect. There is a (small) probability that a test comes back positive even if you do not have COVID (a false positive).

\[P(\text{test}=\text{pos}\mid \text{COVID} = \text{no})\]

Bayes rule allows us to invert this: what is the probability that you don’t have COVID given that the test ist positive.

\[P(\text{COVID}=\text{no} \mid \text{test}=\text{pos})\]

Bayes Theorem

\[ P(\text{COVID}=\text{no}\mid \text{test}=\text{pos}) = \frac{P(\text{test}=\text{positive}\mid \text{COVID}=\text{no}) \times P(\text{COVID}=\text{no})}{P(\text{test}=\text{pos})}. \]

Bayes’ Theorem: COVID testing

To do this, we need the marginal probability that someone has COVID, regardless of what the test says.

During the Omicron wave in January 2022, COVID prevalence in England was very high: about 6.85% of people (roughly 1 in 15) were infected.

We assume:

  • \(P(\text{COVID}=\text{yes}) = 0.0685\)
  • \(P(\text{COVID}=\text{no}) = 0.9315\)
  • PCR sensitivity: \(P(\text{test}=\text{pos} \mid \text{COVID}=\text{yes}) = 0.90\)
  • False positive rate: \(P(\text{test}=\text{pos} \mid \text{COVID}=\text{no}) = 0.001\)

where overall probability of a positive test \(P(\text{test}=\text{pos})\)

\[ P(\text{pos}\mid\text{yes}) \times P(\text{yes}) + P(\text{pos}\mid\text{no}) \times P(\text{no}) \]

\[ (0.90 \times 0.0685) + (0.001 \times 0.9315) = 0.0626 \]

Plugging in the numbers

Bayes’ theorem

\[ P(\text{COVID}=\text{yes} \mid \text{test}=\text{pos}) = \frac{ P(\text{test}=\text{pos} \mid \text{COVID}=\text{yes}) \times P(\text{COVID}=\text{yes}) }{ P(\text{test}=\text{pos}) } \]

Probability of truly having COVID given a positive test:

\[ \frac{0.90 \times 0.0685}{0.0626} \approx 0.985 \]

Interpretation

  • During a period of high prevalence (January 2022),
  • a positive PCR test means there is about a 98.5% chance that the person truly has COVID.
  • Only about 1.5% of positive tests are false positives in this scenario.

Key message:
When disease prevalence is high, positive tests are highly reliable – Bayes’ theorem makes this precise.