MT5762 Lecture 5

C. Donovan

Probability and probability functions

Here we define probability, the philosophy of representing this mathematically, present some axioms and basic results, work through some example probability calculations.

Link with statistics

The polling organisation, Populus Limited, randomly sampled 1,509 adults, age 18 and older, by telephone between January 6th and 8th 2006 and asked each adult their voting intention (Labour, Conservative, Liberal Democrat, and Other). The resulting percentages were:

Party Percentage
Labour 39%
Conservative 36%
Liberal Democrat 16%
Other 9%

How close are these sample statistics to the population parameters?

Do we know the answer then?

Remember: Take a different sample, get a different “answer”, so there must be uncertainty.

Sampled value = Parameter + Chance Error i.e. signal + noise

  • So what is the magnitude of the error?
  • Ideas from probability will help with this.

Random phenomena and uncertain outcomes

Lots of processes present us with uncertainty - consider processes that are random and repeatable.

For example

  • Process: Toss a coin
    • Outcomes: Head, Tail, Side
  • Process: A person who does not have HIV is tested for HIV
    • Outcomes: Negative test result, Positive test result (a false positive)
  • Process: Book a flight from Edinburgh to London on EasyJet
    • Outcomes: the flight departs on time, 1 minute late, 2 minutes late, etc

Probability is a branch of mathematics that deals with the quantification of uncertainty.

Sample space and Events

There are a lot of terms related to the possible outcomes of a random process: sample space, elementary events, compound events, mutually exclusive events, independent events \( \ldots \)

  • The collection of all possible outcomes of an 'experiment' is the Sample Space, and is denoted \( \mathcal{S} \) (or sometimes \( \Omega \)).

Examples

  • Toss Coin: \( \mathcal{S} \) = H, T
  • HIV Test: \( \mathcal{S} \) = Negative, Positive
  • Airplane actual departure time - scheduled departure time: \( \mathcal{S} \) = 0 to 360 minutes

Sample space and Events

  • A subset of outcomes in \( \mathcal{S} \) is called an Event and it's often labelled by a capital letter, e.g. \( A \) or \( B \).

Example Process is to roll 2 dice and count the total number of dots face up.

Define the two events \( A \) and \( B \) as follows:

  • \( A \) = get an even number
  • \( B \) = get a number that is divisible by 3

Sample space and Events

Let \( A \) and \( B \) be any two events defined on a particular sample space.

  • The set of all outcomes occuring in \( A \) alone, \( B \) alone, or in both \( A \) and \( B \) is:

    • union of \( A \) and \( B \), \( A \cup B \), or \( A ~\mbox{or}~ B \).
  • The set of all outcomes occuring only in both \( A \) and \( B \) is:

    • intersection of \( A \) and \( B \), \( A \cap B \), or \( A ~\mbox{and}~ B \).
  • The set of all events in \( S \) that do not occur in \( A \) is:

    • complement of \( A \) or \( \overline{A} \) or \( A^c \).
  • When \( A \) and \( B \) have no outcomes in common, they are:

    • mutually exclusive or disjoint, i.e., \( A \cap B \) = \( \emptyset \).

Sample space and Events

Examples Process is to roll 2 dice and count the total number of dots face up.

Define the two events \( A \) and \( B \) as follows:

  • \( A \) = get an even number
  • \( B \) = get a number that is divisible by 3

Thus \( A \) = ({2}, {4}, {6}, {8}, {10}, {12}). and \( B \) = ({3}, {6}, {9}, {12}). Then

  • \( A \cup B \) = {2,3,4,6,8,9,10,12}
  • \( A \cap B \) = {6,12}
  • \( A^c \) = {3,5,7,9,11}
  • Are \( A \) and \( B \) disjoint? No. \( A \cap B \ne \emptyset \).

Probability

  • These examples contain stochasticity - a particular realisation of the process isn't completely predictable.
  • The usual way of treating this is probabilistically.
  • I would argue we have a reasonble grasp of probability, through intuition or simple repeated exposure.

Probability

Here are examples, some with fundamental differences (group discussion points). What do they mean, how are they determined?:

  • What is the probability I win on a European roulette wheel choosing 1 number?
  • What is the probability I win any money on a spin of an digital one-armed bandit?
  • What is the probability that the All Blacks win the 2019 world cup?
  • The probability of rain tomorrow in St Andrews 10-11am is 0.07 (7% chance)?

Definition of Probability and 3 Axioms

Probability

Informally: Consider a process, with multiple uncertain outcomes, that could be repeated infinitely often, in an identical and independent fashion, then the probability of an event \( A \), \( \Pr(A) \), is the longrun relative frequency of \( A \).

NB - from what we discussed previously, reality can stray far from this concept.

Definition of Probability and 3 Axioms

Formally:

  • (Axiom 1) The probability of an event \( A \in S \), denoted \( \Pr(A) \) is a number between 0 and 1, inclusive.

  • (Axiom 2) \( \Pr(S) \) = 1

  • (Axiom 3a) If \( A_1 \), \( A_2 \), \( \ldots \), \( A_k \) are a finite collection of mutually exclusive events, then

\[ \Pr(A_1 \cup A_2 \cup \ldots \cup A_k) = \sum_{i=1}^k \Pr(A_i) \]

Informally, this is called the Addition Rule. Applies if \( k \) is infinite too.

Definition of Probability and 3 Axioms

These mean practically:

  • Things that never happen get probabilty value 0.
  • Things that are certain get probability value 1.
  • Uncertain things get quantified between these.
  • We need to know all possible outcomes to assign probabilities.
  • You can add probabilities of mututally exclusive things to get the probability one of them happens.

Definition of Probability and 3 Axioms - examples

Process roll a single die once; Outcomes: 1, 2, 3, 4, 5, 6. What is the probability of getting an even number?

\[ \begin{align*} \Pr(2 \cup 4 \cup 6) &= \Pr(2) + \Pr(4) + \Pr(6)\\ &= \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{1}{2} = 0.5 \end{align*} \]

This uses Axiom 3 since \( 2 \), \( 4 \), and \( 6 \) are mutually exclusive.

3 results of the Axioms

  • Complement rule: \( \Pr(A^c) = 1 - \Pr(A) \).

  • Intersection of 2 mutually exclusive events:

    • If \( A \) and \( B \) are mutually exclusive, then \( \Pr(A \cap B) = 0 \).
  • General addition rule.

    • \( \Pr(A \cup B) = \Pr(A) + \Pr(B) - \Pr(A \cap B) \).

Think of \( \cap \) as AND and \( \cup \) as OR

Independence and the Multiplication Rule

Formally, if two events \( A \) and \( B \) are independent, then

\[ \Pr(A ~and~ B) = \Pr(A) \times \Pr(B) \]

Thus when two events are independent, the probability of both happening is the product of the two individual probabilities.

In plain language this means the occurrence of one event does not affect the probability of the other occurring.

This is hugely important. Independence makes probability calculations easy - it is often assumed (but often not true)

Examples

Example:

A fair coin will be tossed twice. Let \( A \) be the event of a Head on the first toss and \( B \) be the event of a Tail on the second toss.

The outcomes for two different flips are independent (since the coin has no memory).

Thus, \( \Pr(A ~and~ B) \) = \( \Pr(A) \times \Pr(B) \) = 0.5 \( \times \) 0.5 = 0.25.

Examples

  • In an ordinary card deck there are 52 cards: 4 suits (diamonds, hearts, clubs, spades) of 13 cards (Ace,2,3,\( \ldots \),10,J,Q,K). A deck is shuffled and a single card is drawn.

  • Suppose one card is drawn and removed, and a second card is drawn. Let \( A \)=first card = 2 of Spades and \( B \)=second card is 3 of spades. What is \( \Pr(A ~and~ B) \)?

Note, independence doesn't hold - knowing the first result is informative about the next.

Conditional Probability

If there is partial information about the result of a random process, that information can be used to calculate conditional probabilities for particular events.

Explanation by example

From The Basic Practice of Statistics, 2nd Ed, D. Moore.

A two-way table of suicides classified by victim and whether or not a firearm was used.

Male Female Total
Firearm 16,381 2,559 18,940
Other 9,034 3,536 12,570
Total 25,415 6,095 31,510

Explanation by example

Convert the table into a relative frequency table with 4 categories:

Male Female Total
Firearm 0.520 0.081 0.601
Other 0.287 0.112 0.399
Total 0.807 0.193 1.000

Let \( G \) be the event that a firearm was used and \( F \) be the event that a female was the victim. Then \( \Pr(G) \) = 0.601.

Explanation by example

  • If you know the victim was Female (i.e. Given Female), what is the probability a firearm was used?

  • The probability of a firearm being used given it was a women is an example of a conditional probability

Formal definition

The probability an event \( A \) occurs given that another event \( B \) occurred is denoted \( \Pr(A|B) \), and is called the conditional probability of \( A \) given \( B \).

It is formally defined by \[ \begin{eqnarray*} \Pr(A|B) = \frac{\Pr(A \cap B)}{\Pr(B)} \end{eqnarray*} \]

Use this definition to calculate \( \Pr(F|G) \)

Independence revisited

One definition of independence is that two events \( A \) and \( B \) are independent when \( \Pr(A|B) \)=\( \Pr(A) \), or equivalently \( \Pr(B|A) \)=\( \Pr(B) \).

In words, knowing that \( B \) occurred tells one nothing about the probability of \( A \). The general multiplication rule reduces to “the'' multiplication rule:

\[ \Pr(A ~and~ B) = \Pr(A|B) \times \Pr(B) = \Pr(A) \times \Pr(B) \]

Tree Diagrams

A sometimes useful technique for calculating probabilities when there is a sequence of random processes is to draw a tree diagram.

A tree diagram is a device used to enumerate all possible outcomes of a sequence of procedures, where the number of possible outcomes for each procedure is finite

(paraphrasing Lipshutz, 1965, Probability).

Tree Diagrams

Example (sort of from Lipshutz): “Dragos and Christopher play a tennis tournament. The first person to win 2 games in a row or who wins a total of three games wins the tournament.''

We can tackle something simple like this by complete enumeration of outcomes. Assume 0.6 for Dragos and 0.4 for Christopher. The following concepts are needed: independence, mutual exclusitivity, the honesty condition.

Some lessons arising for later:

  • How would I establish who is the better player?
  • In reality, what confidence is associated with my assessment? What influences this?

Let's gamble

Online gambling:

Here be dragons

Let's gamble

Some concepts for the P2P gambling market

  • Laying and backing (you can be punter or bookie)
  • Decimal odds vs fractional odds
  • Stakes
  • Volume

Here be dragons

Let's gamble

We already know some useful stuff - assume 1/decimal odds = prob of win:

  • Mutual exclusitivity
  • Independence
  • Honesty condition
  • Trees e.g. prob 10 wins from 10, if laying at 100.

Here be dragons

Marginal, joint probabilities

Passengers of the Titanic can be viewed in terms of two “random'' processes, living or dying after the ship hit the iceberg, and what class ticket they purchased.

  • There were 2,201 people on the Titanic and the numbers cross-classified by two categories are:
Fate First Second Third Crew Total
Lived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201

Marginal, joint probabilities

Dividing the cell values, and row and column totals, by the grand total yields a matrix of ''probabilities''.

Fate First Second Third Crew Total
Lived 0.092 0.054 0.081 0.096 0.323
Died 0.056 0.076 0.240 0.306 0.677
Total 0.148 0.129 0.321 0.402 1.000

For example, \( \Pr(Live \cap Second) \)=0.054, a joint probability. And \( \Pr(Live) \)=0.323, a marginal probability.

Marginal, joint probabilities

  • Of interest are particular conditional probabilities, such as did the ticket class have an effect on the probability of living? These are conditional probabilities.

  • For example, given that one had a First class ticket, what was the probability of surviving?

Discrete Random Variables

We define:

  • discrete (and continuous) random variables;
  • probability mass functions;
  • cumulative distribution functions.

Examples

  • A coin is flipped twice.
\( S \) \( X \)=# Heads
HH 2
HT 1
TH 1
TT 0
  • Condition of randomly chosen hospital patients is Critical, Poor, Fair, or Satisfactory.
\( S \) \( X \)=numerical rating
Critical -1
Poor -1
Fair 0
Satisfactory +1

Examples

Two types of random variables (same as quantitative variables):

  • Discrete
  • Continuous

Probability mass functions

  • Simply a mathematical description of the probabilities of outcomes in your sample space.
  • These are broadly broken into discrete and continuous varieties, dictated by the nature of the RV they describe.

Probability mass functions

  • Rate randomly chosen hospital patients by Critical, Poor, Fair, Satisfactory. Assume the percentages in each category are 20%, 30%, 35%, 15%.

\[ \Pr(X= -1) = \Pr(Critical \cup Poor) \\ = 0.2 + 0.3 = 0.5 \]

Probability mass functions

  • (Definition) Let \( D \) be the set of possible values of a discrete RV, \( X \).

    • The Probability Mass Function, (PMF), for \( X \) is the function that assigns a probability, \( p(x) \), to every value \( x \) in \( D \).
  • Tabular summary of a PMF. Two coin flips and \( X \)=# Heads

\( x \) 0 1 2
\( p(x) \) ¼ ½ ¼

Which is amenable to plotting.

Cumulative distribution functions

  • The Cumulative Distribution Function, (CDF), of a discrete RV \( X \) with PMF \( p(x) \) is \( \Pr(X \le x) \) \( \equiv \) \( F_X(x) \).

  • Roll 2 dice and \( X \) equals the sum of the dots.

Sample space of simple outcomes and the corresponding \( X \) values:

2nd die 1 2 3 4 5 6
1st die
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,2 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6

Cumulative distribution functions

2nd die 1 2 3 4 5 6
1st die
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Cumulative distribution functions

2 3 4 5 6 7 8 9 10 11 12
1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
2 3 4 5 \( \ldots \) 11 12
1/36 3/36 6/36 \( \ldots \) 35/36 36/36
0.027 0.083 0.167 \( \ldots \) 0.972 1.00

i.e. summing over the PMF.

Cumulative distribution functions

The CDF is simply a representation of the PMF that is a useful tool in calculating probabilities over intervals of the RV (nb. must be careful about the endpoints of the intervals - are they inclusive or exclusive?. For example:

  • Probability roll a 3 or less.
  • Probability of rolling at least a 7.
  • Probability of rolling between a 3 and a 7 inclusive.

Discrete RV - example

An Auckland obstetrician [said] the chances of a successful pregnancy resulting from implanting a frozen embryo are about 1 in 10.

A couple would like to know how many times they might need to try the procedure to give themselves at least a 40% chance of success. Label the outcome of an individual attempt as \( Y \) for Yes/success! and \( N \) for No/failure.

Discrete RV - example

  • Define the sample space:

    • The sample space \( S \): Y, NY, NNY, NNNY, \( \ldots \).
  • Define the RV \( X \) (i.e. what numeric value is attached to our samplespace elements)?

    • Define to be the number of attempts until success. So \( X \)=1, 2, 3, 4, \( \ldots \). \( \Pr(X=1) \) = \( \Pr(Y) \), which is 0.1. \( \Pr(X=2) \) = \( \Pr(NY) \) = 0.9 \( \times \) 0.1 = 0.09. \( \Pr(X=3) \) = \( \Pr(NNY) \) = 0.9\( ^2 \) \( \times \) 0.1 = 0.081.

NB - in general, \( \Pr(X=k) \) = 0.9\( ^{k-1} \) \( \times \) 0.1.

Discrete RV - example

Give the PMF and the CDF:

\( S \) \( Y \) \( NY \) \( NNY \) \( NNNY \) \( NNNNY \) \( NNNNNY \) 7 or more failures
\( X \) 1 2 3 4 5 6 7+
\( \Pr(X) \) 0.1 0.09 0.081 0.0729 0.06561 0.059049 0.4783
\( \Pr(X \le x) \) 0.1 0.19 0.271 0.3439 0.40951 0.46856 1.0

Thus the couple need to be willing to try up to 5 times

Recap and look-forwards

We've covered:

  • Basic ideas about probability and initial probability functions

Next:

  • Discrete RVs, expectations, variances, binomial and poisson distributions

Reading:

  • Chapters 5 & 6 of Wild & Seber