C. Donovan
Here we define probability, the philosophy of representing this mathematically, present some axioms and basic results, work through some example probability calculations.
The polling organisation, Populus Limited, randomly sampled 1,509 adults, age 18 and older, by telephone between January 6th and 8th 2006 and asked each adult their voting intention (Labour, Conservative, Liberal Democrat, and Other). The resulting percentages were:
Party | Percentage |
---|---|
Labour | 39% |
Conservative | 36% |
Liberal Democrat | 16% |
Other | 9% |
How close are these sample statistics to the population parameters?
Remember: Take a different sample, get a different “answer”, so there must be uncertainty.
Sampled value = Parameter + Chance Error i.e. signal + noise
Lots of processes present us with uncertainty - consider processes that are random and repeatable.
For example
Probability is a branch of mathematics that deals with the quantification of uncertainty.
There are a lot of terms related to the possible outcomes of a random process: sample space, elementary events, compound events, mutually exclusive events, independent events \( \ldots \)
Examples
Example Process is to roll 2 dice and count the total number of dots face up.
Define the two events \( A \) and \( B \) as follows:
Let \( A \) and \( B \) be any two events defined on a particular sample space.
The set of all outcomes occuring in \( A \) alone, \( B \) alone, or in both \( A \) and \( B \) is:
The set of all outcomes occuring only in both \( A \) and \( B \) is:
The set of all events in \( S \) that do not occur in \( A \) is:
When \( A \) and \( B \) have no outcomes in common, they are:
Examples Process is to roll 2 dice and count the total number of dots face up.
Define the two events \( A \) and \( B \) as follows:
Thus \( A \) = ({2}, {4}, {6}, {8}, {10}, {12}). and \( B \) = ({3}, {6}, {9}, {12}). Then
Here are examples, some with fundamental differences (group discussion points). What do they mean, how are they determined?:
Probability
Informally: Consider a process, with multiple uncertain outcomes, that could be repeated infinitely often, in an identical and independent fashion, then the probability of an event \( A \), \( \Pr(A) \), is the longrun relative frequency of \( A \).
NB - from what we discussed previously, reality can stray far from this concept.
Formally:
(Axiom 1) The probability of an event \( A \in S \), denoted \( \Pr(A) \) is a number between 0 and 1, inclusive.
(Axiom 2) \( \Pr(S) \) = 1
(Axiom 3a) If \( A_1 \), \( A_2 \), \( \ldots \), \( A_k \) are a finite collection of mutually exclusive events, then
\[ \Pr(A_1 \cup A_2 \cup \ldots \cup A_k) = \sum_{i=1}^k \Pr(A_i) \]
Informally, this is called the Addition Rule. Applies if \( k \) is infinite too.
These mean practically:
Process roll a single die once; Outcomes: 1, 2, 3, 4, 5, 6. What is the probability of getting an even number?
\[ \begin{align*} \Pr(2 \cup 4 \cup 6) &= \Pr(2) + \Pr(4) + \Pr(6)\\ &= \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{1}{2} = 0.5 \end{align*} \]
This uses Axiom 3 since \( 2 \), \( 4 \), and \( 6 \) are mutually exclusive.
Complement rule: \( \Pr(A^c) = 1 - \Pr(A) \).
Intersection of 2 mutually exclusive events:
General addition rule.
Think of \( \cap \) as AND and \( \cup \) as OR
Formally, if two events \( A \) and \( B \) are independent, then
\[ \Pr(A ~and~ B) = \Pr(A) \times \Pr(B) \]
Thus when two events are independent, the probability of both happening is the product of the two individual probabilities.
In plain language this means the occurrence of one event does not affect the probability of the other occurring.
This is hugely important. Independence makes probability calculations easy - it is often assumed (but often not true)
Example:
A fair coin will be tossed twice. Let \( A \) be the event of a Head on the first toss and \( B \) be the event of a Tail on the second toss.
The outcomes for two different flips are independent (since the coin has no memory).
Thus, \( \Pr(A ~and~ B) \) = \( \Pr(A) \times \Pr(B) \) = 0.5 \( \times \) 0.5 = 0.25.
In an ordinary card deck there are 52 cards: 4 suits (diamonds, hearts, clubs, spades) of 13 cards (Ace,2,3,\( \ldots \),10,J,Q,K). A deck is shuffled and a single card is drawn.
Suppose one card is drawn and removed, and a second card is drawn. Let \( A \)=first card = 2 of Spades and \( B \)=second card is 3 of spades. What is \( \Pr(A ~and~ B) \)?
Note, independence doesn't hold - knowing the first result is informative about the next.
If there is partial information about the result of a random process, that information can be used to calculate conditional probabilities for particular events.
From The Basic Practice of Statistics, 2nd Ed, D. Moore.
A two-way table of suicides classified by victim and whether or not a firearm was used.
Male | Female | Total | |
---|---|---|---|
Firearm | 16,381 | 2,559 | 18,940 |
Other | 9,034 | 3,536 | 12,570 |
Total | 25,415 | 6,095 | 31,510 |
Convert the table into a relative frequency table with 4 categories:
Male | Female | Total | |
---|---|---|---|
Firearm | 0.520 | 0.081 | 0.601 |
Other | 0.287 | 0.112 | 0.399 |
Total | 0.807 | 0.193 | 1.000 |
Let \( G \) be the event that a firearm was used and \( F \) be the event that a female was the victim. Then \( \Pr(G) \) = 0.601.
If you know the victim was Female (i.e. Given Female), what is the probability a firearm was used?
The probability of a firearm being used given it was a women is an example of a conditional probability
The probability an event \( A \) occurs given that another event \( B \) occurred is denoted \( \Pr(A|B) \), and is called the conditional probability of \( A \) given \( B \).
It is formally defined by \[ \begin{eqnarray*} \Pr(A|B) = \frac{\Pr(A \cap B)}{\Pr(B)} \end{eqnarray*} \]
Use this definition to calculate \( \Pr(F|G) \)
One definition of independence is that two events \( A \) and \( B \) are independent when \( \Pr(A|B) \)=\( \Pr(A) \), or equivalently \( \Pr(B|A) \)=\( \Pr(B) \).
In words, knowing that \( B \) occurred tells one nothing about the probability of \( A \). The general multiplication rule reduces to “the'' multiplication rule:
\[ \Pr(A ~and~ B) = \Pr(A|B) \times \Pr(B) = \Pr(A) \times \Pr(B) \]
A sometimes useful technique for calculating probabilities when there is a sequence of random processes is to draw a tree diagram.
“A tree diagram is a device used to enumerate all possible outcomes of a sequence of procedures, where the number of possible outcomes for each procedure is finite”
(paraphrasing Lipshutz, 1965, Probability).
Example (sort of from Lipshutz): “Dragos and Christopher play a tennis tournament. The first person to win 2 games in a row or who wins a total of three games wins the tournament.''
We can tackle something simple like this by complete enumeration of outcomes. Assume 0.6 for Dragos and 0.4 for Christopher. The following concepts are needed: independence, mutual exclusitivity, the honesty condition.
Some lessons arising for later:
Online gambling:
Some concepts for the P2P gambling market
We already know some useful stuff - assume 1/decimal odds = prob of win:
Passengers of the Titanic can be viewed in terms of two “random'' processes, living or dying after the ship hit the iceberg, and what class ticket they purchased.
Fate | First | Second | Third | Crew | Total |
---|---|---|---|---|---|
Lived | 203 | 118 | 178 | 212 | 711 |
Died | 122 | 167 | 528 | 673 | 1490 |
Total | 325 | 285 | 706 | 885 | 2201 |
Dividing the cell values, and row and column totals, by the grand total yields a matrix of ''probabilities''.
Fate | First | Second | Third | Crew | Total |
---|---|---|---|---|---|
Lived | 0.092 | 0.054 | 0.081 | 0.096 | 0.323 |
Died | 0.056 | 0.076 | 0.240 | 0.306 | 0.677 |
Total | 0.148 | 0.129 | 0.321 | 0.402 | 1.000 |
For example, \( \Pr(Live \cap Second) \)=0.054, a joint probability. And \( \Pr(Live) \)=0.323, a marginal probability.
Of interest are particular conditional probabilities, such as did the ticket class have an effect on the probability of living? These are conditional probabilities.
For example, given that one had a First class ticket, what was the probability of surviving?
We define:
\( S \) | \( X \)=# Heads |
---|---|
HH | 2 |
HT | 1 |
TH | 1 |
TT | 0 |
\( S \) | \( X \)=numerical rating |
---|---|
Critical | -1 |
Poor | -1 |
Fair | 0 |
Satisfactory | +1 |
Two types of random variables (same as quantitative variables):
\[ \Pr(X= -1) = \Pr(Critical \cup Poor) \\ = 0.2 + 0.3 = 0.5 \]
(Definition) Let \( D \) be the set of possible values of a discrete RV, \( X \).
Tabular summary of a PMF. Two coin flips and \( X \)=# Heads
\( x \) | 0 | 1 | 2 |
---|---|---|---|
\( p(x) \) | ¼ | ½ | ¼ |
Which is amenable to plotting.
The Cumulative Distribution Function, (CDF), of a discrete RV \( X \) with PMF \( p(x) \) is \( \Pr(X \le x) \) \( \equiv \) \( F_X(x) \).
Roll 2 dice and \( X \) equals the sum of the dots.
Sample space of simple outcomes and the corresponding \( X \) values:
2nd die | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
1st die | |||||||
1 | 1,1 | 1,2 | 1,3 | 1,4 | 1,5 | 1,6 | |
2 | 2,1 | 2,2 | 2,3 | 2,4 | 2,5 | 2,6 | |
3 | 3,1 | 3,2 | 3,2 | 3,4 | 3,5 | 3,6 | |
4 | 4,1 | 4,2 | 4,3 | 4,4 | 4,5 | 4,6 | |
5 | 5,1 | 5,2 | 5,3 | 5,4 | 5,5 | 5,6 | |
6 | 6,1 | 6,2 | 6,3 | 6,4 | 6,5 | 6,6 |
2nd die | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
1st die | |||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
2 | 3 | 4 | 5 | 6 | 7 | 8 | |
3 | 4 | 5 | 6 | 7 | 8 | 9 | |
4 | 5 | 6 | 7 | 8 | 9 | 10 | |
5 | 6 | 7 | 8 | 9 | 10 | 11 | |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|
1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 | 1/36 |
2 | 3 | 4 | 5 | \( \ldots \) | 11 | 12 |
---|---|---|---|---|---|---|
1/36 | 3/36 | 6/36 | \( \ldots \) | 35/36 | 36/36 | |
0.027 | 0.083 | 0.167 | \( \ldots \) | 0.972 | 1.00 |
i.e. summing over the PMF.
The CDF is simply a representation of the PMF that is a useful tool in calculating probabilities over intervals of the RV (nb. must be careful about the endpoints of the intervals - are they inclusive or exclusive?. For example:
An Auckland obstetrician [said] the chances of a successful pregnancy resulting from implanting a frozen embryo are about 1 in 10.
A couple would like to know how many times they might need to try the procedure to give themselves at least a 40% chance of success. Label the outcome of an individual attempt as \( Y \) for Yes/success! and \( N \) for No/failure.
Define the sample space:
Define the RV \( X \) (i.e. what numeric value is attached to our samplespace elements)?
NB - in general, \( \Pr(X=k) \) = 0.9\( ^{k-1} \) \( \times \) 0.1.
Give the PMF and the CDF:
\( S \) | \( Y \) | \( NY \) | \( NNY \) | \( NNNY \) | \( NNNNY \) | \( NNNNNY \) | 7 or more failures |
---|---|---|---|---|---|---|---|
\( X \) | 1 | 2 | 3 | 4 | 5 | 6 | 7+ |
\( \Pr(X) \) | 0.1 | 0.09 | 0.081 | 0.0729 | 0.06561 | 0.059049 | 0.4783 |
\( \Pr(X \le x) \) | 0.1 | 0.19 | 0.271 | 0.3439 | 0.40951 | 0.46856 | 1.0 |
Thus the couple need to be willing to try up to 5 times
We've covered:
Next:
Reading: