Combinatorics Formulas

Permutations

The number of ways to pick a sequence of \(K\) elements from a set of \(N\): \[ \begin{align} _NP_K&=N\cdot(N-1)\cdot(N-2)\dots(N-K+1)\\\\ &=\frac{N\cdot(N-1)\cdot(N-2)\dots1}{(N-K)\cdot(N-K-1)\dots1}\\\\ &\boxed{_NP_K=\frac{N!}{(N-K)!}} \end{align} \]

Combinations

The number of ways to pick \(K\) elements from a set of \(N\): \[ \begin{align} &_NC_K=\frac{\text{total permutations}}{\text{repatitions}}\\\\ &_NC_K=\frac{_NP_K}{K!} \\\\ &\boxed{_NC_K=\frac{N!}{K!(N-K)!}} \end{align} \]

Fundamental Discrete Probability Laws

Complement Probability

The probability of \(A\) NOT occurring is complementary to the probability of \(A\): \[ \begin{align} &\text{Events }A\text{ and }B\text{ are disjoint.}\\ &\text{Their probabilities sum to }1\:\dots\\ &1=P(A)+P(A^C)\\\\ \therefore\quad &\boxed{P(A^C)=1-P(A)} \end{align} \]

Inclusion-Exclusion principle

the probability of \(A\) OR \(B\) occurring is the sum of the proabilities of \(A\) and \(B\) occurring independently minus the probability both occur simultaneously: \[ \begin{align} &\boxed{P(A\cup B)=\\P(A)+P(B)-P(A\cap B)} \\\\ &\text{for mutually exclusive (disjoint) events} \\ &P(A\cup B)=P(A)+P(B) \end{align} \]

Posterior Probability

The conditional or posterior probability \(P(A|B)\) is the probability of \(A\) given \(B\). After the event \(B\) has occured therefore i) the new sample space \(\Omega'\subset\Omega\) if you like, is P(B) and ii) we must only consider the subset of event \(A\cap B\) since \(P(B^C)=0\) \[ \begin{align} &\boxed{P(A|B)=\frac{P(A\cap B)}{P(B)}} \\\\ \dots&\text{and the }\textbf{multiplication rule}\dots\\ &\boxed{P(A\cap B)=P(A|B)P(B)} \end{align} \]

Law of Total Probability

if the sample space is divided into \(n\) disjoint events \(B_1, B_2\dots,B_n\) then the probability of \(A\) is the sum of the conditional \[ \begin{align} P(A)&=\sum_i P(A\cap B_i) \\\\ \therefore\quad &\boxed{P(A)=\sum_i P(A|B_I)P(B_i)} \end{align} \]

Bayes’ Theorem

Bayes’ rule tells us how to invert conditional probabilities: \[ \begin{align} P(A\cap B) &= P(B\cap A) \\ P(A|B)P(B) &= P(B|A)P(A) \end{align} \] \[\therefore\quad\boxed{P(A|B) = \frac{P(B|A)P(A)}{P(B)}}\]

Bayesian Update for multiple hypothesis

Bayesian updating can be extended to calculate the posterior probability for each hypothesis \(B_1,B_2,\dots B_n\) given the observed evidence. \[ \boxed{P(B_i|A) = \frac{P(A|B_i)P(B_i)}{\sum_jP(A|B_j)P(B_j)}} \]

de morgan’s laws? Set theory

Prototypical Discrete Probability Problems

Probability often defies intuition. Discrete probability is calculated by counting elements within (sometimes unimaginably large) sets (Bragg & Guests, 2016). Probability theory creates formulas to simplify this counting, using combinatorics to tackle real-world challenges by rephrasing them as well-known prototypical problems as presented and stolen from (Orloff & Kamrin, 2022).

Random Variable a random numerical outcome e.g.) sum of two dice, \(X=X_1+X_2\) is the sum of the independent rolls of the two dice \(X_1\) and \(Y_2\) - all random variables. A random variable maps the solution space to an outcome \(X:\Omega\to\mathbb{R}\) (e.g. for two dice \(\{4,5\}\to9\))
Sample space, \(\Omega\): the set of all possibles outcomes e.g) all \(36\) permutations of the two dice.
Event: A subset of the sample space e.g.) \(X\le3\), \(\{1,1\},\{1,2\},\{2,1\}\)

toy models like coins and dice. By understanding these thoroughly we will develop a good feel for the simple essence inside many complex real-world problems.Sometimes a problem is so complicated that the best way to understand it is through computer simulation. Here we use software to run virtual experiments many times in order to estimate probabilities.

The list of applications is essentially endless: tests of one medical treatment against another (or a placebo), measures of genetic linkage, the search for elementary particles, machine learning for vision or speech, gambling probabilities and strategies, climate modeling, economic forecasting, epidemiology, marketing, googling…

Example - combined probability of independent events: A coin is fair if it comes up heads or tails with equal probability. You flip a fair coin three times. What is the probability that exactly one of the flips results in a head? \[ \begin{align} P(H)&=\frac{\sum\text{permutations of 3 with 1 head}}{\sum\text{permutations}}\\ &=\frac{1}{8}\cdot\binom{3}{1}=\frac{1}{8}\cdot\frac{3!}{(3-1)!}=3/8 \end{align} \]

choose(n=3,k=1)/2^3

## [1] 0.375

Example - combined probability of independent events with different ways to achieve the final event: A deck of 52 cards has 13 ranks (2, 3, … , 9, 10, J, Q, K, A) and 4 suits (♡, ♠, ♢, ♣,). A poker hand consists of 5 cards. A one-pair hand consists of two cards having one rank and three cards having three other ranks, e.g., {2♡, 2♠, 5♡, 8♣, K♢}. Here the trick is to to graph the successive possibilities as shown below however, doing this exhaustively is a real feckin ball ache. So by counting all the possibilities we find \(P(\text{no pairs})\approx0.55\) and \(P(\text{at least one pair})\approx0.45\).

Example - discrete random variable multiplication1: If you have 3 shirts and 4 trousers then… you can make: \(\boxed{3 ⋅ 4 = 12\:\text{different outfits}}\)

Alternatively, suppose you roll one die. Then the sample space and probability function are:

\(X\)	1	2	3	4	5	6
P	1/6	1/6	1/6	1/6	1/6	1/6

Now suppose you roll two dice…

x<-as.matrix(rep(1/6,6));
x %*% t(x)

##            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
## [1,] 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778
## [2,] 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778
## [3,] 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778
## [4,] 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778
## [5,] 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778
## [6,] 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778 0.02777778

\(X*X\)	2	3	4	5	6	7	8	9	10	11	12
P	1/36	2/36	3/36	4/36	5/36	6/36	5/36	4/36	3/36	2/36	1/36

Example - permutations and combinations: There are 5 competitors in the 100m final at the Olympics. In how many ways can the gold, silver, and bronze medals be awarded?

can award first medal 5 ways
can award second medal 4 ways
can award third medal 3 ways

\[5\cdot 4 \cdot 3 = \frac{5\cdot 4 \cdot 3 \cdot 2 \cdot 1}{2 \cdot 1}=\frac{5!}{(5-3)!}=_5P_3\]

\(\therefore\qquad\)\(60\) ways to place contestants of the podium (We can pick \(3\) elements from a set of \(5\) in \(60\) different sequences or permutations).

factorial(5)/factorial(5-3)

## [1] 60

\[_5C_3=\frac{_5P_3}{3!}\]

\(\therefore\qquad\)The are \(10\) ways to pick 3 medalists (combinations of \(3\) elements from a set of \(5\))

choose(n=5,k=3)

## [1] 10

Example - binomial distribution:

i.) Count the number of ways to get 3 heads in a sequence of 10 flips of a coin:

total of \(2^{10}\) permutations
total of \(10*9*8\) ways to shuffle 3 heads amongst 7 tails
total of \(\frac{10*9*8}{3!}\) ways to choose a subset of 3 heads

ii.) If the coin is fair, what is the probability of exactly 3 heads in 10 flips?

The probability is just the elements in the event over the sample space: \[P(3\text{ heads})=\frac{10*9*8}{3!}/2^{10}=\frac{120}{1024}\approx 11.7\%\]

More simply we can recognise this as a Binomial process i.e. \(P(3\text{ heads}) = \text{bin}(n=10,\:p=0.5)(X=3)\)
dbinom(x=3,size=10,prob=0.5)
## [1] 0.1171875

iii.) Count the number of ways to get 3 ones in a sequence of 10 dice rolls:

Consider all possibilities:

\[\text{total of }6^{10}\text{ outcome permutations}\]
Forget all the different sequences just consider the ways of throwing three ones and 7 non-one in one particular sequence
- there is \(1\) way to throw a one and \(1=1^3\) (permutations of || way to throw) three ones
- there is \(5\) ways to throw a non-one and \(78125=5^7\) (permutations of || ways to throw) 7 non-ones
- there are \(1^3*5^7=78125\) ways to throw three ones and 7 non-ones
Now consider shuffling the 3 ones amongst the 7 non-ones
- total of \(10*9*8\) ways to choose a sequence of 3 ones from non-ones
- total of \(\binom{10}{3}=\frac{10*9*8}{3!}=120\) ways to (choose || shuffle) a subset of 3 ones from non-ones
put it all together…

\[\binom{10}{3}=1^3*5^7=9375000\:\text{ways to throw }3\:\text{ones with}\:10\:\text{dice}\]

iv.) If the dice is fair, what is the probability of exactly 3 ones in 10 rolls?

Consider all possibility:
- total of \(6^{10}\) outcome permutations
Consider the binary scenario of simply ones and non-ones
- total of \(10*9*8\) ways to choose a sequence of 3 ones from non-ones
- total of \(\binom{10}{3}=\frac{10*9*8}{3!}=120\) ways to choose a subset of 3 ones from non-ones
Consider the probability of throwing a one compared to a non-one
- the probability of a one is \(P(one)=\frac{1}{6}\) and the probability of three ones is \(P(\text{3 ones})=\frac{1}{6}\cdot\frac{1}{6}\cdot\frac{1}{6}\)
- the probability of a non-one is \(P(one)=\frac{5}{6}\) and the probability of seven non-ones is \(P(\text{7 non-ones})=\left(\frac{5}{6}\right)^7\)
- the probability of throwing 3 ones and 7 non-one is \(P(\text{3 ones & 7 non-ones})=\left(\frac{1}{6}\right)^3\cdot\left(\frac{5}{6}\right)^7\)
put it all together…

\[P(\text{3 ones})=\binom{10}{3}\times\left(\frac{1}{6}\right)^3\cdot\left(\frac{5}{6}\right)^7\approx 15.5\%\]

More simply we can recognise this as yet another Binomial process i.e. \(P(3\text{ ones}) = \text{bin}(n=10,\:p=1/6)(X=3)\)
dbinom(x=3,size=10,prob=1/6)
## [1] 0.1550454

Example - Poisson distribution: count the number of black cabs that pass the livingroom window in an hour.

Here the sample space, \(\Omega = \{0, 1, 2, 3, 4, ...\}\). Imagine all black cabs randomly travelling around the city - periodically one will pass the window a bit like one radioactive isotopes from a large set of isotopes randomly decay or a couple of carrots randomly grow in a turn on a large farm in minecraft.

The key here is to already have an estimate \(\lambda\) for the average number of passing black cabs in an hour \(X\) and model the probability distribution as binomial with \(E(X)=\lambda\) and \(n=\infty\) i.e. the Poisson probability distribution: \[ \begin{align} &\boxed{P(k)=e^{-\lambda}\cdot\frac{\lambda^k}{k!}}\qquad\text{where}\quad e^{-\lambda}\equiv\sum_{n=0}^\infty\frac{\lambda^n}{n!}\\\\ \therefore\qquad&\text{if }\lambda=5\text{ the chance of 7 cabs is...}\\ &P(7)=e^{-5}\cdot\frac{5^7}{7!}\approx 10.44\% \end{align} \]

dpois(7, lambda=5)

## [1] 0.1044449

Example - Geometric distribution: Experiment: Toss the coin until the first heads. Report the number of tosses.

This is an uncluttered version of a general class of problems called stopping rule problems modelled by the geometric probability distribution. A stopping rule is a rule that tells you when to end a certain process. A more practical example is a rule for ending a series of medical treatments. Such a rule could depend on how well the treatments are working, how the patient is tolerating them and the probability that the treatments would continue to be effective. One could ask about the probability of stopping within a certain number of treatments or the average number of treatments you should expect before stopping.

Think of traversing down the branch of a probability tree for the relavent stopping criteria…
- \((1 - p)^{k}\) is the probability of \(k\) successive tails
- \(p\) is the probability of the final heads.
- The probability of all these independent events is the product of the two.

\[P(X = k) = (1 - p)^{k}\cdot p\qquad\equiv\text{geo}(p=0.5)\]

e.g. probability of 3 tails then heads is \(\approx 6.25\%\)
dgeom(x=3,prob=0.5)
## [1] 0.0625

e.g. probability of 3 non-six dice rolls followed by a 6 is \(\approx 9.5\%\)
dgeom(x=3,prob=1/6)
## [1] 0.09645062

Example - Conditional Probability:

Draw two cards from a deck. Define the events: \(S1 = \text{‘first card is a spade’}\) and \(S2 = \text{‘second card is a spade’}\). What is the conditional probability of \(P(S2|S1)\)?

We know there are \(12\) spades left in the deck of \(51\) cards so \(P(S2|S1)=\frac{12}{51}\).

Whenever we consider conditional probabilities its always important to really understand the sample space \(\Omega\). Here we might consider a sampling space of \(52\) elements and the probability of drawing the first spade as \(P(S1)=\frac{13}{52}\) however, when we consider the second draw we must realise that it is not independent of the first. A better choice of sampling space is all pairs of possible card combinations, the sampling space consists of \(_{52}P_2=52\cdot 51=\frac{52!}{(52-2)!}=2652\) elements. Now \(S1\) corresponds to the event the first card of a two card draw is spaces which is \(P(S1)=\frac{13\cdot51}{2652}=\frac{13}{52}\).

Having thought hard about the sample space we can check the inclusion-exclusion principle / multiplication rule \(P(A\cup B)=\\P(A)+P(B)-P(A\cap B)\)… \[ \begin{align} &\boxed{P(S1)=\frac{663}{2652}}=\frac{13\cdot 51}{2652}\\ &\boxed{P(S2)=\frac{663}{2652}}=\frac{13\cdot 51}{2652}\\ \\ &\;P(S1)\cup P(S2)= P(S1\cup S2^C) + P(S1^C\cup S2) + P(S1\cup S2)\\ &\quad= \frac{13\cdot (51-12)}{2652} + \boxed{\frac{13\cdot (51-12)}{2652}} + \frac{13\cdot12}{2652}\\ \therefore\quad&\boxed{P(S1)\cup P(S2)=\frac{1170}{2652}}\\ \\ &\boxed{P(S1)\cap P(S2)=\frac{156}{2652}}=\frac{13\cdot12}{2652} \end{align} \]

In more complicated problems it will be be much harder to compute conditional probability by counting so we use the following alternative approach:

\[ \begin{align} P(S2|S1)&=\frac{P(S1)\cap P(S2)}{P(S1)}\\ &=\frac{156}{663}=\frac{12}{51} \end{align} \]

Example - Probability Urns / Law of Total Probability:

An urn contains 5 red balls and 2 green balls. A ball is drawn. If it’s green \(G1\) a red ball is added to the urn and if it’s red \(R1\) a green ball is added to the urn. (The original ball is not returned to the urn.) Then a second ball is drawn. What is the probability the second ball is red \(R2\)?

“In probability and statistics, an urn problem is an idealized mental exercise in which some objects of real interest (such as atoms, people, cars, etc.) are represented as colored balls in an urn or other container. One pretends to draw (remove) one or more balls from the urn; the goal is to determine the probability of drawing one color or another, or some other properties. A key parameter is whether each ball is returned to the urn after each draw.”

Events \(R1\) and \(G1\) are disjoint events whose union is the complete sample space. [ \[\begin{align} &\text{use the total law of porbability...}\\ &P(A)=\sum_i P(A|B_I)P(B_i)\\\\ \therefore\quad&P(R2)=P(R2|R1)P(R1)\:+P(R2|G1)P(G1)\\ \\ &\text{here...}\\ &\:P(R1)=\frac{1}{5}\\ &\:P(G1)=\frac{1}{2}\\ &\:P(R2|R1)=\frac{4}{7}\\ &P\:(R2|G1)=\frac{5}{7}\\ \\ \therefore\quad&P(R2)=\frac{4}{7}\cdot\frac{1}{5}\:+\frac{4}{7}\cdot\frac{1}{2}\qquad \boxed{P(R2)=\frac{6}{7}} \end{align}\] ]

…the law of total probability is hand when calculating a second conditional event where we need to consider all the first events i.e. it allows us to calculate the probability of a second event conditional on a previous event without knowing the previous event

Example - Bayes’ Theorem:

Toss a coin \(5\) times. Let \(H1 = \text{first toss is heads}\) and let \(HA = \text{all tosses are heads}\). Then \(P(H1|HA) = 1\) but \(P(HA|H1) = \frac{1}{16}\). \[ \begin{align} P(HA|H1) &= \frac{P(H1|HA)P(HA)}{H1} \\ &= \frac{1\cdot (0.5)^5}{0.5} &= \frac{1}{16} \end{align} \]

Example - Base rate fallacy: Consider a routine screening test for a disease. Suppose the frequency of the disease in the population \(base rate\) is \(0.5\%\). The test is fairly accurate with a \(5\%\) false positive rate and a \(10\%\) false negative rate. You take the test and it comes back positive. What is the probability that you have the disease?

[ \[\begin{align} &\text{we are interested in the intersection}:\\ &Disease\cap Positive\\ \\ \\ &\text{probability of disease...}\\ &P(Disease)=0.005\\ \\ &\text{probability of positive test given disease...}\\ &P(Positive|Disease)=0.995\\ \\ &\text{probability of disease...}\\ &P(Positive)=P(Disease)\cdot P(Positive|Disease) + P(Disease^C)\cdot P(Positive|Disease^C)\\ &\qquad\qquad\quad\:=0.005\cdot 0.995\qquad+\qquad0.995 \cdot 0.1\\ &\qquad\qquad\quad\:=0.104475 \\ &\text{likelyhood...}\\ &P(Disease|Positive)=\frac{P(Positive|Disease)P(Disease)}{P(Positive)}\\ &\qquad\qquad\qquad\qquad\quad=\frac{0.995\cdot 0.005}{0.104475}\approx0.047619048 \end{align}\] ]

	\(P^+\)	\(P^-\)
\(D^+\)
\(D^-\)

Bayes’ Theorem

A Bayesian update involves calculating the posterior probability using Bayes’ theorem. It is a way to update your beliefs about an event \(A\), given new evidence \(B\).

Bayes’ Theorem The formula for Bayes’ theorem is: \[ P(A∣B)=\frac{P(B∣A)\cdot P(A)}{P(B)} \]

Here’s what each term means:

\(P(A∣B)\): Posterior probability — the probability of \(A\) given \(B\) (updated belief after observing evidence).
\(P(A)\): Prior probability — the probability of \(A\) before observing evidence.
\(P(B∣A)\): Likelihood — the probability of observing \(B\) if \(A\) is true.
\(P(B)\): Evidence or Marginal Likelihood — the total probability of observing \(B\), across all possible causes.

Steps for a Bayesian Update: Start with a prior: This is your initial belief about the probability of \(A\), denoted as \(P(A)\).

Gather evidence: Observe new data \(B\) and calculate the likelihood \(P(B∣A)\), which tells you how likely \(B\) is if \(A\) is true.

Normalize with the evidence: Compute \(P(B)\), the total probability of observing \(B\). This ensures the posterior is a valid probability distribution:

\[P(B)=P(B∣A)P(A)+P(B∣A^C)P(A^C)\]

Update to the posterior: Use Bayes’ theorem to calculate \(P(A∣B)\).

Example: Medical Diagnosis Let’s say:

\(A\): Patient has a disease. \(A^C\): Patient does not have the disease. \(B\): Patient tests positive for the disease. Known probabilities: \(P(A)=0.01\): Prior — 1% of patients have the disease. \(P(B∣A)=0.95\): Likelihood — 95% of diseased patients test positive. \(P(B∣A^C)=0.05\): False positive rate — 5% of healthy patients test positive.

Step 1: Calculate \(P(B)\) (evidence): \[ \begin{align} P(B)&=P(B∣A)P(A)+P(B∣A^C)P(A^C) \\ &=0.0095+0.0495=0.059 \end{align} \]

Step 2: Apply Bayes’ theorem to find \(P(A∣B)\): \[ \begin{align} P(A∣B)&=\frac{P(B∣A)\cdot P(A)}{P(B)}\\ &=\frac{0.0095}{0.059} \\ &\approx 0.161 \end{align} \]

Interpretation: The posterior probability \(P(A∣B)\) is approximately 16.1%, meaning even with a positive test result, the chance the patient actually has the disease is about 16%, not 95%. This is because the disease is rare (low prior probability).

Summary Bayesian updates refine our prior beliefs \(P(A)\) into posterior beliefs \(P(A∣B)\) using evidence \(B\) and the likelihood \(P(B∣A)\). the art of statistical inference involves deciding how to proceed when one (or more) of the terms on the right side of Bayes’ rule is unknown

Appendix

Bragg, M., & Guests. (2016). Probability. https://www.bbc.co.uk/programmes/b06r4whh

Orloff, Dr. J., & Kamrin, Dr. J. F. (2022). MIT 18.05 discrete mathematics, probability and statistics lecture notes. https://www.zotero.org/groups/5801297/mit_ocw

Reminder Discrete Probability using R

Combinatorics Formulas

Permutations

Combinations

Fundamental Discrete Probability Laws

Complement Probability

Inclusion-Exclusion principle

Posterior Probability

Law of Total Probability

Bayes’ Theorem

Bayesian Update for multiple hypothesis

Prototypical Discrete Probability Problems

Bayes’ Theorem

Appendix