Probability Theory 1
In this lab, we (1) create sample spaces and subset event spaces, (2) addition probabilities for non-independent and independent events and (3) combine and permute observations with and without replacement.
Relevant functions: sample(),
seq(), rep(), combn(),
permn()
1. Sample Spaces and Events
1.1 Creating a sample space
A sample space (also depicted by the Greek letter omega) encompasses all events that…
1.2.1 Coin Toss
Let’s create a sample space of a coin toss. These are all the possible outcomes one could get from flipping a coin.
omega_coin <-c("Head","Tails") # Creating a vector called omega_coin
omega_coin # Displaying the elements within out omega_coin vector
## [1] "Head" "Tails"
1.1.2 Die Toss
Similarly, we create a sample space of a six-faced die. This encompasses all the possible outcomes from a die toss.
# Creating a vector called omega_die
# Method 1: Inputting values per hand
omega_die <-c(1,2,3,4,5,6) #
# Method 2: Using the seq() function
omega_die <- seq(1,6)
# Displaying the elements within out omega_die vector
omega_die
## [1] 1 2 3 4 5 6
1.2 Subsetting an event space
We now want to subset event spaces (depicted by capital Roman letters, such as A) from these sample spaces. These are the event(s) from the sample space that actually occured.
We can use the sample() function to sample from a
vector, while specifying the probability of each outcome occurring. For
these examples, we use sampling with replacement
because each trial is independent from each other. Concretely, getting a
Head on a first coin toss does not remove Head from the sample space for
the second coin toss.
1.1.1 Coin Toss
Let’s assume we are working with a fair coin. Here, \(P(H) = P(T) = 0.5\).
# Setting seed for replication purposes
set.seed(123)
# Sampling 1 observation from our omega_coin vector (with replacement)
A <- sample(omega_coin , size=1, replace=TRUE, prob=c(0.5,0.5))
# Printing out our results
A
## [1] "Tails"
# Sampling 5 observations from our omega_coin vector (with replacement)
B <- sample(omega_coin , size=5, replace=TRUE, prob=c(0.5,0.5))
# Printing out our results
B
## [1] "Head" "Tails" "Head" "Head" "Tails"
Let’s now assume that our coin is not fair, and that
\(P(H) = 0.9\) and \(P(T) = 0.1\). We will need to adjust the
prob argument in the sample() function
accordingly.
# Setting seed for replication purposes
set.seed(123)
# Printing out our omega_coin vector to see order of arguments
omega_coin
## [1] "Head" "Tails"
# Sampling 5 observations from our omega_coin vector (with replacement)
C <- sample(omega_coin , size=5, replace=TRUE, prob=c(0.9,0.1))
# Printing out our results
C
## [1] "Head" "Head" "Head" "Head" "Tails"
1.2.2 Die Toss
Let’s assume we are working with a fair die. Here, \(P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 0.1\bar{6}\).
# Setting seed for replication purposes
set.seed(123)
# The rep() function allows you to repeat a given element multiple times
rep(0.16,6)
## [1] 0.16 0.16 0.16 0.16 0.16 0.16
# Sampling 1 observation from our omega_die vector (with replacement)
A <- sample(omega_die , size=1, replace=TRUE, prob=rep(0.16,6))
# Printing out our results
A
## [1] 3
# Sampling 5 observations from our omega_die vector (with replacement)
B <- sample(omega_die , size=5, replace=TRUE, prob=rep(0.16,6))
# Printing out our results
B
## [1] 6 4 1 1 2
Exercise 1
Let’s now assume that our die is not fair, and that \(P(6) = 0.4\) and \(P(1,2,3,4,5) = 0.12\).
Sample 5 observations from omega_die using these probabilities.
# Don't forget to set your seed at 123!
set.seed(123)
# Answer
set.seed(123)
sample(omega_die, size=5, replace=TRUE, prob=c(rep(0.12,5),0.4))
## [1] 6 2 3 1 1
2. Combinations without Replacement
We might want to measure all the ways in which a set of things can be distributed, if the order doesn’t matter. Let’s say the elements we are interested in are letters from the alphabet.
We saw in class that the formula for combinations without replacement was the following:
\[ \left({{n}\atop k}\right) = \frac{n!}{k! * (n-k)!} \]
Where n is the number of objects we want to distribute and k is the size of the group.
If we concentrate only on alphabet letters a, b and c, in how may ways can we arrange dyads (i.e. pairs of two) from these elements if their order doesn’t matter?
\[ \left({{3}\atop 2}\right) = \frac{3!}{2! * (3-2)!} = \frac{3 * 2 * 1}{(2 * 1) * 1} = \frac{6}{2} = 3 \]
Here is how we would do this in R (and visualize all the combinations):
# Generating all possible combinations for the first three letters of
# the alphabet (pairs of 2 elements)
combn(letters[1:3], 2)
## [,1] [,2] [,3]
## [1,] "a" "a" "b"
## [2,] "b" "c" "c"
# Printing out the number of combinations (i.e. number of columns in the table)
ncol(combn(letters[1:3], 2))
## [1] 3
We can deduct from this that you would have \(P(a,b) = \frac{1}{3} = 0.\bar{3}\), since there are three possible ways of arranging these letters.
Exercise 2
How many combinations without replacement are there for 3 die tosses, using a 10-sided die?
ncol(combn(1:10, 3))
## [1] 120
3. Permutations
We might want to measure all the ways in which a set of things can be distributed, if the order matters. Let’s say the elements we are interested in are days of the year (n = 365).
3.1 With replacement
We saw in class that the formula for permutations with replacement was the following:
\[ _nP_r = n^r \]
Where n is the number of objects we want to distribute and r is the size of the group. Imagine I want to know all the possible permutations for birth dates that could exist between two people (with replacement, since both individuals can be born the same day).
\[ _{365}P_2 = 365^2 = 133,225 \]
Let’s say you want to calculate this in R, as well as see all the possible permutations:
# Loading the gtools library
library(gtools)
# Creating a vector with all days of the year
days <- format(seq(as.Date("2022-01-01"), as.Date("2022-12-31"),
by="+1 day"), format="%d-%m")
# Printing out the first 5 permutations
head(permutations(n=365,r=2,v=days,repeats.allowed=TRUE),5)
## [,1] [,2]
## [1,] "01-01" "01-01"
## [2,] "01-01" "01-02"
## [3,] "01-01" "01-03"
## [4,] "01-01" "01-04"
## [5,] "01-01" "01-05"
# Printing out the number of permutations (i.e. number of rows in the table)
nrow(permutations(n=365,r=2,v=days,repeats.allowed=TRUE))
## [1] 133225
3.2 Without replacement
Now let’s introduce a new concept: that of permutations without replacement. This is a case where the order matters, and you cannot assign the same element twice. Once one element is assigned within the group, it isn’t available to be assigned again.
\[ _nP_r = \frac{n!}{(n-r)!} \]
If we take the same example as earlier, with all possible permutations of birthdates for two individuals, but this time they cannot share the same birthday (permutation without replacement), this is what the formula looks like:
\[ _{365}P_2 = \frac{365!}{(365-2)!} = \frac{365!}{363!} = 365 * 364 = 132,860 \]
Here is how to do this in R:
# Instaling and loading the combinat library
# install.packages("combinat")
library(combinat)
# Creating a vector with all days of the year
days <- format(seq(as.Date("2022-01-01"), as.Date("2022-12-31"),
by="+1 day"), format="%d-%m")
# Printing out the number of permutations (i.e. number of rows in the table)
nrow(permutations(n=365,r=2,v=days,repeats.allowed=FALSE))
## [1] 132860
One last note: when the number of objects is equal to the size of the group, the formula can be simplified to:
\[ \text{if n = r, then } _nP_r = n! \]
For instance:
\[ _3P_3 = \frac{3!}{(3-3)!} = \frac{3!}{0!} = \frac{3!}{1} = 3!\]
Reminder: the factorial of 0 is 1.
Application: Birthday Problem
What are the odds that two people in given class have the same birthday? Or rather, since it’s easier to measure, what is the probability that nobody in a given class shares the same birthday?
\[ P(\text{nobody shares birthday}) = \frac{\text{Permutations of k birthdays without replacement}}{\text{Permutations of k birthdays with replacement}}\]
The formula looks like this:
\[ P(\text{nobody shares birthday}) = \frac{365!}{365^k * (365-k)!}\]
Proof
\[ P(\text{nobody shares birthday}) = \frac{\text{Permutations of k birthdays (n = 365) without replacement}}{\text{Permutations of k birthdays (n = 365) with replacement}}\]
\[ P(\text{nobody shares birthday}) = \frac{\frac{n!}{(n-k)!}}{n^k} \]
\[ P(\text{nobody shares birthday}) = \frac{1}{n^k} * \frac{n!}{(n-k)!}\]
\[ P(\text{nobody shares birthday}) = \frac{n!}{n^k * (n-k)!} \]
\[ P(\text{nobody shares birthday}) = \frac{n!}{n^k * (n-k)!}\]
As an example, the probability that nobody shares a birthday in a class of 3 people is:
\[ P(\text{nobody shares birthday}) = \frac{365!}{365^3 * (365-3)!}\]
\[ P(\text{nobody shares birthday}) = \frac{365 * 364 * 363 * 362 * 361 *\text{ ... }* 2 * 1}{365^3 * (362 * 361 * 360 *\text{ ... }* 2 * 1)!}\]
\[ P(\text{nobody shares birthday}) = \frac{365 * 364 * 363 * \cancel{362} * \cancel{361} *\text{ ... }* \cancel{2} * \cancel{1}}{365^3 * (\cancel{362} * \cancel{361} * \cancel{360} *\text{ ... }* \cancel{2} * \cancel{1})!}\]
\[ P(\text{nobody shares birthday}) = \frac{365 * 364 * 363}{365^3}\]
\[ P(\text{nobody shares birthday}) = 0.9917958\]
Consequently, the probability that at least two people share a birthday in a group of 3 people is:
\[ P(\text{at least two people share a birthday}) = 1 - 0.9917958 = 0.0082042 \]
Exercise 3
Using the birthday() function below from Imai (2018,
p. 249-250), determine how many individuals a group should have so that
P(at least two people share a birthday) \(>\) 0.5.
Hint: you can use the which()
function.
# Imai's birthday() function
birthday <-function(k) {logdenom <- k*log(365) + lfactorial(365 - k)
lognumer <-lfactorial(365)
pr <- 1 - exp(lognumer - logdenom)
return(pr)
}
# Testing the result for a 3-people group
birthday(3)
## [1] 0.008204166
# Hint: a general example on how to use the which() function:
which(1:10 > 3)
## [1] 4 5 6 7 8 9 10
# This code displays all observations within our 1:10 vector that are
# higher than 3
# Answer
which(birthday(1:100) > .5)[1]
## [1] 23