Label the possible configurations as \(1 = FF, 2 = FM, 3 = MF, 4 = MM\)
Draw \(N\) random integers
between 1 and 4 with uniform probability, allowing repetition
(replace = TRUE)
every drawn integer is a family configuration (two children, either male or female)
N=10
S=sample(x=1:4,size=N,replace=T)
print(S)
## [1] 2 4 2 1 2 3 1 1 3 2
We want to se how many times
print(S[S<4])
## [1] 2 2 1 2 3 1 1 3 2
S[S < 4]: This filters the elements in
S where the condition S < 4 is
TRUE. In other words, it keeps only those elements in
S that are less than 4, that in this example are 9 over the
initial 10.S, we need to
calculate the ratios of 2s and 3s.print(S[S<4]!=1)
## [1] TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE
print( mean(S[S<4]!=1) )
## [1] 0.6666667
(S[S < 4] != 1 is a logical vector;
TRUE is evaluated as 1 and FALSE
as 0mean on this logical vector returns the ratios of 2s
and 3s in the list, that is 6 over 9, i.e. 0.6666667.Putting all together and running the experiment many times,
N=100000
S=sample(x=1:4,size=N,replace=T)
prob=mean(S[S<4]!=1)
print(prob)
## [1] 0.6652858
Alternatively, mirroring more explicitly the formula for the conditional probability \[P(A | B) = \frac{P(A \text{ and } B)}{P(B)}\]
prob = length( S[S<4 & S > 1] ) / length( S[S<4] )
print(prob)
## [1] 0.6652858
# Number of experiments, that is family instances
N = 10
df = data.frame(
C1 = sample(x = c("F", "G"), size = N, replace = TRUE),
C2 = sample(x = c("F", "G"), size = N, replace = TRUE)
)
df with two columns,
C1 and C2, each containing N
random samples of either "F" (female) or "G"
(male).sample(x = c("F", "G"), size = N, replace = TRUE)
randomly samples "F" or "G" N
times, allowing repetitions (replace = TRUE), meaning each
child can independently be male or female.df represents a “family” with two children,
where C1 is the gender of the first child and
C2 is the gender of the second child.print(df)
## C1 C2
## 1 G F
## 2 G F
## 3 G F
## 4 F F
## 5 F F
## 6 F G
## 7 G F
## 8 F G
## 9 G F
## 10 G F
df = df[df$C1 == "F" | df$C2 == "F", ]
df to include only families where
at least one child is female.df$C1 == "F" | df$C2 == "F" creates a logical vector
that is TRUE for rows where C1
or C2 is "F".df[...] subsets df to include only the
rows that meet this condition.df contains only families with at
least one female child.print(df)
## C1 C2
## 1 G F
## 2 G F
## 3 G F
## 4 F F
## 5 F F
## 6 F G
## 7 G F
## 8 F G
## 9 G F
## 10 G F
prob = nrow(df[df$C1 == "G" | df$C2 == "G", ]) / nrow(df)
df[df$C1 == "G" | df$C2 == "G", ] filters
df again, but this time it keeps only rows where at least
one child is male (C1 == "G" | C2 == "G").nrow(df[df$C1 == "G" | df$C2 == "G", ]) gives the
number of families with at least one boy, among those with at least one
femalenrow(df) gives the total number of families with at
least one femalenrow(...) / nrow(...) calculates the
probability that a family with at least one female also has a male, that
isprob
## [1] 0.8
Putting all together and letting N big,
N=1000000
df=data.frame(C1=sample(x=c("F","G"),size=N,replace=T),
C2=sample(x=c("F","G"),size=N,replace=T));
df=df[df$C1=="F" | df$C2=="F",] # only families with at least a female
prob=nrow(df[df$C1=="G" | df$C2=="G",])/nrow(df) # how many families with a boy (among the ones with at least a female)
prob
## [1] 0.6660227
\(\newcommand{\winner}{i^{\ast}}\) \(\newcommand{\players}{K}\) Along the lines of the solution in https://rpubs.com/janselmi/TD2, the sample space is \(\Omega = \{0, 1\}^k\) endowed with the uniform probability (with \(k \geq 3\)). That is, let \(\omega = (\omega_i)_{i = 1, \dots, k} \in \Omega\) be the profile of coin tosses, where \(\omega_i \in {0, 1}\) is the outcome of player’s \(i\) toss, for all \(i \in \players := \{1, \dots, k\}\). We have \(|{\Omega}| = 2^k\), and \(p(\omega) = \frac{1}{2^k}\). The event “there is a winner” is \[W \subset \Omega = \{\omega \in \Omega: \text{there exists precisely one } \winner \in \players \text{ such that } \omega_\winner \neq \omega_i \forall i \in \players - \{\winner\} \} \] We can enumerate the points in \(W\):
Thus we have \(|W| = 2k\), and \[p(W) = \frac{2k}{2^k} = \frac{k}{2^{k-1}}\]
Let \(T_k\) be the RV denoting the number of games to play for observing a winner. What are the law and expectation of this RV?
\(\newcommand{pber}{p_k}\) Each play is a Bernoulli event with probability \(\pber\), thus by definition \(T_k\) follows a geometric distribution with parameter \(\pber)\), i.e. for all \(t = 1, 2, \dots\) we have
\[ P(T_k = t) = \pber(1-\pber)^{t-1} \]
with expected value \(\frac{1}{\pber}\).
Set the Number of Simulations and Players
N = 10
K = 5
N trials generate a game instance, that
is a binary vector of size KN rows and K
columns.S=replicate(K, sample(x=0:1,N,replace=T))
print(S)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 1 1 0 0
## [2,] 0 0 1 0 0
## [3,] 0 0 0 1 1
## [4,] 0 1 0 0 0
## [5,] 0 1 1 0 1
## [6,] 1 1 0 0 0
## [7,] 1 0 1 0 0
## [8,] 1 0 0 0 0
## [9,] 1 1 1 1 0
## [10,] 1 0 1 1 1
Next, calculate the probability of having exactly 1 or
(K - 1) ones in each row:
p_K = mean(rowSums(S)==1 | rowSums(S)==K-1)
rowSums(S) calculates the sum of each row in the matrix
S, which gives the count of 1s in each row.rowSums(S) == 1 | rowSums(S) == K - 1 is a logical
vector that is TRUE for rows where the sum is exactly 1 or K - 1 (in
this case, 5 out of 10).mean(...) calculates the average of this logical
vector, effectively giving the probability that a row has exactly 1 or K
- 1 ones.p_K
## [1] 0.5
Putting all together for N large,
N=1000000
K=5
S=replicate(K, sample(x=0:1,N,replace=T))
p_K=mean(rowSums(S)==1 | rowSums(S)==K-1)
p_K
## [1] 0.312423
Notice that the theoretical value is \(\frac{5 }{ 2^{4} } = 0.3125\).
Set the Number of Simulations N and Players
K
N = 10
K = 5
First, we represent the history of plays of one player as a data
frame df_parties with one column, called x1,
and N rows, containing random binary values (0
or 1).
# generates a sample of `0`s and `1`s of length `N`.
df_parties = data.frame(x1 = sample(x = c(0,1), size = N, replace = TRUE))
print(df_parties)
## x1
## 1 0
## 2 1
## 3 0
## 4 1
## 5 1
## 6 0
## 7 0
## 8 0
## 9 1
## 10 1
Then, we add the history of plays of the other players (that is, we
add K-1 columns of the dataframe):
for(i in 2:K) {
df_parties = cbind(df_parties, data.frame(x = sample(x = c(0,1), size = N, replace = TRUE)))
names(df_parties)[i] = paste0("x", i)
}
print(df_parties)
## x1 x2 x3 x4 x5
## 1 0 1 0 0 0
## 2 1 0 1 1 0
## 3 0 1 0 0 0
## 4 1 1 0 0 1
## 5 1 0 0 0 0
## 6 0 0 0 0 1
## 7 0 1 1 0 1
## 8 0 0 1 1 0
## 9 1 0 0 1 1
## 10 1 1 1 0 1
K-1 additional columns to
df_parties, each filled with random binary values
(0 or 1).df_parties
using cbind() and rename it as "x2",
"x3", etc., using paste0("x", i).Initialize a Sum Column
We add a column called sum to df_parties
and initialize it to 0. This column will store the row-wise
sum of the values in each x column.
df_parties$sum = 0
print(df_parties)
## x1 x2 x3 x4 x5 sum
## 1 0 1 0 0 0 0
## 2 1 0 1 1 0 0
## 3 0 1 0 0 0 0
## 4 1 1 0 0 1 0
## 5 1 0 0 0 0 0
## 6 0 0 0 0 1 0
## 7 0 1 1 0 1 0
## 8 0 0 1 1 0 0
## 9 1 0 0 1 1 0
## 10 1 1 1 0 1 0
Calculate the Sum of Each Row and store it in the sum column
for(i in 1:K) {
df_parties$sum = df_parties$sum + df_parties[, paste0("x", i)]
}
print(df_parties)
## x1 x2 x3 x4 x5 sum
## 1 0 1 0 0 0 1
## 2 1 0 1 1 0 3
## 3 0 1 0 0 0 1
## 4 1 1 0 0 1 3
## 5 1 0 0 0 0 1
## 6 0 0 0 0 1 1
## 7 0 1 1 0 1 3
## 8 0 0 1 1 0 2
## 9 1 0 0 1 1 3
## 10 1 1 1 0 1 4
x1 to
xK, adding each value to the sum column.df_parties$sum contains the total
number of 1s in each row.Calculate the Probability
1 or K-1 (in this case 4) onesmean(df_parties$sum == 1 | df_parties$sum == K - 1)
calculates the fraction of rows where sum is 1
or K-1.p_K = mean(df_parties$sum == 1 | df_parties$sum == K - 1)
p_K
## [1] 0.5
Putting all together for N large,
N=1000000
K=5
df_parties = data.frame(x1 = sample(x=c(0,1),size=N,replace=T));
for(i in 2:K) {
df_parties = cbind(df_parties, data.frame(x = sample(x=c(0,1),size=N,replace=T)));
names(df_parties)[i]=paste0("x",i);
}
df_parties$sum = 0
for(i in 1:K) {
df_parties$sum = df_parties$sum + df_parties[,paste0("x",i)];
}
p_K=mean(df_parties$sum==1 | df_parties$sum==K-1);
p_K
## [1] 0.312375
Notice that the theoretical value is \(\frac{5 }{ 2^{4} } = 0.3125\).
N = 10
N=100000
S=sample(x=1:6,size=N,replace=T)
prob = length( S[S %in% c(1, 2, 3) & S %in% c(1, 2, 6)] ) / length( S[S %in% c(1, 2, 6)] )
print(prob)
## [1] 0.6680951