Mr. and Mrs. Smith

Directly

  • Label the possible configurations as \(1 = FF, 2 = FM, 3 = MF, 4 = MM\)

  • Draw \(N\) random integers between 1 and 4 with uniform probability, allowing repetition (replace = TRUE)

  • every drawn integer is a family configuration (two children, either male or female)

N=10

S=sample(x=1:4,size=N,replace=T)
print(S)
##  [1] 2 4 2 1 2 3 1 1 3 2

We want to se how many times

  • given that one is a girl (that is, eliminating all the 4s from the list)
  • the other is a boy (that is, how many 2s and 3s are left)
print(S[S<4])
## [1] 2 2 1 2 3 1 1 3 2
  • S[S < 4]: This filters the elements in S where the condition S < 4 is TRUE. In other words, it keeps only those elements in S that are less than 4, that in this example are 9 over the initial 10.
  • Once there are only 1s, 2s and 3s in S, we need to calculate the ratios of 2s and 3s.
print(S[S<4]!=1)
## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
print(  mean(S[S<4]!=1)  )
## [1] 0.6666667
  • (S[S < 4] != 1 is a logical vector; TRUE is evaluated as 1 and FALSE as 0
  • mean on this logical vector returns the ratios of 2s and 3s in the list, that is 6 over 9, i.e. 0.6666667.

Putting all together and running the experiment many times,

N=100000
S=sample(x=1:4,size=N,replace=T)
prob=mean(S[S<4]!=1)
print(prob)
## [1] 0.6652858

Alternatively, mirroring more explicitly the formula for the conditional probability \[P(A | B) = \frac{P(A \text{ and } B)}{P(B)}\]

prob = length( S[S<4 & S > 1] ) / length( S[S<4] )
print(prob)
## [1] 0.6652858

Some practice with dataframes

# Number of experiments, that is family instances
N = 10
  • Create a Data Frame where each family (row) is assigned a possible configuration (the column values)
df = data.frame(
    C1 = sample(x = c("F", "G"), size = N, replace = TRUE),
    C2 = sample(x = c("F", "G"), size = N, replace = TRUE)
)
  • This creates a data frame df with two columns, C1 and C2, each containing N random samples of either "F" (female) or "G" (male).
  • sample(x = c("F", "G"), size = N, replace = TRUE) randomly samples "F" or "G" N times, allowing repetitions (replace = TRUE), meaning each child can independently be male or female.
  • Each row in df represents a “family” with two children, where C1 is the gender of the first child and C2 is the gender of the second child.
print(df)
##    C1 C2
## 1   G  F
## 2   G  F
## 3   G  F
## 4   F  F
## 5   F  F
## 6   F  G
## 7   G  F
## 8   F  G
## 9   G  F
## 10  G  F
  • next, filter the families with at Least one female
df = df[df$C1 == "F" | df$C2 == "F", ]
  • This line filters df to include only families where at least one child is female.
  • df$C1 == "F" | df$C2 == "F" creates a logical vector that is TRUE for rows where C1 or C2 is "F".
  • df[...] subsets df to include only the rows that meet this condition.
  • After this line, df contains only families with at least one female child.
print(df)
##    C1 C2
## 1   G  F
## 2   G  F
## 3   G  F
## 4   F  F
## 5   F  F
## 6   F  G
## 7   G  F
## 8   F  G
## 9   G  F
## 10  G  F
  • next calculate the probability of having a boy in these families
prob = nrow(df[df$C1 == "G" | df$C2 == "G", ]) / nrow(df)
  • df[df$C1 == "G" | df$C2 == "G", ] filters df again, but this time it keeps only rows where at least one child is male (C1 == "G" | C2 == "G").
  • nrow(df[df$C1 == "G" | df$C2 == "G", ]) gives the number of families with at least one boy, among those with at least one female
  • nrow(df) gives the total number of families with at least one female
  • The division nrow(...) / nrow(...) calculates the probability that a family with at least one female also has a male, that is
prob
## [1] 0.8

Putting all together and letting N big,

N=1000000
df=data.frame(C1=sample(x=c("F","G"),size=N,replace=T),
              C2=sample(x=c("F","G"),size=N,replace=T));
df=df[df$C1=="F" | df$C2=="F",] # only families with at least a female
prob=nrow(df[df$C1=="G" | df$C2=="G",])/nrow(df) # how many families with a boy (among the ones with at least a female)
prob
## [1] 0.6660227

A winner among K

\(\newcommand{\winner}{i^{\ast}}\) \(\newcommand{\players}{K}\) Along the lines of the solution in https://rpubs.com/janselmi/TD2, the sample space is \(\Omega = \{0, 1\}^k\) endowed with the uniform probability (with \(k \geq 3\)). That is, let \(\omega = (\omega_i)_{i = 1, \dots, k} \in \Omega\) be the profile of coin tosses, where \(\omega_i \in {0, 1}\) is the outcome of player’s \(i\) toss, for all \(i \in \players := \{1, \dots, k\}\). We have \(|{\Omega}| = 2^k\), and \(p(\omega) = \frac{1}{2^k}\). The event “there is a winner” is \[W \subset \Omega = \{\omega \in \Omega: \text{there exists precisely one } \winner \in \players \text{ such that } \omega_\winner \neq \omega_i \forall i \in \players - \{\winner\} \} \] We can enumerate the points in \(W\):

Thus we have \(|W| = 2k\), and \[p(W) = \frac{2k}{2^k} = \frac{k}{2^{k-1}}\]

Let \(T_k\) be the RV denoting the number of games to play for observing a winner. What are the law and expectation of this RV?

\(\newcommand{pber}{p_k}\) Each play is a Bernoulli event with probability \(\pber\), thus by definition \(T_k\) follows a geometric distribution with parameter \(\pber)\), i.e. for all \(t = 1, 2, \dots\) we have

\[ P(T_k = t) = \pber(1-\pber)^{t-1} \]

with expected value \(\frac{1}{\pber}\).

R code

Direct solution

Set the Number of Simulations and Players

N = 10
K = 5
  • for each of the N trials generate a game instance, that is a binary vector of size K
  • represent it as a matrix with N rows and K columns.
S=replicate(K, sample(x=0:1,N,replace=T)) 
print(S)
##       [,1] [,2] [,3] [,4] [,5]
##  [1,]    1    1    1    0    0
##  [2,]    0    0    1    0    0
##  [3,]    0    0    0    1    1
##  [4,]    0    1    0    0    0
##  [5,]    0    1    1    0    1
##  [6,]    1    1    0    0    0
##  [7,]    1    0    1    0    0
##  [8,]    1    0    0    0    0
##  [9,]    1    1    1    1    0
## [10,]    1    0    1    1    1

Next, calculate the probability of having exactly 1 or (K - 1) ones in each row:

p_K = mean(rowSums(S)==1 | rowSums(S)==K-1)
  • rowSums(S) calculates the sum of each row in the matrix S, which gives the count of 1s in each row.
  • rowSums(S) == 1 | rowSums(S) == K - 1 is a logical vector that is TRUE for rows where the sum is exactly 1 or K - 1 (in this case, 5 out of 10).
  • mean(...) calculates the average of this logical vector, effectively giving the probability that a row has exactly 1 or K - 1 ones.
p_K
## [1] 0.5

Putting all together for N large,

N=1000000
K=5
S=replicate(K, sample(x=0:1,N,replace=T)) 
p_K=mean(rowSums(S)==1 | rowSums(S)==K-1)
p_K
## [1] 0.312423

Notice that the theoretical value is \(\frac{5 }{ 2^{4} } = 0.3125\).


Solution with dataframes

Set the Number of Simulations N and Players K

N = 10
K = 5

First, we represent the history of plays of one player as a data frame df_parties with one column, called x1, and N rows, containing random binary values (0 or 1).

# generates a sample of `0`s and `1`s of length `N`.
df_parties = data.frame(x1 = sample(x = c(0,1), size = N, replace = TRUE))
print(df_parties)
##    x1
## 1   0
## 2   1
## 3   0
## 4   1
## 5   1
## 6   0
## 7   0
## 8   0
## 9   1
## 10  1

Then, we add the history of plays of the other players (that is, we add K-1 columns of the dataframe):

for(i in 2:K) { 
    df_parties = cbind(df_parties, data.frame(x = sample(x = c(0,1), size = N, replace = TRUE)))
    names(df_parties)[i] = paste0("x", i)
}
print(df_parties)
##    x1 x2 x3 x4 x5
## 1   0  1  0  0  0
## 2   1  0  1  1  0
## 3   0  1  0  0  0
## 4   1  1  0  0  1
## 5   1  0  0  0  0
## 6   0  0  0  0  1
## 7   0  1  1  0  1
## 8   0  0  1  1  0
## 9   1  0  0  1  1
## 10  1  1  1  0  1
  • This loop adds K-1 additional columns to df_parties, each filled with random binary values (0 or 1).
  • In each iteration, we add a new column to df_parties using cbind() and rename it as "x2", "x3", etc., using paste0("x", i).

Initialize a Sum Column

We add a column called sum to df_parties and initialize it to 0. This column will store the row-wise sum of the values in each x column.

df_parties$sum = 0
print(df_parties)
##    x1 x2 x3 x4 x5 sum
## 1   0  1  0  0  0   0
## 2   1  0  1  1  0   0
## 3   0  1  0  0  0   0
## 4   1  1  0  0  1   0
## 5   1  0  0  0  0   0
## 6   0  0  0  0  1   0
## 7   0  1  1  0  1   0
## 8   0  0  1  1  0   0
## 9   1  0  0  1  1   0
## 10  1  1  1  0  1   0

Calculate the Sum of Each Row and store it in the sum column

for(i in 1:K) { 
    df_parties$sum = df_parties$sum + df_parties[, paste0("x", i)]
}
print(df_parties)
##    x1 x2 x3 x4 x5 sum
## 1   0  1  0  0  0   1
## 2   1  0  1  1  0   3
## 3   0  1  0  0  0   1
## 4   1  1  0  0  1   3
## 5   1  0  0  0  0   1
## 6   0  0  0  0  1   1
## 7   0  1  1  0  1   3
## 8   0  0  1  1  0   2
## 9   1  0  0  1  1   3
## 10  1  1  1  0  1   4
  • This loop iterates over each column from x1 to xK, adding each value to the sum column.
  • The result is that df_parties$sum contains the total number of 1s in each row.

Calculate the Probability

  • Finally, calculate the probability that each row has exactly 1 or K-1 (in this case 4) ones
  • mean(df_parties$sum == 1 | df_parties$sum == K - 1) calculates the fraction of rows where sum is 1 or K-1.
p_K = mean(df_parties$sum == 1 | df_parties$sum == K - 1)
p_K
## [1] 0.5

Putting all together for N large,

N=1000000
K=5
df_parties = data.frame(x1 = sample(x=c(0,1),size=N,replace=T));
for(i in 2:K) { 
    df_parties = cbind(df_parties, data.frame(x = sample(x=c(0,1),size=N,replace=T)));
    names(df_parties)[i]=paste0("x",i);
}
df_parties$sum = 0
for(i in 1:K) { 
    df_parties$sum = df_parties$sum + df_parties[,paste0("x",i)];
}
p_K=mean(df_parties$sum==1 | df_parties$sum==K-1);
p_K
## [1] 0.312375

Notice that the theoretical value is \(\frac{5 }{ 2^{4} } = 0.3125\).

Betting on cards (c.f. ex. 2.3)

N = 10

N=100000

S=sample(x=1:6,size=N,replace=T)

prob = length( S[S %in% c(1, 2, 3) & S %in% c(1, 2, 6)] ) / length( S[S %in% c(1, 2, 6)] )

print(prob)
## [1] 0.6680951