TD2: Probability and Simulation

Mr. and Mrs. Smith

Mr. and Mrs. Smith have two children, one of whom is a daughter. What is the probability that the other one is a boy?

Solution

We can model the set of outcomes by \(\Omega=\{FF,FM,MF,MM\}\), which we endow with the uniform probability measure. The interpretation of, e.g., \(FM\) is that the first (second) observed child is a female (male). Then, we want to calculate \(P(\{FM,MF,MM\}|\{FF,FM,MF\})\). Using the definition of conditional probability, we get \[(\{FM,MF,MM\}|\{FF,FM,MF\})=\frac{P(\{FM,MF,MM\}\cap\{FF,FM,MF\})}{P(\{FF,FM,MF\})} = \frac{P(\{FM,MF\})}{P(\{FF,FM,MF\})}=\frac{2}{3}.\]

In R

We want to use R to simulate the experiment several times and infer the desired probability.

Solution 1. For the set of outcomes, we assume the following correspondence: \(FF\leftrightarrow 1\), \(FM\leftrightarrow 2\), \(MF\leftrightarrow 3\) and \(MM\leftrightarrow 4\).

N=1000000
S=sample(x=1:4,size=N,replace=T)
prob=mean(S[S<4]!=1)
prob

## [1] 0.6668444

Solution 2. The goal is to use data frames.

N=1000000
df=data.frame(C1=sample(x=c("F","G"),size=N,replace=T),
              C2=sample(x=c("F","G"),size=N,replace=T));
df=df[df$C1=="F" | df$C2=="F",] # only families with at least a female
prob=nrow(df[df$C1=="G" | df$C2=="G",])/nrow(df) # how many families with a boy (among the ones with at least a female)
prob

## [1] 0.6666653

A winner among K

With this game, each of \(K>3\) players tosses a fair coin. A player wins a game if his/her outcome is different from the ones of all the other players.

Identify a sample space \(\Omega\) and endow it with a suitable probability measure.
Compute the probability of having a winner.
Write a code in R that simulate the game.
Let \(T_K\) be the random variable denoting the number of games to play for observing a winner. Find the law and the expected value of \(T_k\).

Solution

A natural choice is \(\Omega=\{0,1\}^K\) where \(0\) and \(1\) represent the outcomes “Head” and “Tail”, respectively. The set \(\Omega\) is endowed with the uniform probability.
Using \(\sigma\)-additivity, the desired probability is \(p_K=\sum_{i=1}^K P(\{a_i\})+P(\{b_i\})\) where \(a_i\) (respectively, \(b_i\)) is a vector of size \(K\) containing all zeros (ones) except on coordinate \(i\), which contains a one (zero). Since \(p(a_i)=p(b_i)=\frac{1}{2^K}\), we obtain \(p_K = \frac{K}{2^{K-1}}\)
Solution 1

N=1000000
K=5
S=replicate(K, sample(x=0:1,N,replace=T)) 
p_K=mean(rowSums(S)==1 | rowSums(S)==K-1)
p_K

## [1] 0.312763

Solution 2 (with data frames)

N=1000000
K=5
df_parties = data.frame(x1 = sample(x=c(0,1),size=N,replace=T));
for(i in 2:K) { 
    df_parties = cbind(df_parties, data.frame(x = sample(x=c(0,1),size=N,replace=T)));
    names(df_parties)[i]=paste0("x",i);
}
df_parties$sum = 0
for(i in 1:K) { 
    df_parties$sum = df_parties$sum + df_parties[,paste0("x",i)];
}
p_K=mean(df_parties$sum==1 | df_parties$sum==K-1);
p_K

## [1] 0.313461

First we notice that the sample space \(\Omega\) defined above does not contain enough information to answer the question because it does not contain the event \(\{T_K=n\}\). In fact, a natural choice here would be to choose \((\{0,1\}^K)^\mathbb{N}\). Let \((X_i)_{i\ge 1}\) be a sequence of independent and Bernoulli(\(p_K\)) distributed random variables. The interpretation is that \(X_i=1\) if and only if there is a winner on game \(i\). Then, we first notice that \(T_K=\min\{n>0:X_n=1\}\) and that \(\{T_K=n\}=\{X_1=0\cap \cdots \cap X_{n-1}=0 \cap X_n=1\}\). Therefore \[\begin{align} P(T_K=n) & = P(X_1=0\cap \cdots \cap X_{n-1}=0 \cap X_n=1)\\ & = P(X_1=0)P(X_2=0) \cdots P(X_{n-1}=0) P(X_n=1)\\ & = (1-p_K)^{n-1} p_K \end{align}\] whre the second equality follows by the independence of the \(X_i\)’s. We recognize the structure of the geometric distribution with success parameter \(p_K\). Thus, \(\mathbb{E}[T_K]=1/p_K\).

TD2: Probability and Simulation

Jonatha Anselmi

9/27/2019

Mr. and Mrs. Smith

Solution

In R

A winner among K

Solution