The coin minting machine is on the fritz, causing the coins to be weighted/unfair (i.e. there is not a 50% chance of either heads or tails outcomes) randomly.
Let the probability of observing a head be given by a continuous uniform distribution, \(Y\sim Unif(0,1)\).
Let \(X\) be the random outcome of a coin flip, where heads (1) and tails (0).
In class we showed the theorem for the Law of Total Variability which states that:
For random variables \(X\) and \(Y\), \[Var[X]=E[Var[X|Y]]+Var[E[X|Y]]\]
set.seed(266)
## STEP 1: MAKE A COIN
## Y = PROBABILITY OF HEADS
Y<-runif(n=1, min=0, max=1)
Y
## [1] 0.7896671
## STEP 2: TOSS THE COIN
## X|Y = SUCCESS OR FAILURES
# Let 1: Heads, 0: Tails
X_y<-sample(size=1,
x=c(1, 0),
prob=c(Y, 1-Y),
replace=TRUE)
## STEP 3: TOSS THE COIN 100 TIMES
X_y<-sample(size=1000,
x=c(1, 0),
prob=c(Y, 1-Y),
replace=TRUE)
library(tidyverse)
ggplot(as.data.frame(X_y), aes(x=X_y, y=(..count..)/sum(..count..)))+
geom_histogram()+
ylim(0, 1)+
geom_hline(yintercept =Y, color="red", lty=2)+ # TRUE PARAMETER VALUE
geom_hline(yintercept =mean(X_y), color="blue", lty=2) # SAMPLE PROPORTION
Is the sample probability close to the value of y?
# SAMPLE PROBABILITY
mean(X_y)
## [1] 0.767
## STEP 4: MAKE RANDOM COINS AND TOSS THEM .. many times
nsim=1000
Ys<-c()
props<-c()
vars<-c()
for(i in 1:nsim){
Y<-runif(n=1, min=0, max=1)
X_y<-sample(size=nsim,
x=c(1, 0),
prob=c(Y, 1-Y),
replace=TRUE)
Ys<-c(Ys,Y)
props<-c(props, mean(X_y))
vars<-c(vars, mean(X_y)*(1-mean(X_y)))
}
datXY<-data.frame(Ys, props)
ggplot(datXY, aes(Ys, props))+
geom_line()+
geom_abline(slope = 1, intercept = 0, color="red", lty=2)
var(props)
## [1] 0.07804333
hist(props)
mean(vars)
## [1] 0.1720028
hist(vars)
var(props)+mean(vars)
## [1] 0.2500461
# Check this against
mean(props)*(1-mean(props))
## [1] 0.2499681
# its very close!!!
varExp<-c()
expVar<-c()
for(j in 1:100){
props<-c()
vars<-c()
for(i in 1:nsim){
Y<-runif(n=1, min=0, max=1)
X_y<-sample(size=nsim,
x=c(1, 0),
prob=c(Y, 1-Y),
replace=TRUE)
props<-c(props, mean(X_y))
vars<-c(vars, mean(X_y)*(1-mean(X_y)))
}
varExp<-c(varExp, var(props))
expVar<-c(expVar, mean(vars))
}
dat_varExp<-data.frame(run=1:100,
varExp, expVar)%>%
gather(source, value, -run)
ggplot(dat_varExp, aes(x=run, y=value, fill=source))+
geom_col()
It looks like the solution is \(V[X]=0.25\).
The Bernoulli distribution describes binary responses; success (1) and failure (0). This distribution is parameterized by \(p\) the probability of observing a success.
Let \(X \sim Bern(p)\) then \[E[X]=p\] \[Var[X]=p(1-p)\]
In class we showed for every value in the range from \(a\) to \(b\), where \(a<b\), the continuous uniform distribution has the same height of \(\frac{1}{b-a}\).
Let \(Y \sim Uniform(a,b)\) then \[E[Y]=\frac{b+a}{2}\] \[Var[Y]=\frac{(b-a)^2}{12}\]
We will approach this problem from the inside out! Since \(X|Y=y \sim Bernoulli(y)\) then \[Var[X|Y]=y(1-y)\] \[E[X|Y]=y\]
Recall that \(Y\sim Unif(0,1)\):
First, \[E[Y(1-Y)]=E[Y-Y^2]=E[Y]-E[Y^2]\]
We know that \[E[Y]=\frac{0+1}{2}=\frac{1}{2}\]
Observe that \[E[Y^2]=\int_0^1y^2dy=\frac{y^3}{3}\rvert_0^1=\frac{1}{3}\] But \(Var[Y]=E[Y^2]-E[Y]^2\) so \(E[Y^2]=Var[Y]+E[Y]^2\), thus \[E[Y^2]=\frac{1}{12}+\frac{1}{2}^2=\frac{1}{12}+\frac{3}{12}=\frac{1}{3}\] Then, \[Var[Y]=\frac{1}{12}\]
Bringing everying together yields
\[Var[X]=E[Var[X|Y]]+Var[E[X|Y]]\]
\[=E[Y(1-Y)]+Var[Y]\]
\[=E[Y]-E[Y^2]+Var[Y]\]
\[=\frac{1}{2}-\frac{1}{3}+\frac{1}{12}=\frac{1}{4}\]