Design a simulation testing your ability to distinguish two data sets which are binomially distributed with parameters \(\pi_1 = .4\), \(\pi_2 = .6\). For each \(n \in \{5, 10, 15, ..., 95, 100\}\), generate \(n\) data points from these distributions. Approximate the probability that you will succeed in distinguishing these two distributions with \(n\) observations as the proportion of simulations in which the symmetric 95% Bayesian confidence intervals that you estimate from the simulated data do not overlap. For each \(n\), go through this procedure using a grid approximation, calculating the posterior probability of each parameter in \(\{0, .01, .02, ..., .99, 1\}\) using a uniform prior. (You have to do this 20 times, so you’ll also want to work out how to use a loop or sapply(), rather than typing similar code 20 times.)
posterior.grid <- function(n.pos, n){
likelihoods <- dbinom(n.pos, n, prob=seq(0, 1, by=0.01))
# since the prior is uniform, the posterior is just normalized likelihood
posteriors <- likelihoods / sum(likelihoods)
return(posteriors)
}
CI.grid <- function(posteriors){
low <- (min(which(cumsum(posteriors) >= 0.025)) - 1) / 100
up <- (min(which(cumsum(posteriors) >= 0.975)) - 1) / 100
return(c(low, up))
}
overlap <- function(interval1, interval2){
if (interval1[2] < interval2[1] || interval2[2] < interval1[1])
return(FALSE)
else
return(TRUE)
}
sim.grid <- function(n){
pi1 <- 0.4
pi2 <- 0.6
n.pos1 <- rbinom(1, size=n, prob=pi1)
n.pos2 <- rbinom(1, size=n, prob=pi2)
ci1 <- CI.grid(posterior.grid(n.pos1, n))
ci2 <- CI.grid(posterior.grid(n.pos2, n))
ovl <- overlap(ci1, ci2)
bf <- dbinom(n.pos1, n, prob=0.5) * dbinom(n.pos2, n, prob=0.5) /
(dbinom(n.pos1, n, prob=0.4) * dbinom(n.pos2, n, prob=0.6))
return(data.frame(n=n,
low1=ci1[1], up1=ci1[2],
low2=ci2[1], up2=ci2[2],
overlap=ovl,
BayesFactor=bf))
}
Present your answer in the form of a data frame with 7 columns: \(n\) in the first, 2-3 giving upper and lower CI bounds for the data sampled from \(\pi_1\), 4-5 giving upper and lower CI bounds for the data sampled from \(\pi_2\), and column 6 giving a Boolean stating whether or not the CIs overlap. In the 7th column show the Bayes factor (the ratio of the likelihoods of the data - \(P(\mathcal{D}|\pi_1, \pi_2)\)) under the following two hypotheses: \(H_0\) = “The parameters are both .5”, and \(H_1\) = “The parameters are \(.4\) and \(.6\) respectively”.
result.grid <- sim.grid(1) # get the result template, will be overwritten
for (i in 1:20){
result.grid[i, ] <- sim.grid(5*i)
}
result.grid
## n low1 up1 low2 up2 overlap BayesFactor
## 1 5 0.54 1.00 0.22 0.88 TRUE 2.759474295
## 2 10 0.31 0.83 0.39 0.89 TRUE 1.002758635
## 3 15 0.07 0.46 0.35 0.80 TRUE 0.161951118
## 4 20 0.05 0.36 0.53 0.89 FALSE 0.017437340
## 5 25 0.30 0.67 0.48 0.83 TRUE 0.365395234
## 6 30 0.14 0.45 0.30 0.64 TRUE 0.298755187
## 7 35 0.23 0.54 0.49 0.79 TRUE 0.072375946
## 8 40 0.45 0.74 0.49 0.78 TRUE 2.274930641
## 9 45 0.33 0.61 0.50 0.77 TRUE 0.244942666
## 10 50 0.22 0.48 0.50 0.76 FALSE 0.017582048
## 11 55 0.27 0.51 0.50 0.75 TRUE 0.032344806
## 12 60 0.32 0.56 0.49 0.73 TRUE 0.133881993
## 13 65 0.28 0.51 0.45 0.68 TRUE 0.109464864
## 14 70 0.37 0.60 0.45 0.68 TRUE 1.529206491
## 15 75 0.40 0.62 0.55 0.76 TRUE 0.164650257
## 16 80 0.27 0.47 0.52 0.72 FALSE 0.005252728
## 17 85 0.28 0.48 0.51 0.71 FALSE 0.009663179
## 18 90 0.35 0.55 0.54 0.74 TRUE 0.026665293
## 19 95 0.32 0.51 0.49 0.68 TRUE 0.049054794
## 20 100 0.29 0.48 0.40 0.60 TRUE 0.456858372
Next, plot the Bayes factors as a function of \(n\), and plot the widths of the confidence intervals as a function of \(n\). (You can calculate the latter by subtracting column 2 from 3, and subtracting 4 from 5.)
# The smaller the BayesFactor, the more support for rejecting the null hypothesis
plot(result.grid$n, result.grid$BayesFactor, type="l", xlab="n", ylab="Bayes Factor: H0/H1")
plot(result.grid$n, result.grid$up1-result.grid$low1, type="l", xlab="n", ylab="Length of CI1", ylim=c(0,1))
plot(result.grid$n, result.grid$up2-result.grid$low2, type="l", xlab="n", ylab="Length of CI2", ylim=c(0,1))
We can see that as \(n\) increases, the Bayes Factor decreases, meaning we have more evidence from the data to reject the null hypothesis. Also the lengths of the CIs decrease as \(n\) increase, meaning we have more evidence from the data to narrow down the possible ranges of \(\pi_1\) and \(\pi_2\).
Notes:
1. Since the two data sets were gathered independently, the probability of the following conjunction - getting data set 1 under proportion \(\pi_i\), and getting data set 2 under proportion \(\pi_j\) - is just the product of the individual probabilities of these events.
2. As in class notes, you’ll want to use dbinom() to find the probability of getting the relevant number of heads given some paramter.
3. You can find the symmetric 95% confidence interval using quantile(..., probs=c(.025, .975)). The first number will give you the lower bound, and the second the upper bound.
I am not sure how this works for the grid approximation, since what we get here is the grid approximation of the posterior density function rather than samples from the posterior distribution.
4. Make sure the number of simulations you use for each \(n\) is at least 100.
**Re-do question 1 using rejection sampling instead of a grid approximation. Assume a uniform distribution over \([0,1]\) - i.e., draw proportions from runif(1,0,1). As in the class notes linked here, use a while-loop to ensure that you are getting enough samples from the conditional distribution - at least 500 for each inference that you want to make. Make sure that you answer all the same questions as in Q1.
Note: on this problem it makes a really big difference to computational efficiency whether your condition is stated in terms of the precise sequence of heads and tails generated, or in terms of the number of heads in a sequence of length \(n\). The latter with be much much faster.**
query.rej <- function(n, n.pos, gen.par){
n.samples <- 500
par.samples <- c()
while (length(par.samples) < n.samples){
par.binom <- gen.par()
if (n.pos == rbinom(1, size=n, prob=par.binom)){
par.samples <- c(par.samples, par.binom)
}
}
return(par.samples)
}
sim.rej <- function(n, gen.hyperprior){
pi1 <- 0.4
pi2 <- 0.6
n.pos1 <- rbinom(1, size=n, prob=pi1)
n.pos2 <- rbinom(1, size=n, prob=pi2)
ci1 <- quantile(query.rej(n, n.pos1, gen.hyperprior), c(0.025, 0.975))
ci2 <- quantile(query.rej(n, n.pos2, gen.hyperprior), c(0.025, 0.975))
ovl <- overlap(ci1, ci2)
bf <- dbinom(n.pos1, n, prob=0.5) * dbinom(n.pos2, n, prob=0.5) /
(dbinom(n.pos1, n, prob=0.4) * dbinom(n.pos2, n, prob=0.6))
return(data.frame(n=n,
low1=ci1[1], up1=ci1[2],
low2=ci2[1], up2=ci2[2],
overlap=ovl,
BayesFactor=bf))
}
gen.par.unif <- function() {
return(runif(1))
}
result.rej <- sim.rej(5, gen.par.unif) # get the result template, will be overwritten
for (i in 1:20){
result.rej[i, ] <- sim.rej(5*i, gen.par.unif)
}
result.rej
## n low1 up1 low2 up2 overlap BayesFactor
## 2.5% 5 0.05543893 0.6254660 0.2077399 0.8771301 TRUE 0.5450813423
## 2 10 0.07279167 0.5048970 0.5032550 0.9359919 TRUE 0.1320505199
## 3 15 0.29773833 0.7831160 0.1652207 0.6055728 TRUE 6.2259450253
## 4 20 0.15133238 0.5257070 0.4787375 0.8638109 TRUE 0.0882765327
## 5 25 0.06247927 0.3528301 0.6110599 0.9080141 FALSE 0.0042243416
## 6 30 0.36270275 0.6818525 0.5531469 0.8619213 TRUE 0.2987551872
## 7 35 0.26866157 0.5577425 0.4107305 0.7305155 TRUE 0.3664032264
## 8 40 0.20107122 0.4714133 0.5737135 0.8414898 FALSE 0.0077927462
## 9 45 0.25938603 0.5388862 0.4485615 0.7187668 TRUE 0.2449426662
## 10 50 0.22041394 0.4799850 0.5745674 0.8322249 FALSE 0.0034729971
## 11 55 0.33285666 0.5885034 0.4159470 0.6628402 TRUE 1.2434430164
## 12 60 0.31922467 0.5705723 0.4135237 0.6555569 TRUE 1.5249995739
## 13 65 0.32636217 0.5569551 0.4234540 0.6696188 TRUE 0.8312488147
## 14 70 0.22903804 0.4414587 0.5062267 0.7235465 FALSE 0.0052382776
## 15 75 0.24294719 0.4663089 0.3642569 0.5902760 TRUE 0.3704630786
## 16 80 0.31815412 0.5374746 0.4618471 0.6707181 TRUE 0.1346216748
## 17 85 0.29999563 0.4995888 0.5626787 0.7405931 FALSE 0.0042947461
## 18 90 0.26401101 0.4658493 0.4759192 0.6852409 FALSE 0.0118512414
## 19 95 0.33152133 0.5166615 0.6197303 0.7977208 FALSE 0.0008506852
## 20 100 0.26785565 0.4568618 0.4420148 0.6199190 TRUE 0.0401082795
# The smaller the BayesFactor, the more support for rejecting the null hypothesis
plot(result.rej$n, result.rej$BayesFactor, type="l", xlab="n", ylab="Bayes Factor: H0/H1")
plot(result.rej$n, result.rej$up1-result.rej$low1, type="l", xlab="n", ylab="Length of CI1", ylim=c(0,1))
plot(result.rej$n, result.rej$up2-result.rej$low2, type="l", xlab="n", ylab="Length of CI2", ylim=c(0,1))
We can see that the qualitative patterns are the same as those in question 1. This is expected since the priors are the same.
Re-do question 2 using a prior of the following form: for both coins, \(P(\pi_i) = .95\) with probability \(.7\), \(P(\pi_i) = 1\) with probability \(.02\), \(P(\pi_i) = 0\) with probability \(.02\), and \(P(\pi_i) \sim \mathcal{U}(0,1)\) otherwise. (This kind of prior might be reasonable, for example, when making inferences about the weights of real coins, which are usually fair, and usually double-headed or -tailed when not fair.) Once you re-write the code for generating hypotheses from the prior, you should be able to recycle your other code from Q2. How do the results differ from the uniform prior, in terms of the number of data points needed in order to distinguish the hypotheses?
I think there are some typos here, I understand it as \(\pi_i=0.5\) with probability \(0.7\) (which means “usually fair”), \(\pi_i=1\) with probability \(0.02\), and \(\pi_i=0\) with probability \(0.02\).
gen.par.v2 <- function() {
coin.type <- runif(1)
if (coin.type <= 0.7){
return(0.5)
}
else if (coin.type <= 0.72){
return(1)
}
else if (coin.type <= 0.74){
return(0)
}
return(runif(1))
}
result.rej2 <- sim.rej(5, gen.par.v2) # get the result template, will be overwritten
for (i in 1:20){
result.rej2[i, ] <- sim.rej(5*i, gen.par.v2)
}
result.rej2
## n low1 up1 low2 up2 overlap BayesFactor
## 2.5% 5 0.3022671 0.6047363 0.09496689 0.5602458 TRUE 1.839649530
## 2 10 0.4204786 0.6817997 0.42715910 0.6037536 TRUE 2.256206929
## 3 15 0.1403516 0.5000000 0.50000000 0.9232674 TRUE 0.071978275
## 4 20 0.4219010 0.5502505 0.50000000 0.7618627 TRUE 0.670349920
## 5 25 0.3437033 0.5048763 0.50000000 0.7910495 TRUE 0.162397882
## 6 30 0.2944230 0.5000000 0.50000000 0.6801603 TRUE 0.298755187
## 7 35 0.3367786 0.5000000 0.32694887 0.5000000 TRUE 4.173561751
## 8 40 0.4396800 0.5157069 0.47989338 0.5086021 TRUE 2.274930641
## 9 45 0.3149958 0.5000000 0.50000000 0.6727499 TRUE 0.108863407
## 10 50 0.2471492 0.5000000 0.49737379 0.6419491 TRUE 0.059339412
## 11 55 0.4608321 0.5000000 0.50000000 0.6752812 TRUE 0.245618374
## 12 60 0.3287796 0.5000000 0.40255705 0.5000000 TRUE 5.146873562
## 13 65 0.3989490 0.5000000 0.50000000 0.5378659 TRUE 0.831248815
## 14 70 0.1814014 0.3959935 0.50000000 0.5863851 FALSE 0.003492185
## 15 75 0.3847666 0.5000000 0.50000000 0.7711092 TRUE 0.009636595
## 16 80 0.2493704 0.5000000 0.50000000 0.6902796 TRUE 0.003501819
## 17 85 0.3527865 0.5000000 0.50000000 0.7419555 TRUE 0.006442119
## 18 90 0.2588748 0.5000000 0.50000000 0.5891497 TRUE 0.017776862
## 19 95 0.2383197 0.5000000 0.50000000 0.6614064 TRUE 0.001276028
## 20 100 0.3079627 0.5000000 0.50000000 0.6463688 TRUE 0.011883935
# The smaller the BayesFactor, the more support for rejecting the null hypothesis
plot(result.rej2$n, result.rej2$BayesFactor, type="l", xlab="n", ylab="Bayes Factor: H0/H1")
plot(result.rej2$n, result.rej2$up1-result.rej2$low1, type="l", xlab="n", ylab="Length of CI1", ylim=c(0,1))
plot(result.rej2$n, result.rej2$up2-result.rej2$low2, type="l", xlab="n", ylab="Length of CI2", ylim=c(0,1))
We can see that since we have a rather strong prior that the coins are fair, we need more evidence in order to reject the null hypothesis. What usually happens is that both CIs overlap at exactly one point: \(0.5\). When n is small, the lengths of CIs are smaller than those with a uniform prior, because the initial belief is concentrated on a reasonable estimate 0.5, which reduces the variation. But this also means that the lengths of CIs are slower to decrease as n increases. We will need more data to override the prior belief.