1. Influence of sample size on inferential strength

Design a simulation testing your ability to distinguish two data sets which are binomially distributed with parameters \(\pi_1 = .4\), \(\pi_2 = .6\). For each \(n \in \{5, 10, 15, ..., 95, 100\}\), generate \(n\) data points from these distributions. Approximate the probability that you will succeed in distinguishing these two distributions with \(n\) observations as the proportion of simulations in which the symmetric 95% Bayesian confidence intervals that you estimate from the simulated data do not overlap. For each \(n\), go through this procedure using a grid approximation, calculating the posterior probability of each parameter in \(\{0, .01, .02, ..., .99, 1\}\) using a uniform prior. (You have to do this 20 times, so you’ll also want to work out how to use a loop or sapply(), rather than typing similar code 20 times.)

posterior.grid <- function(n.pos, n){
  likelihoods <- dbinom(n.pos, n, prob=seq(0, 1, by=0.01))
  
  # since the prior is uniform, the posterior is just normalized likelihood
  posteriors <- likelihoods / sum(likelihoods)
  
  return(posteriors)
}

CI.grid <- function(posteriors){
  
  low <- (min(which(cumsum(posteriors) >= 0.025)) - 1) / 100
  up <- (min(which(cumsum(posteriors) >= 0.975)) - 1) / 100
  
  return(c(low, up))
}

overlap <- function(interval1, interval2){
  if (interval1[2] < interval2[1] || interval2[2] < interval1[1])
    return(FALSE)
  else
    return(TRUE)
}

sim.grid <- function(n){  
  pi1 <- 0.4
  pi2 <- 0.6
  
  n.pos1 <- rbinom(1, size=n, prob=pi1)
  n.pos2 <- rbinom(1, size=n, prob=pi2)
  
  ci1 <- CI.grid(posterior.grid(n.pos1, n))
  ci2 <- CI.grid(posterior.grid(n.pos2, n))
  
  ovl <- overlap(ci1, ci2)
    
  bf <- dbinom(n.pos1, n, prob=0.5) * dbinom(n.pos2, n, prob=0.5) /
        (dbinom(n.pos1, n, prob=0.4) * dbinom(n.pos2, n, prob=0.6))
  
  return(data.frame(n=n, 
                    low1=ci1[1], up1=ci1[2],
                    low2=ci2[1], up2=ci2[2],
                    overlap=ovl,
                    BayesFactor=bf))
}

Present your answer in the form of a data frame with 7 columns: \(n\) in the first, 2-3 giving upper and lower CI bounds for the data sampled from \(\pi_1\), 4-5 giving upper and lower CI bounds for the data sampled from \(\pi_2\), and column 6 giving a Boolean stating whether or not the CIs overlap. In the 7th column show the Bayes factor (the ratio of the likelihoods of the data - \(P(\mathcal{D}|\pi_1, \pi_2)\)) under the following two hypotheses: \(H_0\) = “The parameters are both .5”, and \(H_1\) = “The parameters are \(.4\) and \(.6\) respectively”.

result.grid <- sim.grid(1) # get the result template, will be overwritten

for (i in 1:20){
  result.grid[i, ] <- sim.grid(5*i) 
}

result.grid
##      n low1  up1 low2  up2 overlap BayesFactor
## 1    5 0.54 1.00 0.22 0.88    TRUE 2.759474295
## 2   10 0.31 0.83 0.39 0.89    TRUE 1.002758635
## 3   15 0.07 0.46 0.35 0.80    TRUE 0.161951118
## 4   20 0.05 0.36 0.53 0.89   FALSE 0.017437340
## 5   25 0.30 0.67 0.48 0.83    TRUE 0.365395234
## 6   30 0.14 0.45 0.30 0.64    TRUE 0.298755187
## 7   35 0.23 0.54 0.49 0.79    TRUE 0.072375946
## 8   40 0.45 0.74 0.49 0.78    TRUE 2.274930641
## 9   45 0.33 0.61 0.50 0.77    TRUE 0.244942666
## 10  50 0.22 0.48 0.50 0.76   FALSE 0.017582048
## 11  55 0.27 0.51 0.50 0.75    TRUE 0.032344806
## 12  60 0.32 0.56 0.49 0.73    TRUE 0.133881993
## 13  65 0.28 0.51 0.45 0.68    TRUE 0.109464864
## 14  70 0.37 0.60 0.45 0.68    TRUE 1.529206491
## 15  75 0.40 0.62 0.55 0.76    TRUE 0.164650257
## 16  80 0.27 0.47 0.52 0.72   FALSE 0.005252728
## 17  85 0.28 0.48 0.51 0.71   FALSE 0.009663179
## 18  90 0.35 0.55 0.54 0.74    TRUE 0.026665293
## 19  95 0.32 0.51 0.49 0.68    TRUE 0.049054794
## 20 100 0.29 0.48 0.40 0.60    TRUE 0.456858372

Next, plot the Bayes factors as a function of \(n\), and plot the widths of the confidence intervals as a function of \(n\). (You can calculate the latter by subtracting column 2 from 3, and subtracting 4 from 5.)

# The smaller the BayesFactor, the more support for rejecting the null hypothesis
plot(result.grid$n, result.grid$BayesFactor, type="l", xlab="n", ylab="Bayes Factor: H0/H1")

plot(result.grid$n, result.grid$up1-result.grid$low1, type="l", xlab="n", ylab="Length of CI1", ylim=c(0,1))

plot(result.grid$n, result.grid$up2-result.grid$low2, type="l", xlab="n", ylab="Length of CI2", ylim=c(0,1))

We can see that as \(n\) increases, the Bayes Factor decreases, meaning we have more evidence from the data to reject the null hypothesis. Also the lengths of the CIs decrease as \(n\) increase, meaning we have more evidence from the data to narrow down the possible ranges of \(\pi_1\) and \(\pi_2\).

Notes:

1. Since the two data sets were gathered independently, the probability of the following conjunction - getting data set 1 under proportion \(\pi_i\), and getting data set 2 under proportion \(\pi_j\) - is just the product of the individual probabilities of these events.

2. As in class notes, you’ll want to use dbinom() to find the probability of getting the relevant number of heads given some paramter.

3. You can find the symmetric 95% confidence interval using quantile(..., probs=c(.025, .975)). The first number will give you the lower bound, and the second the upper bound.

I am not sure how this works for the grid approximation, since what we get here is the grid approximation of the posterior density function rather than samples from the posterior distribution.

4. Make sure the number of simulations you use for each \(n\) is at least 100.

2. Bayesian inference with rejection sampling

**Re-do question 1 using rejection sampling instead of a grid approximation. Assume a uniform distribution over \([0,1]\) - i.e., draw proportions from runif(1,0,1). As in the class notes linked here, use a while-loop to ensure that you are getting enough samples from the conditional distribution - at least 500 for each inference that you want to make. Make sure that you answer all the same questions as in Q1.

Note: on this problem it makes a really big difference to computational efficiency whether your condition is stated in terms of the precise sequence of heads and tails generated, or in terms of the number of heads in a sequence of length \(n\). The latter with be much much faster.**

query.rej <- function(n, n.pos, gen.par){
  n.samples <- 500
  par.samples <- c()
  while (length(par.samples) < n.samples){
    par.binom <- gen.par()
    if (n.pos == rbinom(1, size=n, prob=par.binom)){
      par.samples <- c(par.samples, par.binom)
    }
  }
  return(par.samples)  
}

sim.rej <- function(n, gen.hyperprior){
  pi1 <- 0.4
  pi2 <- 0.6
  
  n.pos1 <- rbinom(1, size=n, prob=pi1)
  n.pos2 <- rbinom(1, size=n, prob=pi2)
  
  ci1 <- quantile(query.rej(n, n.pos1, gen.hyperprior), c(0.025, 0.975))
  ci2 <- quantile(query.rej(n, n.pos2, gen.hyperprior), c(0.025, 0.975))
  
  ovl <- overlap(ci1, ci2)
    
  bf <- dbinom(n.pos1, n, prob=0.5) * dbinom(n.pos2, n, prob=0.5) /
        (dbinom(n.pos1, n, prob=0.4) * dbinom(n.pos2, n, prob=0.6))
  
  return(data.frame(n=n, 
                    low1=ci1[1], up1=ci1[2],
                    low2=ci2[1], up2=ci2[2],
                    overlap=ovl,
                    BayesFactor=bf))
}
gen.par.unif <- function() {
  return(runif(1))
} 

result.rej <- sim.rej(5, gen.par.unif) # get the result template, will be overwritten

for (i in 1:20){
  result.rej[i, ] <- sim.rej(5*i, gen.par.unif) 
}

result.rej
##        n       low1       up1      low2       up2 overlap  BayesFactor
## 2.5%   5 0.05543893 0.6254660 0.2077399 0.8771301    TRUE 0.5450813423
## 2     10 0.07279167 0.5048970 0.5032550 0.9359919    TRUE 0.1320505199
## 3     15 0.29773833 0.7831160 0.1652207 0.6055728    TRUE 6.2259450253
## 4     20 0.15133238 0.5257070 0.4787375 0.8638109    TRUE 0.0882765327
## 5     25 0.06247927 0.3528301 0.6110599 0.9080141   FALSE 0.0042243416
## 6     30 0.36270275 0.6818525 0.5531469 0.8619213    TRUE 0.2987551872
## 7     35 0.26866157 0.5577425 0.4107305 0.7305155    TRUE 0.3664032264
## 8     40 0.20107122 0.4714133 0.5737135 0.8414898   FALSE 0.0077927462
## 9     45 0.25938603 0.5388862 0.4485615 0.7187668    TRUE 0.2449426662
## 10    50 0.22041394 0.4799850 0.5745674 0.8322249   FALSE 0.0034729971
## 11    55 0.33285666 0.5885034 0.4159470 0.6628402    TRUE 1.2434430164
## 12    60 0.31922467 0.5705723 0.4135237 0.6555569    TRUE 1.5249995739
## 13    65 0.32636217 0.5569551 0.4234540 0.6696188    TRUE 0.8312488147
## 14    70 0.22903804 0.4414587 0.5062267 0.7235465   FALSE 0.0052382776
## 15    75 0.24294719 0.4663089 0.3642569 0.5902760    TRUE 0.3704630786
## 16    80 0.31815412 0.5374746 0.4618471 0.6707181    TRUE 0.1346216748
## 17    85 0.29999563 0.4995888 0.5626787 0.7405931   FALSE 0.0042947461
## 18    90 0.26401101 0.4658493 0.4759192 0.6852409   FALSE 0.0118512414
## 19    95 0.33152133 0.5166615 0.6197303 0.7977208   FALSE 0.0008506852
## 20   100 0.26785565 0.4568618 0.4420148 0.6199190    TRUE 0.0401082795
# The smaller the BayesFactor, the more support for rejecting the null hypothesis
plot(result.rej$n, result.rej$BayesFactor, type="l", xlab="n", ylab="Bayes Factor: H0/H1")

plot(result.rej$n, result.rej$up1-result.rej$low1, type="l", xlab="n", ylab="Length of CI1", ylim=c(0,1))

plot(result.rej$n, result.rej$up2-result.rej$low2, type="l", xlab="n", ylab="Length of CI2", ylim=c(0,1))

We can see that the qualitative patterns are the same as those in question 1. This is expected since the priors are the same.

3. Non-uniform priors

Re-do question 2 using a prior of the following form: for both coins, \(P(\pi_i) = .95\) with probability \(.7\), \(P(\pi_i) = 1\) with probability \(.02\), \(P(\pi_i) = 0\) with probability \(.02\), and \(P(\pi_i) \sim \mathcal{U}(0,1)\) otherwise. (This kind of prior might be reasonable, for example, when making inferences about the weights of real coins, which are usually fair, and usually double-headed or -tailed when not fair.) Once you re-write the code for generating hypotheses from the prior, you should be able to recycle your other code from Q2. How do the results differ from the uniform prior, in terms of the number of data points needed in order to distinguish the hypotheses?

I think there are some typos here, I understand it as \(\pi_i=0.5\) with probability \(0.7\) (which means “usually fair”), \(\pi_i=1\) with probability \(0.02\), and \(\pi_i=0\) with probability \(0.02\).

gen.par.v2 <- function() {
  coin.type <- runif(1)
  if (coin.type <= 0.7){
    return(0.5)
  }
  else if (coin.type <= 0.72){
    return(1)
  }
  else if (coin.type <= 0.74){
    return(0)
  }
  
  return(runif(1))
} 

result.rej2 <- sim.rej(5, gen.par.v2) # get the result template, will be overwritten

for (i in 1:20){
  result.rej2[i, ] <- sim.rej(5*i, gen.par.v2) 
}

result.rej2
##        n      low1       up1       low2       up2 overlap BayesFactor
## 2.5%   5 0.3022671 0.6047363 0.09496689 0.5602458    TRUE 1.839649530
## 2     10 0.4204786 0.6817997 0.42715910 0.6037536    TRUE 2.256206929
## 3     15 0.1403516 0.5000000 0.50000000 0.9232674    TRUE 0.071978275
## 4     20 0.4219010 0.5502505 0.50000000 0.7618627    TRUE 0.670349920
## 5     25 0.3437033 0.5048763 0.50000000 0.7910495    TRUE 0.162397882
## 6     30 0.2944230 0.5000000 0.50000000 0.6801603    TRUE 0.298755187
## 7     35 0.3367786 0.5000000 0.32694887 0.5000000    TRUE 4.173561751
## 8     40 0.4396800 0.5157069 0.47989338 0.5086021    TRUE 2.274930641
## 9     45 0.3149958 0.5000000 0.50000000 0.6727499    TRUE 0.108863407
## 10    50 0.2471492 0.5000000 0.49737379 0.6419491    TRUE 0.059339412
## 11    55 0.4608321 0.5000000 0.50000000 0.6752812    TRUE 0.245618374
## 12    60 0.3287796 0.5000000 0.40255705 0.5000000    TRUE 5.146873562
## 13    65 0.3989490 0.5000000 0.50000000 0.5378659    TRUE 0.831248815
## 14    70 0.1814014 0.3959935 0.50000000 0.5863851   FALSE 0.003492185
## 15    75 0.3847666 0.5000000 0.50000000 0.7711092    TRUE 0.009636595
## 16    80 0.2493704 0.5000000 0.50000000 0.6902796    TRUE 0.003501819
## 17    85 0.3527865 0.5000000 0.50000000 0.7419555    TRUE 0.006442119
## 18    90 0.2588748 0.5000000 0.50000000 0.5891497    TRUE 0.017776862
## 19    95 0.2383197 0.5000000 0.50000000 0.6614064    TRUE 0.001276028
## 20   100 0.3079627 0.5000000 0.50000000 0.6463688    TRUE 0.011883935
# The smaller the BayesFactor, the more support for rejecting the null hypothesis
plot(result.rej2$n, result.rej2$BayesFactor, type="l", xlab="n", ylab="Bayes Factor: H0/H1")

plot(result.rej2$n, result.rej2$up1-result.rej2$low1, type="l", xlab="n", ylab="Length of CI1", ylim=c(0,1))

plot(result.rej2$n, result.rej2$up2-result.rej2$low2, type="l", xlab="n", ylab="Length of CI2", ylim=c(0,1))

We can see that since we have a rather strong prior that the coins are fair, we need more evidence in order to reject the null hypothesis. What usually happens is that both CIs overlap at exactly one point: \(0.5\). When n is small, the lengths of CIs are smaller than those with a uniform prior, because the initial belief is concentrated on a reasonable estimate 0.5, which reduces the variation. But this also means that the lengths of CIs are slower to decrease as n increases. We will need more data to override the prior belief.