Probability & Statistics for Linguists: Homework 4(Due Monday Feb 2)

1. Influence of sample size on inferential strength

Design a simulation testing your ability to distinguish two data sets which are binomially distributed with parameters \(\pi_1 = .4\), \(\pi_2 = .6\). For each \(n \in \{5, 10, 15, ..., 95, 100\}\), generate \(n\) data points from these distributions. Approximate the probability that you will succeed in distinguishing these two distributions with \(n\) observations as the proportion of simulations in which the symmetric 95% Bayesian confidence intervals that you estimate from the simulated data do not overlap. For each \(n\), go through this procedure using a grid approximation, calculating the posterior probability of each parameter in \(\{0, .01, .02, ..., .99, 1\}\) using a uniform prior. (You have to do this 20 times, so you’ll also want to work out how to use a loop or sapply(), rather than typing similar code 20 times.)

Present your answer in the form of a data frame with 7 columns: \(n\) in the first, 2-3 giving upper and lower CI bounds for the data sampled from \(\pi_1\), 4-5 giving upper and lower CI bounds for the data sampled from \(\pi_2\), and column 6 giving a Boolean stating whether or not the CIs overlap. In the 7th column show the Bayes factor (the ratio of the likelihoods of the data - \(P(\mathcal{D}|\pi_1, \pi_2)\)) under the following two hypotheses: \(H_0\) = “The parameters are both .5”, and \(H_1\) = “The parameters are \(.4\) and \(.6\) respectively”.

Next, plot the Bayes factors as a function of \(n\), and plot the widths of the confidence intervals as a function of \(n\). (You can calculate the latter by subtracting column 2 from 3, and subtracting 4 from 5.)

Notes:

Since the two data sets were gathered independently, the probability of the following conjunction - getting data set 1 under proportion \(\pi_i\), and getting data set 2 under proportion \(\pi_j\) - is just the product of the individual probabilities of these events.
As in class notes, you’ll want to use dbinom() to find the probability of getting the relevant number of heads given some paramter.
You can find the symmetric 95% confidence interval using quantile(..., probs=c(.025, .975)). The first number will give you the lower bound, and the second the upper bound.

2. Bayesian inference with rejection sampling

Re-do question 1 using rejection sampling instead of a grid approximation. Assume a uniform distribution over \([0,1]\) - i.e., draw proportions from runif(1,0,1). As in the class notes linked here, use a while-loop to ensure that you are getting enough samples from the conditional distribution - at least 500 for each inference that you want to make. Make sure that you answer all the same questions as in Q1.

Note: on this problem it makes a really big difference to computational efficiency whether your condition is stated in terms of the precise sequence of heads and tails generated, or in terms of the number of heads in a sequence of length \(n\). The latter with be much much faster.

3. Non-uniform priors

Re-do question 2 using a prior of the following form: for both coins, \(P(\pi_i) = .5\) with probability \(.95\), \(P(\pi_i) = 1\) with probability \(.02\), \(P(\pi_i) = 0\) with probability \(.02\), and \(P(\pi_i) \sim \mathcal{U}(0,1)\) otherwise. (This kind of prior might be reasonable, for example, when making inferences about the weights of real coins, which are usually fair, and usually double-headed or -tailed when not fair.) Once you re-write the code for generating hypotheses from the prior, you should be able to recycle your other code from Q2. How do the results differ from the uniform prior, in terms of the number of data points needed in order to distinguish the hypotheses?