A limitation of specifying a discrete prior for \(p\) is when a plausible value is not specified in the prior distribution (e.g. \(p = 0.2\)), it will be assigned a \(0\) probability in the posterior distribution
Ideally, we want a distribution that allows p to be any value in [0, 1]
Two possible distributions come into mind:
Continuous uniform distribution in the interval [0,1]
Beta distribution
\[ f(p)= \begin{cases} \frac{1}{b-a},\; \text{if}\; a \leq p \leq b \\\ 0,\text{elsewhere} \end{cases} \]
\[ f(p)= \begin{cases} 1, \; \text{if}\; 0 \leq p \leq 1 \\ 0,\text{elsewhere} \end{cases} \]
\[ f(y|p)=\binom{n}{y}p^y(1-p)^{n-y} \]
\[ f(p|y)\approx p^y(1-p)^{n-y} \]
\[ f(p)= \begin{cases} \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}p^{a-1}(1-p)^{b-1}, \; \text{if}\; 0 \leq p \leq 1 \\ 0, \text{elsewhere} \end{cases} \]
Hence, this distribution is suitable for modeling a random variable whose domain is between 0 and 1, say the proportion \(p\)
Note that the \(U(0,1)\) distribution is a special case of the Beta distribution with \(a=b=1\)
Recall that if \(p \sim Beta(a,b)\), then
\[\begin{align} E(p) &= \frac{a}{a+b} \notag \\ Mode(p) &= \frac{a-1}{a+b-2}, \text{if}\; a,b > 1 \notag \\ Var(p) &= \frac{ab}{(a+b)^2(a+b+1)} \notag \end{align}\]
\[ f(s|p)=\binom{n}{s}p^s(1-p)^f \]
with the Beta prior
\[ f(p)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}p^{a-1}(1-p)^{b-1} \]
will give us the posterior density
\[ f(p|s) \propto f(p) \times f(s|p)=p^{a+s-1}(1-p)^{b+f-1} \]
\[\frac{\Gamma(n+a+b)}{\Gamma(a+s)\Gamma(b+f)} \] then the resulting pdf is that of a beta distribution with \(a' = a + s\) and \(b' = b + f\)
We have seen that if we use a Beta prior for the Binomial proportion, we will obtain a Beta posterior
That is, \[p \sim Beta(a,b)\] and \[Y \sim Binomial(n,p)\] then \[p|Y \sim Beta(a+s, b+f)\]
This result shows an important concept in Bayesian statistics: Conjugate distributions
Conjugate distribution or conjugate pair means a pair of a sampling distribution and a prior distribution for which the resulting posterior distribution belongs to same parametric family of distributions as the prior distribution
The parameters of the \(Beta(a,b)\) prior are called hyperparameters
These are chosen to reflect the researcher’s prior beliefs about \(p\)
Difficult to guess values of \(a\) and \(b\) in Beta(\(a\), \(b\))
The solution is to specify quantile(s) of the beta distribution
Recall that the \(q^{th}\) quantile is the value \(y\) of the random variable \(Y\) such that
\[ P(Y\leq y_q)=q \]
library(ProbBayes)
beta.select(list(x=0.3,p=0.50),
list(x=0.5,p=0.90))
## [1] 3.26 7.19
The prior distribution of \(p\) is Beta(3.26, 7.19), the likelihood is Binomial (27, \(p\))
Thus, the posterior is Beta(14.26, 23.19)
<- seq(0, 1, length = 1000)
p <- 3.26
a <- 7.19
b <- 11
s <- 16
f <- dbeta(p, a, b)
prior <- dbeta(p, s+1, f+1)
like <- dbeta(p, a+s, b+f)
post plot(p, post, type = "l", ylab = "Density", lty = 2, lwd = 3)
lines(p, like, lty = 1, lwd = 3)
lines(p, prior, lty = 3, lwd = 3)
legend(.7, 4, c("Prior", "Likelihood", "Posterior"),
lty=c(3, 1, 2), lwd = c( 3, 3, 3))
Bayesian inference (point and credible interval estimates, and test of hypothesis) will be based on Beta(14.26, 23.19)
In fact, a Bayesian point estimate of \(p\) is the mean of Beta(14.26, 23.19) which is
\[\hat{p}=\frac{a{'}}{a{'}+b{'}} = \frac{14.26}{14.26 + 23.19} \approx 0.38\]
round(mean(rbeta(1000,14.26, 23.19)),2)
## [1] 0.38
Another type of inference is a Bayesian credible interval, an interval that one is confident contains \(p\)
Such an interval provides an uncertainty estimate for the parameter \(p\)
A 90% Bayesian credible interval is an interval that contains 90% of the posterior probability.
A (1- \(\alpha\))% credible interval for \(p\) can be obtained using the beta_interval() function in the qbeta() function in the stats package and the ProbBayes package.
::beta_interval(0.9,c(14.26, 23.19)) ProbBayes
Suppose it is claimed that at least 75% of graduating college students slept at least 8 hours every night. Is this a reasonable claim?
From a Bayesian viewpoint,
Find the posterior probability that \(p\geq 0.75\).
Make a decision based on the value of the posterior probability. If the probability is small, we can reject this claim.
In R, we can use the beta_area() function in the ProbBayes package
::beta_area(lo = 0.75, hi = 1.0,
ProbBayesshape_par = c(14.26, 23.19))
1- pbeta(0.75,14.26, 23.19)
## [1] 9.529512e-07
Using the above code we have \(P(p \geq 0.75) \approx 0\)
Or simulate observations and calculate the proportion of the simulated values which are \(\geq0.75\)
#simulates n=1000 observations
<- rbeta(1000, 14.26, 23.19)
rval <- sum(rval >= 0.75) / 1000
prop print(prop)
## [1] 0
Prediction is a typical task of Bayesian inference and statistical inference, in general.
Once we are able to make inference about the parameter in our statistical model, one may be interested in predicting future observations.
For example, if another survey is conducted using a sample of \(m\) graduating students, what is expected number of students who would say that they have at least 8 hours of sleep per night?
Let \(\widetilde{Y}\) be a new observation. We want the probability function \[f(\widetilde{Y}=\widetilde{s}|Y=s) \] where \(\widetilde{s}\) is a value of \(\widetilde{Y}\)
But the conditional distribution of \(\widetilde{Y}\) given a value of the proportion \(p\) is Binomial(\(m\), \(p\)) and the current beliefs about \(p\) are described by the posterior density
So the joint density of \(\widetilde{Y}\) and \(p\) is \[ f(\widetilde{Y}= \widetilde{s}, p|Y=s) = f(\widetilde{Y}= \widetilde{s} | p)f{(p| Y=s)} \]
Thus, to get \(f(\widetilde{Y}=\widetilde{s}|Y=s)\), we integrate out \(p\)
That is, \[ f(\widetilde{Y}=\widetilde{s}|Y=s) = \int{f(\widetilde{Y}= \widetilde{s} | p)f{(p|Y=s)} dp} \]
After the substitution of densities and an integration step, we get \[ f(\widetilde{Y}=\widetilde{s}|Y=s) = \binom{m}{\widetilde{s}} \frac{B(a+s+\widetilde{s}, b+f+m-\widetilde{s})}{B(a+s, b+f)} \]
This is the Beta-Binomial distribution with parameters \(m,a+s, b+f\)
Recall that \(B(a,b)\) is the beta function defined as \[ B(a,b) = \int_{0}^1 p^{a-1}(1-p)^{b-1} dp = \frac{\Gamma(a) \Gamma(b)} {\Gamma(a+b)} \]
In summary, Bayesian prediction distribution of a new observation is a Beta-Binomial distribution where \(m\) is the number of observations in the new sample, \(a\) and \(b\) are the parameters of the Beta prior, and \(s\) and \(n\) are quantities from the likelihood
Let us use the Beta-Binomial distribution to compute the predictive probability that there are \(\widetilde{s}\) successes in a new survey of size \(m\).
We shall use the pbetap() function in the ProbBayes package for the calculation of predictive probabilities
<- pbetap(c(14.26, 23.19),20,12)
pred.prob print(pred.prob)
## [1] 0.03976734
It is desirable to construct an interval that will contain \(\widetilde{Y}\) with a high probability. Suppose the desired probability content is 0.90.
One constructs this prediction interval by putting in the most likely values of \(\widetilde{Y}\) until the probability content of the set exceeds 0.90.
<- pbetap(c(14.26, 23.19),20,0:20)
pred.prob.dist discint(cbind(0:20,pred.prob.dist), 0.90)
## $prob
## [1] 0.9088212
##
## $set
## [1] 4 5 6 7 8 9 10 11 12
In some situations it is difficult to derive the exact predictive distribution
To solve this issue, we simulate values from the predictive distribution
Step 1: Simulate draws of the parameter (in this case the proportion (p) from its posterior distribution
Step 2: Then simulating values of the future observation \(\widetilde{s}\) from the sampling density (say the Binomial distribution using the simulated p’s in Step 1)
For example, let us generate 1000 proportions (pred_p_sim) from the posterior distribution
Input these simulated proportions into the Binomial pmf with \(m=20\).
Get the average of the simulated random y-values to get the expected value of \(\widetilde{Y}\)
<- rbeta(1000, 14.26, 23.19)
pred_p_sim <- rbinom(1000, 20, pred_p_sim)
pred_y_sim print(mean(pred_y_sim))
## [1] 7.676
Suppose another researcher (B) is more pessimistic about the likelihood of students having at least 8 hours of sleep per night
This researcher (B) believes that the prior median of the proportion p is 0.2 and the \(90^{th}\) percentile is 0.4
beta.select(list(x=0.20,p=0.5),
list(x=0.40,p=0.90))
## [1] 2.07 7.32
Based on the above code chunk we have \(a=2.07\) and \(b=7.32\). That is the prior is \(Beta(2.07,7.32)\)
With the same likelihood for \(Y=12\) which is \(Binomial(20, p)\) then the posterior is \(Beta(14.07,15.32)\)
The mean of this posterior distribution is 0.48
In fact, if we do prediction for \(m=20\) for this posterior distibution we also get a different result
<- rbeta(1000, 14.07,15.32)
pred_p_sim1 <- rbinom(1000, 20, pred_p_sim1)
pred_y_sim1 mean(pred_y_sim1)
## [1] 9.513
These results show that a the prior distribution has a very important contribution to the posterior distribution
So, which of the two priors is credible?
We compute the so-called Bayes Factor which is the ratio of the predictive densities of 2 models
The function binomial.beta.mix() is used to find the Bayes factor
One inputs the prior probabilities of the two models (priors), and the vectors of Beta shape parameters that define the two priors
<- c(0.5, 0.5)
probs # assumes that the 2 priors are equally plausible
<- c(3.26, 7.19)
beta_prior1 <- c(2.07, 7.32)
beta_prior2 <- rbind(beta_prior1, beta_prior2)
beta_par <- binomial.beta.mix(probs, beta_par, c(12, 8))
output <- output$probs[1] / output$probs[2]
posterior_odds print(posterior_odds)
## beta_prior1
## 2.536484
Although the prior predictive distribution is useful in model checking, it has some disadvantages
For example, the prior predictive distribution may not exist in the situation where the prior is not a proper probability distribution
A related issue is that a prior may be assigned that may not accurately reflect one’s prior beliefs about a parameter
An alternative method of checking the suitability of a Bayesian model is based on the posterior predictive distribution.
We compute the posterior predictive distribution of a replicated dataset, that is a dataset of the same sample size as our observed sample
We then check if the observed value of \(s\) is in the middle of this predictive distribution
If this is true, then this means that the observed sample is consistent with predictions of replicated data; else some model misspecification exists
The process is similar to predictive checking wherein we simulate values of \(p\) from the posterior distribution and use these simulated proportions as input to simulate \(s\) values from the Binomial distribution
<- rbeta(1000, 14.26, 23.19)
sim.p <- rbinom(1000, 20, sim.p)
sim.y hist(sim.y,xlab="Simulated Y", main = " ")
abline(v = 7.668, col = "red", lwd = 3, lty = 2)