Frequentist vs Noninformative Bayesian inference in the Binomial model

Let \(R\vert \theta \sim Binomial(n,\theta)\), where \(n\geq 1\) is the number of trials and \(\theta \in (0,1)\) is the probability of success.

Frequentist analysis

As discussed already in [1], the likelihood function is (Binomial model):

\[\begin{eqnarray*} \Pr(R=r \vert \theta) = \begin{pmatrix} n\\r \end{pmatrix} \theta^r (1-\theta)^{n-r}. \end{eqnarray*}\]

It is easy to check that the maximum likelihood estimator (MLE) is \(\widehat{\theta}=\dfrac{r}{n}\) and the asymptotic variance of the MLE is \(\dfrac{r(n-r)}{n^3}\).

Bayesian analysis

Consider now the prior distribution \(\theta \sim Beta(a,b)\). This is \[\pi_\Theta(\theta\vert a, b) = \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}.\] As we discussed in [1], the beta prior is a conjugate prior for the binomial model. Then, the posterior distribution is proportional to: \[\begin{eqnarray*} \pi_{\Theta}(\theta \vert R=r) \propto \theta^{r+a-1} (1-\theta)^{n-r+b-1} \end{eqnarray*}\]

Thus, we can identify this as the Beta distribution with new parameters \(a' = r+a\) and \(b' = n-r+b\). That is, the posterior distribution is again a Beta distribution (same as the prior). This means that that Beta prior is a Conjugate Prior for \(\theta\) in the Binomial sampling model.

The posterior Mean is given by:

\[ \dfrac{a+r}{n+a+b}.\]

The posterior Mode (for \(n+a+b-2>0\)) is given by:

\[ \dfrac{a+r-1}{n+a+b-2}.\]

The posterior Variance is given by:

\[\dfrac{(a+r)(b+n-r)}{(a+b+n)^2(a+b+n+1)}\]

Non-informative priors

When there is little prior information about the probability of success, \(\theta\), two non-informative priors have been considered (see [3]):

  1. The beta prior with \(a=b=1\). This is, a uniform prior on \(\theta\).

  2. The beta prior with \(a=b=0.5\). This is a \(U-\)shaped prior that naturally arises as the Jeffreys prior (see [2]).

The uniform prior (i) produces a posterior distribution where the maximum a posteriori coincides with the MLE. Moreover, it is often interpreted as a non-informative prior since it assigns equal probability to any pair of intervals in \((0,1)\) of the same length. In contrast, the Jeffreys prior (ii) assigns more mass to regions around the end points \(0\) and \(1\), but it is justified as a non-informative prior as it is invariant under reparameterisations and (in this case) it coincides with the reference prior. Thus, although both priors are used as non-informative priors, their interpretation and justification are different.

# a = 1, b = 1
pi1 <- Vectorize(function(theta)  dbeta(theta,1,1))
# a = 0.5, b = 0.5
pi2 <- Vectorize(function(theta)  dbeta(theta,0.5,0.5))

curve(pi1, xlab=~theta, ylab="Density", main="Non-informative Beta priors", lwd=2, col = "blue")
curve(pi2, xlab=~theta, ylab="Density", main="Non-informative Beta priors", lwd=2, col = "red", add=T)
legend(0.35, 1.3, c("Uniform","Jeffreys"), col=c("blue","red"),
       text.col = "black", lty = c(1, 1), lwd = c(2,2),
       merge = TRUE, bg = "gray90",cex=1)

The following R code compares the posterior means and variances obtained with the two non-informative priors (Uniform and Jeffreys) with the MLE and its asymptotic variance. Overall, we observe that the inference produced with the Jeffreys prior is slightly closer to the Frequentist inference.

References

  1. The Beta-Binomial model.

  2. Jeffreys prior

  3. The Counter-intuitive Non-informative Prior for the Bernoulli Family

# Required packages
library(knitr)
library(TeachingDemos)

#----------------------------------------------------------------------------
# Mean, Mode, and Variance of the Beta distribution
#----------------------------------------------------------------------------
# a, b : shape parameters
MeanBeta <- function(a,b) a/(a+b)
ModeBeta <- function(a,b){
  m <- ifelse(a>1 & b>1, (a-1)/(a+b-2), NA)
  return(m)
}
VarianceBeta <- function(a,b) (a*b)/((a+b)^2*(a+b+1))

################################################################################
# n = 10, r = 1
################################################################################
r <- 1
n <- 10

Estimators <- c(r/n, MeanBeta(1+r,n-r+1), MeanBeta(0.5+r,n-r+0.5))
Variances <- c(r*(n-r)/(n^3),VarianceBeta(1+r,n-r+1),VarianceBeta(0.5+r,n-r+0.5))
tab <- rbind(Estimators,Variances)
colnames(tab) <- c("Frequentist","Uniform", "Jeffreys")
kable(tab)
Frequentist Uniform Jeffreys
Estimators 0.100 0.1666667 0.1363636
Variances 0.009 0.0106838 0.0098140
# HPD interval for Uniform prior
emp.hpd(rbeta(1000000,r+1,n-r+1))
## [1] 0.006562546 0.367800338
# HPD interval for Jeffreys prior
emp.hpd(rbeta(1000000,r+0.5,n-r+0.5))
## [1] 0.0001673427 0.3301377564
################################################################################
# n = 100, r = 10
################################################################################
r <- 10
n <- 100

Estimators <- c(r/n, MeanBeta(1+r,n-r+1), MeanBeta(0.5+r,n-r+0.5))
Variances <- c(r*(n-r)/(n^3),VarianceBeta(1+r,n-r+1),VarianceBeta(0.5+r,n-r+0.5))
tab <- rbind(Estimators,Variances)
colnames(tab) <- c("Frequentist","Uniform", "Jeffreys")
kable(tab)
Frequentist Uniform Jeffreys
Estimators 1e-01 0.1078431 0.1039604
Variances 9e-04 0.0009341 0.0009133
# HPD interval for Uniform prior
emp.hpd(rbeta(1000000,r+1,n-r+1))
## [1] 0.0514756 0.1684563
# HPD interval for Jeffreys prior
emp.hpd(rbeta(1000000,r+0.5,n-r+0.5))
## [1] 0.04802647 0.16397925
################################################################################
# n = 1000, r = 100
################################################################################
r <- 100
n <- 1000

Estimators <- c(r/n, MeanBeta(1+r,n-r+1), MeanBeta(0.5+r,n-r+0.5))
Variances <- c(r*(n-r)/(n^3),VarianceBeta(1+r,n-r+1),VarianceBeta(0.5+r,n-r+0.5))
tab <- rbind(Estimators,Variances)
colnames(tab) <- c("Frequentist","Uniform", "Jeffreys")
kable(tab)
Frequentist Uniform Jeffreys
Estimators 1e-01 0.1007984 0.1003996
Variances 9e-05 0.0000904 0.0000901
# HPD interval for Uniform prior
emp.hpd(rbeta(1000000,r+1,n-r+1))
## [1] 0.08219253 0.11931255
# HPD interval for Jeffreys prior
emp.hpd(rbeta(1000000,r+0.5,n-r+0.5))
## [1] 0.08211941 0.11923611