Let \(R\vert \theta \sim Binomial(n,\theta)\), where \(n\geq 1\) is the number of trials and \(\theta \in (0,1)\) is the probability of success.
As discussed already in [1], the likelihood function is (Binomial model):
\[\begin{eqnarray*} \Pr(R=r \vert \theta) = \begin{pmatrix} n\\r \end{pmatrix} \theta^r (1-\theta)^{n-r}. \end{eqnarray*}\]It is easy to check that the maximum likelihood estimator (MLE) is \(\widehat{\theta}=\dfrac{r}{n}\) and the asymptotic variance of the MLE is \(\dfrac{r(n-r)}{n^3}\).
Thus, we can identify this as the Beta distribution with new parameters \(a' = r+a\) and \(b' = n-r+b\). That is, the posterior distribution is again a Beta distribution (same as the prior). This means that that Beta prior is a Conjugate Prior for \(\theta\) in the Binomial sampling model.
The posterior Mean is given by:
\[ \dfrac{a+r}{n+a+b}.\]
The posterior Mode (for \(n+a+b-2>0\)) is given by:
\[ \dfrac{a+r-1}{n+a+b-2}.\]
The posterior Variance is given by:
\[\dfrac{(a+r)(b+n-r)}{(a+b+n)^2(a+b+n+1)}\]
When there is little prior information about the probability of success, \(\theta\), two non-informative priors have been considered (see [3]):
The beta prior with \(a=b=1\). This is, a uniform prior on \(\theta\).
The beta prior with \(a=b=0.5\). This is a \(U-\)shaped prior that naturally arises as the Jeffreys prior (see [2]).
The uniform prior (i) produces a posterior distribution where the maximum a posteriori coincides with the MLE. Moreover, it is often interpreted as a non-informative prior since it assigns equal probability to any pair of intervals in \((0,1)\) of the same length. In contrast, the Jeffreys prior (ii) assigns more mass to regions around the end points \(0\) and \(1\), but it is justified as a non-informative prior as it is invariant under reparameterisations and (in this case) it coincides with the reference prior. Thus, although both priors are used as non-informative priors, their interpretation and justification are different.
# a = 1, b = 1
pi1 <- Vectorize(function(theta) dbeta(theta,1,1))
# a = 0.5, b = 0.5
pi2 <- Vectorize(function(theta) dbeta(theta,0.5,0.5))
curve(pi1, xlab=~theta, ylab="Density", main="Non-informative Beta priors", lwd=2, col = "blue")
curve(pi2, xlab=~theta, ylab="Density", main="Non-informative Beta priors", lwd=2, col = "red", add=T)
legend(0.35, 1.3, c("Uniform","Jeffreys"), col=c("blue","red"),
text.col = "black", lty = c(1, 1), lwd = c(2,2),
merge = TRUE, bg = "gray90",cex=1)
The following R code compares the posterior means and variances obtained with the two non-informative priors (Uniform and Jeffreys) with the MLE and its asymptotic variance. Overall, we observe that the inference produced with the Jeffreys prior is slightly closer to the Frequentist inference.
References
# Required packages
library(knitr)
library(TeachingDemos)
#----------------------------------------------------------------------------
# Mean, Mode, and Variance of the Beta distribution
#----------------------------------------------------------------------------
# a, b : shape parameters
MeanBeta <- function(a,b) a/(a+b)
ModeBeta <- function(a,b){
m <- ifelse(a>1 & b>1, (a-1)/(a+b-2), NA)
return(m)
}
VarianceBeta <- function(a,b) (a*b)/((a+b)^2*(a+b+1))
################################################################################
# n = 10, r = 1
################################################################################
r <- 1
n <- 10
Estimators <- c(r/n, MeanBeta(1+r,n-r+1), MeanBeta(0.5+r,n-r+0.5))
Variances <- c(r*(n-r)/(n^3),VarianceBeta(1+r,n-r+1),VarianceBeta(0.5+r,n-r+0.5))
tab <- rbind(Estimators,Variances)
colnames(tab) <- c("Frequentist","Uniform", "Jeffreys")
kable(tab)
Frequentist | Uniform | Jeffreys | |
---|---|---|---|
Estimators | 0.100 | 0.1666667 | 0.1363636 |
Variances | 0.009 | 0.0106838 | 0.0098140 |
# HPD interval for Uniform prior
emp.hpd(rbeta(1000000,r+1,n-r+1))
## [1] 0.006562546 0.367800338
# HPD interval for Jeffreys prior
emp.hpd(rbeta(1000000,r+0.5,n-r+0.5))
## [1] 0.0001673427 0.3301377564
################################################################################
# n = 100, r = 10
################################################################################
r <- 10
n <- 100
Estimators <- c(r/n, MeanBeta(1+r,n-r+1), MeanBeta(0.5+r,n-r+0.5))
Variances <- c(r*(n-r)/(n^3),VarianceBeta(1+r,n-r+1),VarianceBeta(0.5+r,n-r+0.5))
tab <- rbind(Estimators,Variances)
colnames(tab) <- c("Frequentist","Uniform", "Jeffreys")
kable(tab)
Frequentist | Uniform | Jeffreys | |
---|---|---|---|
Estimators | 1e-01 | 0.1078431 | 0.1039604 |
Variances | 9e-04 | 0.0009341 | 0.0009133 |
# HPD interval for Uniform prior
emp.hpd(rbeta(1000000,r+1,n-r+1))
## [1] 0.0514756 0.1684563
# HPD interval for Jeffreys prior
emp.hpd(rbeta(1000000,r+0.5,n-r+0.5))
## [1] 0.04802647 0.16397925
################################################################################
# n = 1000, r = 100
################################################################################
r <- 100
n <- 1000
Estimators <- c(r/n, MeanBeta(1+r,n-r+1), MeanBeta(0.5+r,n-r+0.5))
Variances <- c(r*(n-r)/(n^3),VarianceBeta(1+r,n-r+1),VarianceBeta(0.5+r,n-r+0.5))
tab <- rbind(Estimators,Variances)
colnames(tab) <- c("Frequentist","Uniform", "Jeffreys")
kable(tab)
Frequentist | Uniform | Jeffreys | |
---|---|---|---|
Estimators | 1e-01 | 0.1007984 | 0.1003996 |
Variances | 9e-05 | 0.0000904 | 0.0000901 |
# HPD interval for Uniform prior
emp.hpd(rbeta(1000000,r+1,n-r+1))
## [1] 0.08219253 0.11931255
# HPD interval for Jeffreys prior
emp.hpd(rbeta(1000000,r+0.5,n-r+0.5))
## [1] 0.08211941 0.11923611