The Beta distribution

For \(\theta \in (0,1)\), the Beta-distribution, \(\mbox{Beta}(a,b)\), \(a,b>0\), is defined as \[\pi_\Theta(\theta\vert a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \theta^{a-1}(1-\theta)^{b-1} = \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)},\] where \[B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}.\] Some properties of the beta distribution:

Mean: \(\dfrac{a}{a+b}\)

Mode: \(\dfrac{a-1}{a+b-2}\) for \(a,b>1\)

Variance: \(\dfrac{ab}{(a+b)^2(a+b+1)}\)

# a = 1, b = 1
pi <- Vectorize(function(theta)  dbeta(theta,1,1))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=1, b=1",lwd=2)

# a = 0.3, b = 1
pi <- Vectorize(function(theta)  dbeta(theta,0.3,1))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=1", lwd=2)

# a = 0.3, b = 0.3
pi <- Vectorize(function(theta)  dbeta(theta,0.3,0.3))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=0.3", lwd=2)

# a = 2, b = 2
pi <- Vectorize(function(theta)  dbeta(theta,2,2))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=2", lwd=2)

# a = 8, b = 2
pi <- Vectorize(function(theta)  dbeta(theta,8,2))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=8, b=2", lwd=2)

# a = 2, b = 8
pi <- Vectorize(function(theta)  dbeta(theta,2,8))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=8", lwd=2)

# a = 0.5, b = 0.5
pi <- Vectorize(function(theta)  dbeta(theta,0.5,0.5))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.5, b=0.5", lwd=2)

# a = 0.005, b = 0.005
pi <- Vectorize(function(theta)  dbeta(theta,0.005,0.005))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.005, b=0.005", lwd=2)

# Mean, Mode, and Variance of the Beta distribution
MeanBeta <- function(a,b) a/(a+b)
ModeBeta <- function(a,b){
m <- ifelse(a>1 & b>1, (a-1)/(a+b-2), NA)
return(m)
}
VarianceBeta <- function(a,b) (a*b)/((a+b)^2*(a+b+1))

library(knitr)
as <- c(1,0.3,0.3,2,8,2,0.5,0.005)
bs <- c(1,1,0.3,2,2,8,0.5,0.005)
PriorProp <- cbind(1:8,MeanBeta(as,bs),ModeBeta(as,bs),qbeta(0.5,as,bs),VarianceBeta(as,bs))
colnames(PriorProp) <- c("Prior","Mean","Mode","Median","Variance")  

print(kable(PriorProp,digits=4))

## 
## 
##  Prior     Mean    Mode   Median   Variance
## ------  -------  ------  -------  ---------
##      1   0.5000      NA   0.5000     0.0833
##      2   0.2308      NA   0.0992     0.0772
##      3   0.5000      NA   0.5000     0.1562
##      4   0.5000   0.500   0.5000     0.0500
##      5   0.8000   0.875   0.8204     0.0145
##      6   0.2000   0.125   0.1796     0.0145
##      7   0.5000      NA   0.5000     0.1250
##      8   0.5000      NA   0.5000     0.2475

The Gamma function

The Gamma function \(\Gamma(a)\) for \(a>0\) is defined as:

\[\Gamma(a) = \int_0^\infty x^{a-1} e^{-x} \, dx.\] When \(a\) takes integer values, \(\Gamma(a)\) is the factorial of \((a-1)\), \((a-1)!\).

# Evaluating the Gamma function
gamma(1)

## [1] 1

gamma(1.5)

## [1] 0.8862269

gamma(2)

## [1] 1

gamma(3)

## [1] 2

The Binomial distribution

The binomial distribution is the discrete probability distribution with parameters \(n\) and \(\theta\). \(n=1,2,...\), \(0\leq \theta \leq 1\). This distribution is used to model the number of successes \(r\) in a sequence of \(n\) independent trials with probability of success \(\theta\).

The probability mass function, for \(n\) fixed, is given by: \[\begin{eqnarray*} \Pr(r\vert\theta) = \begin{pmatrix} n\\r \end{pmatrix}\theta^r(1-\theta)^{n-r}, \end{eqnarray*}\]

where \(\begin{pmatrix} n\\r \end{pmatrix} = \dfrac{n!}{r!(n-r)!}\).

For instance, we can model the probability of observing \(r\) tails after tossing \(n\) times a fair coin.

Posterior distribution for a Binomial model with Beta prior

Suppose that \(R\mid \theta \sim Binomial(n,\theta)\), with \(n\) fixed. What is the posterior probability of \(\theta\)?

The likelihood function is (Binomial model):

\[\begin{eqnarray*} \Pr(R=r \vert \theta) = \begin{pmatrix} n\\r \end{pmatrix} \theta^r (1-\theta)^{n-r}. \end{eqnarray*}\] Consider now the prior distribution \(\theta \sim Beta(a,b)\). This is \[\pi_\Theta(\theta\vert a, b) = \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}.\] Then, the posterior distribution is proportional to: \[\begin{eqnarray*} \pi_{\Theta}(\theta \vert R=r) \propto \theta^{r+a-1} (1-\theta)^{n-r+b-1} \end{eqnarray*}\]

Thus, we can identify this as the Beta distribution with new parameters \(a' = r+a\) and \(b' = n-r+b\). That is, the posterior distribution is again a Beta distribution (same as the prior). This means that that Beta prior is a Conjugate Prior for \(\theta\) in the Binomial sampling model.

Example 1: small sample

Suppose that you toss a coin \(n=10\) times, and that you observe \(r=4\) tails. What is the posterior probability of \(\theta\)?

As we found in the previous section, the posterior distribution is a Beta distribution with parameters \(a' = 4+a\) and \(b' = 6+b\). The following R code illustrates the shapes of the posterior distribution for the Beta priors presented in the previous sections. This example illustrates the influence of the prior in small samples.

# a = 1, b = 1
pi <- Vectorize(function(theta)  dbeta(theta,1,1))
pi1 <- Vectorize(function(theta)  dbeta(theta,4+1,6+1))
curve(pi1, xlab=~theta, ylab="Density", main="Beta prior: a=1, b=1",lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 2.55, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.3, b = 1
pi <- Vectorize(function(theta)  dbeta(theta,0.3,1))
pi2 <- Vectorize(function(theta)  dbeta(theta,4+0.3,6+1))
curve(pi2, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=1", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 2.55, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.3, b = 0.3
pi <- Vectorize(function(theta)  dbeta(theta,0.3,0.3))
pi3 <- Vectorize(function(theta)  dbeta(theta,4+0.3,6+0.3))
curve(pi3, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=0.3", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.65, 2.2, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 2, b = 2
pi <- Vectorize(function(theta)  dbeta(theta,2,2))
pi4 <- Vectorize(function(theta)  dbeta(theta,4+2,6+2))
curve(pi4, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 2.55, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 8, b = 2
pi <- Vectorize(function(theta)  dbeta(theta,8,2))
pi5 <- Vectorize(function(theta)  dbeta(theta,4+8,6+2))
curve(pi5, xlab=~theta, ylab="Density", main="Beta prior: a=8, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.05, 3, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 2, b = 8
pi <- Vectorize(function(theta)  dbeta(theta,2,8))
pi6 <- Vectorize(function(theta)  dbeta(theta,4+2,6+8))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=8", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 3.5, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.5, b = 0.5
pi <- Vectorize(function(theta)  dbeta(theta,0.5,0.5))
pi7 <- Vectorize(function(theta)  dbeta(theta,4+0.5,6+0.5))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.5, b=0.5", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 3.5, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.005, b = 0.005
pi <- Vectorize(function(theta)  dbeta(theta,0.005,0.005))
pi8 <- Vectorize(function(theta)  dbeta(theta,4+0.005,6+0.005))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.005, b=0.005", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 3.5, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# All the posteriors together
curve(pi1, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, ylim=c(0,4.5),col=1,n=10000)
curve(pi2, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=2,n=10000)
curve(pi3, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=3,n=10000)
curve(pi4, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=4,n=10000)
curve(pi5, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=5,n=10000)
curve(pi6, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=6,n=10000)
curve(pi7, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=7,n=10000)
curve(pi8, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=8,n=10000)

# Properties of the posteriors
as <- 4 + c(1,0.3,0.3,2,8,2,0.5,0.005)
bs <- 6 + c(1,1,0.3,2,2,8,0.5,0.005)
PostProp1 <- cbind(1:8,MeanBeta(as,bs),ModeBeta(as,bs),qbeta(0.5,as,bs),VarianceBeta(as,bs))
colnames(PostProp1) <- c("Prior","Mean","Mode","Median","Variance")  

print(kable(PostProp1,digits=4))

## 
## 
##  Prior     Mean     Mode   Median   Variance
## ------  -------  -------  -------  ---------
##      1   0.4167   0.4000   0.4119     0.0187
##      2   0.3805   0.3548   0.3733     0.0192
##      3   0.4057   0.3837   0.3995     0.0208
##      4   0.4286   0.4167   0.4251     0.0163
##      5   0.6000   0.6111   0.6034     0.0114
##      6   0.3000   0.2778   0.2932     0.0100
##      7   0.4091   0.3889   0.4034     0.0201
##      8   0.4001   0.3752   0.3932     0.0218

Example 2: large sample

Suppose that you toss a coin \(n=1000\) times, and that you observe \(r=400\) tails. What is the posterior probability of \(\theta\)?

As we found in the previous section, the posterior distribution is a Beta distribution with parameters \(a' = 400 +a\) and \(b' = 600 +b\). The following R code illustrates the shapes of the posterior distribution for the Beta priors presented in the previous sections. This example illustrates the little influence of the prior in large samples.

# a = 1, b = 1
pi <- Vectorize(function(theta)  dbeta(theta,1,1))
pi1 <- Vectorize(function(theta)  dbeta(theta,400+1,600+1))
curve(pi1, xlab=~theta, ylab="Density", main="Beta prior: a=1, b=1",lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.3, b = 1
pi <- Vectorize(function(theta)  dbeta(theta,0.3,1))
pi2 <- Vectorize(function(theta)  dbeta(theta,400+0.3,600+1))
curve(pi2, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=1", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.3, b = 0.3
pi <- Vectorize(function(theta)  dbeta(theta,0.3,0.3))
pi3 <- Vectorize(function(theta)  dbeta(theta,400+0.3,600+0.3))
curve(pi3, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=0.3", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 2, b = 2
pi <- Vectorize(function(theta)  dbeta(theta,2,2))
pi4 <- Vectorize(function(theta)  dbeta(theta,400+2,600+2))
curve(pi4, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 8, b = 2
pi <- Vectorize(function(theta)  dbeta(theta,8,2))
pi5 <- Vectorize(function(theta)  dbeta(theta,400+8,600+2))
curve(pi5, xlab=~theta, ylab="Density", main="Beta prior: a=8, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 2, b = 8
pi <- Vectorize(function(theta)  dbeta(theta,2,8))
pi6 <- Vectorize(function(theta)  dbeta(theta,400+2,600+8))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=8", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.5, b = 0.5
pi <- Vectorize(function(theta)  dbeta(theta,0.5,0.5))
pi7 <- Vectorize(function(theta)  dbeta(theta,400+0.5,600+0.5))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.5, b=0.5", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# a = 0.005, b = 0.005
pi <- Vectorize(function(theta)  dbeta(theta,0.005,0.005))
pi8 <- Vectorize(function(theta)  dbeta(theta,400+0.005,600+0.005))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.005, b=0.005", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
       text.col = "black", lty = c(1, 2), lwd = c(2,1),
       merge = TRUE, bg = "gray90",cex=1)

# All the posteriors together
curve(pi1, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, ylim=c(0,26),col=1,n=10000)
curve(pi2, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=2,n=10000)
curve(pi3, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=3,n=10000)
curve(pi4, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=4,n=10000)
curve(pi5, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=5,n=10000)
curve(pi6, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=6,n=10000)
curve(pi7, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=7,n=10000)
curve(pi8, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=8,n=10000)

# Properties of the Posteriors
as <- 400 + c(1,0.3,0.3,2,8,2,0.5,0.005)
bs <- 600 + c(1,1,0.3,2,2,8,0.5,0.005)
PostProp1 <- cbind(1:8,MeanBeta(as,bs),ModeBeta(as,bs),qbeta(0.5,as,bs),VarianceBeta(as,bs))
colnames(PostProp1) <- c("Prior","Mean","Mode","Median","Variance")  

print(kable(PostProp1,digits=4))

## 
## 
##  Prior     Mean     Mode   Median   Variance
## ------  -------  -------  -------  ---------
##      1   0.4002   0.4000   0.4001      2e-04
##      2   0.3998   0.3996   0.3997      2e-04
##      3   0.4001   0.3999   0.4000      2e-04
##      4   0.4004   0.4002   0.4003      2e-04
##      5   0.4040   0.4038   0.4039      2e-04
##      6   0.3980   0.3978   0.3980      2e-04
##      7   0.4001   0.3999   0.4000      2e-04
##      8   0.4000   0.3998   0.3999      2e-04

Credible Intervals

Quantile credible intervals can be easily obtained by using the R command qbeta(). For instance, if we want to construct a credible interval of \(95\%\), we need to calculate the \(2.5\%\) and \(97.5\%\) quantiles of the posterior distribution.

Suppose that you toss a coin \(n=5\) times, and that you observe \(r=3\) tails. Then, the posterior probability of \(\theta\) is a \(\mbox{Beta}(3.5,2.5)\). The following code shows how to calculate the \(95\%\) credible intervals in this case, as well as the posterior mean, posterior variance, and posterior standard deviation.

# Credible intervals

L <- qbeta(0.025,3.5,2.5) # lower limit
U <- qbeta(0.975,3.5,2.5) # upper limit

# 95% Credible Interval
print(c(L,U))

## [1] 0.2094167 0.9056097

# Posterior distribution
post <- Vectorize(function(theta) dbeta(theta,3.5,2.5))

# Illustration
x <- seq(0,1,length=10000)
y <- post(x)
# Plot x vs. y as a line graph
plot(x, y, type="l",xlab=~theta,ylab="Density",main="95% Credible Interval",
     lwd=2, cex.axis=1.5, cex.lab=1.5)
polygon(c( L,x[x>=L & x <=U] , U ),  c(0,y[x>=L & x <=U],0 ), col="green")

# Mean, Variance and SD
MeanBeta(3.5,2.5) # Posterior mean

## [1] 0.5833333

VarianceBeta(3.5,2.5) # Posterior variance

## [1] 0.03472222

sqrt(VarianceBeta(3.5,2.5)) # Posterior SD

## [1] 0.186339

The Beta-Binomial model

F. Javier Rubio