For \(\theta \in (0,1)\), the Beta-distribution, \(\mbox{Beta}(a,b)\), \(a,b>0\), is defined as \[\pi_\Theta(\theta\vert a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \theta^{a-1}(1-\theta)^{b-1} = \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)},\] where \[B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}.\] Some properties of the beta distribution:
Mean: \(\dfrac{a}{a+b}\)
Mode: \(\dfrac{a-1}{a+b-2}\) for \(a,b>1\)
Variance: \(\dfrac{ab}{(a+b)^2(a+b+1)}\)
# a = 1, b = 1
pi <- Vectorize(function(theta) dbeta(theta,1,1))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=1, b=1",lwd=2)
# a = 0.3, b = 1
pi <- Vectorize(function(theta) dbeta(theta,0.3,1))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=1", lwd=2)
# a = 0.3, b = 0.3
pi <- Vectorize(function(theta) dbeta(theta,0.3,0.3))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=0.3", lwd=2)
# a = 2, b = 2
pi <- Vectorize(function(theta) dbeta(theta,2,2))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=2", lwd=2)
# a = 8, b = 2
pi <- Vectorize(function(theta) dbeta(theta,8,2))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=8, b=2", lwd=2)
# a = 2, b = 8
pi <- Vectorize(function(theta) dbeta(theta,2,8))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=8", lwd=2)
# a = 0.5, b = 0.5
pi <- Vectorize(function(theta) dbeta(theta,0.5,0.5))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.5, b=0.5", lwd=2)
# a = 0.005, b = 0.005
pi <- Vectorize(function(theta) dbeta(theta,0.005,0.005))
curve(pi, xlab=~theta, ylab="Density", main="Beta prior: a=0.005, b=0.005", lwd=2)
# Mean, Mode, and Variance of the Beta distribution
MeanBeta <- function(a,b) a/(a+b)
ModeBeta <- function(a,b){
m <- ifelse(a>1 & b>1, (a-1)/(a+b-2), NA)
return(m)
}
VarianceBeta <- function(a,b) (a*b)/((a+b)^2*(a+b+1))
library(knitr)
as <- c(1,0.3,0.3,2,8,2,0.5,0.005)
bs <- c(1,1,0.3,2,2,8,0.5,0.005)
PriorProp <- cbind(1:8,MeanBeta(as,bs),ModeBeta(as,bs),qbeta(0.5,as,bs),VarianceBeta(as,bs))
colnames(PriorProp) <- c("Prior","Mean","Mode","Median","Variance")
print(kable(PriorProp,digits=4))
##
##
## Prior Mean Mode Median Variance
## ------ ------- ------ ------- ---------
## 1 0.5000 NA 0.5000 0.0833
## 2 0.2308 NA 0.0992 0.0772
## 3 0.5000 NA 0.5000 0.1562
## 4 0.5000 0.500 0.5000 0.0500
## 5 0.8000 0.875 0.8204 0.0145
## 6 0.2000 0.125 0.1796 0.0145
## 7 0.5000 NA 0.5000 0.1250
## 8 0.5000 NA 0.5000 0.2475
The Gamma function \(\Gamma(a)\) for \(a>0\) is defined as:
\[\Gamma(a) = \int_0^\infty x^{a-1} e^{-x} \, dx.\] When \(a\) takes integer values, \(\Gamma(a)\) is the factorial of \((a-1)\), \((a-1)!\).
# Evaluating the Gamma function
gamma(1)
## [1] 1
gamma(1.5)
## [1] 0.8862269
gamma(2)
## [1] 1
gamma(3)
## [1] 2
The binomial distribution is the discrete probability distribution with parameters \(n\) and \(\theta\). \(n=1,2,...\), \(0\leq \theta \leq 1\). This distribution is used to model the number of successes \(r\) in a sequence of \(n\) independent trials with probability of success \(\theta\).
The probability mass function, for \(n\) fixed, is given by: \[\begin{eqnarray*} \Pr(r\vert\theta) = \begin{pmatrix} n\\r \end{pmatrix}\theta^r(1-\theta)^{n-r}, \end{eqnarray*}\]where \(\begin{pmatrix} n\\r \end{pmatrix} = \dfrac{n!}{r!(n-r)!}\).
For instance, we can model the probability of observing \(r\) tails after tossing \(n\) times a fair coin.
Suppose that \(R\mid \theta \sim Binomial(n,\theta)\), with \(n\) fixed. What is the posterior probability of \(\theta\)?
The likelihood function is (Binomial model):
\[\begin{eqnarray*} \Pr(R=r \vert \theta) = \begin{pmatrix} n\\r \end{pmatrix} \theta^r (1-\theta)^{n-r}. \end{eqnarray*}\] Consider now the prior distribution \(\theta \sim Beta(a,b)\). This is \[\pi_\Theta(\theta\vert a, b) = \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}.\] Then, the posterior distribution is proportional to: \[\begin{eqnarray*} \pi_{\Theta}(\theta \vert R=r) \propto \theta^{r+a-1} (1-\theta)^{n-r+b-1} \end{eqnarray*}\]Thus, we can identify this as the Beta distribution with new parameters \(a' = r+a\) and \(b' = n-r+b\). That is, the posterior distribution is again a Beta distribution (same as the prior). This means that that Beta prior is a Conjugate Prior for \(\theta\) in the Binomial sampling model.
Suppose that you toss a coin \(n=10\) times, and that you observe \(r=4\) tails. What is the posterior probability of \(\theta\)?
As we found in the previous section, the posterior distribution is a Beta distribution with parameters \(a' = 4+a\) and \(b' = 6+b\). The following R code illustrates the shapes of the posterior distribution for the Beta priors presented in the previous sections. This example illustrates the influence of the prior in small samples.
# a = 1, b = 1
pi <- Vectorize(function(theta) dbeta(theta,1,1))
pi1 <- Vectorize(function(theta) dbeta(theta,4+1,6+1))
curve(pi1, xlab=~theta, ylab="Density", main="Beta prior: a=1, b=1",lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 2.55, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.3, b = 1
pi <- Vectorize(function(theta) dbeta(theta,0.3,1))
pi2 <- Vectorize(function(theta) dbeta(theta,4+0.3,6+1))
curve(pi2, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=1", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 2.55, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.3, b = 0.3
pi <- Vectorize(function(theta) dbeta(theta,0.3,0.3))
pi3 <- Vectorize(function(theta) dbeta(theta,4+0.3,6+0.3))
curve(pi3, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=0.3", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.65, 2.2, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 2, b = 2
pi <- Vectorize(function(theta) dbeta(theta,2,2))
pi4 <- Vectorize(function(theta) dbeta(theta,4+2,6+2))
curve(pi4, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 2.55, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 8, b = 2
pi <- Vectorize(function(theta) dbeta(theta,8,2))
pi5 <- Vectorize(function(theta) dbeta(theta,4+8,6+2))
curve(pi5, xlab=~theta, ylab="Density", main="Beta prior: a=8, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.05, 3, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 2, b = 8
pi <- Vectorize(function(theta) dbeta(theta,2,8))
pi6 <- Vectorize(function(theta) dbeta(theta,4+2,6+8))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=8", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 3.5, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.5, b = 0.5
pi <- Vectorize(function(theta) dbeta(theta,0.5,0.5))
pi7 <- Vectorize(function(theta) dbeta(theta,4+0.5,6+0.5))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.5, b=0.5", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 3.5, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.005, b = 0.005
pi <- Vectorize(function(theta) dbeta(theta,0.005,0.005))
pi8 <- Vectorize(function(theta) dbeta(theta,4+0.005,6+0.005))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.005, b=0.005", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 3.5, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# All the posteriors together
curve(pi1, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, ylim=c(0,4.5),col=1,n=10000)
curve(pi2, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=2,n=10000)
curve(pi3, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=3,n=10000)
curve(pi4, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=4,n=10000)
curve(pi5, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=5,n=10000)
curve(pi6, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=6,n=10000)
curve(pi7, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=7,n=10000)
curve(pi8, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=8,n=10000)
# Properties of the posteriors
as <- 4 + c(1,0.3,0.3,2,8,2,0.5,0.005)
bs <- 6 + c(1,1,0.3,2,2,8,0.5,0.005)
PostProp1 <- cbind(1:8,MeanBeta(as,bs),ModeBeta(as,bs),qbeta(0.5,as,bs),VarianceBeta(as,bs))
colnames(PostProp1) <- c("Prior","Mean","Mode","Median","Variance")
print(kable(PostProp1,digits=4))
##
##
## Prior Mean Mode Median Variance
## ------ ------- ------- ------- ---------
## 1 0.4167 0.4000 0.4119 0.0187
## 2 0.3805 0.3548 0.3733 0.0192
## 3 0.4057 0.3837 0.3995 0.0208
## 4 0.4286 0.4167 0.4251 0.0163
## 5 0.6000 0.6111 0.6034 0.0114
## 6 0.3000 0.2778 0.2932 0.0100
## 7 0.4091 0.3889 0.4034 0.0201
## 8 0.4001 0.3752 0.3932 0.0218
Suppose that you toss a coin \(n=1000\) times, and that you observe \(r=400\) tails. What is the posterior probability of \(\theta\)?
As we found in the previous section, the posterior distribution is a Beta distribution with parameters \(a' = 400 +a\) and \(b' = 600 +b\). The following R code illustrates the shapes of the posterior distribution for the Beta priors presented in the previous sections. This example illustrates the little influence of the prior in large samples.
# a = 1, b = 1
pi <- Vectorize(function(theta) dbeta(theta,1,1))
pi1 <- Vectorize(function(theta) dbeta(theta,400+1,600+1))
curve(pi1, xlab=~theta, ylab="Density", main="Beta prior: a=1, b=1",lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.3, b = 1
pi <- Vectorize(function(theta) dbeta(theta,0.3,1))
pi2 <- Vectorize(function(theta) dbeta(theta,400+0.3,600+1))
curve(pi2, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=1", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.3, b = 0.3
pi <- Vectorize(function(theta) dbeta(theta,0.3,0.3))
pi3 <- Vectorize(function(theta) dbeta(theta,400+0.3,600+0.3))
curve(pi3, xlab=~theta, ylab="Density", main="Beta prior: a=0.3, b=0.3", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 2, b = 2
pi <- Vectorize(function(theta) dbeta(theta,2,2))
pi4 <- Vectorize(function(theta) dbeta(theta,400+2,600+2))
curve(pi4, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 8, b = 2
pi <- Vectorize(function(theta) dbeta(theta,8,2))
pi5 <- Vectorize(function(theta) dbeta(theta,400+8,600+2))
curve(pi5, xlab=~theta, ylab="Density", main="Beta prior: a=8, b=2", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 2, b = 8
pi <- Vectorize(function(theta) dbeta(theta,2,8))
pi6 <- Vectorize(function(theta) dbeta(theta,400+2,600+8))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=2, b=8", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.5, b = 0.5
pi <- Vectorize(function(theta) dbeta(theta,0.5,0.5))
pi7 <- Vectorize(function(theta) dbeta(theta,400+0.5,600+0.5))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.5, b=0.5", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# a = 0.005, b = 0.005
pi <- Vectorize(function(theta) dbeta(theta,0.005,0.005))
pi8 <- Vectorize(function(theta) dbeta(theta,400+0.005,600+0.005))
curve(pi6, xlab=~theta, ylab="Density", main="Beta prior: a=0.005, b=0.005", lwd=2,n=10000)
curve(pi, xlab=~theta,lwd=1,lty = 2, add=T,n=10000)
legend(0.7, 25, c("Posterior","Prior"), col=c("black","black"),
text.col = "black", lty = c(1, 2), lwd = c(2,1),
merge = TRUE, bg = "gray90",cex=1)
# All the posteriors together
curve(pi1, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, ylim=c(0,26),col=1,n=10000)
curve(pi2, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=2,n=10000)
curve(pi3, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=3,n=10000)
curve(pi4, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=4,n=10000)
curve(pi5, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=5,n=10000)
curve(pi6, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=6,n=10000)
curve(pi7, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=7,n=10000)
curve(pi8, xlab=~theta, ylab="Density", main="Posterior distribution",lwd=2, add=T,col=8,n=10000)
# Properties of the Posteriors
as <- 400 + c(1,0.3,0.3,2,8,2,0.5,0.005)
bs <- 600 + c(1,1,0.3,2,2,8,0.5,0.005)
PostProp1 <- cbind(1:8,MeanBeta(as,bs),ModeBeta(as,bs),qbeta(0.5,as,bs),VarianceBeta(as,bs))
colnames(PostProp1) <- c("Prior","Mean","Mode","Median","Variance")
print(kable(PostProp1,digits=4))
##
##
## Prior Mean Mode Median Variance
## ------ ------- ------- ------- ---------
## 1 0.4002 0.4000 0.4001 2e-04
## 2 0.3998 0.3996 0.3997 2e-04
## 3 0.4001 0.3999 0.4000 2e-04
## 4 0.4004 0.4002 0.4003 2e-04
## 5 0.4040 0.4038 0.4039 2e-04
## 6 0.3980 0.3978 0.3980 2e-04
## 7 0.4001 0.3999 0.4000 2e-04
## 8 0.4000 0.3998 0.3999 2e-04
Quantile credible intervals can be easily obtained by using the R command qbeta()
. For instance, if we want to construct a credible interval of \(95\%\), we need to calculate the \(2.5\%\) and \(97.5\%\) quantiles of the posterior distribution.
Suppose that you toss a coin \(n=5\) times, and that you observe \(r=3\) tails. Then, the posterior probability of \(\theta\) is a \(\mbox{Beta}(3.5,2.5)\). The following code shows how to calculate the \(95\%\) credible intervals in this case, as well as the posterior mean, posterior variance, and posterior standard deviation.
# Credible intervals
L <- qbeta(0.025,3.5,2.5) # lower limit
U <- qbeta(0.975,3.5,2.5) # upper limit
# 95% Credible Interval
print(c(L,U))
## [1] 0.2094167 0.9056097
# Posterior distribution
post <- Vectorize(function(theta) dbeta(theta,3.5,2.5))
# Illustration
x <- seq(0,1,length=10000)
y <- post(x)
# Plot x vs. y as a line graph
plot(x, y, type="l",xlab=~theta,ylab="Density",main="95% Credible Interval",
lwd=2, cex.axis=1.5, cex.lab=1.5)
polygon(c( L,x[x>=L & x <=U] , U ), c(0,y[x>=L & x <=U],0 ), col="green")
# Mean, Variance and SD
MeanBeta(3.5,2.5) # Posterior mean
## [1] 0.5833333
VarianceBeta(3.5,2.5) # Posterior variance
## [1] 0.03472222
sqrt(VarianceBeta(3.5,2.5)) # Posterior SD
## [1] 0.186339