Conjugate Distributions 1. Bernoulli and Beta

Bernoulli distribution

Bernoulli distribution describes situation where only two possible outcomes 0/1 or failure/success.
At given lambda, x may take values shown as arrays berk (n=1000) and ber10 (n=10).

seek<-127
n<-1000
lambda<-0.3
berk<-rbinom(n,1,lambda)
head(berk,50)

##  [1] 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0
## [36] 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0

barplot(table(berk)/n,col=3,main="Bernoulli distribution n=1000 prior p=0.3")

ber10<-rbinom(10,1,lambda)
barplot(table(ber10)/10,col=4,main="Bernoulli distribution n=10 prior p=0.3")

Its probability for data xi=1 is λ (in the upper case 0.3).
We can make function named as Bern(x, λ), where vector x is series of 1 or 0.
It will back the value of product some λ (if xi=1) and/or some 1-λ (if xi=0).

Bern<-function(x,lambda){
  b<-1
  for(xi in x) b<- b* lambda^xi*(1-lambda)^(1-xi)
  b}
#
Bern(x=1,0.3)

## [1] 0.3

Bern(x=0,0.3)

## [1] 0.7

Bern(x=berk,0.3)

## [1] 5.77019e-269

Bern(x=ber10,0.3)

## [1] 0.0009529569

Maximum likelifood (ML)

Maximum likelihood of world λ may be gained form data X as followed.

#n=1000
f<-function(l)-(Bern(berk,l))
optimize(f,c(0,1))$minimum

## [1] 0.3079996

#n=10
f<-function(l)-(Bern(ber10,l))
optimize(f,c(0,1))$minimum

## [1] 0.399998

Likelihood distribution

Distribution of λ in the world (not only the maximum) may be estimated and visualized as a graph from the data x.

Likelifood.Bern<-function(x){
  Lf<-NULL
  for(lambda in seq(0,1,0.01)){
    l<-Bern(x,lambda)
    Lf<-rbind(Lf,c(lambda,l)) }
  Lf<-data.frame(lambda=Lf[,1],Likelifood=Lf[,2])
  Lf}
#
plot(Likelifood.Bern(berk),type="l",col=3,main="Likelihood distribution n=1000")

If your data is not large, then the peak of the curve will be dull.

plot(Likelifood.Bern(ber10),type="l",col=4,main="Likelihood distribution n=10")

Beta Distribution

Beta function could represent distribution of the value lambda.

Beta<-function(a,b,renge=seq(0,1,0.01)){
  prob<-beta(a,b)*renge^(a-1)*(1-renge)^(b-1)
  data.frame(lambda=renge,prb=prob)}

plot(Beta(a=2,b=4),type="l",col=2,main="Distribution of the prior λ estimated from Beta")

Conjugate Distributions

As you see Beta is conjugate to Bernouilli.
It means both have the same form
λ^A(1-λ)^B.
So
Bern(x,λ)Beta(a,b,λ)=k(x,c,d)Beta(c,d).
where k(x,c,d) is constant.
Previously we called Bern(x,λ) as likelihood. This time we may call Beta(a,b,λ) as previous distribution of λ.

Bayes’ rule

According to Bayes’ rule
Posterior = Liklihood * Prior / Evidence.
In upper case
Liklihood = Bern(x,λ); Pr(x|λ)
Prior = Beta(a,b,λ); Pr(λ)
Evidence = ∫Bern(x,λ)dλ

Maximum a posteriori estimation (MAP)

Remarking that Evidence = constant, MAP may be estimated as
arg max(g(x) = Bern(x,λ) Beta(a,b,renge))
so
arg min(f(x) = - Bern(x,λ) Beta(a,b,renge))

#n=1000
f<-function(l)-(Bern(berk,l)*Beta(a=2,b=4,renge=l))$prb
optimize(f,c(0,1))$minimum

## [1] 0.3077688

#n=10
f<-function(l)-(Bern(ber10,l)*Beta(a=2,b=4,renge=l))$prb
optimize(f,c(0,1))$minimum

## [1] 0.3571228

Posterior distribution

Posterior distribution of λ in the world (not only the MAP) may be estimated and visualized as a graph from the data x.

Posterior.Bern<-function(x,a,b,renge){
  prob<-NULL
  for(l in renge){
    p<-Bern(x=x,lambda=l)*Beta(a=a,b=b,renge=l)
    prob<-c(prob,p$prb)}
  sp<-sum(prob)      #1/k(x,c,d) const
  prob<- prob/sp     #Beta(c,d) prob
  data.frame(lambda=renge,prob=prob)}
#
Post<-Posterior.Bern(x=berk,a=2,b=4,renge=seq(0,1,0.01))
plot(Post$lambda,Post$prob,type="l",col=3,main="Posterior distribution of λ n=1000")

If your data is not large, then the peak of the curve will be dull also in the case of posterior estimate as in the case of the likelifood.

Post<-Posterior.Bern(x=ber10,a=2,b=4,renge=seq(0,1,0.01))
plot(Post$lambda,Post$prob,type="l",col=4,main="Posterior distribution of λ n=10")

Summary of functions

#Estimate Bernulli probability at the given lambda
Bern<-function(x,lambda){
  b<-1
  for(xi in x) b<- b* lambda^xi*(1-lambda)^(1-xi)
  b}
#Bernulli Likelifood along lambda sequence
Likelifood.Bern<-function(x){
  Lf<-NULL
  for(lambda in seq(0,1,0.01)){
    l<-Bern(x,lambda)
    Lf<-rbind(Lf,c(lambda,l)) }
  Lf<-data.frame(lambda=Lf[,1],Likelifood=Lf[,2])
  Lf}
#Distribution of the posterior probability along Bernulli lambda   
Posterior.Bern<-function(x,a,b,renge){
  prob<-NULL
  for(l in renge){
    p<-Bern(x=x,lambda=l)*Beta(a=a,b=b,renge=l)
    prob<-c(prob,p$prb)}
  sp<-sum(prob)      #1/k(x,c,d) const
  prob<- prob/sp     #Beta(c,d) prob
  data.frame(lambda=renge,prob=prob)}