Who is Bayes?

Thomas Bayes (1701-1761) was an English statistician, philosopher and Presbyterian minister.

Thomas Bayes (_photo from https://en.wikipedia.org/wiki/Thomas_Bayes_)

1763: “An Essay towards solving a Problem in the Doctrine of Chances” presented to the Royal Society in 1763 after his death. This essay contains a statement of a special case of Bayesian theorem.

Traditional vs. Modern Bayesian analysis

Bayesian probability is an interpretation of probability, according to which probability is a partial knowledge or belief (used since 1950s), rather than a frequency.
There are two views on Bayesian probability:
- Objectivism: the rules of Bayesian statistics can be theoretically derived as an extension of logic
- Subjectivism: Bayesian probability quantities a “personal belief” or level of knowledge
After 1950s, applications of Bayesian analysis were limited by a few simple models (conjugate distributions). Mostly it was a subject of philosophical discussions between “frequentists” and “Bayesianists”. But since the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the advancements in computing technology and discovery of methods like MCMC, Gibbs sampling, EM algorithm, Dirichlet process models, etc \(\Rightarrow\) Modern Bayesian analysis
These advancements removed computational restrictions and allowed the spread of applications of Bayesian methods to many important areas and now the methods are widely accepted and used in the most fields.

1 Joint Distribution: 3-Dimensional Case

A joint probability distribution shows a probability distribution for two (or more) random variables. Instead of events being labeled \(A\) and \(B\), the norm is to use \(X\) and \(Y\) as 2 random variables. The formal definition is:

\(f(x,y) = P(X = x, Y = y)\) for discrete random variables (joint mass distribution)

\(f(x,y) = P(X \le x, Y \le y)\) for continuous random variables (joint density distribution)

1.1 Creating 3-way array

Simulating a 3-dimensional distribution table.

u <- c("u1", "u2")
v <- c("v1", "v2", "v3")
w <- c("w1", "w2", "w3")

matr.u0 <- paste("u0", outer(v, w, FUN = "paste", sep = ","), sep = ",")
dim(matr.u0) <- c(3,3)
matr.u0

##      [,1]       [,2]       [,3]      
## [1,] "u0,v1,w1" "u0,v1,w2" "u0,v1,w3"
## [2,] "u0,v2,w1" "u0,v2,w2" "u0,v2,w3"
## [3,] "u0,v3,w1" "u0,v3,w2" "u0,v3,w3"

matr.u1 <- paste("u1", outer(v, w, FUN = "paste", sep = ","), sep = ",")
dim(matr.u1) <- c(3,3)
matr.u1

##      [,1]       [,2]       [,3]      
## [1,] "u1,v1,w1" "u1,v1,w2" "u1,v1,w3"
## [2,] "u1,v2,w1" "u1,v2,w2" "u1,v2,w3"
## [3,] "u1,v3,w1" "u1,v3,w2" "u1,v3,w3"

df3w <- read.csv("df3w.csv")
head(df3w)

mat.u0 <- table(subset(df3w,u==0)[,1],subset(df3w,u==0)[,2]) # w, v table
mat.u1 <- table(subset(df3w,u==1)[,1],subset(df3w,u==1)[,2])
mat.u0

##    
##     1 2 3
##   1 9 7 1
##   2 8 2 1
##   3 4 2 0

mat.u1

##    
##      1  2  3
##   1 23  9  1
##   2 12  8  1
##   3  4  7  1

Check elements of mat.u1 and name the columns and rows of both matrices.

idx.v1<-df3w$v==1
idx.w1<-df3w$w==1
idx.u1<-df3w$u==1
sum(idx.v1*idx.w1*idx.u1) #element (1,1) of mat.u1

## [1] 23

idx.v2<-df3w$v==2
sum(idx.v2*idx.w1*idx.u1) #element (1,2) of mat.u1

## [1] 9

idx.w2<-df3w$w==2
sum(idx.v1*idx.w2*idx.u1) #element (2,1) of mat.u1

## [1] 12

colnames(mat.u1)<-colnames(mat.u0)<-c("v1","v2","v3")
rownames(mat.u1)<-rownames(mat.u0)<-c("w1","w2","w3")

df3w.array<-array(rep(NA,18),dim=c(3,3,2),dimnames=list(paste("w",1:3,sep=""),
                                                            paste("v",1:3,sep=""),
                                                            paste("u",0:1,sep="")))
df3w.array[,,1]<-mat.u0
df3w.array[,,2]<-mat.u1
df3w.array

## , , u0
## 
##    v1 v2 v3
## w1  9  7  1
## w2  8  2  1
## w3  4  2  0
## 
## , , u1
## 
##    v1 v2 v3
## w1 23  9  1
## w2 12  8  1
## w3  4  7  1

Create 3-dimensional joint distribution.

N<-sum(df3w.array)
(df3w.array.p<-df3w.array/N)

## , , u0
## 
##      v1   v2   v3
## w1 0.09 0.07 0.01
## w2 0.08 0.02 0.01
## w3 0.04 0.02 0.00
## 
## , , u1
## 
##      v1   v2   v3
## w1 0.23 0.09 0.01
## w2 0.12 0.08 0.01
## w3 0.04 0.07 0.01

1.2 Marginal Distributions

Create marginal distribution for \(u\) as vector uMarginal.

margin.u0 <- margin.table(df3w.array.p[,,1])
margin.u1 <- margin.table(df3w.array.p[,,2])
uMarginal <- c(margin.u0,margin.u1)
names(uMarginal)<-c("u0","u1")
uMarginal

##   u0   u1 
## 0.34 0.66

Create marginal distribution for \(v\) as vector vMarginal.

#(v.margin.u0 <- margin.table(df3w.array.p[,,1],2))
#apply(df3w.array.p[,,1], 1, FUN=sum)
#(v.margin.u1 <- margin.table(df3w.array.p[,,2],2))
(vMarginal <- margin.table(df3w.array.p[,,],2))

##   v1   v2   v3 
## 0.60 0.35 0.05

vMarginal['v1']

##  v1 
## 0.6

Create marginal distribution for \(w\) as vector wMarginal.

#(w.margin.u0 <- margin.table(df3w.array.p[,,1],1))
#apply(df3w.array.p[,,1], 2, FUN=sum)
#(w.margin.u1 <- margin.table(df3w.array.p[,,2],1))
(wMarginal <- margin.table(df3w.array.p[,,],1))

##   w1   w2   w3 
## 0.50 0.32 0.18

wMarginal['w1']

##  w1 
## 0.5

1.3 Conditional Distributions

Create conditional distribution \(p(w,v|u=1)\) as matrix cond.v.w.given.u1.

margin.u1 <- margin.table(df3w.array.p[,,2])
cond.v.w.given.u1 <- df3w.array.p[,,2]/margin.u1

Check:

cond.v.w.given.u1['w1',]

##         v1         v2         v3 
## 0.34848485 0.13636364 0.01515152

sum(cond.v.w.given.u1)

## [1] 1

Create conditional distribution \(p(v|u=1)\) as vector cond.v.given.u1

cond.v.given.u1 <- margin.table(df3w.array.p[,,2],2)/margin.u1

Check:

cond.v.given.u1['v1']

##        v1 
## 0.5909091

sum(cond.v.given.u1) #check

## [1] 1

Create conditional distribution \(p(w|v=2,u=1)\) as vector cond.w.given.u1.v2.

margin.v2.u1 <- sum(df3w.array.p[,2,2])
cond.w.given.u1.v2 <- df3w.array.p[,2,2]/margin.v2.u1

Check:

cond.w.given.u1.v2['w1']

##    w1 
## 0.375

sum(cond.w.given.u1.v2)   #check

## [1] 1

Compare the vectors \(p(w|v2,u1)p(v2|u1)p(u1)\) and \(p(w,v,u)[,v2,u1]\)

rbind(uMarginal["u1"]*cond.v.given.u1["v2"]*cond.w.given.u1.v2,df3w.array.p[,"v2","u1"])

##        w1   w2   w3
## [1,] 0.09 0.08 0.07
## [2,] 0.09 0.08 0.07

In general, \(p(w,v,u)=p(w|v,u)p(v|u)p(u)\).

2 Simulation Using Conditional Distributions

Let the marginal distribution for random variable \(u\) be Bernoulli with \(p(u=0)=0.55\), \(p(u=1)=0.45\). Let conditional distributions for random variables \((v|u=0)\) and \((v|u=1)\), taking values \(1,2,3\) be

(pCond.v.given.u0<-c(.7,.2,.1))

## [1] 0.7 0.2 0.1

(pCond.v.given.u1<-c(.1,.2,.7))

## [1] 0.1 0.2 0.7

Let random variable \((w|v,u)\) take values \(1,2,3\) with probabilities \(p(w|v,u)\), given by the following:

p.given.u0.v1<-c(.3,.3,.4)
p.given.u0.v2<-c(.5,.3,.2)
p.given.u0.v3<-c(.6,.2,.2)
p.given.u1.v1<-c(.2,.3,.5)
p.given.u1.v2<-c(.2,.2,.6)
p.given.u1.v3<-c(.1,.7,.2)

Simulate joint sample \((w,v,u)\) of lenth \(n=500\).

Use set.seed(11) Start with simulation of \(u\).

For each simulated value \(u\) generate \(v\) from the corresponding conditional distribution \(p(v|u)\).

Finally, for each pair \(v,u\) simulate \(w\) from \(p(w|v,u)\).

p.u0=0.55
p.u1=0.45
v1=pCond.v.given.u0[1]*p.u0+pCond.v.given.u1[1]*p.u1
v2=pCond.v.given.u0[2]*p.u0+pCond.v.given.u1[2]*p.u1
v3=pCond.v.given.u0[3]*p.u0+pCond.v.given.u1[3]*p.u1
probs.v=c(v1,v2,v3) # 0.43 0.20 0.37
sum(v1,v2,v3) # check v

## [1] 1

w1=p.given.u0.v1[1]*pCond.v.given.u0[1]*p.u0+
  p.given.u0.v2[1]*pCond.v.given.u0[2]*p.u0+
  p.given.u0.v3[1]*pCond.v.given.u0[3]*p.u0+
  p.given.u1.v1[1]*pCond.v.given.u1[1]*p.u1+
  p.given.u1.v2[1]*pCond.v.given.u1[2]*p.u1+
  p.given.u1.v3[1]*pCond.v.given.u1[3]*p.u1
w2=p.given.u0.v1[2]*pCond.v.given.u0[1]*p.u0+
  p.given.u0.v2[2]*pCond.v.given.u0[2]*p.u0+
  p.given.u0.v3[2]*pCond.v.given.u0[3]*p.u0+
  p.given.u1.v1[2]*pCond.v.given.u1[1]*p.u1+
  p.given.u1.v2[2]*pCond.v.given.u1[2]*p.u1+
  p.given.u1.v3[2]*pCond.v.given.u1[3]*p.u1
w3=p.given.u0.v1[3]*pCond.v.given.u0[1]*p.u0+
  p.given.u0.v2[3]*pCond.v.given.u0[2]*p.u0+
  p.given.u0.v3[3]*pCond.v.given.u0[3]*p.u0+
  p.given.u1.v1[3]*pCond.v.given.u1[1]*p.u1+
  p.given.u1.v2[3]*pCond.v.given.u1[2]*p.u1+
  p.given.u1.v3[3]*pCond.v.given.u1[3]*p.u1
probs.w=c(w1,w2,w3)
sum(w1,w2,w3) # check w

## [1] 1

set.seed(11)
sim.u = rbinom(500,1,.45)
sim.v = rMultinom(rbind(c(0.43,0.20,0.37)), 500)
#t(apply(sim.v, 1, table)/500)
sim.v = as.vector(sim.v)
sim.w = rMultinom(rbind(c(0.262,0.4115,0.3265)), 500)
sim.w = as.vector(sim.w)


head(cbind(sim.w,sim.v,sim.u),25)

##       sim.w sim.v sim.u
##  [1,]     3     1     0
##  [2,]     3     1     0
##  [3,]     1     2     0
##  [4,]     3     1     0
##  [5,]     2     3     0
##  [6,]     3     3     1
##  [7,]     3     3     0
##  [8,]     3     1     0
##  [9,]     1     2     1
## [10,]     3     1     0
## [11,]     3     1     0
## [12,]     3     3     0
## [13,]     1     3     1
## [14,]     3     3     1
## [15,]     2     1     1
## [16,]     3     3     1
## [17,]     2     1     0
## [18,]     3     1     0
## [19,]     1     2     0
## [20,]     2     2     0
## [21,]     3     3     0
## [22,]     3     1     1
## [23,]     3     1     0
## [24,]     2     3     0
## [25,]     2     2     0

3 Conditional Expected Values

Calculate unconditional expected value \(E[v]\). Random variable \(v\) can take values \(1,2,3\) with corresponding probabilities

vMarginal

##   v1   v2   v3 
## 0.60 0.35 0.05

Then the unconditional mean value is

c(1,2,3)%*%vMarginal

##      [,1]
## [1,] 1.45

Calculate conditional expected value \(E[v \mid u]\).

First, find conditional mean values \(E[v \mid u=u0]\) = \(E[v \mid u0]\) and \(E[v \mid u=u1]\) = \(E[v \mid u1]\).

The random variable \((v \mid u0)\) takes values 1,2,3 with corresponding probabilities \(p(v=1 \mid u0)\), \(p(v=2 \mid u0)\), \(p(v=3 \mid u0)\), given by the vector

(cond.v.given.u0<-apply(df3w.array.p[,,"u0"],2,sum)/uMarginal["u0"])

##         v1         v2         v3 
## 0.61764706 0.32352941 0.05882353

Taking conditional expected value with respect to this distribution \(E[v \mid u0]\) is:

(exp.v.given.u0<-c(1,2,3)%*%cond.v.given.u0)

##          [,1]
## [1,] 1.441176

The random variable \((v|u1)\) takes the same values 1,2,3, but with different probabilities \(p(v|u1)\):

cond.v.given.u1

##         v1         v2         v3 
## 0.59090909 0.36363636 0.04545455

Thus conditional expected value \(E[v|u1]\) is

(exp.v.given.u1<-c(1,2,3)%*%cond.v.given.u1)

##          [,1]
## [1,] 1.454545

This means that \(E[v|u]\) is a random variable and its (unconditional) expected value is \[E[v|u] = E[v|u0] p(u=u0) + E[v|u1] p(u=u1)\]

Calculate this unconditional expected value:

(uncond.exp.v.given.u<-exp.v.given.u0*uMarginal["u0"]+exp.v.given.u1*uMarginal["u1"])

##      [,1]
## [1,] 1.45

4 Case Study: Smokers Analysis

The following data set is in library(faraway).
In it participating women are categorized into:
- groups of smokers or not,
- 7 age groups and
- groups of dead or alive
after 20 years of study.

library(faraway)
data(femsmoke)
femsmoke

4.1 Create joint distribution of 3 counts variables:

v - “smoker yes”=1 or “smoker no”=0
u - “dead yes”=1 or “dead no”=0
w - age category

\(w\) = 1: 18-24

\(w\) = 2: 25-34

\(w\) = 3: 35-44

\(w\) = 4: 45-54

\(w\) = 5: 55-64

\(w\) = 6: 65-74

\(w\) = 7: 75+

matr.dead.no=xtabs(y~smoker+age,data=femsmoke[which(femsmoke$dead=="no"),])
matr.dead.yes=xtabs(y~smoker+age,data=femsmoke[which(femsmoke$dead=="yes"),])
matr.dead.no

##       age
## smoker 18-24 25-34 35-44 45-54 55-64 65-74 75+
##    yes    53   121    95   103    64     7   0
##    no     61   152   114    66    81    28   0

matr.dead.yes

##       age
## smoker 18-24 25-34 35-44 45-54 55-64 65-74 75+
##    yes     2     3    14    27    51    29  13
##    no      1     5     7    12    40   101  64

femsmoke.array<-array(rep(NA,28),dim=c(2,7,2),
                      dimnames=list(paste("smoker",c("yes","no"),sep="."),
                                    paste("age",1:7,sep="."),
                                    paste("dead",c("yes","no"),sep=".")))
femsmoke.array[,,1]<-matr.dead.yes
femsmoke.array[,,2]<-matr.dead.no
femsmoke.array

## , , dead.yes
## 
##            age.1 age.2 age.3 age.4 age.5 age.6 age.7
## smoker.yes     2     3    14    27    51    29    13
## smoker.no      1     5     7    12    40   101    64
## 
## , , dead.no
## 
##            age.1 age.2 age.3 age.4 age.5 age.6 age.7
## smoker.yes    53   121    95   103    64     7     0
## smoker.no     61   152   114    66    81    28     0

N<-sum(femsmoke.array)
femsmoke.joint.p<-femsmoke.array/N
femsmoke.joint.p

## , , dead.yes
## 
##                  age.1       age.2       age.3      age.4      age.5      age.6
## smoker.yes 0.001522070 0.002283105 0.010654490 0.02054795 0.03881279 0.02207002
## smoker.no  0.000761035 0.003805175 0.005327245 0.00913242 0.03044140 0.07686454
##                  age.7
## smoker.yes 0.009893455
## smoker.no  0.048706240
## 
## , , dead.no
## 
##                 age.1      age.2      age.3      age.4      age.5       age.6
## smoker.yes 0.04033486 0.09208524 0.07229833 0.07838661 0.04870624 0.005327245
## smoker.no  0.04642314 0.11567732 0.08675799 0.05022831 0.06164384 0.021308980
##            age.7
## smoker.yes     0
## smoker.no      0

sum(femsmoke.joint.p)

## [1] 1

4.2 Marginal distributions

Create marginal distribution for \(u\) (dead).

uDead  <- margin.table(femsmoke.joint.p[,,1]) #dead.yes
uAlive <- margin.table(femsmoke.joint.p[,,2]) #dead.no

uMarginal.dead <- c(uAlive,uDead)
names(uMarginal.dead)<-c("uAlive","uDead")
uMarginal.dead

##    uAlive     uDead 
## 0.7191781 0.2808219

sum(uMarginal.dead)

## [1] 1

Create marginal distribution for \(v\) (smoke).

vMarginal.smoke <- margin.table(femsmoke.joint.p[,,],1)
vMarginal.smoke

## smoker.yes  smoker.no 
##  0.4429224  0.5570776

sum(vMarginal.smoke)

## [1] 1

Create marginal distribution for \(w\).

wMarginal.age <- margin.table(femsmoke.joint.p[,,],2)
wMarginal.age

##     age.1     age.2     age.3     age.4     age.5     age.6     age.7 
## 0.0890411 0.2138508 0.1750381 0.1582953 0.1796043 0.1255708 0.0585997

sum(wMarginal.age)

## [1] 1

4.3 Conditional distributions

Create conditional distribution \(p(w,v|u="alive")=p(smoke,age|alive)\).

cond.v.w.given.uAlive <- femsmoke.joint.p[,,2]/uAlive
cond.v.w.given.uAlive

##                 age.1     age.2     age.3      age.4      age.5       age.6
## smoker.yes 0.05608466 0.1280423 0.1005291 0.10899471 0.06772487 0.007407407
## smoker.no  0.06455026 0.1608466 0.1206349 0.06984127 0.08571429 0.029629630
##            age.7
## smoker.yes     0
## smoker.no      0

sum(cond.v.w.given.uAlive)

## [1] 1

Create conditional distribution \(p(v|u="alive")=p(smoke|alive)\)

cond.v.given.uAlive <- margin.table(femsmoke.joint.p[,,2],1)/uAlive
cond.v.given.uAlive

## smoker.yes  smoker.no 
##  0.4687831  0.5312169

sum(cond.v.given.uAlive)

## [1] 1

Create conditional distribution \(p(w|u="alive",v="smoker")=p(age|alive,smoke)\)

uAlive.vSmoke <- sum(femsmoke.joint.p[1,,2])
cond.w.given.uAlive.vSmoke <- femsmoke.joint.p[1,,2]/uAlive.vSmoke
cond.w.given.uAlive.vSmoke

##      age.1      age.2      age.3      age.4      age.5      age.6      age.7 
## 0.11963883 0.27313770 0.21444695 0.23250564 0.14446953 0.01580135 0.00000000

sum(cond.w.given.uAlive.vSmoke)

## [1] 1

Compare the vectors \(p(w|v2,u1)p(v2|u1)p(u1)\) and \(p(w,v,u)[,v2,u1]\)

rbind(uMarginal.dead["uAlive"]*cond.v.given.uAlive["smoker.yes"]*cond.w.given.uAlive.vSmoke,
      femsmoke.joint.p["smoker.yes",,"dead.no"])

##           age.1      age.2      age.3      age.4      age.5       age.6 age.7
## [1,] 0.04033486 0.09208524 0.07229833 0.07838661 0.04870624 0.005327245     0
## [2,] 0.04033486 0.09208524 0.07229833 0.07838661 0.04870624 0.005327245     0

4.4 Simulation

Let the marginal distribution for age group be \(p(w)\) estimated marginal distribution from the sample:

wMarginal.age

##     age.1     age.2     age.3     age.4     age.5     age.6     age.7 
## 0.0890411 0.2138508 0.1750381 0.1582953 0.1796043 0.1255708 0.0585997

Given simulated age group, simulate variable \(v\) using conditional distribution \(p(v|w)\), i.e. using probabilities \(p(smoke.yes|age),~p(smoke.no|age)\).

Given simulated variables for age and for smoke, simulate mortality variable using distribution \(p(dead|v,w),~p(alive|v,w)\).

Using the described procedure simulate outcomes for 100 participants.
Use seed set.seed(8420) for comparison.

set.seed(8420)

sim.ages = rMultinom(rbind(c(0.0890411,0.2138508,0.1750381,0.1582953,0.1796043,0.1255708,0.0585997)), 100)
sim.ages = as.vector(sim.ages)
sim.smoker = rbinom(100,1,0.4429224)
sim.dead = rbinom(100,1,0.2808219)
simulatedData = data.frame(sim.ages,sim.smoker,sim.dead)
head(simulatedData,25)

5 Cromwell’s Rule, Bayesian Convergence and Divergence

5.1 Cromwell’s Rule

The term Cromwell’s Rule states that “the use of prior probabilities of 1 (”the event will definitely occur”) or 0 (“the event will definitely not occur”) should be avoided, except when applied to statements that are logically true or false, such as 2+2 equaling 4 or 5.”

This is a very “Bayesian” statement: no matter how implausible might a hypothsis seem, always assign to it at least a tiny chance in the form of the prior probability.

Assigning prior as zero will not give any chance for the posterior to be anything different from zero, no matter how strong the evidence is.

5.2 Bayesian Convergence

This example is a reconstruction of the graph on Figure 8.8 in “The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t”, Nate Silver, 2020.

Two investors want to understand if the market is in bull mode or not. They agreed on the following definitions of bull market and no-bull (bear) market:

Market price changes follow binomial distribution, where success (1) means uptick and failure (0) means downtick
In bull market probability of success is 65%
In bear market probability of success is 35%

One of the investors is an optimist and believes with confidence 80% that the market is in bull mode.
The other investor is a pessimist and believes in bull market only with confidence of 10%.
Both of them are somewhat open minded and have their believes change a little after daily observations of the market: both become a little more optimistic on each uptick day and a little less optimistic on downtick day.

Observe evolution of their beliefs on a simulated market history.

Set prior beliefs of both investors as probabilities of bull market \(P\left(B \right)\)

p_optimist = .8
p_pessimist = .1

Formalize definitions of bull market \(B\) and bear market \(\bar{B}\) as conditional probabilities of uptick \(U\): \[P\left(U|B\right) = 0.65,\] \[P\left(U|\bar{B}\right) = 0.35.\]

pUp_bull = .65
pUp_no_bull = .35

Simulate sequence of upticks and downticks from binomial distribution with probability of success only slightly greater than 0.5, i.e. not quite a bull market.
Transform the binomial sequence into a market trajectory using cumulative sum.

num_market_obs = 40
p_market = .53
set.seed(5)
binomial_sample <- rbinom(n=num_market_obs, size=1, prob=p_market)
market_obs = cumsum((binomial_sample - .5)/10)+.2
plot(market_obs,type = "l")

Formalize the learning process using Bayes’ theorem for probability of bull market, given the data evidence \(P \left(B|D \right)\): \[P \left(B|D \right) = \frac{P \left(D|B \right)P \left(B \right)}{P \left(D|B \right)P \left(B \right)+P \left(D|\bar{B} \right)P \left(\bar{B} \right)}.\]
Here \(D\) is either 1 (uptick) or 0 (downtick).

Define a function taking one market observation (as binomial sample), prior probability of bull market and definitions of bull and bear market. The function returns posterior probability of bull market.

p_post <- function(dat, p_prior, pUp_b, pUp_no_b){
  if (dat == 1){
    pos_prob = pUp_b*p_prior/(pUp_b*p_prior+pUp_no_b*(1-p_prior))
  } else {
    pos_prob = (1-pUp_b)*p_prior/((1-pUp_b)*p_prior+(1-pUp_no_b)*(1-p_prior))
  }
  return(pos_prob)
}

Extension requests can be made for minimum of one semester and maximum of one year at a time.

cat('Optimist prior for bull market: ',p_optimist,
      '\nOptimist posterior after uptick: ',p_post(1,p_optimist,pUp_bull,pUp_no_bull),
      '\nOptimist posterior after downtick: ',p_post(0,p_optimist,pUp_bull,pUp_no_bull))

## Optimist prior for bull market:  0.8 
## Optimist posterior after uptick:  0.8813559 
## Optimist posterior after downtick:  0.6829268

Calculate and plot updates of the learning process by both investors as market evidence is observed day by day.

learning_process <- function(price_changes, p_prior, pUp_b, pUp_no_b){
  investor_learning = p_prior
  for (ind in 1:length(price_changes)){
    post_prob <- p_post(price_changes[ind], tail(investor_learning,1), pUp_b, pUp_no_b)
    investor_learning <- append(investor_learning, post_prob)
  }
  return(investor_learning[-1])
}

optimist_learning <- learning_process(binomial_sample, p_optimist, pUp_bull, pUp_no_bull)
pessimist_learning <- learning_process(binomial_sample, p_pessimist, pUp_bull, pUp_no_bull)

plot(1:length(binomial_sample), optimist_learning, type="o", col="blue", pch="o", ylim = c(0,1),  lty=2, xlab = "", ylab = "")
points(1:length(binomial_sample), pessimist_learning, col="orange", pch="*")
lines(1:length(binomial_sample), pessimist_learning, col="orange", lty=2)
points(1:length(binomial_sample), market_obs, col="black", pch="+")
lines(1:length(binomial_sample), market_obs, col="black", lty=2)
legend(32,0.3,legend=c("optimist","pessimist","market"), col=c("blue","orange","black"),
                                   pch=c("o","*","+"), lty=c(1,2,3), ncol=1)

Run the same experiment for the market conditions closer to the bull market definition with probability of uptick equal 60%.

p_market = .6
set.seed(5)
binomial_sample <- rbinom(n=num_market_obs, size=1, prob=p_market)
market_obs = cumsum((binomial_sample - .5)/10)+.2
plot(market_obs,type = "l")

optimist_learning <- learning_process(binomial_sample, p_optimist, pUp_bull, pUp_no_bull)
pessimist_learning <- learning_process(binomial_sample, p_pessimist, pUp_bull, pUp_no_bull)

plot(1:length(binomial_sample), optimist_learning, type="o", col="blue", pch="o", ylim = c(0,1),  lty=2, xlab = "", ylab = "")
points(1:length(binomial_sample), pessimist_learning, col="orange", pch="*")
lines(1:length(binomial_sample), pessimist_learning, col="orange", lty=2)
points(1:length(binomial_sample), market_obs, col="black", pch="+")
lines(1:length(binomial_sample), market_obs, col="black", lty=2)
legend(32,0.3,legend=c("optimist","pessimist","market"), col=c("blue","orange","black"),
                                   pch=c("o","*","+"), lty=c(1,2,3), ncol=1)

This example demonstrates the effect called Bayesian convergence. Individuals with different prior beliefs presented with the same evidence and having a coherent Bayesian learning ability converge to the same conclusions.

5.3 Bayesian Divergence

This example is a manifestation of the Cromwell’s rule.
Two individuals meet a stranger who has 3 coins: two of them are fair and one unfair, having “heads” on both sides.

The stranger selects one coin and tosses it 3 times, showing all 3 times “heads”.

The first individual, Tim, believes that the stranger picked the coin randomly with prior probability \(\frac{1}{3}\) assigned to each of the coins.

Another individual, Susan, believes that the stranger selected one of the 2 fair coins, leaving prior probability of the unfair coin equal to 0.

Find Tim’s posterior probability of unfair coin selection, given the evidence.
Calculate Susan’s posterior probability of unfair coin, given the evidence.

The stranger tossed the same coin 2 more times and observed 2 more heads. How did Tim’s and Susan’s posterior probabilities change?

The stranger tossed the same coin one more time showing “tails”. What happens with the posterior probabilities?

6 References

Adaptive from lecture note UC Bayesian Methods
Gelman, A., et al. (2013). Bayesian Data Analysis, Third Edition, 3rd Edition, CRC Press.
Kruschke, J. K. (2015). Doing Bayesian data analysis : a tutorial with R, JAGS, and Stan. Amsterdam, Academic Press is an imprint of Elsevier.

Introduction to Bayes theorem

Hai Nguyen

Last compiled on 07 January, 2023

1 Joint Distribution: 3-Dimensional Case

1.1 Creating 3-way array

1.2 Marginal Distributions

1.3 Conditional Distributions

2 Simulation Using Conditional Distributions

3 Conditional Expected Values

4 Case Study: Smokers Analysis

4.1 Create joint distribution of 3 counts variables:

4.2 Marginal distributions

4.3 Conditional distributions

4.4 Simulation

5 Cromwell’s Rule, Bayesian Convergence and Divergence

5.1 Cromwell’s Rule

5.2 Bayesian Convergence

5.3 Bayesian Divergence

6 References