Some classical algorithms for parameters estimation of the Genetic Linkage Model.
Model: 197 animals are distributed into four categories: Y=(y1,y2,y3,y4),according to the genetic linkage model((2+θ)/4,(1-θ)/4,(1-θ)/4,θ/4),we have the likelihood function:
L(θ|Y) = C * ((2+θ)/4)^y1 * ((1-θ)/4)^y2 * ((1-θ)/4)^y3 * (θ/4)^y4
For genetic linkage model:
l(θ|Y)) = logL(θ|Y) = logC + 125*log((2+θ)/4)+38*log((1−θ)/4)+34*log(θ/4)
θ[i+1]=θ[i] - (y`|θ[i])/(y``|θ[i])
fun<-function(error=1e-7)
{
theta<- 0.2
k<-1
while(k<=10000)
{
k<-k+1
theta[k]<-theta[k-1]-((125/(2+theta[k-1])-38/(1-theta[k-1])+34/theta[k-1]))/
(-125*((2+theta[k-1])^(-2))-38*((1-theta[k-1])^(-2))-34*((theta[k-1])^(-2)))
if(abs(theta[k]-theta[k-1])<error)
break
}
cat("theta",theta)
list(theta[k],k)
}
fun()
## theta 0.2 0.3917428 0.6130034 0.6270999 0.6268216 0.6268215 0.6268215
## [[1]]
## [1] 0.6268215
##
## [[2]]
## [1] 7
Augmented data (x1,x2,x3,x4,x5), s.t. x1+x2=125, x3=y2, x4=y3, x5=y4
E step:
Augmented posterior(under a flat prior) is proportional to
θ^(x2+x5) * (1-θ)^(x3+x4)
then
Q(θ,θ[i]) = E[(X2+X5)logθ + (x3+x4)log(1-θ) | θ[i],Y]
M step:
θ[i+1] = (159*θ[i]+68) / (197*θ[i]+144)
fun<-function(error=1e-7)
{
theta<- runif(1,0,1)
k<-1
while(T)
{
k<-k+1
theta[k]<-(159*theta[k-1]+68)/(197*theta[k-1]+144)
if(abs(theta[k]-theta[k-1])<error) break
}
cat('theta',theta,'\n')
list(theta<-theta[k],iter<-k-1)
}
fun()
## theta 0.6944534 0.6353755 0.6279502 0.6269712 0.6268414 0.6268241 0.6268218 0.6268215 0.6268215
## [[1]]
## [1] 0.6268215
##
## [[2]]
## [1] 8
Given z(here , z
is presented of the latent data x2),sampling a value for θ (θ`) from augmented posterior(under a flat prior) , namely
p(θ|z`,Y) = Beta(z`+x5+1,x3+x4+1)
Then , sampling a value for z(z`) from the conditional predictive distribution:
p(z|θ`,Y) = Bi(125,θ` / (θ`+2))
We can also get x1=125-z
gibbs<-function (n, x3,x4,x5)
{
mat <- matrix(ncol = 3, nrow = n)
theta<-runif(1,0,1)
x1<-25
x2 <- 100
mat[1, ] <- c(theta, x1,x2)
for (i in 2:n) {
theta <- rbeta(1,x2+x5+1,x3+x4+1)
x2 <- rbinom(1,125,theta/(theta+2))
x1<-125-x2
mat[i, ] <- c(theta, x1,x2)
}
mat
}
result<-gibbs(10000,18,20,34)
summary(result)
## V1 V2 V3
## Min. :0.4229 Min. : 25.00 Min. : 12.00
## 1st Qu.:0.5894 1st Qu.: 92.00 1st Qu.: 26.00
## Median :0.6246 Median : 95.00 Median : 30.00
## Mean :0.6233 Mean : 95.32 Mean : 29.68
## 3rd Qu.:0.6591 3rd Qu.: 99.00 3rd Qu.: 33.00
## Max. :0.9777 Max. :113.00 Max. :100.00
hist(result[,1])
plot(result[,1],type="c")
x2<-mean(result[,3])
x2
## [1] 29.6834
x<-seq(0,1,length.out=10000)
x3<-18
x4<-20
x5<-34
plot(x,dbeta(x,x2+x5+1,x3+x4+1),type="l",
xlab="theta",ylab="density",main="posterior density of theta by gibbs")
pi(θ) = (2+θ)^125 * (1-θ)^38 * θ^34
is proportional to p(θ|Y),θ∈[0,1].
If the chain is currently at θ[n]=θ,then generate a candidate value y for next location θ[n+1] from U[0,1].
With probability a(θ,y)=min{pi(y)/pi(θ), 1},accept y and move the chain to θ[n+1]=y; Otherwise,reject and let θ[n+1]=θ.
N = 10000
theta = vector(length = N)
theta[1] = 0.1
# uniform variable: u
u = runif(N)
for (i in 2:N)
{
y = runif(1,0,1)
# print(y)
p_accept = dmultinom(x=c(125,18,20,34),prob=c((2+y)/4,(1-y)/4,
(1-y)/4,y/4))/ dmultinom(x=c(125,18,20,34),prob=c((2+theta[i-1])/4,
(1-theta[i-1])/4,(1-theta[i-1])/4,theta[i-1]/4))
if ((u[i] <= p_accept))
{
theta[i] = y
#print("accept")
}
else
{
theta[i] = theta[i-1]
#print("reject")
}
}
summary(theta)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1000 0.5877 0.6226 0.6213 0.6541 0.8521
plot(theta,type ="l")
hist(theta)