Some classical algorithms for parameters estimation of the Genetic Linkage Model.

Model: 197 animals are distributed into four categories: Y=(y1,y2,y3,y4),according to the genetic linkage model((2+θ)/4,(1-θ)/4,(1-θ)/4,θ/4),we have the likelihood function:

L(θ|Y) = C * ((2+θ)/4)^y1 * ((1-θ)/4)^y2 * ((1-θ)/4)^y3 * (θ/4)^y4

Newton-Raphson algorithm

For genetic linkage model:

l(θ|Y)) = logL(θ|Y) = logC + 125*log((2+θ)/4)+38*log((1−θ)/4)+34*log(θ/4)

θ[i+1]=θ[i] - (y`|θ[i])/(y``|θ[i])
fun<-function(error=1e-7)
{
  theta<- 0.2
  k<-1
  while(k<=10000)
  {
    k<-k+1
    theta[k]<-theta[k-1]-((125/(2+theta[k-1])-38/(1-theta[k-1])+34/theta[k-1]))/
      (-125*((2+theta[k-1])^(-2))-38*((1-theta[k-1])^(-2))-34*((theta[k-1])^(-2)))
    if(abs(theta[k]-theta[k-1])<error) 
      break
  }
 cat("theta",theta)
 list(theta[k],k)
}
fun()
## theta 0.2 0.3917428 0.6130034 0.6270999 0.6268216 0.6268215 0.6268215
## [[1]]
## [1] 0.6268215
## 
## [[2]]
## [1] 7

The EM Algorithm

Augmented data (x1,x2,x3,x4,x5), s.t. x1+x2=125, x3=y2, x4=y3, x5=y4

E step:

Augmented posterior(under a flat prior) is proportional to

θ^(x2+x5) * (1-θ)^(x3+x4)

then

Q(θ,θ[i]) = E[(X2+X5)logθ + (x3+x4)log(1-θ) | θ[i],Y]

M step:

θ[i+1] = (159*θ[i]+68) / (197*θ[i]+144)
fun<-function(error=1e-7)
{
  theta<- runif(1,0,1)
  k<-1
  while(T)
     {
       k<-k+1
       theta[k]<-(159*theta[k-1]+68)/(197*theta[k-1]+144)
       if(abs(theta[k]-theta[k-1])<error) break
       }
       cat('theta',theta,'\n')
       list(theta<-theta[k],iter<-k-1)
     }
fun()
## theta 0.6944534 0.6353755 0.6279502 0.6269712 0.6268414 0.6268241 0.6268218 0.6268215 0.6268215
## [[1]]
## [1] 0.6268215
## 
## [[2]]
## [1] 8

Gibbs Sampler

Given z(here , z is presented of the latent data x2),sampling a value for θ (θ`) from augmented posterior(under a flat prior) , namely

p(θ|z`,Y) = Beta(z`+x5+1,x3+x4+1)

Then , sampling a value for z(z`) from the conditional predictive distribution:

p(z|θ`,Y) = Bi(125,θ` / (θ`+2))

We can also get x1=125-z

gibbs<-function (n, x3,x4,x5)   
{  
  mat <- matrix(ncol = 3, nrow = n)  
  theta<-runif(1,0,1)
  x1<-25
  x2 <- 100 
  mat[1, ] <- c(theta, x1,x2)  
  for (i in 2:n) {  
    theta <- rbeta(1,x2+x5+1,x3+x4+1) 
    x2 <- rbinom(1,125,theta/(theta+2)) 
    x1<-125-x2
    mat[i, ] <- c(theta, x1,x2)  
  }  
  mat  
}  
result<-gibbs(10000,18,20,34)  
summary(result)
##        V1               V2               V3        
##  Min.   :0.4229   Min.   : 25.00   Min.   : 12.00  
##  1st Qu.:0.5894   1st Qu.: 92.00   1st Qu.: 26.00  
##  Median :0.6246   Median : 95.00   Median : 30.00  
##  Mean   :0.6233   Mean   : 95.32   Mean   : 29.68  
##  3rd Qu.:0.6591   3rd Qu.: 99.00   3rd Qu.: 33.00  
##  Max.   :0.9777   Max.   :113.00   Max.   :100.00
hist(result[,1])

plot(result[,1],type="c")

x2<-mean(result[,3])
x2
## [1] 29.6834
x<-seq(0,1,length.out=10000)
x3<-18
x4<-20
x5<-34
plot(x,dbeta(x,x2+x5+1,x3+x4+1),type="l",
     xlab="theta",ylab="density",main="posterior density of theta by gibbs")

Metropolis Algorithm

pi(θ) = (2+θ)^125 * (1-θ)^38 * θ^34

is proportional to p(θ|Y),θ∈[0,1].

  1. If the chain is currently at θ[n]=θ,then generate a candidate value y for next location θ[n+1] from U[0,1].

  2. With probability a(θ,y)=min{pi(y)/pi(θ), 1},accept y and move the chain to θ[n+1]=y; Otherwise,reject and let θ[n+1]=θ.

N = 10000  
theta = vector(length = N)  
theta[1] = 0.1  
# uniform variable: u  
u = runif(N)  
for (i in 2:N)  
{  
  y = runif(1,0,1)  
 # print(y)  
  p_accept = dmultinom(x=c(125,18,20,34),prob=c((2+y)/4,(1-y)/4,
(1-y)/4,y/4))/ dmultinom(x=c(125,18,20,34),prob=c((2+theta[i-1])/4,
(1-theta[i-1])/4,(1-theta[i-1])/4,theta[i-1]/4))
  if ((u[i] <= p_accept))  
  {  
    theta[i] = y  
    #print("accept")  
  }  
  else  
  {  
    theta[i] = theta[i-1]  
    #print("reject")  
  }  
}  

summary(theta)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1000  0.5877  0.6226  0.6213  0.6541  0.8521
plot(theta,type ="l")   

hist(theta)