Soal
Pendugaan parameter pada data genetika (Rao 1973, hal 369), diasumsikan data phenotype y=(y1,y2,y3,y4)=(125,18,20,34) memiliki distribusi multinomial dengan peluang
\[\frac{1}{2}+\frac{\theta}{4},\frac{1-\theta}{4},\frac{1-\theta}{4},\frac{\theta}{4}\]
sehingga fungsi peluang bagi y adalah
\[g(y|\theta)=\frac{y_1+y_2+y_3+y_4}{y_1!y_2!y_3!y_4}\left( \frac{1}{2}+\frac{\theta}{4} \right)^{y_1}\left( \frac{1-\theta}{4} \right)^{y_2}\left( \frac{1-\theta}{4} \right)^{y_3}\left( \frac{\theta}{4} \right)^{y_4}\]
Fungsi log-likelihood bagi y adalah
\[logL(\theta,y)=y1log2+\theta+(y2+y3)log1???\theta+y4log \theta\]
Misalkan y data taklengkap dari x=(x1,.,x5) dengan peluang multinomial sebesar
\[\frac{1}{2},\frac{\theta}{4},\frac{1-\theta}{4},\frac{1-\theta}{4},\frac{\theta}{4}\]
di mana y1=x1+x2, y2=x3, y3=x4 dan y4=x5, sehingga diperoleh fungsi peluang x sebagai berikut
\[g(x|\theta)=\frac{x_1+x_2+x_3+x_4+x_5}{x_1!x_2!x_3!x_4!x_5}\left( \frac{1}{2} \right)^{x_1}\left( \frac{\theta}{4} \right)^{x_2}\left( \frac{1-\theta}{4} \right)^{x_3}\left( \frac{1-\theta}{4} \right)^{x_4}\left( \frac{\theta}{4} \right)^{x_5}\]
Fungsi log-likelihood dari x adalah
\[logL(\theta,x)=(x2+x5)log \theta+(x3+x4)log1- \theta\]
Dugalah data hilang x2 dengan EM algorithm!
Jawaban
a. Tahap E
\[Q(\theta)=E(x2+x5|y,\theta^{(0)}log\theta+E(x3+x4|y,\theta^{(0)})log1???\theta =[E(x2|y,\theta^{(0)})+x5]log\theta+(x3+x4)log1???\theta\]
Pendugaan x2 diperoleh sebagai berikut:
\[\hat{x^p_2}=E(x2|y,\theta^{(0)})\]
karena x1+x2=y1, distribusi bersyarat dari x2|y1 adalah binomial dengan parameter y1=125 dan peluang
\[p=\frac{\frac {1}{4}\theta{p}}{\frac{1}{2}+\frac {1}{4}\theta{p}}\]
jadi,
\[\hat{x^p_2}=E(x2|y,\theta^{(0)})={y_1}p\]
\[\hat{x^p_2}=125 \frac{\frac {1}{4}\theta{p}}{\frac{1}{2}+\frac {1}{4}\theta{p}}\]
b. Tahap M
\[Q(\theta)=[E(x2|y,\theta^{(0)})+x5]log \theta+(x3+x4)log1-{\theta}\] \[ = [++ x_5] + (x_3+ x_4)\]
Memaksimumkan \(Q(\theta)\) sehingga diperoleh hasil
\[\frac{dQ(\theta)}{d(\theta)}=0\] \[\frac{\hat{x_2}+{x_5}}{\theta}-\frac{{x_3}+{x_4}}{1-\theta}=0\] \[\hat {\theta}= \frac{\hat{x_2}+{x_5}}{\hat{x_2}+{x_3}+{x_4}+{x_5}}\]
expectation <- function(theta){
125*( ( (1/4)*theta ) / ( (1/2) + (1/4)*theta) )
}
maximization <- function(x2){
(x2 + x5) / (x2 + x3 + x4 + x5)
}
x2=0
x3<- 18
x4<- 20
x5<- 34
niter <- 100
theta0 <- 2
save_iter <- data.frame("iter"=0,"theta"=theta0,"x2"=x2)
for (i in 1:niter){
x2 <- expectation(theta0)
theta <- maximization(x2)
criteria <- abs(theta-theta0)
theta0 <- theta
save_iter <- rbind(save_iter,c(i,theta0,x2))
if (criteria<10^-7){
break
}
}
tail(save_iter)