The goal of this assignment is to implement the Expectation-Maximization (EM) Algorithm to fit a finite mixture distribution in R . The results are compared to the normalmixEM
function in the R package mixtools.
Let’s start by doing a kernel density estimate on the first dataset.
setwd('/media/haojun/42F4-0A40/comp_homework2')
bar51 <- read.table('bar_51_1.txt')
x <- bar51$x
plot(density(x))
It is reasonable to say that this mixture distribution is composed of two Gaussian distributions. I use a k means clustering to get an estimate for the starting values for \(\mu_r\), \(\sigma_r\), and \(\pi_r\).
mem <- kmeans(x,2)$cluster
mu1 <- mean(x[mem==1])
mu2 <- mean(x[mem==2])
sigma1 <- sd(x[mem==1])
sigma2 <- sd(x[mem==2])
pi1 <- sum(mem==1)/length(mem)
pi2 <- sum(mem==2)/length(mem)
Now, the density of this mixture distribution can be expressed as
\[ f(x; \mu, \sigma, \pi) = \sum_{r=1}^{2}\pi_r f_r(x; \mu_r, \sigma_r, \pi_r) \]
The complete data log likelihood can be expressed as
\[ \ell(x, u_j; \mu, \sigma, \pi) = \sum_{r=1}^{2} I(u_j=r)\left[\log\pi_r+\log f_r(x; \mu_r, \sigma_r, \pi_r)\right]\]
where \(I(u_j=r)\) is an indicator of to which subpopulation each observation \(x_j\) belongs.
In the E step, the conditional probability to which subpopulation each observation \(x_j\) belongs must be computed first
\[ Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi) = \frac{\pi_r f_r(x_j; \mu_r, \sigma_r, \pi_r)}{\sum_{r=1}^{2} \pi_r f_r(x_j; \mu_r, \sigma_r, \pi_r)} \]
Then the expected value of the log likelihood based on a random sample can be expressed as
\[ Q = \sum_{j=1}^{n} \sum_{r=1}^{2} Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi) \left[\log\pi_r+\log f_r(x; \mu_r, \sigma_r, \pi_r)\right] \]
In the M step, the miximizing values for \(\mu_r\), \(\sigma_r\), and \(\pi_r\) are calculated as
\[ \pi_r^{'} = \frac{\sum_{j=1}^{n}Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}{n} \]
\[ \mu_r^{'} = \frac{\sum_{j=1}^{n}x_jPr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}{\sum_{j=1}^{n}Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)} \]
\[ \sigma_r^{'} = \sqrt{\frac{\sum_{j=1}^{n}(x_j- \mu_r^{'})^2Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}{\sum_{j=1}^{n}Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}} \]
In the next iteration, the expected value of the log likelihood is recalculated using the new parameters, and is compared to \(Q\) from the previous step. This process is repeated until the converting criteria is met, which in this case is the difference less than \(10^{-6}\). Putting these all together, I have the EM Algorithm implemented as below.
# modified sum only considers finite values
sum.finite <- function(x) {
sum(x[is.finite(x)])
}
Q <- 0
# starting value of expected value of the log likelihood
Q[2] <- sum.finite(log(pi1)+log(dnorm(x, mu1, sigma1))) + sum.finite(log(pi2)+log(dnorm(x, mu2, sigma2)))
k <- 2
while (abs(Q[k]-Q[k-1])>=1e-6) {
# E step
comp1 <- pi1 * dnorm(x, mu1, sigma1)
comp2 <- pi2 * dnorm(x, mu2, sigma2)
comp.sum <- comp1 + comp2
p1 <- comp1/comp.sum
p2 <- comp2/comp.sum
# M step
pi1 <- sum.finite(p1) / length(x)
pi2 <- sum.finite(p2) / length(x)
mu1 <- sum.finite(p1 * x) / sum.finite(p1)
mu2 <- sum.finite(p2 * x) / sum.finite(p2)
sigma1 <- sqrt(sum.finite(p1 * (x-mu1)^2) / sum.finite(p1))
sigma2 <- sqrt(sum.finite(p2 * (x-mu2)^2) / sum.finite(p2))
p1 <- pi1
p2 <- pi2
k <- k + 1
Q[k] <- sum(log(comp.sum))
}
It takes 120 iteration for the EM Algorithm to converge. The estimated parameters are \(\mu_1=\) 0.3312, \(\mu_2=\) 0.4464, \(\sigma_1=\) 0.0195, \(\sigma_2=\) 0.0548, \(\pi_1=\) 0.1611, and \(\pi_2=\) 0.8389.
Next, the function normalmixEM
in the R package mixtools is used on the same dataset and the results follows
library(mixtools)
gm<-normalmixEM(x,k=2,lambda=c(0.9,0.1),mu=c(0.4,0.3),sigma=c(0.05,0.02))
## number of iterations= 84
gm$mu
## [1] 0.4463909 0.3312405
gm$sigma
## [1] 0.05483107 0.01948010
gm$lambda # posterior probabilities
## [1] 0.8390532 0.1609468
The results agree with my EM Algorithm implementation.
To visualize how good the fit is, let’s look at the density plot.
hist(x, prob=T, breaks=32, xlim=c(range(x)[1], range(x)[2]), main='')
lines(density(x), col="green", lwd=2)
x1 <- seq(from=range(x)[1], to=range(x)[2], length.out=1000)
y <- pi1 * dnorm(x1, mean=mu1, sd=sigma1) + pi2 * dnorm(x1, mean=mu2, sd=sigma2)
lines(x1, y, col="red", lwd=2)
legend('topright', col=c("green", 'red'), lwd=2, legend=c("kernal", "fitted"))
This two component mixture of normal densities fit the first dataset very well.
\newpageThe same process is repeated for the second dataset.
It is reasonable to say that this mixture distribution is composed of two Gaussian distributions. Again, I use a k means clustering to get an estimate for the starting values for \(\mu_r\), \(\sigma_r\), and \(\pi_r\).
It takes 138 iteration for the EM Algorithm to converge. The estimated parameters are \(\mu_1=\) 0.2522, \(\mu_2=\) 0.406, \(\sigma_1=\) 0.0475, \(\sigma_2=\) 0.0786, \(\pi_1=\) 0.4919, and \(\pi_2=\) 0.5081. The results agree with the function normalmixEM
in the R package mixtools.
library(mixtools)
gm <- normalmixEM(x,k=2,lambda=c(0.5,0.5),mu=c(0.3,0.4),sigma=c(0.05,0.06))
## number of iterations= 154
gm$mu
## [1] 0.2521802 0.4058755
gm$sigma
## [1] 0.04744476 0.07862689
gm$lambda # posterior probabilities
## [1] 0.4914139 0.5085861
This two component mixture of normal densities fits the second dataset fairly good.