The goal of this assignment is to implement the Expectation-Maximization (EM) Algorithm to fit a finite mixture distribution in R . The results are compared to the normalmixEM function in the R package mixtools.

Let’s start by doing a kernel density estimate on the first dataset.

setwd('/media/haojun/42F4-0A40/comp_homework2')
bar51 <- read.table('bar_51_1.txt')
x <- bar51$x
plot(density(x))

It is reasonable to say that this mixture distribution is composed of two Gaussian distributions. I use a k means clustering to get an estimate for the starting values for \(\mu_r\), \(\sigma_r\), and \(\pi_r\).

mem <- kmeans(x,2)$cluster
mu1 <- mean(x[mem==1])
mu2 <- mean(x[mem==2])
sigma1 <- sd(x[mem==1])
sigma2 <- sd(x[mem==2])
pi1 <- sum(mem==1)/length(mem)
pi2 <- sum(mem==2)/length(mem)

Now, the density of this mixture distribution can be expressed as

\[ f(x; \mu, \sigma, \pi) = \sum_{r=1}^{2}\pi_r f_r(x; \mu_r, \sigma_r, \pi_r) \]

The complete data log likelihood can be expressed as

\[ \ell(x, u_j; \mu, \sigma, \pi) = \sum_{r=1}^{2} I(u_j=r)\left[\log\pi_r+\log f_r(x; \mu_r, \sigma_r, \pi_r)\right]\]

where \(I(u_j=r)\) is an indicator of to which subpopulation each observation \(x_j\) belongs.

In the E step, the conditional probability to which subpopulation each observation \(x_j\) belongs must be computed first

\[ Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi) = \frac{\pi_r f_r(x_j; \mu_r, \sigma_r, \pi_r)}{\sum_{r=1}^{2} \pi_r f_r(x_j; \mu_r, \sigma_r, \pi_r)} \]

Then the expected value of the log likelihood based on a random sample can be expressed as

\[ Q = \sum_{j=1}^{n} \sum_{r=1}^{2} Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi) \left[\log\pi_r+\log f_r(x; \mu_r, \sigma_r, \pi_r)\right] \]

In the M step, the miximizing values for \(\mu_r\), \(\sigma_r\), and \(\pi_r\) are calculated as

\[ \pi_r^{'} = \frac{\sum_{j=1}^{n}Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}{n} \]

\[ \mu_r^{'} = \frac{\sum_{j=1}^{n}x_jPr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}{\sum_{j=1}^{n}Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)} \]

\[ \sigma_r^{'} = \sqrt{\frac{\sum_{j=1}^{n}(x_j- \mu_r^{'})^2Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}{\sum_{j=1}^{n}Pr(u_j=r \left\vert x_j\right.; \mu, \sigma, \pi)}} \]

In the next iteration, the expected value of the log likelihood is recalculated using the new parameters, and is compared to \(Q\) from the previous step. This process is repeated until the converting criteria is met, which in this case is the difference less than \(10^{-6}\). Putting these all together, I have the EM Algorithm implemented as below.

# modified sum only considers finite values
sum.finite <- function(x) {
  sum(x[is.finite(x)])
}

Q <- 0
# starting value of expected value of the log likelihood
Q[2] <- sum.finite(log(pi1)+log(dnorm(x, mu1, sigma1))) + sum.finite(log(pi2)+log(dnorm(x, mu2, sigma2)))

k <- 2

while (abs(Q[k]-Q[k-1])>=1e-6) {
  # E step
  comp1 <- pi1 * dnorm(x, mu1, sigma1)
  comp2 <- pi2 * dnorm(x, mu2, sigma2)
  comp.sum <- comp1 + comp2
  
  p1 <- comp1/comp.sum
  p2 <- comp2/comp.sum
  
  # M step
  pi1 <- sum.finite(p1) / length(x)
  pi2 <- sum.finite(p2) / length(x)
  
  mu1 <- sum.finite(p1 * x) / sum.finite(p1)
  mu2 <- sum.finite(p2 * x) / sum.finite(p2)
  
  sigma1 <- sqrt(sum.finite(p1 * (x-mu1)^2) / sum.finite(p1))
  sigma2 <- sqrt(sum.finite(p2 * (x-mu2)^2) / sum.finite(p2))
  
  p1 <- pi1 
  p2 <- pi2
  
  k <- k + 1
  Q[k] <- sum(log(comp.sum))
}

It takes 120 iteration for the EM Algorithm to converge. The estimated parameters are \(\mu_1=\) 0.3312, \(\mu_2=\) 0.4464, \(\sigma_1=\) 0.0195, \(\sigma_2=\) 0.0548, \(\pi_1=\) 0.1611, and \(\pi_2=\) 0.8389.

Next, the function normalmixEM in the R package mixtools is used on the same dataset and the results follows

library(mixtools)
gm<-normalmixEM(x,k=2,lambda=c(0.9,0.1),mu=c(0.4,0.3),sigma=c(0.05,0.02))
## number of iterations= 84
gm$mu
## [1] 0.4463909 0.3312405
gm$sigma
## [1] 0.05483107 0.01948010
gm$lambda  # posterior probabilities
## [1] 0.8390532 0.1609468

The results agree with my EM Algorithm implementation.

To visualize how good the fit is, let’s look at the density plot.

hist(x, prob=T, breaks=32, xlim=c(range(x)[1], range(x)[2]), main='')
lines(density(x), col="green", lwd=2)
x1 <- seq(from=range(x)[1], to=range(x)[2], length.out=1000)
y <- pi1 * dnorm(x1, mean=mu1, sd=sigma1) + pi2 * dnorm(x1, mean=mu2, sd=sigma2)
lines(x1, y, col="red", lwd=2)
legend('topright', col=c("green", 'red'), lwd=2, legend=c("kernal", "fitted"))

This two component mixture of normal densities fit the first dataset very well.

\newpage

The same process is repeated for the second dataset.

It is reasonable to say that this mixture distribution is composed of two Gaussian distributions. Again, I use a k means clustering to get an estimate for the starting values for \(\mu_r\), \(\sigma_r\), and \(\pi_r\).

It takes 138 iteration for the EM Algorithm to converge. The estimated parameters are \(\mu_1=\) 0.2522, \(\mu_2=\) 0.406, \(\sigma_1=\) 0.0475, \(\sigma_2=\) 0.0786, \(\pi_1=\) 0.4919, and \(\pi_2=\) 0.5081. The results agree with the function normalmixEM in the R package mixtools.

library(mixtools)
gm <- normalmixEM(x,k=2,lambda=c(0.5,0.5),mu=c(0.3,0.4),sigma=c(0.05,0.06))
## number of iterations= 154
gm$mu
## [1] 0.2521802 0.4058755
gm$sigma
## [1] 0.04744476 0.07862689
gm$lambda  # posterior probabilities
## [1] 0.4914139 0.5085861

This two component mixture of normal densities fits the second dataset fairly good.