Expectation maximum

Introduction

The parameters that maximize the likelihood function, as an algebraic function of the data - difficult or impossible. A common example is missing data; other examples include ‘hidden’ or ‘latent’ variables.

See reference

Steps:

1 Expectation Step: Given the current estimates of the parameters θ^ , we take the expected value of the labels given the data.
2 Maximization Step: We plug back these expected values into the complete-data likelihood and maximize over the parameters θ^ .
The EM algorithm is an iterative method for approximating the maximum of a likelihood function.

A typical example is that of mixture distributions, whereby we have a set of distributions, each observation comes from one of these distributions with some probability, but we only observe the value of each observation, and not what distribution it came from.

Generate a simulation data

# whether we arce drawing from distribution 1 or 2 - ie, Bernoulli trials.
set.seed(123)
tau_1_true <- 0.25
x <- y <- rep(0,1000)
for( i in 1:1000 ) {
  if( runif(1) < tau_1_true ) {
    x[i] <- rnorm(1, mean=1)
    y[i] <- "heads"
  } else {
    x[i] <- rnorm(1, mean=7)
    y[i] <- "tails"
  }
}

library(lattice)
densityplot( ~x, par.settings = list( plot.symbol=list( col=as.factor(y) ) ) )

EM by hand

############## iterations
# Now, we simply repeat the following process, until the difference between our successive estimates of the mean are very small.
# We'll run 10 iterations.
 
## set the initial guesses for the distribution parameters (arbitrarily, p1,2 and means1,2) 
mu_1 <- 0
mu_2 <- 1

## as well as the latent variable parameters
tau_1 <- 0.5
tau_2 <- 0.5

for( i in 1:10 ) {

  ## Given the observed data, as well as the distribution parameters,
  ## what are the latent variables?

  T_1 <- tau_1 * dnorm( x, mu_1 )
  T_2 <- tau_2 * dnorm( x, mu_2 )
  
  # updated probability
  P_1 <- T_1 / (T_1 + T_2)
  P_2 <- T_2 / (T_1 + T_2)  

  tau_1 <- mean(P_1)
  tau_2 <- mean(P_2)

  ## Given the observed data, as well as the latent variables,
  ## what are the population parameters?

  mu_1 <- sum( P_1 * x ) / sum(P_1)
  mu_2 <- sum( P_2 * x ) / sum(P_2)

  ## print the current estimates

  print( c(mu_1, mu_2, mean(P_1), mean(P_2)) )

  # converge quickly, after just one iteration
}

## [1] 0.5045618 6.1011529 0.1002794 0.8997206
## [1] 0.8546336 6.9403680 0.2301181 0.7698819
## [1] 0.9732251 7.0006108 0.2423406 0.7576594
## [1] 0.9853947 7.0054109 0.2434347 0.7565653
## [1] 0.9864849 7.0058260 0.2435309 0.7564691
## [1] 0.9865811 7.0058624 0.2435394 0.7564606
## [1] 0.9865895 7.0058656 0.2435401 0.7564599
## [1] 0.9865903 7.0058659 0.2435402 0.7564598
## [1] 0.9865903 7.0058660 0.2435402 0.7564598
## [1] 0.9865904 7.0058660 0.2435402 0.7564598

EM by package

########### using the R package mixtools to apply the EM algorithm
library("mixtools")

## mixtools package, version 2.0.0, Released 2022-12-04
## This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193).

myEM <- normalmixEM( x, mu = c(0,1), sigma=c(1,1), sd.constr=c(1,1) )

## number of iterations= 7

myEM$mu

## [1] 0.9865898 7.0058658

myEM$lambda

## [1] 0.2435402 0.7564598

# What if the guesses initials aren't so similar?
# The algorithm converges much more slowly, but still gives good estimates.

########### using the R package mixtools to apply the EM algorithm
library("mixtools")
myEM <- normalmixEM( x, mu = c(0,10), sigma=c(1,1), sd.constr=c(1,1) )

## number of iterations= 7

myEM$mu

## [1] 0.9865904 7.0058660

myEM$lambda

## [1] 0.2435402 0.7564598

Expectation maximum

2023-05-08

Introduction