We have the mixture model written as \[f(x) = \sum_{k=1}^{K}w_kg_k(x)\] We will introduce an indicator \(C\), which is a (discrete) random variable where \(C \in 1,2,,,,K\).
Then, \(X|C \sim g_c(x)\) and \(C\sim Pr(C=k) = w_k\), and
\[Pr(X)=\sum_{k=1}^{K}f(x|C=k)Pr(C=k)=\sum_{k=1}^{K}w_kg_k(x)\]
Setting up the hierarchical problem:
\(X|C \sim g_c(x)\) and \(C\sim Pr(C=k) = w_k\)
For each observation:
# Generate n observations from a mixture of two Gaussian
# distributions
n = 50 # Size of the sample to be generated
w = c(0.6, 0.4) # Weights
mu = c(0, 5) # Means
sigma = c(1, 2) # Standard deviations
cc = sample(1:2, n, replace=T, prob=w)
x = rnorm(n, mu[cc], sigma[cc])
# Plot f(x) along with the observations
# just sampled
xx = seq(-5, 12, length=200)
yy = w[1]*dnorm(xx, mu[1], sigma[1]) +
w[2]*dnorm(xx, mu[2], sigma[2])
par(mar=c(4,4,1,1)+0.1)
plot(xx, yy, type="l", ylab="Density", xlab="x", las=1, lwd=2)
points(x, y=rep(0,n), pch=1, col=cc)
\(x_1,x_2,...,x_n\) are observations that have been collected. We assume that the \(x_i\)s are independent and identically distributed and \(x_i \sim f\), where \(f(x) = \sum_{k=1}^{K}w_kg_k(x|\theta_k)\).
The likelihood function \[L(w_1,...w_k,\theta_1,...\theta_k)=\prod_{i=1}^{n}\sum_{k=1}^{K}w_kg_k(x_i|\theta_k)\] This expression is very difficult to work with, to say the least…
\[x_i|C_i \sim g_{C_i}(x_i), \space\space Pr(C_i=k) = w_k\] where \(C_1,C_2,...C_n\) are \(iid\).
We write the likelihood as \[L(w_1,...w_k,\theta_1,...\theta_k,C_1,...C_n)=\prod_{i=1}^{n}\prod_{k=1}^{K}\left[w_kg_k(x_i)\right]^{\mathbb{I}_{C_k}}\]
where \(\mathbb{I}_{C_k} = 1\) if \(C_i=k\), otherwise \(\mathbb{I}_{C_k} = 0\)
\[L(w_1,...w_k,\theta_1,...\theta_k,C_1,...C_n)=\prod_{i=1}^{n}\prod_{k=1}^{K}\left[g_k(x_i)\right]^{\mathbb{I}_{C_k}}\prod_{k=1}^{K}\prod_{i=1}^{n}w_k^{\mathbb{I}_{C_k}}\]
\[f_1(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]\] where we have \(w_1=0.7,\space w_2=0.3,\space \mu_1=0,\space \mu_2=1,\space \sigma_1=1,\space \sigma_2=2\)
\[f_2(x)=(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left (\frac{x-(1)}{(2)}\right)^2\right]+(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left(\frac{x-(0)}{(1)}\right)^2\right]\] where we have \(w_1=0.3,\space w_2=0.7,\space \mu_1=1,\space \mu_2=0,\space \sigma_1=2,\space \sigma_2=1\)
\[f_1(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]\]
\[f_2(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.2)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]+(0.1)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]\]
\[f_3(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]+(0)\frac{1}{\sqrt{2\pi}(3)}\exp\left[-\frac{1}{2}\left(\frac{x-(100)}{(3)}\right)^2\right]\]
\[f(x)=w\lambda_1e^{-\lambda_1x}+(1-w)\lambda_2^{-\lambda_2x}\]
What is the complete-data likelihood associated with the indicator vector \((1, 1, 2, 1, 2)\)?
Recall the definition of the complete-data likelihood: it is (proportional to) the joint distribution of the indicator variables and the observation. Since the indicator vector is \((1, 1, 2, 1, 2)\), there are 3 observations in group 1 and 2 observations in group 2. So, the first term must be \(w^3(1 - w)^2\).
Now, the distribution of the data given the indicator involves the product over two components: For the first component, we have 3 Poison distributions (associated with the first, second and fourth observations, which add up to 19.6), and for the second component we have 2 Poisson distributions (associated with the third and fifth observations, which add up to 15.3).
Therefore, the complete likelihood is: \[w^3(1-w)\lambda_1^3e^{-19.6\lambda_1}\lambda_2^2e^{-15.3\lambda_2}\]
The observed data likelihood \(L_O(w,\lambda_1,\lambda_2;x)\) for this sample is given by the product
\[\left\{w\lambda_1e^{-3.5\lambda_1}+(1-w)\lambda_2e^{-3.5\lambda_2}\right\}\times\] \[\left\{w\lambda_1e^{-9.7\lambda_1}+(1-w)\lambda_2e^{-9.7\lambda_2}\right\}\times\] \[\left\{w\lambda_1e^{-7.1\lambda_1}+(1-w)\lambda_2e^{-7.1\lambda_2}\right\}\]
The complete-data likelihood is proportional to the joint distribution of the data. Because observations are independent (this is a random sample), that corresponds to \(f(3.5) × f(9.7) × f(7.1)\).