We have the mixture model written as \[f(x) = \sum_{k=1}^{K}w_kg_k(x)\] We will introduce an indicator \(C\), which is a (discrete) random variable where \(C \in 1,2,,,,K\).
Then, \(X|C \sim g_c(x)\) and \(C\sim Pr(C=k) = w_k\), and
\[Pr(X)=\sum_{k=1}^{K}f(x|C=k)Pr(C=k)=\sum_{k=1}^{K}w_kg_k(x)\]
Setting up the hierarchical problem:
\(X|C \sim g_c(x)\) and \(C\sim Pr(C=k) = w_k\)
For each observation:
# Generate n observations from a mixture of two Gaussian
# distributions
n = 50 # Size of the sample to be generated
w = c(0.6, 0.4) # Weights
mu = c(0, 5) # Means
sigma = c(1, 2) # Standard deviations
cc = sample(1:2, n, replace=T, prob=w)
x = rnorm(n, mu[cc], sigma[cc])
# Plot f(x) along with the observations
# just sampled
xx = seq(-5, 12, length=200)
yy = w[1]*dnorm(xx, mu[1], sigma[1]) +
w[2]*dnorm(xx, mu[2], sigma[2])
par(mar=c(4,4,1,1)+0.1)
plot(xx, yy, type="l", ylab="Density", xlab="x", las=1, lwd=2)
points(x, y=rep(0,n), pch=1, col=cc)
\(x_1,x_2,...,x_n\) are observations that have been collected. We assume that the \(x_i\)s are independent and identically distributed and \(x_i \sim f\), where \(f(x) = \sum_{k=1}^{K}w_kg_k(x|\theta_k)\).
The likelihood function \[L(w_1,...w_k,\theta_1,...\theta_k)=\prod_{i=1}^{n}\sum_{k=1}^{K}w_kg_k(x_i|\theta_k)\] This expression is very difficult to work with, to say the least…
\[x_i|C_i \sim g_{C_i}(x_i), \space\space Pr(C_i=k) = w_k\] where \(C_1,C_2,...C_n\) are \(iid\).
We write the likelihood as \[L(w_1,...w_k,\theta_1,...\theta_k,C_1,...C_n)=\prod_{i=1}^{n}\prod_{k=1}^{K}\left[w_kg_k(x_i)\right]^{\mathbb{I}_{C_k}}\]
where \(\mathbb{I}_{C_k} = 1\) if \(C_i=k\), otherwise \(\mathbb{I}_{C_k} = 0\)
\[L(w_1,...w_k,\theta_1,...\theta_k,C_1,...C_n)=\prod_{i=1}^{n}\prod_{k=1}^{K}\left[g_k(x_i)\right]^{\mathbb{I}_{C_k}}\prod_{k=1}^{K}\prod_{i=1}^{n}w_k^{\mathbb{I}_{C_k}}\]
\[f_1(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]\] where we have \(w_1=0.7,\space w_2=0.3,\space \mu_1=0,\space \mu_2=1,\space \sigma_1=1,\space \sigma_2=2\)
\[f_2(x)=(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left (\frac{x-(1)}{(2)}\right)^2\right]+(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left(\frac{x-(0)}{(1)}\right)^2\right]\] where we have \(w_1=0.3,\space w_2=0.7,\space \mu_1=1,\space \mu_2=0,\space \sigma_1=2,\space \sigma_2=1\)
\[f_1(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]\]
\[f_2(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.2)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]+(0.1)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]\]
\[f_3(x)=(0.7)\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\left (\frac{x-(0)}{(1)}\right)^2\right]+(0.3)\frac{1}{\sqrt{2\pi}(2)}\exp\left[-\frac{1}{2}\left(\frac{x-(1)}{(2)}\right)^2\right]+(0)\frac{1}{\sqrt{2\pi}(3)}\exp\left[-\frac{1}{2}\left(\frac{x-(100)}{(3)}\right)^2\right]\]
\[f(x)=w\lambda_1e^{-\lambda_1x}+(1-w)\lambda_2^{-\lambda_2x}\]
What is the complete-data likelihood associated with the indicator vector \((1, 1, 2, 1, 2)\)?
Recall the definition of the complete-data likelihood: it is (proportional to) the joint distribution of the indicator variables and the observation. Since the indicator vector is \((1, 1, 2, 1, 2)\), there are 3 observations in group 1 and 2 observations in group 2. So, the first term must be \(w^3(1 - w)^2\).
Now, the distribution of the data given the indicator involves the product over two components: For the first component, we have 3 Poison distributions (associated with the first, second and fourth observations, which add up to 19.6), and for the second component we have 2 Poisson distributions (associated with the third and fifth observations, which add up to 15.3).
Therefore, the complete likelihood is:
\[w^3(1-w)\lambda_1^3e^{-(3.5+9.7+6.4)\lambda_1}\lambda_2^2e^{-(8.2+7.1)\lambda_2}\] \[w^3(1-w)\lambda_1^3e^{-19.6\lambda_1}\lambda_2^2e^{-15.3\lambda_2}\]
The observed data likelihood \(L_O(w,\lambda_1,\lambda_2;x)\) for this sample is given by the product
\[\left\{w\lambda_1e^{-3.5\lambda_1}+(1-w)\lambda_2e^{-3.5\lambda_2}\right\}\times\] \[\left\{w\lambda_1e^{-9.7\lambda_1}+(1-w)\lambda_2e^{-9.7\lambda_2}\right\}\times\] \[\left\{w\lambda_1e^{-7.1\lambda_1}+(1-w)\lambda_2e^{-7.1\lambda_2}\right\}\]
The complete-data likelihood is proportional to the joint distribution of the data. Because observations are independent (this is a random sample), that corresponds to \(f(3.5) × f(9.7) × f(7.1)\).
\[f(x)=w_1\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{x^2}{2}\right]+w_2\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{(x-2)^2}{2}\right]+w_3\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{(x-4)^2}{2}\right]\]
\[f(x)=w_1\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\frac{(x-0)^2}{(1)}\right]+w_2\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\frac{(x-2)^2}{(1)}\right]+w_3\frac{1}{\sqrt{2\pi}(1)}\exp\left[-\frac{1}{2}\frac{(x-4)^2}{(1)}\right]\]
Write an expression that is proportional to the complete-data likelihood associated with the indicator vector \((1,2, 2, 3, 1, 2)\)?
Recall the definition of the complete-data likelihood: it is (proportional to) the joint distribution of the indicator variables and the observation. Since the indicator vector is \((1,2, 2, 3, 1, 2)\), there are 2 observations in group 1, 3 observations in group 2 and 1 observation in group 3. So, the first term must be \(w_1^2w_2^3w_3\).
The expression becomes \[w_1^2w_2^3w_3 \exp\left\{-\frac{(-0.3)^2+(4.1-2)^2+ (3.6-2)^2+(7.5-4)^2+(1.9)^2+(2.7-2)^2}{2}\right\}\]
\[=w_1^2w_2^3w_3 \exp\left\{-11.705\right\}\]
\[f(x) = \sum_{k=1}^{K}w_k\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x-\mu_k)^2}{2\sigma^2}\right]\]
True or False: The following 3 constraints make all parameters fully identifiable:
This is TRUE. The three constraints are enough to ensure identifiability. The last one addresses label switching, while the first two address identifiability of the number of components in the mixture.