Mixture Models

Mixture models naturally arise when measurements of individuals within a population can be considered to arise from different distributions
- For example, male and female
For $y = (y_1,...,y_m)$, the $M$ component mixture distribution is:
- $f_m(y_i|\theta_m)$ is the distribution of $y_i$ for the $m$ component model
  - Each distribution often from the same parametric family
- $\lambda_m$ is the proportion of $y_i$ from component $m$
  - $\sum\lambda_m = 1$

\[ f(y_i|\theta,\lambda) = \sum^M_{m=1}\lambda_mf_m(y_i|\theta_m) \]

Fitting Mixture Models

Consider an indicator variable:

\[ z_{im} = \begin{cases}1,\quad\text{if } y_i \text{ is drawn from mth component}\\0,\quad\text{otherwise}\end{cases}\]

Distribution of this indicator variable $z_i = (z_{i1},...,z_{iM})$:
- This distribution is such that $P(z_{im}=1) = \lambda_m$

\[ \pi(z_i|\lambda)\sim Multinomial(1:\lambda_1,...,\lambda_m) = \prod^M_{m=1}\lambda_m^{z_{im}}\]

Mixture Joint Distribution

Distribution of $y_i$:

\[ f(y_i|z_i,\theta) = \prod^M_{m=1}f_m(y_i|\theta_m,z_i)^{z_{im}}\]

Therefore the likelihood:

\[ L(y|z,\theta) = \prod^n_{i=1} f(y_i|z_i,\theta) = \prod^n_{i=1}\left[\prod^M_{m=1}f_m(y_i|\theta_m,z_i)^{z_{im}}\right]\]

Distribution of $z|\lambda$:

\[ \pi(z|\lambda) = \prod^n_{i=1}\prod^M_{m=1}\lambda_m^{z_{im}}\]

Mixture Prior Distribution

Prior for $\theta,\lambda$ typically assumes independence:

\[ \pi(\theta,\lambda)=\pi(\theta)\pi(\lambda)\]

Natural prior for $\lambda$ is $Dirichlet(_1,…,_m)
- Conjugate with multinomial
- Multivariate version of beta: $\lambda_i\sim Beta(\alpha_i,\sum_{j\neq i}\alpha_j)$

\[ \pi(\lambda)\propto\prod^M_{m=1}\lambda_m^{\alpha_m-1}\]

Mixture Posterior Distribution

The mixture posterior distribution is then:
- Commonly $\pi(\theta)=\prod_m\pi_m(\theta_m)$

\[ \pi(\theta,z,\lambda|y)\propto L(y|z,\theta)\pi(z|\lambda)\pi(\lambda)\pi(\theta)\]

\[ \propto \prod^n_{i=1}\left[\prod^M_{m=1}[\lambda_mf(y_i|\theta_m)]^{z_{im}}\right]\left[\prod^M_{m=1}\lambda_m^{\alpha_m-1}\right]\pi(\theta)\]

MCMC Sampler

The indicator variables make creating full conditional distributions easy. Therefore we can use Gibbs Sampler on these distributions:
Update $\theta_m$

\[ \pi(\theta_m|y,z,\lambda,\theta_{-m}) \propto \prod^n_{i=1}f(y_i|\theta_m)^{z_{im}}\pi_m(\theta_m|\theta_{-m})\]

Update $\lambda$

\[ \pi(\lambda|y,\theta,z) \propto \prod^M_{m=1}\lambda_m^{\sum_i z_{im}+\alpha_m - 1}\]

Update $z_i$:

\[ \pi(z_i|y,\theta,\lambda,z_{-i}) \propto \prod^M_{m=1}[\lambda_mf(y_i|\theta_m)]^{z_{im}}\]

$P(z_{im}=1|\lambda,\theta) \propto \lambda_mf(y_i|\theta_m)=p_{im}$, with normalised probabilities $\tilde p_{im}=p_{im}\sum_m p_{im}$
Therefore, the full conditional is

\[ z_i\sim Multinomial(1:\tilde{p_{i1}},...,\tilde{p_{iM}})\]

Inference - Predictive Distribution

The predictive distribution is:

\[ f(y|x) = \int\sum^M_{m=1}\lambda_mf_m(y|\theta_m)\pi(\theta,z,\lambda|x)d\theta d\lambda dz\]

For classification:
- Can approximate through montecarlo

\[ P(z_{(x)m}=1|x,y)\propto\int\lambda f(y|\theta_m)\pi(\theta,z,\lambda|x)d\theta d\lambda dz\approx \frac{1}{N}\sum^N_{j=1}\lambda^{(j)}_m f(y|\theta^{(j)}_m)\]

or can consider the proportion of times $y_i$ is allocated to component $m$

\[ \mathbb{E}_\pi[I(z_{im}=1)]\approx\frac{1}{N}\sum^N_{j=1}z_{im}^{(j)}\]

Non-Identifiability of Mixture Models

Relabeling components by arbitrarily permuting index set $1,...,M$ for $(\lambda_m,\theta_m,z_{im})$ does not change mixture pdf value
- Caused by assumed changeability
- All $\theta_m$ have the same marginal posterior distribution due to label switching
Can resolve this issue through prior information
- Constrain parameter space to unique subregion to uniquely identify model through unique permutation
Order weights:

\[ \lambda_1 > ... > \lambda_M\]

Or order parameters:

\[ \theta_1 > ... > \theta_M\]

Or use an informative prior identifying components with specific sub populations

Mixture_Models

Jake

29/11/2022