Mixture_Models

Jake

29/11/2022

Mixture Models

  • Mixture models naturally arise when measurements of individuals within a population can be considered to arise from different distributions
    • For example, male and female
  • For \(y = (y_1,...,y_m)\), the \(M\) component mixture distribution is:
    • \(f_m(y_i|\theta_m)\) is the distribution of \(y_i\) for the \(m\) component model
      • Each distribution often from the same parametric family
    • \(\lambda_m\) is the proportion of \(y_i\) from component \(m\)
      • \(\sum\lambda_m = 1\)

\[ f(y_i|\theta,\lambda) = \sum^M_{m=1}\lambda_mf_m(y_i|\theta_m) \]

Fitting Mixture Models

  • Consider an indicator variable:

\[ z_{im} = \begin{cases}1,\quad\text{if } y_i \text{ is drawn from mth component}\\0,\quad\text{otherwise}\end{cases}\]

  • Distribution of this indicator variable \(z_i = (z_{i1},...,z_{iM})\):
    • This distribution is such that \(P(z_{im}=1) = \lambda_m\)

\[ \pi(z_i|\lambda)\sim Multinomial(1:\lambda_1,...,\lambda_m) = \prod^M_{m=1}\lambda_m^{z_{im}}\]

Mixture Joint Distribution

  • Distribution of \(y_i\):

\[ f(y_i|z_i,\theta) = \prod^M_{m=1}f_m(y_i|\theta_m,z_i)^{z_{im}}\]

  • Therefore the likelihood:

\[ L(y|z,\theta) = \prod^n_{i=1} f(y_i|z_i,\theta) = \prod^n_{i=1}\left[\prod^M_{m=1}f_m(y_i|\theta_m,z_i)^{z_{im}}\right]\]

  • Distribution of \(z|\lambda\):

\[ \pi(z|\lambda) = \prod^n_{i=1}\prod^M_{m=1}\lambda_m^{z_{im}}\]

Mixture Prior Distribution

  • Prior for \(\theta,\lambda\) typically assumes independence:

\[ \pi(\theta,\lambda)=\pi(\theta)\pi(\lambda)\]

  • Natural prior for \(\lambda\) is $Dirichlet(_1,…,_m)
    • Conjugate with multinomial
    • Multivariate version of beta: \(\lambda_i\sim Beta(\alpha_i,\sum_{j\neq i}\alpha_j)\)

\[ \pi(\lambda)\propto\prod^M_{m=1}\lambda_m^{\alpha_m-1}\]

Mixture Posterior Distribution

  • The mixture posterior distribution is then:
    • Commonly \(\pi(\theta)=\prod_m\pi_m(\theta_m)\)

\[ \pi(\theta,z,\lambda|y)\propto L(y|z,\theta)\pi(z|\lambda)\pi(\lambda)\pi(\theta)\]

\[ \propto \prod^n_{i=1}\left[\prod^M_{m=1}[\lambda_mf(y_i|\theta_m)]^{z_{im}}\right]\left[\prod^M_{m=1}\lambda_m^{\alpha_m-1}\right]\pi(\theta)\]

MCMC Sampler

  • The indicator variables make creating full conditional distributions easy. Therefore we can use Gibbs Sampler on these distributions:

  • Update \(\theta_m\)

\[ \pi(\theta_m|y,z,\lambda,\theta_{-m}) \propto \prod^n_{i=1}f(y_i|\theta_m)^{z_{im}}\pi_m(\theta_m|\theta_{-m})\]

  • Update \(\lambda\)

\[ \pi(\lambda|y,\theta,z) \propto \prod^M_{m=1}\lambda_m^{\sum_i z_{im}+\alpha_m - 1}\]

  • Update \(z_i\):

\[ \pi(z_i|y,\theta,\lambda,z_{-i}) \propto \prod^M_{m=1}[\lambda_mf(y_i|\theta_m)]^{z_{im}}\]

  • \(P(z_{im}=1|\lambda,\theta) \propto \lambda_mf(y_i|\theta_m)=p_{im}\), with normalised probabilities \(\tilde p_{im}=p_{im}\sum_m p_{im}\)
  • Therefore, the full conditional is

\[ z_i\sim Multinomial(1:\tilde{p_{i1}},...,\tilde{p_{iM}})\]

Inference - Predictive Distribution

  • The predictive distribution is:

\[ f(y|x) = \int\sum^M_{m=1}\lambda_mf_m(y|\theta_m)\pi(\theta,z,\lambda|x)d\theta d\lambda dz\]

  • For classification:
    • Can approximate through montecarlo

\[ P(z_{(x)m}=1|x,y)\propto\int\lambda f(y|\theta_m)\pi(\theta,z,\lambda|x)d\theta d\lambda dz\approx \frac{1}{N}\sum^N_{j=1}\lambda^{(j)}_m f(y|\theta^{(j)}_m)\]

  • or can consider the proportion of times \(y_i\) is allocated to component \(m\)

\[ \mathbb{E}_\pi[I(z_{im}=1)]\approx\frac{1}{N}\sum^N_{j=1}z_{im}^{(j)}\]

Non-Identifiability of Mixture Models

  • Relabeling components by arbitrarily permuting index set \(1,...,M\) for \((\lambda_m,\theta_m,z_{im})\) does not change mixture pdf value
    • Caused by assumed changeability
    • All \(\theta_m\) have the same marginal posterior distribution due to label switching
  • Can resolve this issue through prior information
    • Constrain parameter space to unique subregion to uniquely identify model through unique permutation
  • Order weights:

\[ \lambda_1 > ... > \lambda_M\]

  • Or order parameters:

\[ \theta_1 > ... > \theta_M\]

  • Or use an informative prior identifying components with specific sub populations