# Mixture Models¶

bnpy supports two kinds of mixture models: FiniteMixtureModel and DPMixtureModel.

## FiniteMixtureModel¶

The finite mixture has the following generative representation as an allocation model. There is a single top-level vector of cluster probabilities $$\pi_0$$. Each data atom’s assignment is drawn i.i.d. according to the probabilities in this vector.

$\begin{split}[\pi_{01}, \pi_{02}, \ldots \pi_{0K}] \sim \mbox{Dir}_K( \frac{\alpha_0}{K} ) \\ \mbox{for~} n \in 1, \ldots N: \\ \qquad z_n \sim \mbox{Cat}_K(\pi_{01}, \lddots \pi_{0K})\end{split}$

Here, $$\alpha_0 > 0$$ is the uniform concentration parameter. TODO interpret.

## DPMixtureModel¶

The Dirichlet Process (DP) mixture has the following generative representation as an allocation model. It modifies the finite mixture by using the StickBreaking process to K active weights and a remainder weight, all inside $pi_0$.

$\begin{split}[\pi_{01}, \pi_{02}, \ldots \pi_{0K}, \pi_{0, >K}] \sim \mbox{StickBreaking}_K(\pi_0) \\ \mbox{for~} n \in 1, \ldots N: \\ \qquad z_n \sim \mbox{Cat}_K(\pi_{01}, \lddots \pi_{0K})\end{split}$

If we take the limit as K grows to infinity, these two generative models are equivalent.

## Using mixtures with other bnpy modules¶

As usual, to train a hierarchical model whose allocation is done by FiniteMixtureModel,

>>> hmodel, Info = bnpy.Run(Data, 'FiniteMixtureModel', obsModelName, algName, **kwargs)
>>> # or
>>> hmodel, Info = bnpy.Run(Data, 'DPMixtureModel', obsModelName, algName, **kwargs)


### Supported DataObj Types¶

Mixture models can apply to almost all data formats available in bnpy. Any data suitable for topic models or sequence models can also be fit with a basic mixture model.

The only formats that do not apply are those based on GraphData, which require the subclass of mixture models (TBD).

### Supported Learning Algorithms¶

Currently, the practical differences are:

• FiniteMixtureModel supports EM, VB, soVB, moVB
• DPMixtureModel supports VB, soVB, and moVB.
• with birth/merge/delete moves for moVB

EM (MAP) inference for the DPMixtureModel is possible, but just not implemented yet.

### Accessing learned cluster assignments¶

Given a dataset of interest Data (a DataObj), and an hmodel (an instance of HModel) properly initialized with K active clusters, we simply perform a local step.

>>> LP = hmodel.calc_local_params(Data)
>>> resp = LP['resp']


Here, resp is a 2D array of size N x K. Each entry resp[n, k] gives the probability that data atom n is assigned to cluster k under the posterior. Thus, each entry resp[n,k] must be a value within the interval [0,1]. The sum of every row must equal one.

>>> assert resp[n, k] >= 0.0
>>> assert resp[n, k] <= 1.0
>>> assert np.allclose(np.sum(resp[n,:]), 1.0)


To convert to hard assignments

>>> Z = resp.argmax(axis=1)


Here, Z is a 1D array of size N, where entry Z[n] is an integer in the set {0, 1, 2, … K-1, K}.

### Accessing learned cluster probabilities¶

>>> pi0 = hmodel.allocModel.get_active_cluster_probs()
>>> assert pi0.ndim == 1
>>> assert pi0.size == hmodel.allocModel.K


### Global update summaries¶

For a global update, mixture models require only one sufficient statistic: an expected count value for each cluster k. This value gives the expected number of data atoms assigned to k throughout the dataset.

• Count N_k
Expected assignments to state k across all data items.
>>> LP = hmodel.calc_local_params(Data)
>>> SS = hmodel.get_global_suff_stats(Data, LP)
>>> Nvec = SS.N # or SS.getCountVec()
>>> assert Nvec.size == hmodel.allocModel.K
[ ... TODO ... ]


### ELBO summaries¶

To compute the ELBO, mixture models require only one non-linear summary statistic: the entropy of the learned assignment parameters resp.

\begin{align}\begin{aligned}\L = \Ldata + \Lalloc - E[ \log q(z) ]\\- E[ \log q(z) ] = \sum_{k=1}^K H_k\\H_k = - \sum_{n=1}^N r_{nk} \log r_{nk}\end{aligned}\end{align}

You can compute this by enabling the correct keyword flag when calling the summary step function.

>>> LP = hmodel.calc_local_params(Data)
>>> SS = hmodel.get_global_suff_stats(Data, LP, doPrecompEntropy=1)
>>> Hresp =  SS.getELBOTerm('Hresp')
>>> assert Hresp.ndim == 1
>>> assert Hresp.size == SS.K
[ ... TODO ... ]