Mixture Models

bnpy supports two kinds of mixture models: FiniteMixtureModel and DPMixtureModel.

FiniteMixtureModel

The finite mixture has the following generative representation as an allocation model. There is a single top-level vector of cluster probabilities \(\pi_0\). Each data atom’s assignment is drawn i.i.d. according to the probabilities in this vector.

\[\begin{split}[\pi_{01}, \pi_{02}, \ldots \pi_{0K}] \sim \mbox{Dir}_K( \frac{\alpha_0}{K} ) \\ \mbox{for~} n \in 1, \ldots N: \\ \qquad z_n \sim \mbox{Cat}_K(\pi_{01}, \lddots \pi_{0K})\end{split}\]

Here, \(\alpha_0 > 0\) is the uniform concentration parameter. TODO interpret.

DPMixtureModel

The Dirichlet Process (DP) mixture has the following generative representation as an allocation model. It modifies the finite mixture by using the StickBreaking process to K active weights and a remainder weight, all inside $pi_0$.

\[\begin{split}[\pi_{01}, \pi_{02}, \ldots \pi_{0K}, \pi_{0, >K}] \sim \mbox{StickBreaking}_K(\pi_0) \\ \mbox{for~} n \in 1, \ldots N: \\ \qquad z_n \sim \mbox{Cat}_K(\pi_{01}, \lddots \pi_{0K})\end{split}\]

If we take the limit as K grows to infinity, these two generative models are equivalent.

Using mixtures with other bnpy modules

As usual, to train a hierarchical model whose allocation is done by FiniteMixtureModel,

Supported DataObj Types

Mixture models can apply to almost all data formats available in bnpy. Any data suitable for topic models or sequence models can also be fit with a basic mixture model.

The only formats that do not apply are those based on GraphData, which require the subclass of mixture models (TBD).

Supported Learning Algorithms

Currently, the practical differences are:

  • FiniteMixtureModel supports EM, VB, soVB, moVB
  • DPMixtureModel supports VB, soVB, and moVB.
    • with birth/merge/delete moves for moVB

EM (MAP) inference for the DPMixtureModel is possible, but just not implemented yet.

Common tasks with mixtures

Accessing learned cluster assignments

Given a dataset of interest Data (a DataObj), and an hmodel (an instance of HModel) properly initialized with K active clusters, we simply perform a local step.

>>> LP = hmodel.calc_local_params(Data)
>>> resp = LP['resp']

Here, resp is a 2D array of size N x K. Each entry resp[n, k] gives the probability that data atom n is assigned to cluster k under the posterior. Thus, each entry resp[n,k] must be a value within the interval [0,1]. The sum of every row must equal one.

>>> assert resp[n, k] >= 0.0
>>> assert resp[n, k] <= 1.0
>>> assert np.allclose(np.sum(resp[n,:]), 1.0)

To convert to hard assignments

>>> Z = resp.argmax(axis=1)

Here, Z is a 1D array of size N, where entry Z[n] is an integer in the set {0, 1, 2, ... K-1, K}.

Accessing learned cluster probabilities

>>> pi0 = hmodel.allocModel.get_active_cluster_probs()
>>> assert pi0.ndim == 1
>>> assert pi0.size == hmodel.allocModel.K

Global update summaries

For a global update, mixture models require only one sufficient statistic: an expected count value for each cluster k. This value gives the expected number of data atoms assigned to k throughout the dataset.

  • Count N_k

    Expected assignments to state k across all data items.

>>> LP = hmodel.calc_local_params(Data)
>>> SS = hmodel.get_global_suff_stats(Data, LP)
>>> Nvec = SS.N # or SS.getCountVec()
>>> assert Nvec.size == hmodel.allocModel.K
[ ... TODO ... ]

ELBO summaries

To compute the ELBO, mixture models require only one non-linear summary statistic: the entropy of the learned assignment parameters resp.

\[\L = \Ldata + \Lalloc - E[ \log q(z) ]\]\[- E[ \log q(z) ] = \sum_{k=1}^K H_k\]\[H_k = - \sum_{n=1}^N r_{nk} \log r_{nk}\]

You can compute this by enabling the correct keyword flag when calling the summary step function.

>>> LP = hmodel.calc_local_params(Data)
>>> SS = hmodel.get_global_suff_stats(Data, LP, doPrecompEntropy=1)
>>> Hresp =  SS.getELBOTerm('Hresp')
>>> assert Hresp.ndim == 1
>>> assert Hresp.size == SS.K
[ ... TODO ... ]