=======================
Mixture Models
=======================

**bnpy** supports two kinds of mixture models: `FiniteMixtureModel` and `DPMixtureModel`. 

`FiniteMixtureModel`
--------------------
The finite mixture has the following generative representation as an allocation model. There is a single top-level vector of cluster probabilities :math:`\pi_0`. Each data atom's assignment is drawn i.i.d. according to the probabilities in this vector.

.. math::

	[\pi_{01}, \pi_{02}, \ldots \pi_{0K}] 
	\sim \mbox{Dir}_K( \frac{\alpha_0}{K} )
	\\
	\mbox{for~} n \in 1, \ldots N:
	\\
	\qquad z_n \sim \mbox{Cat}_K(\pi_{01}, \lddots \pi_{0K})

Here, :math:`\alpha_0 > 0` is the uniform concentration parameter. TODO interpret.

`DPMixtureModel`
--------------------
The Dirichlet Process (DP) mixture has the following generative representation as an allocation model. It modifies the finite mixture by using the StickBreaking process to K active weights and a remainder weight, all inside $\pi_0$.

.. math::
	[\pi_{01}, \pi_{02}, \ldots \pi_{0K}, \pi_{0, >K}] 
	\sim \mbox{StickBreaking}_K(\pi_0)
	\\
	\mbox{for~} n \in 1, \ldots N:
	\\
	\qquad z_n \sim \mbox{Cat}_K(\pi_{01}, \lddots \pi_{0K})

If we take the limit as K grows to infinity, these two generative models are equivalent.

Using mixtures with other **bnpy** modules
------------------------------------------

As usual, to train a hierarchical model whose allocation is done by FiniteMixtureModel,  

.. code::

	>>> hmodel, Info = bnpy.Run(Data, 'FiniteMixtureModel', obsModelName, algName, **kwargs)
	>>> # or
	>>> hmodel, Info = bnpy.Run(Data, 'DPMixtureModel', obsModelName, algName, **kwargs)

Supported DataObj Types
+++++++++++++++++++++++++++

Mixture models can apply to almost all data formats available in bnpy.
Any data suitable for topic models or sequence models can also be fit
with a basic mixture model.

The only formats that do not apply are those based on GraphData, 
which require the subclass of mixture models (TBD).

Supported Learning Algorithms
+++++++++++++++++++++++++++++
Currently, the practical differences are:

* `FiniteMixtureModel` supports EM, VB, soVB, moVB
* `DPMixtureModel` supports VB, soVB, and moVB.
* * with birth/merge/delete moves for moVB

EM (MAP) inference for the DPMixtureModel is possible, but just not implemented yet.


Common tasks with mixtures
---------------------------

Accessing learned cluster assignments
+++++++++++++++++++++++++++++++++++++

Given a dataset of interest Data (a :class:`.DataObj`), and an hmodel (an instance of :class:`.HModel`) properly initialized with K active clusters, we simply perform a local step.

.. code::

	>>> LP = hmodel.calc_local_params(Data)
	>>> resp = LP['resp']

Here, resp is a 2D array of size N x K. 
Each entry resp[n, k] gives the probability that data atom n is assigned to cluster k under the posterior. 
Thus, each entry resp[n,k] must be a value within the interval [0,1].
The sum of every row must equal one.

.. code::

	>>> assert resp[n, k] >= 0.0
	>>> assert resp[n, k] <= 1.0
	>>> assert np.allclose(np.sum(resp[n,:]), 1.0)


To convert to hard assignments

.. code::

	>>> Z = resp.argmax(axis=1)

Here, Z is a 1D array of size N, where entry Z[n] is an integer in the set {0, 1, 2, ... K-1, K}.

Accessing learned cluster probabilities
+++++++++++++++++++++++++++++++++++++++

.. code::

	>>> pi0 = hmodel.allocModel.get_active_cluster_probs()
	>>> assert pi0.ndim == 1
	>>> assert pi0.size == hmodel.allocModel.K

Global update summaries
+++++++++++++++++++++++++++

For a global update, mixture models require only one sufficient statistic: an expected count value for each cluster k. This value gives the expected number of data atoms assigned to k throughout the dataset.

* Count N_k
	Expected assignments to state k across all data items.

.. code::

	>>> LP = hmodel.calc_local_params(Data)
	>>> SS = hmodel.get_global_suff_stats(Data, LP)
	>>> Nvec = SS.N # or SS.getCountVec()
	>>> assert Nvec.size == hmodel.allocModel.K
	[ ... TODO ... ]

ELBO summaries
++++++++++++++

To compute the ELBO, mixture models require only one non-linear summary statistic: the entropy of the learned assignment parameters `resp`.

.. math::

	\L = \Ldata + \Lalloc - E[ \log q(z) ]

	- E[ \log q(z) ] = \sum_{k=1}^K H_k

	H_k = - \sum_{n=1}^N r_{nk} \log r_{nk}

You can compute this by enabling the correct keyword flag when calling the summary step function.

.. code::

	>>> LP = hmodel.calc_local_params(Data)
	>>> SS = hmodel.get_global_suff_stats(Data, LP, doPrecompEntropy=1)
	>>> Hresp =  SS.getELBOTerm('Hresp')
	>>> assert Hresp.ndim == 1
	>>> assert Hresp.size == SS.K
	[ ... TODO ... ]

.. toctree::
   :maxdepth: 3
   :titlesonly:
   :hidden:

   FiniteMixtureModel.rst
   FiniteMixtureModel-Variational.rst

   DPMixtureModel.rst
   DPMixtureModel-Variational.rst