Within bnpy, every hierarchical model we support has two pieces: an allocation model and an observation model. We use the label “allocation model” to describe the generative process that allocates cluster assignments to individual data points.
TODO ILLUSTRATION
In this document, we give a high-level overview of how we define an allocation model and how variational inference works. We also define the essential variational inference API functions that any concrete allocation model (an instance of the abstract AllocModel
class) should support.
Here are some quick links to documentation for each of the possible allocation models supported by bnpy.
An allocation model defines a probabilistic generative process for assigning (aka allocating) clusters to data atoms. There are two types of variables involved: cluster probability vectors $pi_j$, and discrete assignments $z_n$ at each data aton indexed by $n$. Each allocation model defines a joint distribution
First, we generate a set of global cluster probabilities $pi_0$.
Depending on the model, we may next generate several more cluster probability vectors $pi_j$.
Second, we draw cluster assignment variables $z_n$ at each data atom $n$.
For example, consider a simple finite mixture model with $K$ clusters. The complete allocation model would be:
To extend this to a Dirichlet process mixture model, we simply use a stick-breaking distribution instead:
Variational inference for allocation models tries to optimize an approximate posterior:
The optimization objective is to make this approximate posterior as close to the true posterior as possible. Remember that this objective incorporates terms from the observation model as well. The optimization finds values for the free parameters – pseudo-counts theta and assignments r – that make the objective function as large as possible.
Expanding the allocation model terms, we have
Every variational algorithm proceeds by iteratively improving this objective function by cycling through four concrete steps:
Local step: optimize the local assignments r and any local theta values.
Summary step: compute summary statistics from the local parameters.
Global step:
Objective function evaluation step
Within bnpy, each possible allocation model is a subclass of the general-purpose abstract base class: AllocModel
.
Each AllocModel
instance has both state and behaviors.
The state represents two key values: the hyperparameters that define the prior and the global variational parameters that define the approximate posterior. The behaviors are the four fundamental steps of inference, as well as some auxiliary functions.
For any generative model in our framework, the hyperparameters of an allocation model are just the set of concentration parameters $alpha_j$ that parameterize the generative story for each $pi_j$ probability vector. Thus, each allocation model will hold one or more alpha values as attributes.
Each AllocModel
subclass will have model-specific global parameters, which are represented as instance attributes. For example, a FiniteMixtureModel
has a vector of Dirichlet pseudo-counts called theta, while a DPMixtureModel
instance has a vector of Beta pseudo-counts called eta.
Each of the four conceptual steps of the variational inference – local step, summary step, global step, and objective step – is associated with a single instance-level function of an AllocModel object. The general abstract interface for using these functions is documented below. Each subclass will provide an actual implementation of these functions.
The local step, specified by calc_local_params, finds local parameters for the dataset.
Compute local parameters for each data item and component.
This is the E-step of EM algorithm.
Returned LP contains optimal values of local parameters specific to the provided dataset. Updated values computed using current global parameter attributes.
Possible keyword arguments control model-specific computations.
Data (DataObj
) – Dataset to compute local parameters for.
LP (dict) – Must contain cond. likelihoods in field ‘E_log_soft_ev’, a 2D array that is N x K provided by the observation model.
LP (dict) – Contains updated fields for all K clusters in current model. * ‘resp’ : N x K 2D array, soft assignments for each data atom.
The summary step, specified by get_global_suff_stats, summarizes a dataset Data and its associated local parameters LP. It produces a bag of sufficient statistics SS.
Compute low-dim summaries for provided local params.
Returned sufficient statistics are deterministic given Data, LP.
Possible keyword arguments control model-specific computations.
Data (DataObj
) – Dataset to be summarized.
SS (SuffStatBag
) – If present, all summaries will be added to this bag.
If None, new bag will be created and returned.
LP (dict) – Holds valid local params for K’ clusters and all atoms in Data.
SS (SuffStatBag
) – Updated fields for each of K’ clusters represented in LP
The global step, performed by update_global_params,
Compute low-dim summaries for provided local params.
Returned sufficient statistics are deterministic given Data, LP.
Possible keyword arguments control model-specific computations.
Data (DataObj
) – Dataset to be summarized.
SS (SuffStatBag
) – If present, all summaries will be added to this bag.
If None, new bag will be created and returned.
LP (dict) – Holds valid local params for K’ clusters and all atoms in Data.
SS (SuffStatBag
) – Updated fields for each of K’ clusters represented in LP
During inference, we need to verify that each step is working as expected. Thus, we need to be able to compute the scalar value of the objective given any current set of global parameters (stored in self) and local parameters (summarized in SS).
Calculate ELBO objective function value for provided state.
Data (optional,) – If not provided, relies exclusively on summaries in SS
SS (SuffStatBag
) – Contains valid summaries for desired dataset.
LP (optional, dict) – If not provided, relies exclusively on summaries in SS If provided, used in place of summaries in SS when possible.
todict (boolean) –
under named keys like ‘Ldata’ and ‘Lentropy’.
This is the default.
L (float) – Represents sum of all terms in optimization objective. Will be a dict if todict option is True.