Allocation Model Guide

What is an allocation model?

Within bnpy, every hierarchical model we support has two pieces: an allocation model and an observation model. We use the label “allocation model” to describe the generative process that allocates cluster assignments to individual data points.

TODO ILLUSTRATION

In this document, we give a high-level overview of how we define an allocation model and how variational inference works. We also define the essential variational inference API functions that any concrete allocation model (an instance of the abstract AllocModel class) should support.

Generative model

An allocation model defines a probabilistic generative process for assigning (aka allocating) clusters to data atoms. There are two types of variables involved: cluster probability vectors $pi_j$, and discrete assignments $z_n$ at each data aton indexed by $n$. Each allocation model defines a joint distribution

\[\log p(\pi, z) = \log p(\pi) + \log p( z | \pi)\]

First, we generate a set of global cluster probabilities $pi_0$.

\[\pi_0 \sim \mbox{Dir}_K(\frac{\alpha_0}{K})\]

Depending on the model, we may next generate several more cluster probability vectors $pi_j$.

Second, we draw cluster assignment variables $z_n$ at each data atom $n$.

\[z_n \sim \mbox{Cat}( \pi_{j1}, \ldots \pi_{jK} )\]

Example: Mixture model

For example, consider a simple finite mixture model with $K$ clusters. The complete allocation model would be:

\[\pi_0 \sim \mbox{Dir}_K(\alpha_0 \frac{1}{K})\]\[z_n \sim \mbox{Cat}( \pi_{01}, \ldots \pi_{0K} )\]

To extend this to a Dirichlet process mixture model, we simply use a stick-breaking distribution instead:

\[\pi_0 \sim \mbox{Stick}(\alpha_0)\]\[z_n \sim \mbox{Cat}( \pi_{01}, \ldots \pi_{0K}, \ldots)\]

Variational Inference

Variational inference for allocation models tries to optimize an approximate posterior:

\[\log q(\pi, z) = \log q(\pi | \theta) + \log q(z | r)\]

The optimization objective is to make this approximate posterior as close to the true posterior as possible. Remember that this objective incorporates terms from the observation model as well. The optimization finds values for the free parameters – pseudo-counts theta and assignments r – that make the objective function as large as possible.

\[\L = \Lalloc(r, theta) + \Lobs(r, ...)\]

Expanding the allocation model terms, we have

\[\Lalloc(r, theta) = \Lz + \Lentropy\]\[\Lz = \E_q[ \log p(z) + \frac{\log p(\pi)}{\log q(\pi)} ]\]\[\Lentropy = - \E_q[ \log q(z) ]\]\[\log p(z | \alpha) \geq \Lpz\]

Every variational algorithm proceeds by iteratively improving this objective function by cycling through four concrete steps:

  • Local step: optimize the local assignments r and any local theta values.
  • Summary step: compute summary statistics from the local parameters.
  • Global step:
  • Objective function evaluation step

Variational API

Within bnpy, each possible allocation model is a subclass of the general-purpose abstract base class: AllocModel. Each AllocModel instance has both state and behaviors. The state represents two key values: the hyperparameters that define the prior and the global variational parameters that define the approximate posterior. The behaviors are the four fundamental steps of inference, as well as some auxiliary functions.

Attributes

For any generative model in our framework, the hyperparameters of an allocation model are just the set of concentration parameters $alpha_j$ that parameterize the generative story for each $pi_j$ probability vector. Thus, each allocation model will hold one or more alpha values as attributes.

Each AllocModel subclass will have model-specific global parameters, which are represented as instance attributes. For example, a FiniteMixtureModel has a vector of Dirichlet pseudo-counts called theta, while a DPMixtureModel instance has a vector of Beta pseudo-counts called eta.

Each of the four conceptual steps of the variational inference – local step, summary step, global step, and objective step – is associated with a single instance-level function of an AllocModel object. The general abstract interface for using these functions is documented below. Each subclass will provide an actual implementation of these functions.

Local step

The local step, specified by calc_local_params, finds local parameters for the dataset.

class bnpy.allocmodel.AllocModel(inferType)[source]
calc_local_params(Data, LP)[source]

Compute local parameters for each data item and component.

This is the E-step of EM algorithm.

Returned LP contains optimal values of local parameters specific to the provided dataset. Updated values computed using current global parameter attributes.

Possible keyword arguments control model-specific computations.

Parameters:
  • Data (DataObj) – Dataset to compute local parameters for.
  • LP (dict) – Must contain cond. likelihoods in field ‘E_log_soft_ev’, a 2D array that is N x K provided by the observation model.
Returns:

LP (dict) – Contains updated fields for all K clusters in current model. * ‘resp’ : N x K 2D array, soft assignments for each data atom.

Summary step

The summary step, specified by get_global_suff_stats, summarizes a dataset Data and its associated local parameters LP. It produces a bag of sufficient statistics SS.

class bnpy.allocmodel.AllocModel(inferType)[source]
get_global_suff_stats(Data, SS, LP, **kwargs)[source]

Compute low-dim summaries for provided local params.

Returned sufficient statistics are deterministic given Data, LP.

Possible keyword arguments control model-specific computations.

Parameters:
  • Data (DataObj) – Dataset to be summarized.
  • SS (SuffStatBag) – If present, all summaries will be added to this bag. If None, new bag will be created and returned.
  • LP (dict) – Holds valid local params for K’ clusters and all atoms in Data.
Returns:

SS (SuffStatBag) – Updated fields for each of K’ clusters represented in LP

Global step

The global step, performed by update_global_params,

class bnpy.allocmodel.AllocModel(inferType)[source]
get_global_suff_stats(Data, SS, LP, **kwargs)[source]

Compute low-dim summaries for provided local params.

Returned sufficient statistics are deterministic given Data, LP.

Possible keyword arguments control model-specific computations.

Parameters:
  • Data (DataObj) – Dataset to be summarized.
  • SS (SuffStatBag) – If present, all summaries will be added to this bag. If None, new bag will be created and returned.
  • LP (dict) – Holds valid local params for K’ clusters and all atoms in Data.
Returns:

SS (SuffStatBag) – Updated fields for each of K’ clusters represented in LP

Objective evaluation step

During inference, we need to verify that each step is working as expected. Thus, we need to be able to compute the scalar value of the objective given any current set of global parameters (stored in self) and local parameters (summarized in SS).

class bnpy.allocmodel.AllocModel(inferType)[source]
calc_evidence(Data, SS, LP, todict=0, **kwargs)[source]

Calculate ELBO objective function value for provided state.

Parameters:
  • Data (optional,) – If not provided, relies exclusively on summaries in SS
  • SS (SuffStatBag) – Contains valid summaries for desired dataset.
  • LP (optional, dict) – If not provided, relies exclusively on summaries in SS If provided, used in place of summaries in SS when possible.
Keyword Arguments:
 

todict (boolean) – If True, return a dict with different ELBO terms

under named keys like ‘Ldata’ and ‘Lentropy’

If False [default], return scalar value equal to sum of terms.

Returns:

L (float) – Represents sum of all terms in optimization objective. Will be a dict if todict option is True.