.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/04_bars_one_per_doc/plot-01-demo=vb_single_run-model=mix+mult.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_examples_04_bars_one_per_doc_plot-01-demo=vb_single_run-model=mix+mult.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_04_bars_one_per_doc_plot-01-demo=vb_single_run-model=mix+mult.py:


===================================================
01: Standard variational training for mixture model
===================================================

How to train a mixture of multinomials.

.. GENERATED FROM PYTHON SOURCE LINES 8-19

.. code-block:: default

    import bnpy
    import numpy as np
    import os

    from matplotlib import pylab
    import seaborn as sns

    FIG_SIZE = (3, 3)
    SMALL_FIG_SIZE = (1,1)
    pylab.rcParams['figure.figsize'] = FIG_SIZE


.. GENERATED FROM PYTHON SOURCE LINES 20-21

Read toy "bars" dataset from file.

.. GENERATED FROM PYTHON SOURCE LINES 21-26

.. code-block:: default


    dataset_path = os.path.join(bnpy.DATASET_PATH, 'bars_one_per_doc')
    dataset = bnpy.data.BagOfWordsData.read_npz(
        os.path.join(dataset_path, 'dataset.npz'))


.. GENERATED FROM PYTHON SOURCE LINES 27-28

Make a simple plot of the raw data

.. GENERATED FROM PYTHON SOURCE LINES 29-36

.. code-block:: default

    X_csr_DV = dataset.getSparseDocTypeCountMatrix()
    bnpy.viz.BarsViz.show_square_images(
        X_csr_DV[:10].toarray(), vmin=0, vmax=5)
    #pylab.colorbar()
    #pylab.clabel('word count')
    pylab.tight_layout()


.. image-sg:: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_001.png
   :alt: plot 01 demo=vb single run model=mix+mult
   :srcset: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 37-40

Let's do one single run of the VB algorithm.

Using 10 clusters and the 'randexamples' initializatio procedure.

.. GENERATED FROM PYTHON SOURCE LINES 41-49

.. code-block:: default


    trained_model, info_dict = bnpy.run(
        dataset, 'FiniteMixtureModel', 'Mult', 'VB',
        output_path='/tmp/bars_one_per_doc/helloworld-K=10/',
        nLap=1000, convergeThr=0.0001,
        K=10, initname='randomlikewang',
        gamma0=50.0, lam=0.1)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    WARNING: Found unrecognized keyword args. These are ignored.
      --gamma0
    Dataset Summary:
    BagOfWordsData
      size: 2000 units (documents)
      vocab size: 144
      min    5%   50%   95%   max 
       38    42    46    51    57  nUniqueTokensPerDoc
      100   100   100   100   100  nTotalTokensPerDoc
    Hist of word_count across tokens 
          1      2      3    <10   <100  >=100
       0.38   0.29   0.19   0.14      0      0
    Hist of unique docs per word type
         <1    <10   <100  <0.10  <0.20  <0.50 >=0.50
          0      0      0      0      0   >.99      0
    Allocation Model:  Finite mixture model. Dir prior param 1.00
    Obs. Data  Model:  Multinomial over finite vocabulary.
    Obs. Data  Prior:  Dirichlet over finite vocabulary 
      lam = [0.1 0.1] ...
    Initialization:
      initname = randomlikewang
      K = 10 (number of clusters)
      seed = 1607680
      elapsed_time: 0.0 sec
    Learn Alg: VB | task  1/1 | alg. seed: 1607680 | data order seed: 8541952
    task_output_path: /tmp/bars_one_per_doc/helloworld-K=10/1
            1/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.711444785e+00 |  
            2/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.244123622e+00 | Ndiff  270.283 
            3/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242999256e+00 | Ndiff   17.646 
            4/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242755538e+00 | Ndiff   11.500 
            5/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242599251e+00 | Ndiff    6.980 
            6/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242508901e+00 | Ndiff    4.291 
            7/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242425981e+00 | Ndiff    3.722 
            8/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242371327e+00 | Ndiff    3.025 
            9/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242328843e+00 | Ndiff    2.275 
           10/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242298419e+00 | Ndiff    0.979 
           11/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242296009e+00 | Ndiff    0.423 
           12/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242290695e+00 | Ndiff    0.588 
           13/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242278821e+00 | Ndiff    0.812 
           14/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242257185e+00 | Ndiff    0.771 
           15/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242254419e+00 | Ndiff    0.214 
           16/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242244922e+00 | Ndiff    0.468 
           17/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242219919e+00 | Ndiff    0.534 
           18/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242175380e+00 | Ndiff    0.753 
           19/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242154460e+00 | Ndiff    0.628 
           20/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242128894e+00 | Ndiff    0.395 
           21/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242126836e+00 | Ndiff    0.218 
           22/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242109815e+00 | Ndiff    0.529 
           23/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242103791e+00 | Ndiff    0.091 
           24/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242103790e+00 | Ndiff    0.001 
           25/1000 after      0 sec. |    218.8 MiB | K   10 | loss  4.242103790e+00 | Ndiff    0.000 
    ... done. converged.


.. GENERATED FROM PYTHON SOURCE LINES 50-53

First, we can plot the loss function over time
We'll skip the first few iterations, since performance is quite bad.


.. GENERATED FROM PYTHON SOURCE LINES 54-62

.. code-block:: default


    pylab.figure(figsize=FIG_SIZE)
    pylab.plot(info_dict['lap_history'][2:], info_dict['loss_history'][2:], 'k.-')
    pylab.xlabel('num. laps')
    pylab.ylabel('loss')
    pylab.tight_layout()


.. image-sg:: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_002.png
   :alt: plot 01 demo=vb single run model=mix+mult
   :srcset: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 63-64

Setup: Useful function to display learned bar structure over time.

.. GENERATED FROM PYTHON SOURCE LINES 65-88

.. code-block:: default


    def show_bars_over_time(
            task_output_path=None,
            query_laps=[0, 1, 2, 5, None],
            ncols=10):
        '''
        '''
        nrows = len(query_laps)
        fig_handle, ax_handles_RC = pylab.subplots(
            figsize=(SMALL_FIG_SIZE[0] * ncols, SMALL_FIG_SIZE[1] * nrows),
            nrows=nrows, ncols=ncols, sharex=True, sharey=True)
        for row_id, lap_val in enumerate(query_laps):
            cur_model, lap_val = bnpy.load_model_at_lap(task_output_path, lap_val)
            cur_topics_KV = cur_model.obsModel.getTopics()
            # Plot the current model
            cur_ax_list = ax_handles_RC[row_id].flatten().tolist()
            bnpy.viz.BarsViz.show_square_images(
                cur_topics_KV,
                vmin=0.0, vmax=0.06,
                ax_list=cur_ax_list)
            cur_ax_list[0].set_ylabel("lap: %d" % lap_val)
        pylab.tight_layout()


.. GENERATED FROM PYTHON SOURCE LINES 89-90

Show the clusters over time

.. GENERATED FROM PYTHON SOURCE LINES 91-92

.. code-block:: default

    show_bars_over_time(info_dict['task_output_path'])


.. image-sg:: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_003.png
   :alt: plot 01 demo=vb single run model=mix+mult
   :srcset: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  2.729 seconds)


.. _sphx_glr_download_examples_04_bars_one_per_doc_plot-01-demo=vb_single_run-model=mix+mult.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot-01-demo=vb_single_run-model=mix+mult.py <plot-01-demo=vb_single_run-model=mix+mult.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot-01-demo=vb_single_run-model=mix+mult.ipynb <plot-01-demo=vb_single_run-model=mix+mult.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_