.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/04_bars_one_per_doc/plot-01-demo=vb_single_run-model=mix+mult.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_04_bars_one_per_doc_plot-01-demo=vb_single_run-model=mix+mult.py: =================================================== 01: Standard variational training for mixture model =================================================== How to train a mixture of multinomials. .. GENERATED FROM PYTHON SOURCE LINES 8-19 .. code-block:: default import bnpy import numpy as np import os from matplotlib import pylab import seaborn as sns FIG_SIZE = (3, 3) SMALL_FIG_SIZE = (1,1) pylab.rcParams['figure.figsize'] = FIG_SIZE .. GENERATED FROM PYTHON SOURCE LINES 20-21 Read toy "bars" dataset from file. .. GENERATED FROM PYTHON SOURCE LINES 21-26 .. code-block:: default dataset_path = os.path.join(bnpy.DATASET_PATH, 'bars_one_per_doc') dataset = bnpy.data.BagOfWordsData.read_npz( os.path.join(dataset_path, 'dataset.npz')) .. GENERATED FROM PYTHON SOURCE LINES 27-28 Make a simple plot of the raw data .. GENERATED FROM PYTHON SOURCE LINES 29-36 .. code-block:: default X_csr_DV = dataset.getSparseDocTypeCountMatrix() bnpy.viz.BarsViz.show_square_images( X_csr_DV[:10].toarray(), vmin=0, vmax=5) #pylab.colorbar() #pylab.clabel('word count') pylab.tight_layout() .. image-sg:: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_001.png :alt: plot 01 demo=vb single run model=mix+mult :srcset: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 37-40 Let's do one single run of the VB algorithm. Using 10 clusters and the 'randexamples' initializatio procedure. .. GENERATED FROM PYTHON SOURCE LINES 41-49 .. code-block:: default trained_model, info_dict = bnpy.run( dataset, 'FiniteMixtureModel', 'Mult', 'VB', output_path='/tmp/bars_one_per_doc/helloworld-K=10/', nLap=1000, convergeThr=0.0001, K=10, initname='randomlikewang', gamma0=50.0, lam=0.1) .. rst-class:: sphx-glr-script-out .. code-block:: none WARNING: Found unrecognized keyword args. These are ignored. --gamma0 Dataset Summary: BagOfWordsData size: 2000 units (documents) vocab size: 144 min 5% 50% 95% max 38 42 46 51 57 nUniqueTokensPerDoc 100 100 100 100 100 nTotalTokensPerDoc Hist of word_count across tokens 1 2 3 <10 <100 >=100 0.38 0.29 0.19 0.14 0 0 Hist of unique docs per word type <1 <10 <100 <0.10 <0.20 <0.50 >=0.50 0 0 0 0 0 >.99 0 Allocation Model: Finite mixture model. Dir prior param 1.00 Obs. Data Model: Multinomial over finite vocabulary. Obs. Data Prior: Dirichlet over finite vocabulary lam = [0.1 0.1] ... Initialization: initname = randomlikewang K = 10 (number of clusters) seed = 1607680 elapsed_time: 0.0 sec Learn Alg: VB | task 1/1 | alg. seed: 1607680 | data order seed: 8541952 task_output_path: /tmp/bars_one_per_doc/helloworld-K=10/1 1/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.711444785e+00 | 2/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.244123622e+00 | Ndiff 270.283 3/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242999256e+00 | Ndiff 17.646 4/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242755538e+00 | Ndiff 11.500 5/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242599251e+00 | Ndiff 6.980 6/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242508901e+00 | Ndiff 4.291 7/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242425981e+00 | Ndiff 3.722 8/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242371327e+00 | Ndiff 3.025 9/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242328843e+00 | Ndiff 2.275 10/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242298419e+00 | Ndiff 0.979 11/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242296009e+00 | Ndiff 0.423 12/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242290695e+00 | Ndiff 0.588 13/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242278821e+00 | Ndiff 0.812 14/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242257185e+00 | Ndiff 0.771 15/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242254419e+00 | Ndiff 0.214 16/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242244922e+00 | Ndiff 0.468 17/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242219919e+00 | Ndiff 0.534 18/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242175380e+00 | Ndiff 0.753 19/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242154460e+00 | Ndiff 0.628 20/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242128894e+00 | Ndiff 0.395 21/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242126836e+00 | Ndiff 0.218 22/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242109815e+00 | Ndiff 0.529 23/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242103791e+00 | Ndiff 0.091 24/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242103790e+00 | Ndiff 0.001 25/1000 after 0 sec. | 218.8 MiB | K 10 | loss 4.242103790e+00 | Ndiff 0.000 ... done. converged. .. GENERATED FROM PYTHON SOURCE LINES 50-53 First, we can plot the loss function over time We'll skip the first few iterations, since performance is quite bad. .. GENERATED FROM PYTHON SOURCE LINES 54-62 .. code-block:: default pylab.figure(figsize=FIG_SIZE) pylab.plot(info_dict['lap_history'][2:], info_dict['loss_history'][2:], 'k.-') pylab.xlabel('num. laps') pylab.ylabel('loss') pylab.tight_layout() .. image-sg:: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_002.png :alt: plot 01 demo=vb single run model=mix+mult :srcset: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 63-64 Setup: Useful function to display learned bar structure over time. .. GENERATED FROM PYTHON SOURCE LINES 65-88 .. code-block:: default def show_bars_over_time( task_output_path=None, query_laps=[0, 1, 2, 5, None], ncols=10): ''' ''' nrows = len(query_laps) fig_handle, ax_handles_RC = pylab.subplots( figsize=(SMALL_FIG_SIZE[0] * ncols, SMALL_FIG_SIZE[1] * nrows), nrows=nrows, ncols=ncols, sharex=True, sharey=True) for row_id, lap_val in enumerate(query_laps): cur_model, lap_val = bnpy.load_model_at_lap(task_output_path, lap_val) cur_topics_KV = cur_model.obsModel.getTopics() # Plot the current model cur_ax_list = ax_handles_RC[row_id].flatten().tolist() bnpy.viz.BarsViz.show_square_images( cur_topics_KV, vmin=0.0, vmax=0.06, ax_list=cur_ax_list) cur_ax_list[0].set_ylabel("lap: %d" % lap_val) pylab.tight_layout() .. GENERATED FROM PYTHON SOURCE LINES 89-90 Show the clusters over time .. GENERATED FROM PYTHON SOURCE LINES 91-92 .. code-block:: default show_bars_over_time(info_dict['task_output_path']) .. image-sg:: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_003.png :alt: plot 01 demo=vb single run model=mix+mult :srcset: /examples/04_bars_one_per_doc/images/sphx_glr_plot-01-demo=vb_single_run-model=mix+mult_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 2.729 seconds) .. _sphx_glr_download_examples_04_bars_one_per_doc_plot-01-demo=vb_single_run-model=mix+mult.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot-01-demo=vb_single_run-model=mix+mult.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot-01-demo=vb_single_run-model=mix+mult.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_