.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/04_bars_one_per_doc/run-03-demo=topic_model_vb_single_run-model=hdp_topic+mult.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_04_bars_one_per_doc_run-03-demo=topic_model_vb_single_run-model=hdp_topic+mult.py: ================================================= 03: Standard variational training for topic model ================================================= .. GENERATED FROM PYTHON SOURCE LINES 8-19 .. code-block:: default import bnpy import numpy as np import os from matplotlib import pylab import seaborn as sns FIG_SIZE = (3, 3) SMALL_FIG_SIZE = (1,1) pylab.rcParams['figure.figsize'] = FIG_SIZE .. GENERATED FROM PYTHON SOURCE LINES 20-21 Read dataset from file. .. GENERATED FROM PYTHON SOURCE LINES 21-26 .. code-block:: default dataset_path = os.path.join(bnpy.DATASET_PATH, 'bars_one_per_doc') dataset = bnpy.data.BagOfWordsData.read_npz( os.path.join(dataset_path, 'dataset.npz')) .. GENERATED FROM PYTHON SOURCE LINES 27-28 Make a simple plot of the raw data .. GENERATED FROM PYTHON SOURCE LINES 29-34 .. code-block:: default X_csr_DV = dataset.getSparseDocTypeCountMatrix() bnpy.viz.BarsViz.show_square_images( X_csr_DV[:10].toarray(), vmin=0, vmax=5) pylab.tight_layout() .. GENERATED FROM PYTHON SOURCE LINES 35-36 Setup: Function to show bars from start to end of training run .. GENERATED FROM PYTHON SOURCE LINES 37-60 .. code-block:: default def show_bars_over_time( task_output_path=None, query_laps=[0, 1, 2, 5, None], ncols=10): ''' ''' nrows = len(query_laps) fig_handle, ax_handles_RC = pylab.subplots( figsize=(SMALL_FIG_SIZE[0] * ncols, SMALL_FIG_SIZE[1] * nrows), nrows=nrows, ncols=ncols, sharex=True, sharey=True) for row_id, lap_val in enumerate(query_laps): cur_model, lap_val = bnpy.load_model_at_lap(task_output_path, lap_val) cur_topics_KV = cur_model.obsModel.getTopics() # Plot the current model cur_ax_list = ax_handles_RC[row_id].flatten().tolist() bnpy.viz.BarsViz.show_square_images( cur_topics_KV, vmin=0.0, vmax=0.06, ax_list=cur_ax_list) cur_ax_list[0].set_ylabel("lap: %d" % lap_val) pylab.tight_layout() .. GENERATED FROM PYTHON SOURCE LINES 61-65 Train LDA topic model --------------------- Using 10 clusters and the 'randexamples' initialization procedure. .. GENERATED FROM PYTHON SOURCE LINES 65-81 .. code-block:: default local_step_kwargs = dict( # perform at most this many iterations at each document nCoordAscentItersLP=100, # stop local iters early when max change in doc-topic counts < this thr convThrLP=0.001, ) trained_model, info_dict = bnpy.run( dataset, 'FiniteTopicModel', 'Mult', 'VB', output_path='/tmp/bars_one_per_doc/helloworld-model=topic+mult-K=10/', nLap=100, convergeThr=0.01, K=10, initname='randomlikewang', alpha=0.5, lam=0.1, **local_step_kwargs) .. GENERATED FROM PYTHON SOURCE LINES 82-85 First, we can plot the loss function over time We'll skip the first few iterations, since performance is quite bad. .. GENERATED FROM PYTHON SOURCE LINES 86-93 .. code-block:: default pylab.figure(figsize=FIG_SIZE) pylab.plot(info_dict['lap_history'][1:], info_dict['loss_history'][1:], 'k.-') pylab.xlabel('num. laps') pylab.ylabel('loss') pylab.tight_layout() .. GENERATED FROM PYTHON SOURCE LINES 94-95 Show the clusters over time .. GENERATED FROM PYTHON SOURCE LINES 96-99 .. code-block:: default show_bars_over_time(info_dict['task_output_path']) .. GENERATED FROM PYTHON SOURCE LINES 100-104 Train LDA topic model with restarts ----------------------------------- Using 10 clusters and the 'randexamples' initialization procedure. .. GENERATED FROM PYTHON SOURCE LINES 104-125 .. code-block:: default r_local_step_kwargs = dict( # perform at most this many iterations at each document nCoordAscentItersLP=100, # stop local iters early when max change in doc-topic counts < this thr convThrLP=0.001, # perform restart proposals at each document restartLP=1, restartNumItersLP=5, restartNumTrialsLP=5, ) r_trained_model, r_info_dict = bnpy.run( dataset, 'FiniteTopicModel', 'Mult', 'VB', output_path='/tmp/bars_one_per_doc/helloworld-model=topic+mult-K=10-localstep=restarts/', nLap=100, convergeThr=0.01, K=10, initname='randomlikewang', alpha=0.5, lam=0.1, **r_local_step_kwargs) show_bars_over_time(r_info_dict['task_output_path']) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_examples_04_bars_one_per_doc_run-03-demo=topic_model_vb_single_run-model=hdp_topic+mult.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: run-03-demo=topic_model_vb_single_run-model=hdp_topic+mult.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: run-03-demo=topic_model_vb_single_run-model=hdp_topic+mult.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_