Recent Releases of anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali

anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali - Improve topic interpretability, unify visual styling, and update documentation

Implemented multiple usability and interpretability improvements for v1.1.1:

  • Added top-word hover tooltips across all visualisations for clearer topic interpretation.
  • Standardised global topic colour scheme across all charts.
  • Updated documentation to explain that Heatmap and Bar Chart visualise the same topic–word weight matrix.
  • Reduced number of displayed terms in plots to prevent hidden tick labels; added hover-based x-axis details where needed.
  • Clarified Topic Evolution axis (document upload order) and added filenames to hover output.
  • Enhanced hierarchical clustering with BERTopic-style merged-cluster keyword tooltips.
  • Unified Plotly font styling using Roboto/Noto Bengali; reduced margins for a cleaner layout.
  • Added missing loading spinner to indicate processing during analysis.

These changes significantly improve clarity, consistency, and user experience in the visualisation interface.

- HTML
Published by vinayakdasgupta 6 months ago

anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali - v1.1.0 – Improved Bengali Topic Modelling with Dictionary-Based Stemming

v1.1.0 – Improved Bengali Topic Modelling with Dictionary-Based Stemming

This release improves topic coherence and interpretability by replacing the earlier rule-based Bengali stemmer with a dictionary-based stemmer using lemma mappings from the BNLP Project.

Enhancements

  • Integrated BNLP lemma dictionary for accurate and consistent token normalization
  • Improved topic-word lists across all corpora (tested against Tagore, Bankim, and news.
  • Smoother topic prevalence distribution, reducing overfitting
  • Significantly reduced vocabulary size without loss of nuance
  • No impact on model speed or interface responsiveness

Internal Changes

  • The stemming logic is now handled in utils.py using a dictionary-based approach
  • stem_tokens() is retained for compatibility with previous versions

Acknowledgements

This release builds directly on the lemma data provided by the BNLP Project (https://github.com/bedanta79/bnlp). We gratefully acknowledge their work.

This is a backward-compatible release. The user interface and input/output structure remain unchanged from v1.0.0.

- HTML
Published by vinayakdasgupta about 1 year ago