Recent Releases of anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali
anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali - Improve topic interpretability, unify visual styling, and update documentation
Implemented multiple usability and interpretability improvements for v1.1.1:
- Added top-word hover tooltips across all visualisations for clearer topic interpretation.
- Standardised global topic colour scheme across all charts.
- Updated documentation to explain that Heatmap and Bar Chart visualise the same topic–word weight matrix.
- Reduced number of displayed terms in plots to prevent hidden tick labels; added hover-based x-axis details where needed.
- Clarified Topic Evolution axis (document upload order) and added filenames to hover output.
- Enhanced hierarchical clustering with BERTopic-style merged-cluster keyword tooltips.
- Unified Plotly font styling using Roboto/Noto Bengali; reduced margins for a cleaner layout.
- Added missing loading spinner to indicate processing during analysis.
These changes significantly improve clarity, consistency, and user experience in the visualisation interface.
- HTML
Published by vinayakdasgupta 6 months ago
anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali - v1.1.0 – Improved Bengali Topic Modelling with Dictionary-Based Stemming
v1.1.0 – Improved Bengali Topic Modelling with Dictionary-Based Stemming
This release improves topic coherence and interpretability by replacing the earlier rule-based Bengali stemmer with a dictionary-based stemmer using lemma mappings from the BNLP Project.
Enhancements
- Integrated BNLP lemma dictionary for accurate and consistent token normalization
- Improved topic-word lists across all corpora (tested against Tagore, Bankim, and news.
- Smoother topic prevalence distribution, reducing overfitting
- Significantly reduced vocabulary size without loss of nuance
- No impact on model speed or interface responsiveness
Internal Changes
- The stemming logic is now handled in
utils.pyusing a dictionary-based approach stem_tokens()is retained for compatibility with previous versions
Acknowledgements
This release builds directly on the lemma data provided by the BNLP Project (https://github.com/bedanta79/bnlp). We gratefully acknowledge their work.
This is a backward-compatible release. The user interface and input/output structure remain unchanged from v1.0.0.
- HTML
Published by vinayakdasgupta about 1 year ago