https://github.com/sergeyklay/clusterium
Text Clustering Toolkit for Bayesian Nonparametric Analysis
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic links in README
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (2.1%) to scientific vocabulary
Keywords
bayesian-analysis
clustering
data-science
dirichlet-process
embeddings
machine-learning
natural-language-processing
nlp
pitman-yor-process
power-law
sentence-transformers
text-analysis
Last synced: 6 months ago
·
JSON representation
Repository
Text Clustering Toolkit for Bayesian Nonparametric Analysis
Basic Info
- Host: GitHub
- Owner: sergeyklay
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://clusterium.readthedocs.io/
- Size: 1.38 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Topics
bayesian-analysis
clustering
data-science
dirichlet-process
embeddings
machine-learning
natural-language-processing
nlp
pitman-yor-process
power-law
sentence-transformers
text-analysis
Created 12 months ago
· Last pushed 9 months ago
Metadata Files
Readme
License
Owner
- Name: Serghei Iakovlev
- Login: sergeyklay
- Kind: user
- Location: Wrocław, Poland
- Company: airSlate
- Website: https://serghei.blog
- Repositories: 23
- Profile: https://github.com/sergeyklay
GitHub Events
Total
- Release event: 7
- Watch event: 1
- Delete event: 43
- Issue comment event: 40
- Push event: 117
- Pull request review comment event: 1
- Pull request review event: 5
- Pull request event: 77
- Create event: 50
Last Year
- Release event: 7
- Watch event: 1
- Delete event: 43
- Issue comment event: 40
- Push event: 117
- Pull request review comment event: 1
- Pull request review event: 5
- Pull request event: 77
- Create event: 50
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 70
- Average time to close issues: N/A
- Average time to close pull requests: about 15 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.93
- Merged pull requests: 70
- Bot issues: 0
- Bot pull requests: 4
Past Year
- Issues: 0
- Pull requests: 70
- Average time to close issues: N/A
- Average time to close pull requests: about 15 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.93
- Merged pull requests: 70
- Bot issues: 0
- Bot pull requests: 4
Top Authors
Issue Authors
Pull Request Authors
- sergeyklay (103)
- dependabot[bot] (9)
Top Labels
Issue Labels
Pull Request Labels
dependencies (9)
github_actions (9)
Packages
- Total packages: 1
-
Total downloads:
- pypi 29 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
pypi.org: clusx
Bayesian nonparametric toolkit for text clustering, analysis, and benchmarking with advanced embedding models and statistical validation.
- Homepage: https://clusterium.readthedocs.io
- Documentation: https://clusterium.readthedocs.io/en/latest/index.html
- License: MIT
-
Latest release: 0.6.0
published 11 months ago
Rankings
Dependent packages count: 9.5%
Average: 31.5%
Dependent repos count: 53.5%
Maintainers (1)
Last synced:
6 months ago
Dependencies
poetry.lock
pypi
- annotated-types 0.7.0
- anyio 4.8.0
- async-timeout 4.0.3
- black 25.1.0
- certifi 2025.1.31
- cffi 1.17.1
- charset-normalizer 3.4.1
- click 8.1.8
- colorama 0.4.6
- debugpy 1.8.13
- distro 1.9.0
- exceptiongroup 1.2.2
- greenlet 3.1.1
- h11 0.14.0
- hdbscan 0.8.40
- httpcore 1.0.7
- httpx 0.28.1
- idna 3.10
- isort 6.0.1
- jiter 0.8.2
- joblib 1.4.2
- jsonpatch 1.33
- jsonpointer 3.0.0
- langchain 0.3.20
- langchain-core 0.3.43
- langchain-openai 0.3.8
- langchain-text-splitters 0.3.6
- langsmith 0.3.13
- mypy-extensions 1.0.0
- numpy 2.2.3
- openai 1.65.4
- orjson 3.10.15
- packaging 24.2
- pandas 2.2.3
- pathspec 0.12.1
- platformdirs 4.3.6
- pycparser 2.22
- pydantic 2.10.6
- pydantic-core 2.27.2
- python-dateutil 2.9.0.post0
- python-dotenv 1.0.1
- pytz 2025.1
- pyyaml 6.0.2
- regex 2024.11.6
- requests 2.32.3
- requests-toolbelt 1.0.0
- scikit-learn 1.6.1
- scipy 1.15.2
- six 1.17.0
- sniffio 1.3.1
- sqlalchemy 2.0.38
- tenacity 9.0.0
- threadpoolctl 3.5.0
- tiktoken 0.9.0
- tomli 2.2.1
- tqdm 4.67.1
- typing-extensions 4.12.2
- tzdata 2025.1
- urllib3 2.3.0
- zstandard 0.23.0
pyproject.toml
pypi
.github/workflows/ci.yml
actions
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5.4.0 composite
- codecov/codecov-action v5.4.0 composite
- codecov/test-results-action v1 composite
- snok/install-poetry v1 composite