https://github.com/sergeyklay/clusterium

Text Clustering Toolkit for Bayesian Nonparametric Analysis

https://github.com/sergeyklay/clusterium

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic links in README
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (2.1%) to scientific vocabulary

Keywords

bayesian-analysis clustering data-science dirichlet-process embeddings machine-learning natural-language-processing nlp pitman-yor-process power-law sentence-transformers text-analysis
Last synced: 6 months ago · JSON representation

Repository

Text Clustering Toolkit for Bayesian Nonparametric Analysis

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
bayesian-analysis clustering data-science dirichlet-process embeddings machine-learning natural-language-processing nlp pitman-yor-process power-law sentence-transformers text-analysis
Created 12 months ago · Last pushed 9 months ago
Metadata Files
Readme License

Owner

  • Name: Serghei Iakovlev
  • Login: sergeyklay
  • Kind: user
  • Location: Wrocław, Poland
  • Company: airSlate

GitHub Events

Total
  • Release event: 7
  • Watch event: 1
  • Delete event: 43
  • Issue comment event: 40
  • Push event: 117
  • Pull request review comment event: 1
  • Pull request review event: 5
  • Pull request event: 77
  • Create event: 50
Last Year
  • Release event: 7
  • Watch event: 1
  • Delete event: 43
  • Issue comment event: 40
  • Push event: 117
  • Pull request review comment event: 1
  • Pull request review event: 5
  • Pull request event: 77
  • Create event: 50

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 70
  • Average time to close issues: N/A
  • Average time to close pull requests: about 15 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.93
  • Merged pull requests: 70
  • Bot issues: 0
  • Bot pull requests: 4
Past Year
  • Issues: 0
  • Pull requests: 70
  • Average time to close issues: N/A
  • Average time to close pull requests: about 15 hours
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.93
  • Merged pull requests: 70
  • Bot issues: 0
  • Bot pull requests: 4
Top Authors
Issue Authors
Pull Request Authors
  • sergeyklay (103)
  • dependabot[bot] (9)
Top Labels
Issue Labels
Pull Request Labels
dependencies (9) github_actions (9)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 29 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
pypi.org: clusx

Bayesian nonparametric toolkit for text clustering, analysis, and benchmarking with advanced embedding models and statistical validation.

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 29 Last month
Rankings
Dependent packages count: 9.5%
Average: 31.5%
Dependent repos count: 53.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

poetry.lock pypi
  • annotated-types 0.7.0
  • anyio 4.8.0
  • async-timeout 4.0.3
  • black 25.1.0
  • certifi 2025.1.31
  • cffi 1.17.1
  • charset-normalizer 3.4.1
  • click 8.1.8
  • colorama 0.4.6
  • debugpy 1.8.13
  • distro 1.9.0
  • exceptiongroup 1.2.2
  • greenlet 3.1.1
  • h11 0.14.0
  • hdbscan 0.8.40
  • httpcore 1.0.7
  • httpx 0.28.1
  • idna 3.10
  • isort 6.0.1
  • jiter 0.8.2
  • joblib 1.4.2
  • jsonpatch 1.33
  • jsonpointer 3.0.0
  • langchain 0.3.20
  • langchain-core 0.3.43
  • langchain-openai 0.3.8
  • langchain-text-splitters 0.3.6
  • langsmith 0.3.13
  • mypy-extensions 1.0.0
  • numpy 2.2.3
  • openai 1.65.4
  • orjson 3.10.15
  • packaging 24.2
  • pandas 2.2.3
  • pathspec 0.12.1
  • platformdirs 4.3.6
  • pycparser 2.22
  • pydantic 2.10.6
  • pydantic-core 2.27.2
  • python-dateutil 2.9.0.post0
  • python-dotenv 1.0.1
  • pytz 2025.1
  • pyyaml 6.0.2
  • regex 2024.11.6
  • requests 2.32.3
  • requests-toolbelt 1.0.0
  • scikit-learn 1.6.1
  • scipy 1.15.2
  • six 1.17.0
  • sniffio 1.3.1
  • sqlalchemy 2.0.38
  • tenacity 9.0.0
  • threadpoolctl 3.5.0
  • tiktoken 0.9.0
  • tomli 2.2.1
  • tqdm 4.67.1
  • typing-extensions 4.12.2
  • tzdata 2025.1
  • urllib3 2.3.0
  • zstandard 0.23.0
pyproject.toml pypi
.github/workflows/ci.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/setup-python v5.4.0 composite
  • codecov/codecov-action v5.4.0 composite
  • codecov/test-results-action v1 composite
  • snok/install-poetry v1 composite