Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: leaeb
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 133 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Noise-Aware Cluster Validity Indices (NACVI)

This repository contains a Python implementation of internal cluster validity indices specifically designed for noise-aware Clusterings (e.g. DBSCAN). The validity indices presented here explicitly consider unassigned data points (noise), which makes them particularly suitable for realistic, unsupervised settings.

This is based on the scientific publication:
Lea Eileen Brauner, Frank Höppner, Frank Klawonn
Cluster Validity for Noise-Aware Clusterings, Intelligent Data Analysis Journal, IOS Press (2025)


Content

You can find the implementations of the following NACVIs:

  • sil+: noise-aware Silhouette Coefficient
  • dbi+: noise-aware Davies-Bouldin Index
  • gD33+: noise-aware Dunn-Index-Variant
  • sf+: noise-aware Score Function
  • grid+: grid-based noise-validity index
  • nr+: neighbourhood-based Noise-validity index

Motivation

Conventional validity measures treat all data points as belonging to a cluster, even if noise is explicitly labelled in DBSCAN, for example. This leads to distorted evaluations.

This package: - takes noise into account correctly, - enables a separate evaluation of the cluster structure and the noise delimitation, - offers an integrated metric for both with the B+ score.


Installation

bash pip install nacvi

Usage

In examples/usage_miniexample.py you can find a minimal example for the usage with numpy arrays as inputs.

In examples/usage_example.ipynb you can find a comprehensive example with: - data generation, - execution of the DBSCAN clustering algorithm, - visualisation, - calculation of the NACVIs

Owner

  • Name: Lea Eileen Brauner
  • Login: leaeb
  • Kind: user
  • Location: Braunschweig

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the following paper."
authors:
  - family-names: Brauner
    given-names: Lea Eileen
    affiliation: Ostfalia University of Applied Sciences
title: "Noise-Aware Cluster Validity Measures"
version: "1.0.0"
doi: 10.3233/IDA-XXXXX  # <- DOI deiner Publikation
date-released: 2025-07-14
url: https://github.com/username/noise-aware-cluster-validity

GitHub Events

Total
  • Push event: 1
  • Create event: 2
Last Year
  • Push event: 1
  • Create event: 2

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
pypi.org: nacvi

Cluster Validity Indices for Noise-Aware Clusterings (e.g. DBSCAN)

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 8.7%
Average: 28.9%
Dependent repos count: 49.1%
Maintainers (1)
Last synced: 7 months ago

Dependencies

Pipfile pypi
  • ipykernel * develop
  • matplotlib *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
Pipfile.lock pypi
  • appnope ==0.1.4 develop
  • asttokens ==3.0.0 develop
  • comm ==0.2.2 develop
  • debugpy ==1.8.14 develop
  • decorator ==5.2.1 develop
  • executing ==2.2.0 develop
  • ipykernel ==6.29.5 develop
  • ipython ==9.4.0 develop
  • ipython-pygments-lexers ==1.1.1 develop
  • jedi ==0.19.2 develop
  • jupyter-client ==8.6.3 develop
  • jupyter-core ==5.8.1 develop
  • matplotlib-inline ==0.1.7 develop
  • nest-asyncio ==1.6.0 develop
  • packaging ==25.0 develop
  • parso ==0.8.4 develop
  • pexpect ==4.9.0 develop
  • platformdirs ==4.3.8 develop
  • prompt-toolkit ==3.0.51 develop
  • psutil ==7.0.0 develop
  • ptyprocess ==0.7.0 develop
  • pure-eval ==0.2.3 develop
  • pygments ==2.19.2 develop
  • python-dateutil ==2.9.0.post0 develop
  • pyzmq ==27.0.0 develop
  • six ==1.17.0 develop
  • stack-data ==0.6.3 develop
  • tornado ==6.5.1 develop
  • traitlets ==5.14.3 develop
  • wcwidth ==0.2.13 develop
  • contourpy ==1.3.2
  • cycler ==0.12.1
  • fonttools ==4.58.5
  • joblib ==1.5.1
  • kiwisolver ==1.4.8
  • matplotlib ==3.10.3
  • numpy ==2.3.1
  • packaging ==25.0
  • pandas ==2.3.1
  • pillow ==11.3.0
  • pyparsing ==3.2.3
  • python-dateutil ==2.9.0.post0
  • pytz ==2025.2
  • scikit-learn ==1.7.0
  • scipy ==1.16.0
  • six ==1.17.0
  • threadpoolctl ==3.6.0
  • tzdata ==2025.2
nacvi.egg-info/requires.txt pypi
  • contourpy ==1.3.2
  • cycler ==0.12.1
  • fonttools ==4.58.5
  • joblib ==1.5.1
  • kiwisolver ==1.4.8
  • numpy ==2.3.1
  • packaging ==25.0
  • pandas ==2.3.1
  • pillow ==11.3.0
  • pyparsing ==3.2.3
  • python-dateutil ==2.9.0.post0
  • pytz ==2025.2
  • scikit-learn ==1.7.0
  • scipy ==1.16.0
  • six ==1.17.0
  • threadpoolctl ==3.6.0
  • tzdata ==2025.2
pyproject.toml pypi
requirements.txt pypi
  • contourpy ==1.3.2
  • cycler ==0.12.1
  • fonttools ==4.58.5
  • joblib ==1.5.1
  • kiwisolver ==1.4.8
  • numpy ==2.3.1
  • packaging ==25.0
  • pandas ==2.3.1
  • pillow ==11.3.0
  • pyparsing ==3.2.3
  • python-dateutil ==2.9.0.post0
  • pytz ==2025.2
  • scikit-learn ==1.7.0
  • scipy ==1.16.0
  • six ==1.17.0
  • threadpoolctl ==3.6.0
  • tzdata ==2025.2