pdistmap

PDistMap helps to find the overlap percentage of two probability distributions.

https://github.com/rehanguha/pdistmap

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary

Keywords

machine-learning probability statistics
Last synced: 7 months ago · JSON representation ·

Repository

PDistMap helps to find the overlap percentage of two probability distributions.

Basic Info
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 3
Topics
machine-learning probability statistics
Created over 1 year ago · Last pushed 8 months ago
Metadata Files
Readme Funding License Code of conduct Citation

README.md

PdistMap: A Kernel Density Overlap Metric for Distributional Similarity and Cluster Matching

DOI

SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5358971

This package calculates the overlap percentage between two probability distributions, offering extensive applications in both academic and industrial settings. For instance, in multiple iterations of machine learning clustering, the core algorithm may change the cluster number or name, making it challenging for the end user to map the clusters accurately.

Example Use Cases:

  • Machine Learning Clustering: In scenarios where multiple iterations of clustering algorithms are performed, the cluster identifiers may change, making it difficult to track and compare clusters across iterations. This package helps in mapping and comparing clusters by calculating the overlap percentage between the distributions of cluster assignments. For example, if a data scientist is running a k-means clustering algorithm multiple times, the cluster labels might change in each iteration. By using this package, they can measure the overlap between the clusters from different iterations and ensure consistency in their analysis.

  • Anomaly Detection: The package can be used to compare the distribution of data points in normal and anomalous conditions, helping in identifying and quantifying the extent of anomalies. For instance, in a network security application, the distribution of network traffic under normal conditions can be compared with the distribution during a suspected attack. The overlap percentage can help quantify the deviation and identify potential security breaches.

  • Quality Control: In manufacturing and quality control processes, the package can be used to compare the distribution of measurements from different batches or production runs, ensuring consistency and identifying deviations. For example, a quality control engineer can compare the distribution of product dimensions from two different production runs to ensure that they meet the required specifications and identify any deviations that need to be addressed.

  • Market Research: The package can be applied to compare the distribution of survey responses or customer preferences across different demographic groups or time periods, providing insights into market trends and changes in consumer behavior. For instance, a market researcher can compare the distribution of customer satisfaction scores from two different regions to identify any significant differences and tailor marketing strategies accordingly.

  • Healthcare Analytics: In healthcare, the package can be used to compare the distribution of patient outcomes or treatment responses across different groups, aiding in the evaluation of treatment effectiveness and identifying potential disparities. For example, a healthcare analyst can compare the distribution of recovery times for patients receiving two different treatments to determine which treatment is more effective and identify any disparities in treatment outcomes.

Installation

bash pip install pdistmap

How to use it

Method 1

```python

from pdistmap.set import KDEIntersection import numpy as np

A = np.array([25, 40, 70, 65, 69, 75, 80, 85]) B = np.array([25, 40, 70, 65, 69, 75, 80, 85, 81, 90])

area = KDEIntersection(A,B).intersection_area() print(area) # Expected output: 0.8752770150023454

KDEIntersection(A,B).intersection_area(plot = True)

```

Sample Image

Owner

  • Name: Rehan Guha
  • Login: rehanguha
  • Kind: user
  • Location: Kolkata
  • Company: @Vodafone

I am a Machine Learning Researcher and I love to work on Optimization, Explainable AI etc.

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Guha
  given-names: Rehan
  orcid: https://orcid.org/0000-0001-5471-586X
title: pdistmap
version: v0.3.0
date-released: 2024-12-02
doi: 10.5281/zenodo.14257978
url: "https://github.com/rehanguha/pdistmap"

GitHub Events

Total
  • Release event: 4
  • Push event: 19
  • Create event: 3
Last Year
  • Release event: 4
  • Push event: 19
  • Create event: 3

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 28 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 8
  • Total maintainers: 1
pypi.org: pdistmap

This package helps to find the overlap percentage of two probability distributions.

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 28 Last month
Rankings
Dependent packages count: 10.3%
Average: 34.2%
Dependent repos count: 58.2%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/python-app.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v3 composite
.github/workflows/python-package.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v3 composite
poetry.lock pypi
  • colorama 0.4.6
  • contourpy 1.3.0
  • cycler 0.12.1
  • exceptiongroup 1.2.2
  • fonttools 4.53.1
  • iniconfig 2.0.0
  • kiwisolver 1.4.7
  • matplotlib 3.9.2
  • numpy 2.1.1
  • packaging 24.1
  • pillow 10.4.0
  • pluggy 1.5.0
  • pyparsing 3.1.4
  • pytest 8.3.3
  • python-dateutil 2.9.0.post0
  • scipy 1.14.1
  • six 1.16.0
  • tomli 2.0.1
pyproject.toml pypi
  • matplotlib ^3.9.2
  • numpy ^2.1.1
  • python ^3.10
  • scipy ^1.14.1
  • pytest ^8.3.3 test
requirements.txt pypi
  • contourpy ==1.3.0
  • cycler ==0.12.1
  • fonttools ==4.53.1
  • kiwisolver ==1.4.7
  • matplotlib ==3.9.2
  • numpy ==2.1.1
  • packaging ==24.1
  • pillow ==10.4.0
  • pyparsing ==3.1.4
  • python-dateutil ==2.9.0.post0
  • scipy ==1.14.1
  • six ==1.16.0