pdistmap

PDistMap helps to find the overlap percentage of two probability distributions.

https://github.com/rehanguha/pdistmap

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary

Keywords

machine-learning probability statistics

Last synced: 10 months ago · JSON representation ·

Repository

PDistMap helps to find the overlap percentage of two probability distributions.

Basic Info

Host: GitHub
Owner: rehanguha
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://pypi.org/project/pdistmap/
Size: 272 KB

Statistics

Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 3

Topics

machine-learning probability statistics

Created almost 2 years ago · Last pushed 11 months ago

Metadata Files

Readme Funding License Code of conduct Citation

PdistMap: A Kernel Density Overlap Metric for Distributional Similarity and Cluster Matching

SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5358971

This package calculates the overlap percentage between two probability distributions, offering extensive applications in both academic and industrial settings. For instance, in multiple iterations of machine learning clustering, the core algorithm may change the cluster number or name, making it challenging for the end user to map the clusters accurately.

Example Use Cases:

Machine Learning Clustering: In scenarios where multiple iterations of clustering algorithms are performed, the cluster identifiers may change, making it difficult to track and compare clusters across iterations. This package helps in mapping and comparing clusters by calculating the overlap percentage between the distributions of cluster assignments. For example, if a data scientist is running a k-means clustering algorithm multiple times, the cluster labels might change in each iteration. By using this package, they can measure the overlap between the clusters from different iterations and ensure consistency in their analysis.
Anomaly Detection: The package can be used to compare the distribution of data points in normal and anomalous conditions, helping in identifying and quantifying the extent of anomalies. For instance, in a network security application, the distribution of network traffic under normal conditions can be compared with the distribution during a suspected attack. The overlap percentage can help quantify the deviation and identify potential security breaches.
Quality Control: In manufacturing and quality control processes, the package can be used to compare the distribution of measurements from different batches or production runs, ensuring consistency and identifying deviations. For example, a quality control engineer can compare the distribution of product dimensions from two different production runs to ensure that they meet the required specifications and identify any deviations that need to be addressed.
Market Research: The package can be applied to compare the distribution of survey responses or customer preferences across different demographic groups or time periods, providing insights into market trends and changes in consumer behavior. For instance, a market researcher can compare the distribution of customer satisfaction scores from two different regions to identify any significant differences and tailor marketing strategies accordingly.
Healthcare Analytics: In healthcare, the package can be used to compare the distribution of patient outcomes or treatment responses across different groups, aiding in the evaluation of treatment effectiveness and identifying potential disparities. For example, a healthcare analyst can compare the distribution of recovery times for patients receiving two different treatments to determine which treatment is more effective and identify any disparities in treatment outcomes.

Installation

bash pip install pdistmap

How to use it

Method 1

```python

from pdistmap.set import KDEIntersection import numpy as np

A = np.array([25, 40, 70, 65, 69, 75, 80, 85]) B = np.array([25, 40, 70, 65, 69, 75, 80, 85, 81, 90])

area = KDEIntersection(A,B).intersection_area() print(area) # Expected output: 0.8752770150023454

KDEIntersection(A,B).intersection_area(plot = True)

```

Sample Image

Owner

Name: Rehan Guha
Login: rehanguha
Kind: user
Location: Kolkata
Company: @Vodafone

Website: https://rehanguha.github.io
Twitter: rehan_guha
Repositories: 6
Profile: https://github.com/rehanguha

I am a Machine Learning Researcher and I love to work on Optimization, Explainable AI etc.

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Guha
  given-names: Rehan
  orcid: https://orcid.org/0000-0001-5471-586X
title: pdistmap
version: v0.3.0
date-released: 2024-12-02
doi: 10.5281/zenodo.14257978
url: "https://github.com/rehanguha/pdistmap"

GitHub Events

Total

Release event: 4
Push event: 19
Create event: 3

Last Year

Release event: 4
Push event: 19
Create event: 3

Packages

Total packages: 1
Total downloads:
- pypi 28 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 8
Total maintainers: 1

pypi.org: pdistmap

This package helps to find the overlap percentage of two probability distributions.

Homepage: https://github.com/rehanguha/pdistmap
Documentation: https://pdistmap.readthedocs.io/
License: Apache-2.0
Latest release: 0.5.0
published over 1 year ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 28 Last month

Rankings

Dependent packages count: 10.3%

Average: 34.2%

Dependent repos count: 58.2%

Maintainers (1)

rehanguha

Last synced: 10 months ago

Dependencies

.github/workflows/python-app.yml actions

actions/checkout v4 composite
actions/setup-python v3 composite

.github/workflows/python-package.yml actions

actions/checkout v4 composite
actions/setup-python v3 composite

poetry.lock pypi

colorama 0.4.6
contourpy 1.3.0
cycler 0.12.1
exceptiongroup 1.2.2
fonttools 4.53.1
iniconfig 2.0.0
kiwisolver 1.4.7
matplotlib 3.9.2
numpy 2.1.1
packaging 24.1
pillow 10.4.0
pluggy 1.5.0
pyparsing 3.1.4
pytest 8.3.3
python-dateutil 2.9.0.post0
scipy 1.14.1
six 1.16.0
tomli 2.0.1

pyproject.toml pypi

matplotlib ^3.9.2
numpy ^2.1.1
python ^3.10
scipy ^1.14.1
pytest ^8.3.3 test

requirements.txt pypi

contourpy ==1.3.0
cycler ==0.12.1
fonttools ==4.53.1
kiwisolver ==1.4.7
matplotlib ==3.9.2
numpy ==2.1.1
packaging ==24.1
pillow ==10.4.0
pyparsing ==3.1.4
python-dateutil ==2.9.0.post0
scipy ==1.14.1
six ==1.16.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science