pdistmap
PDistMap helps to find the overlap percentage of two probability distributions.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Keywords
Repository
PDistMap helps to find the overlap percentage of two probability distributions.
Basic Info
- Host: GitHub
- Owner: rehanguha
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://pypi.org/project/pdistmap/
- Size: 272 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 3
Topics
Metadata Files
README.md
PdistMap: A Kernel Density Overlap Metric for Distributional Similarity and Cluster Matching
SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5358971
This package calculates the overlap percentage between two probability distributions, offering extensive applications in both academic and industrial settings. For instance, in multiple iterations of machine learning clustering, the core algorithm may change the cluster number or name, making it challenging for the end user to map the clusters accurately.
Example Use Cases:
Machine Learning Clustering: In scenarios where multiple iterations of clustering algorithms are performed, the cluster identifiers may change, making it difficult to track and compare clusters across iterations. This package helps in mapping and comparing clusters by calculating the overlap percentage between the distributions of cluster assignments. For example, if a data scientist is running a k-means clustering algorithm multiple times, the cluster labels might change in each iteration. By using this package, they can measure the overlap between the clusters from different iterations and ensure consistency in their analysis.
Anomaly Detection: The package can be used to compare the distribution of data points in normal and anomalous conditions, helping in identifying and quantifying the extent of anomalies. For instance, in a network security application, the distribution of network traffic under normal conditions can be compared with the distribution during a suspected attack. The overlap percentage can help quantify the deviation and identify potential security breaches.
Quality Control: In manufacturing and quality control processes, the package can be used to compare the distribution of measurements from different batches or production runs, ensuring consistency and identifying deviations. For example, a quality control engineer can compare the distribution of product dimensions from two different production runs to ensure that they meet the required specifications and identify any deviations that need to be addressed.
Market Research: The package can be applied to compare the distribution of survey responses or customer preferences across different demographic groups or time periods, providing insights into market trends and changes in consumer behavior. For instance, a market researcher can compare the distribution of customer satisfaction scores from two different regions to identify any significant differences and tailor marketing strategies accordingly.
Healthcare Analytics: In healthcare, the package can be used to compare the distribution of patient outcomes or treatment responses across different groups, aiding in the evaluation of treatment effectiveness and identifying potential disparities. For example, a healthcare analyst can compare the distribution of recovery times for patients receiving two different treatments to determine which treatment is more effective and identify any disparities in treatment outcomes.
Installation
bash
pip install pdistmap
How to use it
Method 1
```python
from pdistmap.set import KDEIntersection import numpy as np
A = np.array([25, 40, 70, 65, 69, 75, 80, 85]) B = np.array([25, 40, 70, 65, 69, 75, 80, 85, 81, 90])
area = KDEIntersection(A,B).intersection_area() print(area) # Expected output: 0.8752770150023454
KDEIntersection(A,B).intersection_area(plot = True)
```

Owner
- Name: Rehan Guha
- Login: rehanguha
- Kind: user
- Location: Kolkata
- Company: @Vodafone
- Website: https://rehanguha.github.io
- Twitter: rehan_guha
- Repositories: 6
- Profile: https://github.com/rehanguha
I am a Machine Learning Researcher and I love to work on Optimization, Explainable AI etc.
Citation (CITATION.cff)
cff-version: 1.1.0 message: "If you use this software, please cite it as below." authors: - family-names: Guha given-names: Rehan orcid: https://orcid.org/0000-0001-5471-586X title: pdistmap version: v0.3.0 date-released: 2024-12-02 doi: 10.5281/zenodo.14257978 url: "https://github.com/rehanguha/pdistmap"
GitHub Events
Total
- Release event: 4
- Push event: 19
- Create event: 3
Last Year
- Release event: 4
- Push event: 19
- Create event: 3
Packages
- Total packages: 1
-
Total downloads:
- pypi 28 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 8
- Total maintainers: 1
pypi.org: pdistmap
This package helps to find the overlap percentage of two probability distributions.
- Homepage: https://github.com/rehanguha/pdistmap
- Documentation: https://pdistmap.readthedocs.io/
- License: Apache-2.0
-
Latest release: 0.5.0
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- actions/setup-python v3 composite
- actions/checkout v4 composite
- actions/setup-python v3 composite
- colorama 0.4.6
- contourpy 1.3.0
- cycler 0.12.1
- exceptiongroup 1.2.2
- fonttools 4.53.1
- iniconfig 2.0.0
- kiwisolver 1.4.7
- matplotlib 3.9.2
- numpy 2.1.1
- packaging 24.1
- pillow 10.4.0
- pluggy 1.5.0
- pyparsing 3.1.4
- pytest 8.3.3
- python-dateutil 2.9.0.post0
- scipy 1.14.1
- six 1.16.0
- tomli 2.0.1
- matplotlib ^3.9.2
- numpy ^2.1.1
- python ^3.10
- scipy ^1.14.1
- pytest ^8.3.3 test
- contourpy ==1.3.0
- cycler ==0.12.1
- fonttools ==4.53.1
- kiwisolver ==1.4.7
- matplotlib ==3.9.2
- numpy ==2.1.1
- packaging ==24.1
- pillow ==10.4.0
- pyparsing ==3.1.4
- python-dateutil ==2.9.0.post0
- scipy ==1.14.1
- six ==1.16.0