pdll

Pairwise Difference Learning (PDL) is a meta-learning framework that leverages pairwise differences to transform multiclass problems into binary tasks. This repository includes the original PDL Classifier implementation, along with extended versions for regression and weighted learning scenarios.

https://github.com/karim-53/pdll

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

data-science machine-learning python supervised-learning
Last synced: 6 months ago · JSON representation ·

Repository

Pairwise Difference Learning (PDL) is a meta-learning framework that leverages pairwise differences to transform multiclass problems into binary tasks. This repository includes the original PDL Classifier implementation, along with extended versions for regression and weighted learning scenarios.

Basic Info
  • Host: GitHub
  • Owner: Karim-53
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.22 MB
Statistics
  • Stars: 21
  • Watchers: 1
  • Forks: 3
  • Open Issues: 2
  • Releases: 1
Topics
data-science machine-learning python supervised-learning
Created almost 2 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Pairwise difference learning library (pdll)

Downloads

Pairwise Difference Learning (PDL) library is a python module. It contains a scikit-learn compatible implementation of PDL Classifier, as described in Belaid et al. 2024

PDL Classifier or PDC is a meta learner that can reduce multiclass classification problem into a binary classification problem (similar/different).

Installation

To install the package, run the following command: shell pip install -U pdll

[comment]: <> (todo conda link)

Usage

```python from pdll import PairwiseDifferenceClassifier

from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_blobs

Generate random data with 2 features, 10 points, and 3 classes

X, y = makeblobs(nsamples=10, nfeatures=2, centers=3, randomstate=0)

pdc = PairwiseDifferenceClassifier(estimator=RandomForestClassifier()) pdc.fit(X, y) print('score:', pdc.score(X, y))

ypred = pdc.predict(X) probapred = pdc.predict_proba(X) `` Please consultexamples/` directory for more examples.

How does it work?

The PDL algorithm works by transforming the multiclass classification problem into a binary classification problem. The algorithm works as follows:

Example 1: Graphical abstract

Example 2: PDC trained on the Iris dataset

Clic to show We provide a minimalist classification example using the Iris dataset. The dataset is balanced, so the prior probabilities of each of the 3 classes are equal: p(Setosa) = p(Versicolour) = p(Virginica) = 1/3 **Three Anchor Points** - Flower 1: `y1 = Setosa` - Flower 2: `y2 = Versicolour` - Flower 3: `y3 = Virginica` **One Query Point** - Flower Q: `yq` (unknown target) **Pairwise Predictions** The model predicts the likelihood that both points have a similar class: - g_sym(Flower Q, Flower 1) = 0.6 - g_sym(Flower Q, Flower 2) = 0.3 - g_sym(Flower Q, Flower 3) = 0.0 Given the above data, the first step is to update the priors. **Posterior using Flower 1:** - p_post,1(Setosa) = 0.6 - p_post,1(Versicolour) = (1/3 * (1 - 0.6)) / (1 - 1/3) = 0.2 - p_post,1(Virginica) = (1/3 * (1 - 0.6)) / (1 - 1/3) = 0.2 Similarly, we calculate for anchors 2 and 3: - p_post,2(Setosa) = 0.35 - p_post,2(Versicolour) = 0.30 - p_post,2(Virginica) = 0.35 - p_post,3(Setosa) = 0.5 - p_post,3(Versicolour) = 0.5 - p_post,3(Virginica) = 0.0 **Averaging over the three predictions:** Finally, the predicted class is the most likely prediction: ŷ_q = arg max_{y ∈ Y} p_post(y) = Setosa

Evaluation

To reproduce the experiment of the paper, please run run_benchmark.py with a base learner and a dataset number, between 0 and 99. Example:

python run_benchmark.py --model DecisionTreeClassifier --data 0

Scores will be stored in ./results/tmp/ directory.

Experiment

We use 99 datasets from the OpenML repository. We compare the performance of the PDC algorithm with 7 base learners. We use the macro F1 score as a metric. The search space is inspired from TPOT a state-of-the-art library in optimizing Sklearn pipelines

Description of the search space per estimator | Estimator | # parameters | # combinations | |------------------------|--------------|----------------| | DecisionTree | 4 | 350 | | RandomForest | 7 | 1000 | | ExtraTree | 6 | 648 | | HistGradientBoosting | 6 | 486 | | Bagging | 6 | 96 | | ExtraTrees | 7 | 1000 | | GradientBoosting | 5 | 900 |
Search space per estimator | Estimator | Parameter | Values | |----------------------------|------------------------|--------------------------------------------------------| | **DecisionTreeClassifier** | criterion | gini, entropy | | | max depth | None, 1, 2, 4, 6, 8, 11 | | | min samples split | 2, 4, 8, 16, 21 | | | min samples leaf | 1, 2, 4, 10, 21 | | **RandomForestClassifier** | criterion | gini, entropy | | | min samples split | 2, 4, 8, 16, 21 | | | max features | sqrt, 0.05, 0.17, 0.29, 0.41, 0.52, 0.64, 0.76, 0.88, 1.0 | | | min samples leaf | 1, 2, 4, 10, 21 | | | bootstrap | True, False | | **ExtraTreeClassifier** | criterion | gini, entropy | | | min samples split | 2, 5, 10 | | | min samples leaf | 1, 2, 4 | | | max features | sqrt, log2, None | | | max leaf nodes | None, 2, 12, 56 | | | min impurity decrease | 0.0, 0.1, 0.5 | | **HistGradientBoostingClassifier** | max iter | 100, 10 | | | learning rate | 0.1, 0.01, 1 | | | max leaf nodes | 31, 3, 256 | | | min samples leaf | 20, 4, 64 | | | l2 regularization | 0, 0.01, 0.1 | | | max bins | 255, 2, 64 | | **BaggingClassifier** | n estimators | 10, 5, 100, 256 | | | max samples | 1.0, 0.5 | | | max features | 0.5, 0.9, 1.0 | | | bootstrap | True, False | | | bootstrap features | False, True | | **ExtraTreesClassifier** | criterion | gini, entropy | | | max features | sqrt, 0.05, 0.17, 0.29, 0.41, 0.52, 0.64, 0.76, 0.88, 1.0 | | | min samples split | 2, 4, 8, 16, 21 | | | min samples leaf | 1, 2, 4, 10, 21 | | | bootstrap | False, True | | **GradientBoostingClassifier** | learning rate | 0.1, 0.01, 1 | | | min samples split | 2, 4, 8, 16, 21 | | | min samples leaf | 1, 2, 4, 10, 21 | | | subsample | 1.0, 0.05, 0.37, 0.68 | | | max features | None, 0.15, 0.68 |
OpenML benchmark datasets | data_id | NumberOfClasses | NumberOfInstances | NumberOfFeatures | NumberOfSymbolicFeatures | NumberOfFeatures_post_processing | MajorityClassSize | MinorityClassSize | |----------:|------------------:|--------------------:|-------------------:|---------------------------:|-----------------------------------:|--------------------:|--------------------:| | 43 | 2 | 306 | 4 | 2 | 3 | 225 | 81 | | 48 | 3 | 151 | 6 | 3 | 5 | 52 | 49 | | 59 | 2 | 351 | 35 | 1 | 34 | 225 | 126 | | 61 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 164 | 2 | 106 | 58 | 58 | 57 | 53 | 53 | | 333 | 2 | 556 | 7 | 7 | 6 | 278 | 278 | | 377 | 6 | 600 | 61 | 1 | 60 | 100 | 100 | | 444 | 2 | 132 | 4 | 4 | 3 | 71 | 61 | | 464 | 2 | 250 | 3 | 1 | 2 | 125 | 125 | | 475 | 4 | 400 | 6 | 5 | 5 | 100 | 100 | | 714 | 2 | 125 | 5 | 3 | 4 | 76 | 49 | | 717 | 2 | 508 | 11 | 1 | 10 | 286 | 222 | | 721 | 2 | 200 | 11 | 1 | 10 | 103 | 97 | | 733 | 2 | 209 | 7 | 1 | 6 | 153 | 56 | | 736 | 2 | 111 | 4 | 1 | 3 | 58 | 53 | | 744 | 2 | 250 | 6 | 1 | 5 | 141 | 109 | | 750 | 2 | 500 | 8 | 1 | 7 | 254 | 246 | | 756 | 2 | 159 | 16 | 1 | 15 | 105 | 54 | | 766 | 2 | 500 | 51 | 1 | 50 | 262 | 238 | | 767 | 2 | 475 | 4 | 3 | 3 | 414 | 61 | | 768 | 2 | 100 | 26 | 1 | 25 | 55 | 45 | | 773 | 2 | 250 | 26 | 1 | 25 | 126 | 124 | | 779 | 2 | 500 | 26 | 1 | 25 | 267 | 233 | | 782 | 2 | 120 | 3 | 1 | 2 | 63 | 57 | | 784 | 2 | 140 | 4 | 2 | 3 | 70 | 70 | | 788 | 2 | 186 | 61 | 1 | 60 | 109 | 77 | | 792 | 2 | 500 | 6 | 1 | 5 | 298 | 202 | | 793 | 2 | 250 | 11 | 1 | 10 | 135 | 115 | | 811 | 2 | 264 | 3 | 2 | 2 | 163 | 101 | | 812 | 2 | 100 | 26 | 1 | 25 | 53 | 47 | | 814 | 2 | 468 | 3 | 1 | 2 | 256 | 212 | | 824 | 2 | 500 | 11 | 1 | 10 | 274 | 226 | | 850 | 2 | 100 | 51 | 1 | 50 | 51 | 49 | | 853 | 2 | 506 | 14 | 2 | 13 | 297 | 209 | | 860 | 2 | 380 | 3 | 1 | 2 | 195 | 185 | | 863 | 2 | 250 | 11 | 1 | 10 | 133 | 117 | | 870 | 2 | 500 | 6 | 1 | 5 | 267 | 233 | | 873 | 2 | 250 | 51 | 1 | 50 | 142 | 108 | | 877 | 2 | 250 | 51 | 1 | 50 | 137 | 113 | | 879 | 2 | 500 | 26 | 1 | 25 | 304 | 196 | | 880 | 2 | 284 | 11 | 1 | 10 | 142 | 142 | | 889 | 2 | 100 | 26 | 1 | 25 | 50 | 50 | | 895 | 2 | 222 | 3 | 1 | 2 | 134 | 88 | | 896 | 2 | 500 | 26 | 1 | 25 | 280 | 220 | | 902 | 2 | 147 | 7 | 5 | 6 | 78 | 69 | | 906 | 2 | 400 | 8 | 1 | 7 | 207 | 193 | | 909 | 2 | 400 | 8 | 1 | 7 | 203 | 197 | | 911 | 2 | 250 | 6 | 1 | 5 | 140 | 110 | | 915 | 2 | 315 | 14 | 4 | 13 | 182 | 133 | | 918 | 2 | 250 | 51 | 1 | 50 | 135 | 115 | | 925 | 2 | 323 | 5 | 1 | 4 | 175 | 148 | | 932 | 2 | 100 | 51 | 1 | 50 | 56 | 44 | | 933 | 2 | 250 | 26 | 1 | 25 | 136 | 114 | | 935 | 2 | 250 | 11 | 1 | 10 | 140 | 110 | | 936 | 2 | 500 | 11 | 1 | 10 | 272 | 228 | | 937 | 2 | 500 | 51 | 1 | 50 | 282 | 218 | | 969 | 2 | 150 | 5 | 1 | 4 | 100 | 50 | | 973 | 2 | 178 | 14 | 1 | 13 | 107 | 71 | | 974 | 2 | 132 | 5 | 1 | 4 | 81 | 51 | | 1005 | 2 | 214 | 10 | 1 | 9 | 138 | 76 | | 1011 | 2 | 336 | 8 | 1 | 7 | 193 | 143 | | 1012 | 2 | 194 | 29 | 27 | 28 | 125 | 69 | | 1054 | 2 | 161 | 40 | 1 | 39 | 109 | 52 | | 1063 | 2 | 522 | 22 | 1 | 21 | 415 | 107 | | 1065 | 2 | 458 | 40 | 1 | 39 | 415 | 43 | | 1073 | 2 | 274 | 9 | 1 | 8 | 140 | 134 | | 1100 | 3 | 478 | 11 | 5 | 10 | 247 | 90 | | 1115 | 3 | 151 | 7 | 5 | 6 | 52 | 49 | | 1413 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 1467 | 2 | 540 | 21 | 1 | 20 | 494 | 46 | | 1480 | 2 | 583 | 11 | 2 | 10 | 416 | 167 | | 1488 | 2 | 195 | 23 | 1 | 22 | 147 | 48 | | 1490 | 2 | 182 | 13 | 1 | 12 | 130 | 52 | | 1499 | 3 | 210 | 8 | 1 | 7 | 70 | 70 | | 1510 | 2 | 569 | 31 | 1 | 30 | 357 | 212 | | 1511 | 2 | 440 | 9 | 2 | 8 | 298 | 142 | | 1523 | 3 | 310 | 7 | 1 | 6 | 150 | 60 | | 1554 | 5 | 500 | 13 | 5 | 12 | 192 | 43 | | 1556 | 2 | 120 | 7 | 6 | 6 | 61 | 59 | | 1600 | 2 | 267 | 45 | 1 | 44 | 212 | 55 | | 4329 | 2 | 470 | 17 | 14 | 16 | 400 | 70 | | 40663 | 5 | 399 | 33 | 21 | 32 | 96 | 44 | | 40681 | 2 | 128 | 7 | 7 | 6 | 64 | 64 | | 41568 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 41977 | 2 | 156 | 91 | 1 | 90 | 98 | 58 | | 41978 | 2 | 156 | 81 | 1 | 80 | 94 | 62 | | 42011 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 42021 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 42026 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 42051 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 42066 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 42071 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 42186 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 42700 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 43859 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 44149 | 2 | 296 | 14 | 1 | 18 | 159 | 137 | | 44151 | 3 | 149 | 5 | 0 | 4 | 50 | 49 | | 44344 | 3 | 150 | 5 | 1 | 4 | 50 | 50 | | 45711 | 2 | 530 | 14 | 3 | 13 | 354 | 176 |

Score comparison

2D datasets Examples

2d datasets Here we see the difference in the learned patterns between PDL and the base learner. In case PDL is compatible with the base learner (DecisionTree, RandomForest) then the scores improves. In case the base learner is not compatible with PDL (SVC, AdaBoost, ...) then the scores gets lower.

Reference

Please cite us if you use this library in your research: @article{belaid2024pairwise, title={Pairwise Difference Learning for Classification}, author={Belaid, Mohamed Karim and Rabus, Maximilian and H{\"u}llermeier, Eyke}, journal={Discovery Science}, year={2024} } The first commit correspond to the original implementation of the PDC algorithm

Acknowledgments: We would like to thank Tim Wibiral, Dorra ElMekki, Viktor Bengs, Muhammad Zeeshan Anwer, Muhammad Hossein Shaker, Alireza Javanmardi, Patrick Kolpaczki, and Maximilian Muschalik for their early comments on this work. We also acknowledge LRZ and IDIADA for computational resources. LRZ IDIADA

Owner

  • Login: Karim-53
  • Kind: user
  • Location: Germany
  • Company: Porsche AG

Hi, I am Karim. I am an external PhD student at LMU (Munich) working in collaboration with Porsche on Explainable AI.

Citation (CITATION.bib)

@article{belaid2024pairwise,
  author       = {Mohamed Karim Belaid and
                  Maximilian Rabus and
                  Eyke H{\"{u}}llermeier
                  },
  title        = {Pairwise Difference Learning for Classification},
  journal      = {Discovery Science},
  year         = {2024},
  url          = {https://arxiv.org/pdf/2406.20031}
}

GitHub Events

Total
  • Watch event: 5
  • Issue comment event: 3
  • Push event: 1
  • Pull request review comment event: 18
  • Pull request review event: 6
  • Pull request event: 2
  • Fork event: 2
Last Year
  • Watch event: 5
  • Issue comment event: 3
  • Push event: 1
  • Pull request review comment event: 18
  • Pull request review event: 6
  • Pull request event: 2
  • Fork event: 2

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 48 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
pypi.org: pdll

Pairwise difference learning library is a scikit learn compatible library for learning from pairwise differences.

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 48 Last month
Rankings
Dependent packages count: 10.8%
Average: 35.7%
Dependent repos count: 60.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/test-pypi.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v3 composite
.github/workflows/test-repo.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v3 composite
pyproject.toml pypi
requirements.txt pypi
  • numpy >=1.24.3
  • pandas >=2.2.0
  • scikit-learn >=1.3.2
  • setuptools >=69.0.2
setup.py pypi
  • numpy *
  • pandas *
  • scikit-learn *
  • setuptools *