fair-trees

https://github.com/pereirabarataap/fair_tree_classifier

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: springer.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: pereirabarataap
License: gpl-3.0
Language: HTML
Default Branch: main
Size: 7.92 MB

Statistics

Stars: 8
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 5 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Fair tree classifier using strong demographic parity

Implementation of the algorithm proposed in:

Pereira Barata, A. et al. Fair tree classifier using strong demographic parity. Machine Learning (2023). [>>]

This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.

When incorporating FairDecisionTreeClassifier or FairRandomForestClassifier objects into scikit-learn pipelines, use the fit_params={"z": z} parameter to pass the sensitive attribute(s) z

Installation

A)
pip install fair-trees

B)
git clone https://github.com/pereirabarataap/fairtreeclassifier
pip install -r requirements.txt

Usage

```python from fairtrees import FairRandomForestClassifier as FRFC, loaddatasets, sdp_score

datasets = load_datasets() X = datasets["adult"]["X"] y = datasets["adult"]["y"] z = datasets["adult"]["z"]["gender"]

clf = FRFC(theta=0.5).fit(X,y,z) yprob = clf.predictproba(X)[:,1] print(sdpscore(z, yprob)) ```

Example

```python import numpy as np import pandas as pd import seaborn as sb from tqdm.notebook import tqdm from matplotlib import pyplot as plt from sklearn.metrics import rocaucscore from sklearn.modelselection import StratifiedKFold as SKF from fairtrees import FairRandomForestClassifier as FRFC, sdpscore, loaddatasets

datasets = load_datasets()

results_data = [] for dataset in tqdm(datasets): X = datasets[dataset]["X"] y = datasets[dataset]["y"] z = datasets[dataset]["z"]

fold = 0
skf = SKF(n_splits=5, random_state=42, shuffle=True)
# ensuring stratified kfold w.r.t. y and z
splitter_y = pd.concat([y, z], axis=1).astype(str).apply(
    lambda row:
        row[y.name] + "".join([row[col] for col in z.columns]),
    axis=1
).values
desc_i = f"dataset={dataset} | processing folds"
for train_idx, test_idx in tqdm(skf.split(X,splitter_y), desc=desc_i, leave=False):

    X_train, X_test = X.loc[train_idx], X.loc[test_idx]
    y_train, y_test = y.loc[train_idx], y.loc[test_idx]
    z_train, z_test = z.loc[train_idx], z.loc[test_idx]

    desc_j = f"fold={fold} | fitting thetas"
    for theta in tqdm(np.linspace(0,1,11).round(1), desc=desc_j, leave=False):
        clf = FRFC(
            n_jobs=-1,
            n_bins=256,
            theta=theta,
            max_depth=None,
            bootstrap=True,
            random_state=42,
            n_estimators=500,
            min_samples_leaf=1,
            min_samples_split=2,
            max_features="sqrt",
            requires_data_processing=True
        ).fit(X_train, y_train, z_train)
        y_prob = clf.predict_proba(X_test)[:,1]

        auc = roc_auc_score(y_test, y_prob)

        sdp_min = np.inf
        for sens_att in z.columns:
            if len(np.unique(z_test[sens_att]))==2:
                sens_val = np.unique(z_test[sens_att])[0]
                z_true = z_test[sens_att]==sens_val
                sdp = sdp_score(z_true, y_prob)
                if sdp < sdp_min:
                    sdp_min = sdp
            else:
                for sens_val in np.unique(z_test[sens_att]):
                    z_true = z_test[sens_att]==sens_val
                    sdp = sdp_score(z_true, y_prob)
                    if sdp < sdp_min:
                        sdp_min = sdp

        data_row = [dataset, fold, theta, auc, sdp_min]
        results_data.append(data_row)

    fold += 1

resultsdf = pd.DataFrame( data=resultsdata, columns=["dataset", "fold", "theta", "performance", "fairness"] )

fig, ax = plt.subplots(1,1,dpi=100, figsize=(8,4)) sb.lineplot( data=results_df.groupby(by=["dataset", "theta"]).mean(), x="fairness", y="performance", hue="dataset", ax=ax ) plt.show() ``` output

3D Figures

https://htmlpreview.github.io/?https://github.com/pereirabarataap/fairtreeclassifier/main/3d/index.html

Owner

Name: Fideous
Login: pereirabarataap
Kind: user

Repositories: 3
Profile: https://github.com/pereirabarataap

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the paper in which it was introduced."
authors:
- family-names: Pereira Barata
  given-names: Antonio
  orcid: "https://orcid.org/0000-0002-0540-7681"
title: "fair_tree_classifier"
version: 0.1
date-released: 2023
url: "https://github.com/pereirabarataap/fair_tree_classifier"
preferred-citation:
  type: journal-paper
  authors:
  - family-names: Pereira Barata
    given-names: Antonio
    orcid: "https://orcid.org/0000-0002-0540-7681"
  - family-names: Takes
    given-names: Frank W.
    orcid: "https://orcid.org/0000-0001-5468-1030"
  - family-names: Herik
    given-names: H. Jaap van den
    orcid: "https://orcid.org/0000-0001-9751-761X"
  - family-names: Veenman
    given-names: Cor
    orcid: "https://orcid.org/0000-0002-2645-1198"
  doi: 10.1007/s10994-023-06376-z
  journal: "Machine Learning"
  title: "Fair tree classifier using strong demographic parity"
  year: 2023

GitHub Events

Total

Watch event: 1
Push event: 1

Last Year

Watch event: 1
Push event: 1

Packages

Total packages: 1
Total downloads:
- pypi 101 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 28
Total maintainers: 1

pypi.org: fair-trees

This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.

Homepage: https://github.com/pereirabarataap/fair_tree_classifier
Documentation: https://fair-trees.readthedocs.io/
License: MIT
Latest release: 2.6.6
published about 1 year ago

Versions: 28
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 101 Last month

Rankings

Dependent packages count: 9.6%

Average: 36.4%

Dependent repos count: 63.2%

Maintainers (1)

Fideous

Last synced: 10 months ago

Dependencies

fair_trees.egg-info/requires.txt pypi

joblib *
numpy *
pandas *
scikit-learn *
scipy *

requirements.txt pypi

joblib ==1.2.0
matplotlib ==3.8.2
numpy ==1.23.4
pandas ==2.2.0
scikit-learn ==1.3.0
scipy ==1.11.3
seaborn ==0.13.2
tqdm ==4.65.0

setup.py pypi

joblib *
numpy *
pandas *
scikit-learn *
scipy *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science