Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: pereirabarataap
  • License: gpl-3.0
  • Language: HTML
  • Default Branch: main
  • Size: 7.92 MB
Statistics
  • Stars: 8
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 5 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Fair tree classifier using strong demographic parity

Implementation of the algorithm proposed in:

Pereira Barata, A. et al. Fair tree classifier using strong demographic parity. Machine Learning (2023). [>>]

This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.

When incorporating FairDecisionTreeClassifier or FairRandomForestClassifier objects into scikit-learn pipelines, use the fit_params={"z": z} parameter to pass the sensitive attribute(s) z

Installation

A)
pip install fair-trees

or

B)
git clone https://github.com/pereirabarataap/fairtreeclassifier
pip install -r requirements.txt

Usage

```python from fairtrees import FairRandomForestClassifier as FRFC, loaddatasets, sdp_score

datasets = load_datasets() X = datasets["adult"]["X"] y = datasets["adult"]["y"] z = datasets["adult"]["z"]["gender"]

clf = FRFC(theta=0.5).fit(X,y,z) yprob = clf.predictproba(X)[:,1] print(sdpscore(z, yprob)) ```

Example

```python import numpy as np import pandas as pd import seaborn as sb from tqdm.notebook import tqdm from matplotlib import pyplot as plt from sklearn.metrics import rocaucscore from sklearn.modelselection import StratifiedKFold as SKF from fairtrees import FairRandomForestClassifier as FRFC, sdpscore, loaddatasets

datasets = load_datasets()

results_data = [] for dataset in tqdm(datasets): X = datasets[dataset]["X"] y = datasets[dataset]["y"] z = datasets[dataset]["z"]

fold = 0
skf = SKF(n_splits=5, random_state=42, shuffle=True)
# ensuring stratified kfold w.r.t. y and z
splitter_y = pd.concat([y, z], axis=1).astype(str).apply(
    lambda row:
        row[y.name] + "".join([row[col] for col in z.columns]),
    axis=1
).values
desc_i = f"dataset={dataset} | processing folds"
for train_idx, test_idx in tqdm(skf.split(X,splitter_y), desc=desc_i, leave=False):

    X_train, X_test = X.loc[train_idx], X.loc[test_idx]
    y_train, y_test = y.loc[train_idx], y.loc[test_idx]
    z_train, z_test = z.loc[train_idx], z.loc[test_idx]

    desc_j = f"fold={fold} | fitting thetas"
    for theta in tqdm(np.linspace(0,1,11).round(1), desc=desc_j, leave=False):
        clf = FRFC(
            n_jobs=-1,
            n_bins=256,
            theta=theta,
            max_depth=None,
            bootstrap=True,
            random_state=42,
            n_estimators=500,
            min_samples_leaf=1,
            min_samples_split=2,
            max_features="sqrt",
            requires_data_processing=True
        ).fit(X_train, y_train, z_train)
        y_prob = clf.predict_proba(X_test)[:,1]

        auc = roc_auc_score(y_test, y_prob)

        sdp_min = np.inf
        for sens_att in z.columns:
            if len(np.unique(z_test[sens_att]))==2:
                sens_val = np.unique(z_test[sens_att])[0]
                z_true = z_test[sens_att]==sens_val
                sdp = sdp_score(z_true, y_prob)
                if sdp < sdp_min:
                    sdp_min = sdp
            else:
                for sens_val in np.unique(z_test[sens_att]):
                    z_true = z_test[sens_att]==sens_val
                    sdp = sdp_score(z_true, y_prob)
                    if sdp < sdp_min:
                        sdp_min = sdp

        data_row = [dataset, fold, theta, auc, sdp_min]
        results_data.append(data_row)

    fold += 1

resultsdf = pd.DataFrame( data=resultsdata, columns=["dataset", "fold", "theta", "performance", "fairness"] )

fig, ax = plt.subplots(1,1,dpi=100, figsize=(8,4)) sb.lineplot( data=results_df.groupby(by=["dataset", "theta"]).mean(), x="fairness", y="performance", hue="dataset", ax=ax ) plt.show() ``` output

3D Figures

https://htmlpreview.github.io/?https://github.com/pereirabarataap/fairtreeclassifier/main/3d/index.html

Owner

  • Name: Fideous
  • Login: pereirabarataap
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the paper in which it was introduced."
authors:
- family-names: Pereira Barata
  given-names: Antonio
  orcid: "https://orcid.org/0000-0002-0540-7681"
title: "fair_tree_classifier"
version: 0.1
date-released: 2023
url: "https://github.com/pereirabarataap/fair_tree_classifier"
preferred-citation:
  type: journal-paper
  authors:
  - family-names: Pereira Barata
    given-names: Antonio
    orcid: "https://orcid.org/0000-0002-0540-7681"
  - family-names: Takes
    given-names: Frank W.
    orcid: "https://orcid.org/0000-0001-5468-1030"
  - family-names: Herik
    given-names: H. Jaap van den
    orcid: "https://orcid.org/0000-0001-9751-761X"
  - family-names: Veenman
    given-names: Cor
    orcid: "https://orcid.org/0000-0002-2645-1198"
  doi: 10.1007/s10994-023-06376-z
  journal: "Machine Learning"
  title: "Fair tree classifier using strong demographic parity"
  year: 2023

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
Last Year
  • Watch event: 1
  • Push event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 101 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 28
  • Total maintainers: 1
pypi.org: fair-trees

This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.

  • Versions: 28
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 101 Last month
Rankings
Dependent packages count: 9.6%
Average: 36.4%
Dependent repos count: 63.2%
Maintainers (1)
Last synced: 6 months ago

Dependencies

fair_trees.egg-info/requires.txt pypi
  • joblib *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
requirements.txt pypi
  • joblib ==1.2.0
  • matplotlib ==3.8.2
  • numpy ==1.23.4
  • pandas ==2.2.0
  • scikit-learn ==1.3.0
  • scipy ==1.11.3
  • seaborn ==0.13.2
  • tqdm ==4.65.0
setup.py pypi
  • joblib *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *