Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: springer.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: pereirabarataap
- License: gpl-3.0
- Language: HTML
- Default Branch: main
- Size: 7.92 MB
Statistics
- Stars: 8
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Fair tree classifier using strong demographic parity
Implementation of the algorithm proposed in:
Pereira Barata, A. et al. Fair tree classifier using strong demographic parity. Machine Learning (2023). [>>]
This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.
When incorporating FairDecisionTreeClassifier or FairRandomForestClassifier objects into scikit-learn pipelines, use the fit_params={"z": z} parameter to pass the sensitive attribute(s) z
Installation
A)
pip install fair-trees
or
B)
git clone https://github.com/pereirabarataap/fairtreeclassifierpip install -r requirements.txt
Usage
```python from fairtrees import FairRandomForestClassifier as FRFC, loaddatasets, sdp_score
datasets = load_datasets() X = datasets["adult"]["X"] y = datasets["adult"]["y"] z = datasets["adult"]["z"]["gender"]
clf = FRFC(theta=0.5).fit(X,y,z) yprob = clf.predictproba(X)[:,1] print(sdpscore(z, yprob)) ```
Example
```python import numpy as np import pandas as pd import seaborn as sb from tqdm.notebook import tqdm from matplotlib import pyplot as plt from sklearn.metrics import rocaucscore from sklearn.modelselection import StratifiedKFold as SKF from fairtrees import FairRandomForestClassifier as FRFC, sdpscore, loaddatasets
datasets = load_datasets()
results_data = [] for dataset in tqdm(datasets): X = datasets[dataset]["X"] y = datasets[dataset]["y"] z = datasets[dataset]["z"]
fold = 0
skf = SKF(n_splits=5, random_state=42, shuffle=True)
# ensuring stratified kfold w.r.t. y and z
splitter_y = pd.concat([y, z], axis=1).astype(str).apply(
lambda row:
row[y.name] + "".join([row[col] for col in z.columns]),
axis=1
).values
desc_i = f"dataset={dataset} | processing folds"
for train_idx, test_idx in tqdm(skf.split(X,splitter_y), desc=desc_i, leave=False):
X_train, X_test = X.loc[train_idx], X.loc[test_idx]
y_train, y_test = y.loc[train_idx], y.loc[test_idx]
z_train, z_test = z.loc[train_idx], z.loc[test_idx]
desc_j = f"fold={fold} | fitting thetas"
for theta in tqdm(np.linspace(0,1,11).round(1), desc=desc_j, leave=False):
clf = FRFC(
n_jobs=-1,
n_bins=256,
theta=theta,
max_depth=None,
bootstrap=True,
random_state=42,
n_estimators=500,
min_samples_leaf=1,
min_samples_split=2,
max_features="sqrt",
requires_data_processing=True
).fit(X_train, y_train, z_train)
y_prob = clf.predict_proba(X_test)[:,1]
auc = roc_auc_score(y_test, y_prob)
sdp_min = np.inf
for sens_att in z.columns:
if len(np.unique(z_test[sens_att]))==2:
sens_val = np.unique(z_test[sens_att])[0]
z_true = z_test[sens_att]==sens_val
sdp = sdp_score(z_true, y_prob)
if sdp < sdp_min:
sdp_min = sdp
else:
for sens_val in np.unique(z_test[sens_att]):
z_true = z_test[sens_att]==sens_val
sdp = sdp_score(z_true, y_prob)
if sdp < sdp_min:
sdp_min = sdp
data_row = [dataset, fold, theta, auc, sdp_min]
results_data.append(data_row)
fold += 1
resultsdf = pd.DataFrame( data=resultsdata, columns=["dataset", "fold", "theta", "performance", "fairness"] )
fig, ax = plt.subplots(1,1,dpi=100, figsize=(8,4))
sb.lineplot(
data=results_df.groupby(by=["dataset", "theta"]).mean(),
x="fairness",
y="performance",
hue="dataset",
ax=ax
)
plt.show()
```
3D Figures
https://htmlpreview.github.io/?https://github.com/pereirabarataap/fairtreeclassifier/main/3d/index.html
Owner
- Name: Fideous
- Login: pereirabarataap
- Kind: user
- Repositories: 3
- Profile: https://github.com/pereirabarataap
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite the paper in which it was introduced."
authors:
- family-names: Pereira Barata
given-names: Antonio
orcid: "https://orcid.org/0000-0002-0540-7681"
title: "fair_tree_classifier"
version: 0.1
date-released: 2023
url: "https://github.com/pereirabarataap/fair_tree_classifier"
preferred-citation:
type: journal-paper
authors:
- family-names: Pereira Barata
given-names: Antonio
orcid: "https://orcid.org/0000-0002-0540-7681"
- family-names: Takes
given-names: Frank W.
orcid: "https://orcid.org/0000-0001-5468-1030"
- family-names: Herik
given-names: H. Jaap van den
orcid: "https://orcid.org/0000-0001-9751-761X"
- family-names: Veenman
given-names: Cor
orcid: "https://orcid.org/0000-0002-2645-1198"
doi: 10.1007/s10994-023-06376-z
journal: "Machine Learning"
title: "Fair tree classifier using strong demographic parity"
year: 2023
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 101 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 28
- Total maintainers: 1
pypi.org: fair-trees
This package learns fair decision tree classifiers which can then be bagged into fair random forests, following the scikit-learn API standards.
- Homepage: https://github.com/pereirabarataap/fair_tree_classifier
- Documentation: https://fair-trees.readthedocs.io/
- License: MIT
-
Latest release: 2.6.6
published 10 months ago
Rankings
Maintainers (1)
Dependencies
- joblib *
- numpy *
- pandas *
- scikit-learn *
- scipy *
- joblib ==1.2.0
- matplotlib ==3.8.2
- numpy ==1.23.4
- pandas ==2.2.0
- scikit-learn ==1.3.0
- scipy ==1.11.3
- seaborn ==0.13.2
- tqdm ==4.65.0
- joblib *
- numpy *
- pandas *
- scikit-learn *
- scipy *