https://github.com/csinva/disentangled-attribution-curves
Using / reproducing DAC from the paper "Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees"
Science Score: 33.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Keywords
Repository
Using / reproducing DAC from the paper "Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees"
Basic Info
- Host: GitHub
- Owner: csinva
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://arxiv.org/abs/1905.07631
- Size: 4.63 MB
Statistics
- Stars: 27
- Watchers: 6
- Forks: 4
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
readme.md
Disentangled attribution curves (DAC) 🔎
Official code for using / reproducing DAC from the paper Disentangled Attribution Curves for Interpreting Random Forests (arXiv 2018 pdf)
Note: this repo is actively maintained. For any questions please file an issue.

documentation
using DAC on new models
- quick install:
pip install git+https://github.com/csinva/disentangled-attribution-curves - the core of the method code lies in the dac folder and is compatible with scikit-learn
- the examples/xor_dac.ipynb folder contains examples of how to use DAC on a new dataset with some simple datasets (e.g. XOR, etc.)
- the basic api consists of two functions:
from dac import dac, dac_plot dac(forest, input_space_x, outcome_space_y, assignment, S, continuous_y=True, class_id=1)- inputs:
forest: an sklearn ensemble of decision treesinput_space_x: the matrix of training data (feature values), a numpy 2D arrayoutcome_space_y: the array of training data (labels/regression targets), a numpy 1D arrayassignment: a matrix of feature values that will have their DAC importance score evaluated, a numpy 2D arrayS: a binary indicator of whether to include each feature in the importance calculation, a numpy 1D array with values 0 and 1 onlycontinuous_y: a boolean indicator of whether the y targets are regression(true) or classification(false), defaults to trueclass_id: if classification, the class value to return proportions for, defaults to 1
- returns
dac_curve- for regression: a numpy array whose length corresponds to the number of samples in the assignment input. Each entry is a DAC importance score, a
float between min(outcomespacey) and max(outcomespacey)
- for classification: a numpy array whose length corresponds to the number of samples in the assignment input. Each entry is a DAC importance score, a float between 0 and 1
- inputs:
dac_plot(forest, input_space_x, outcome_space_y, S, interval_x, interval_y, di_x, di_y, C, continuous_y, weights- inputs
forest: an sklearn ensemble of decision trees (random forest or adaboosted forest)input_space_x: the matrix of training data (feature values), a numpy 2D arrayoutcome_space_y: the array of training data (labels/regression targets), a numpy 1D arrayS: a binary indicator of whether to include each feature in the importance calculation, a numpy 1D array with values 0 and 1 onlyinterval_x: an interval for the x axis of the plot, defaults to None. If None, a reasonable interval will be extrapolated from the range of the first feature specified in S.interval_y: an interval for the y axis of the plot (only applicable to heat maps), defaults to None.
If None, a reasonable interval will be extrapolated from the range of the second feature specified in S.di_x: a step length for the x axis of the plot, defaults to None. If None, a reasonable step length will be extrapolated from the range of the first feature specified in S.di_y: a step length for the y axis of the plot (only applicable to heat maps), defaults to None. If None, a reasonable step length will be extrapolated from the range of the second feature specified in S.C: a hyper-parameter specifying the number of standard deviations samples can be from the mean of the leaf and be counted into the curve. Smaller values yield a more sensitive curve, larger values yield a smoother curve.continuous_y: a boolean indicator of whether the y targets are regression(true) or classification(false), defaults to trueweights: weights for the individual estimators contributions to the curve, defaults to None. If None, weights will be extrapolated from the forest type.
- returns
dac_curvea numpy array containing values for the DAC curve or heatmap describing the interaction of the variables specified in S
- inputs
reproducing results from the paper

- the examples/bikesharingdac.ipynb folder contains examples of how to use DAC to reproducing the qualitative curves on the bike-sharing dataset in the paper
- the simulation script replicates the experiments with running simulations
- the pmlb script replicates the experiments of automatic feature engineering on pmlb datasets
dac animation
a gif demonstrating calculating a DAC curve for a simple tree

related work
- this work is part of an overarching project on interpretable machine learning, guided by the PDR framework for interpretable machine learning
- for related work, see the github repo for disentangled hierarchical dnn interpretations (ICLR 2019)
reference
feel free to use/share this code openly
citation for this work:
c
@article{devlin2019disentangled,
title={Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees},
author={Devlin, Summer and Singh, Chandan and Murdoch, W James and Yu, Bin},
journal={arXiv preprint arXiv:1905.07631},
year={2019}
}
Owner
- Name: Chandan Singh
- Login: csinva
- Kind: user
- Location: Microsoft research
- Company: Senior researcher
- Website: csinva.io
- Twitter: csinva_
- Repositories: 29
- Profile: https://github.com/csinva
Senior researcher @Microsoft interpreting ML models in science and medicine. PhD from UC Berkeley.
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Chandan Singh | c****h@b****u | 53 |
| Summer Devlin | s****r@S****t | 17 |
| Summer Devlin | s****r@S****l | 11 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 7
- Average time to close issues: 39 minutes
- Average time to close pull requests: 17 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 1.5
- Average comments per pull request: 0.0
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- stephenpardy (1)
- kumar-hardik (1)
Pull Request Authors
- devlins96 (7)