lemon-explainer

The LEMON machine learning explanation technique 🍋

https://github.com/iamdecode/lemon

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

The LEMON machine learning explanation technique 🍋

Basic Info
  • Host: GitHub
  • Owner: iamDecode
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: main
  • Homepage: https://explaining.ml/lemon
  • Size: 69.3 KB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created almost 7 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

PyPI version

LEMON is a technique to explain why predictions of machine learning models are made. It does so by providing feature contribution: a score for each feature that indicates how much it contributed to the final prediction. More precisely, it shows the sensitivity of the feature: a small change in an important feature's value results in a relatively large change in prediction. It is similar to the popular LIME explanation technique, but is more faithful to the reference model, especially for larger datasets.

Website ↗ Academic paper ↗

Installation

To install use pip:

$ pip install lemon-explainer

Example

A minimal working example is shown below:

```python import numpy as np import pandas as pd from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from lemon import LemonExplainer

Load dataset

data = loadiris(asframe=True) X = data.data y = pd.Series(np.array(data.target_names)[data.target])

Train complex model

clf = RandomForestClassifier() clf.fit(X, y)

Explain instance

explainer = LemonExplainer(X, radiusmax=0.5) instance = X.iloc[-1, :] explanation = explainer.explaininstance(instance, clf.predictproba)[0] explanation.showin_notebook() ```

Development

For a development installation (requires npm or yarn),

$ git clone https://github.com/iamDecode/lemon.git $ cd lemon

You may want to (create and) activate a virtual environment:

$ python3 -m venv venv $ source venv/bin/activate

Install requirements:

$ pip install -r requirements.txt

And run the tests with:

$ pytest .

Approximate distance kernel LIME

If you prefer to use a Gaussian distance kernel as used in LIME, we can approximate this behavior with:

```python from lemon import LemonExplainer, gaussian_kernel from scipy.special import gammainccinv

DIMENSIONS = X.shape[1] KERNEL_SIZE = np.sqrt(DIMENSIONS) * .75 # kernel size as used in LIME

Obtain a distance kernel very close to LIME's gaussian kernel, see the paper for details.

p = 0.999 radius = KERNELSIZE * np.sqrt(2 * gammainccinv(DIMENSIONS / 2, (1 - p))) kernel = lambda x: gaussiankernel(x, KERNEL_SIZE)

explainer = LemonExplainer(X, distancekernel=kernel, radiusmax=radius) ```

This behavior is as close as possible to LIME, but still yields more faithful explanations due to LEMON's improved sampling technique. Read the paper for more details about this approach.

Citation

If you want to refer to our explanation technique, please cite our paper using the following BibTeX entry:

bibtex @inproceedings{collaris2023lemon, title={{LEMON}: Alternative Sampling for More Faithful Explanation Through Local Surrogate Models}, author={Collaris, Dennis and Gajane, Pratik and Jorritsma, Joost and van Wijk, Jarke J and Pechenizkiy, Mykola}, booktitle={Advances in Intelligent Data Analysis XXI: 21st International Symposium on Intelligent Data Analysis (IDA 2023)}, pages={77--90}, year={2023}, organization={Springer} }

License

This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.

Owner

  • Name: Dennis Collaris
  • Login: iamDecode
  • Kind: user
  • Location: Brainport, The Netherlands
  • Company: Eindhoven University of Technology

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it using these metadata.
title: LEMON: Alternative Sampling for More Faithful Explanation Through Local Surrogate Models
abstract: Local surrogate learning is a popular and successful method for machine learning explanation. It uses synthetic transfer data to approximate a complex reference model. The sampling technique used for this transfer data has a significant impact on the provided explanation, but remains relatively unexplored in literature. In this work, we explore alternative sampling techniques in pursuit of more faithful and robust explanations, and present LEMON: a sampling technique that samples directly from the desired distribution instead of reweighting samples as done in other explanation techniques (e.g., LIME). Next, we evaluate our technique in a synthetic and UCI dataset-based experiment, and show that our sampling technique yields more faithful explanations compared to current state-of-the-art explainers.
authors:
  - family-names: Collaris
    given-names: Dennis
    orcid: "https://orcid.org/0000-0001-7612-9319"
  - family-names: Gajane
    given-names: Pratik
    orcid: "http://orcid.org/0000-0002-8087-5661"
  - family-names: Jorritsma
    given-names: Joost
    orcid: "http://orcid.org/0000-0002-1669-9253"
  - family-names: van Wijk
    given-names: Jarke J.
    orcid: "https://orcid.org/0000-0002-5128-976X"
  - family-names: Pechenizkiy
    given-names: Mykola
    orcid: "http://orcid.org/0000-0003-4955-0743"
doi: 10.1007/978-3-031-30047-9_7
date-released: 2023-09-08
license: BSD-2-Clause

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 15 hours
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • Smells-tech (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 20 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
pypi.org: lemon-explainer

Explaining the predictions of any machine learning model

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 20 Last month
Rankings
Dependent packages count: 7.4%
Average: 38.1%
Dependent repos count: 68.9%
Maintainers (1)
Last synced: 6 months ago

Dependencies

pyproject.toml pypi
  • matplotlib *
  • numpy *
  • pandas *
requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *