https://github.com/birkhoffg/explainable-ml-papers

A list of research papers of explainable machine learning.

https://github.com/birkhoffg/explainable-ml-papers

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, sciencedirect.com, nature.com, acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary

Keywords

academic awesome counterfactual-explanations explainability explainable-ml explanations human-ai-interaction human-in-the-loop human-in-the-loop-machine-learning interpretability interpretable-ml interpretable-models machine-learning paper recourse research survey trustworthy-machine-learning xai
Last synced: 5 months ago · JSON representation

Repository

A list of research papers of explainable machine learning.

Basic Info
  • Host: GitHub
  • Owner: BirkhoffG
  • Default Branch: master
  • Homepage:
  • Size: 13.7 KB
Statistics
  • Stars: 41
  • Watchers: 3
  • Forks: 4
  • Open Issues: 0
  • Releases: 0
Topics
academic awesome counterfactual-explanations explainability explainable-ml explanations human-ai-interaction human-in-the-loop human-in-the-loop-machine-learning interpretability interpretable-ml interpretable-models machine-learning paper recourse research survey trustworthy-machine-learning xai
Created over 5 years ago · Last pushed over 4 years ago
Metadata Files
Readme

README.md

Papers on Explainable Machine Learning

This repository includes a collection of awesome research papers on Explainable Machine Learning (also referred as Explainable AI/XAI, Interpretable Machine Learning). As a rapidly emerging field, it can be frustrated when starting researching this field buried by enormous amount of papers (and un-unified terminologies). I hope this repository can help new ML researchers/practitioners to learn about this field with lesser pain and stress.

Unlike most repositories you find in GitHub which maintain a comprehensive list of resources in Explainable ML, I try to keep this list short to make it less intimating for beginners. It is definitely an objective selection which is based on my preferences and research tastes.

Papers marked in bold are highly recommended to read.

1. General Idea

Survey

  • The Mythos of Model Interpretability. Lipton, 2016 pdf

  • Open the Black Box Data-Driven Explanation of Black Box Decision Systems. Pedreschi et al. pdf

  • Techniques for Interpretable Machine Learning. Du et al. 2018 pdf

    notes

    • interpretable models (adding interpretable constraints, mimic learning)
    • post-hoc global explanation, and post-hoc local explanation
  • Explaining Explanations in AI. Mittelstadt et. al., 2019 pdf

  • Explanation in artificial intelligence: Insights from the social sciences. *Miller, 2019* pdf

  • Explaining Explanations: An Overview of Interpretability of Machine Learning. *Gilpin et al. 2019* pdf

    notes

    • tradeoff between Interpretability and completeness:
    • Interpretability: describe the internals of a system in a way that is understandable to human.
    • completeness: describe the operation of a systm in an accurate way.
  • Interpretable machine learning: definitions, methods, and applications. *Murdoch et al. 2019* pdf

  • Explaining Deep Neural Networks. Camburu, 2020 pdf

  • 2. Global Explanation

    Interpretable Models

    • Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Rudin, 2019 pdf

    Generalized Addictive Model

    • Accurate intelligible models with pairwise interactions. Lou et. al., 2013 pdf

    • Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. Caruana et. al., 2015 pdf | InterpretableML

    Rule-based Method

    • Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Letham et. al., 2015 pdf

    • Interpretable Decision Sets: A Joint Framework for Description and Prediction. Lakkaraju et. al., 2016 pdf

    Scoring System

    • Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice. Rudin, 2018 pdf

    Model Distillation

    Use interpretable models to approximate blackbox learning; similar to the imitation learning in RL.

    • Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. Tan et. al., 2018 pdf

    • Faithful and Customizable Explanations of Black Box Models. Lakkaraju et. al. 2019 pdf

    Representation-based Explanation

    • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), Kim et. al. 2018 pdf

    • This Looks Like That: Deep Learning for Interpretable Image Recognition. Chen et al., 2019 pdf

      Related papers

      • This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition. Nauta et al., 2020 pdf
      • Learning to Explain With Complemental Examples. Kanehira & Harada, 2019 pdf

    Self-Explaining Neural Network

    Also offers example-based explanation

    • Towards Robust Interpretability with Self-Explaining Neural Networks. Alvarez-Melis et. al., 2018 pdf

    • Deep Weighted Averaging Classifiers. Card et al., 2019 pdf

    3. Local Explanation

    Note: cumulating multiple local explanations can be viewed as constructing a global explanation.

    Feature-based Explanation

    • Permutation importance: a corrected feature importance measure. Altmann et. al. 2010 link | sklearn

    • "Why Should I Trust You?" Explaining the Predictions of Any Classifier. *Ribeiro et. al., 2016* pdf | LIME

    • A Unified Approach to Interpreting Model Predictions. Lundberg & Lee, 2017 pdf | SHAP

    • Anchors: High-Precision Model-Agnostic Explanations. Ribeiro et. al. 2018 pdf

    Example-based Explanation

    • Examples are not enough, learn to criticize! Criticism for Interpretability. Kim et. al., 2016 pdf

    Counterfactual Explanation

    Also referred as algorithmic recourse or contrastive explanation.

    • Counterfactual Explanations for Machine Learning: A Review. Verma et al., 2020 pdf
    • A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. Karimi et al., 2020 pdf

    Minimize distance counterfactuals

    • Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Wachter et. al., 2017 pdf

    • Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations. Mothilal et al., 2019 pdf

    Minimize cost (algorithmic recourse)

    • Actionable Recourse in Linear Classification. Ustun et al., 2019 pdf

    • Algorithmic Recourse: from Counterfactual Explanations to Interventions. Karimi et al., 2021 pdf

    Causal constraints

    • Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers. Mahajan et al., 2020 pdf

    4. Explainability in Human-in-the-loop ML

    HCI's perspective of Explainable ML

    • Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models. Krause et. al., 2016 pdf
    • Human-centered Machine Learning: a Machine-in-the-loop Approach. Tan, 2018 blog
    • Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. Abdul et. al., 2018 pdf
    • Explaining models: an empirical study of how explanations impact fairness judgment. Dodge et. al., 2019 pdf
    • Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making. Cai et. al, 2019 pdf
    • Designing Theory-Driven User-Centric Explainable AI. *Wang et. al., 2019* pdf
    • Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. Bansal et al., 2021 pdf

    5. Evaluate Explainable ML

    Evaluation of explainable ML can be loosely categorized into two classes: - faithfulness on evaluating how well the explanation reflects the true inner behavior of the black-box model. - interpretability on evaluating how understandable the explanation to human.

    • The Price of Interpretability. Bertsimas et. al., 2019 pdf

    • Beyond Accuracy: Behavioral Testing of NLP Models with Checklist. Ribeiro et. al., 2020 pdf @ ACL 2020 Best Paper

    Evaluating Faithfulness

    Evaluate whether or not the explanation faithfully reflects how model works (it turns out that 100% faithfully is often not the case in post-hoc explanations).

    • Sanity Checks for Saliency Maps Adebayo et al., 2018 pdf
    • Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? Jacovi & Goldberg, 2020 ACL

    Robust Explanation

    • Interpretation of Neural Networks Is Fragile. Ghorbani et. al., 2019 pdf

    • Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. Slack et. al., 2020 pdf

    • Robust and Stable Black Box Explanations. Lakkaraju et. al., 2020 pdf

      Evaluating Interpretability

      Evaluate interpretability (does the explanations make sense to human or not).

    • Towards A Rigorous Science of Interpretable Machine Learning. Doshi-Velez & Kim. 2017 pdf

    • 'It's Reducing a Human Being to a Percentage'; Perceptions of Justice in Algorithmic Decisions Binns et al., 2018 pdf

    • Human Evaluation of Models Built for Interpretability. Lage et. al., 2019 pdf

    • Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. Kaur et. al., 2019 pdf

    • Manipulating and Measuring Model Interpretability. Poursabzi-Sangdeh et al., 2021 pdf

    6. Useful Resources

    Courses & Talks

    • Tutorial on Explainable ML Website
    • Interpretability and Explainability in Machine Learning, Fall 2019 @ Harvard University by Hima Lakkaraju Course
    • Human-centered Machine Learning @University of Colorado Boulder by Chenhao Tan course
    • Model Explainability Forum by TWIML AI Podcast YouTube | link

    Collections of Resources

    Toolbox

    Owner

    • Name: Hangzhi Guo
    • Login: BirkhoffG
    • Kind: user
    • Company: Penn State University

    Ph.D. Student at Penn State University

    GitHub Events

    Total
    • Watch event: 12
    • Fork event: 1
    Last Year
    • Watch event: 12
    • Fork event: 1

    Issues and Pull Requests

    Last synced: about 1 year ago

    All Time
    • Total issues: 0
    • Total pull requests: 0
    • Average time to close issues: N/A
    • Average time to close pull requests: N/A
    • Total issue authors: 0
    • Total pull request authors: 0
    • Average comments per issue: 0
    • Average comments per pull request: 0
    • Merged pull requests: 0
    • Bot issues: 0
    • Bot pull requests: 0
    Past Year
    • Issues: 0
    • Pull requests: 0
    • Average time to close issues: N/A
    • Average time to close pull requests: N/A
    • Issue authors: 0
    • Pull request authors: 0
    • Average comments per issue: 0
    • Average comments per pull request: 0
    • Merged pull requests: 0
    • Bot issues: 0
    • Bot pull requests: 0
    Top Authors
    Issue Authors
    Pull Request Authors
    Top Labels
    Issue Labels
    Pull Request Labels