mlinspect

Inspect ML Pipelines in Python in the form of a DAG

https://github.com/stefan-grafberger/mlinspect

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Inspect ML Pipelines in Python in the form of a DAG

Basic Info
  • Host: GitHub
  • Owner: stefan-grafberger
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 22.4 MB
Statistics
  • Stars: 70
  • Watchers: 5
  • Forks: 17
  • Open Issues: 19
  • Releases: 0
Created over 5 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

mlinspect

mlinspect GitHub license Build Status codecov

Inspect ML Pipelines in Python in the form of a DAG

Run mlinspect locally

Prerequisite: Python 3.10

  1. Clone this repository
  2. Set up the environment

    cd mlinspect
    python -m venv venv
    source venv/bin/activate

  3. If you want to use the visualisation functions we provide, install graphviz which can not be installed via pip

    Linux: apt-get install graphviz
    MAC OS: brew install graphviz

  4. Install pip dependencies

    SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e .[dev]

  5. To ensure everything works, you can run the tests (without graphviz, the visualisation test will fail)

    python setup.py test

How to use mlinspect

mlinspect makes it easy to analyze your pipeline and automatically check for common issues. ```python from mlinspect import PipelineInspector from mlinspect.inspections import MaterializeFirstOutputRows from mlinspect.checks import NoBiasIntroducedFor

IPYNB_PATH = ...

inspectorresult = PipelineInspector\ .onpipelinefromipynbfile(IPYNBPATH)\ .addrequiredinspection(MaterializeFirstOutputRows(5))\ .add_check(NoBiasIntroducedFor(['race']))\ .execute()

extracteddag = inspectorresult.dag dagnodetoinspectionresults = inspectorresult.dagnodetoinspectionresults checktocheckresults = inspectorresult.checktocheckresults ```

Detailed Example

We prepared a demo notebook to showcase mlinspect and its features.

Supported libraries and API functions

mlinspect already supports a selection of API functions from pandas and scikit-learn. Extending mlinspect to support more and more API functions and libraries will be an ongoing effort. However, mlinspect won't just crash when it encounters functions it doesn't recognize yet. For more information, please see here.

Notes

  • For debugging in PyCharm, set the pytest flag --no-cov (Link)

Publications

License

This library is licensed under the Apache 2.0 License.

Owner

  • Name: Stefan Grafberger
  • Login: stefan-grafberger
  • Kind: user
  • Location: Amsterdam
  • Company: University of Amsterdam

I am a Ph.D. student at the University of Amsterdam, conducting research at the intersection of data management and machine learning.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Grafberger"
  given-names: "Stefan"
  orcid: "https://orcid.org/0000-0002-9884-9517"
- family-names: "Groth"
  given-names: "Paul"
  orcid: "https://orcid.org/0000-0003-0183-6910"
- family-names: "Stoyanovich"
  given-names: "Julia"
- family-names: "Schelter"
  given-names: "Sebastian"
title: "Data Distribution Debugging in Machine Learning Pipelines"
doi: 10.1007/s00778-021-00726-w
url: "https://github.com/stefan-grafberger/mlinspect"
preferred-citation:
    type: article
    authors:
    - family-names: "Grafberger"
      given-names: "Stefan"
      orcid: "https://orcid.org/0000-0002-9884-9517"
    - family-names: "Groth"
      given-names: "Paul"
      orcid: "https://orcid.org/0000-0003-0183-6910"
    - family-names: "Stoyanovich"
      given-names: "Julia"
    - family-names: "Schelter"
      given-names: "Sebastian"
    title: "Data Distribution Debugging in Machine Learning Pipelines"
    doi: 10.1007/s00778-021-00726-w
    date-released: 2022-01-31

GitHub Events

Total
  • Watch event: 3
  • Fork event: 1
Last Year
  • Watch event: 3
  • Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 46
  • Total pull requests: 54
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 24 days
  • Total issue authors: 3
  • Total pull request authors: 5
  • Average comments per issue: 0.3
  • Average comments per pull request: 1.19
  • Merged pull requests: 35
  • Bot issues: 0
  • Bot pull requests: 15
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • stefan-grafberger (38)
  • sscdotopen (7)
  • adrianlut (1)
Pull Request Authors
  • stefan-grafberger (33)
  • dependabot[bot] (15)
  • adrianlut (4)
  • PiStefania (1)
  • shubhaguha (1)
Top Labels
Issue Labels
enhancement (12) core (7) future work (4) help wanted (2) infrastructure (2) demo (2) bug (1) good first issue (1) experiments (1) wontfix (1)
Pull Request Labels
dependencies (16) core (5) demo (2)

Dependencies

requirements/requirements.dev.txt pypi
  • gensim ==3.8.3 development
  • importnb ==0.6.2 development
  • jupyter ==1.0.0 development
  • keras ==2.4.3 development
  • pylint ==2.6.0 development
  • pytest ==6.1.2 development
  • pytest-cov ==2.10.1 development
  • pytest-mock ==3.3.1 development
  • pytest-pycharm ==0.7.0 development
  • pytest-pylint ==0.17.0 development
  • pytest-runner ==5.2 development
  • seaborn ==0.11.0 development
  • tensorflow ==2.5.0 development
requirements/requirements.txt pypi
  • astmonkey ==0.3.6
  • astpretty ==2.0.0
  • astunparse ==1.6.3
  • gorilla ==0.4.0
  • ipython ==7.25.0
  • matplotlib ==3.4.2
  • more-itertools ==8.6.0
  • nbconvert ==6.4.5
  • nbformat ==5.0.8
  • networkx ==2.5
  • numpy ==1.19.5
  • pandas ==1.2.3
  • protobuf ==3.20.1
  • pygraphviz ==1.7
  • scikit-learn ==0.23.2
  • scipy ==1.7.0
  • setuptools ==57.0.0
  • six ==1.15.0
  • statsmodels ==0.12.2
  • testfixtures ==6.17.1
.github/workflows/build.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
  • ts-graphviz/setup-graphviz v1 composite
setup.py pypi