mpcsl

A Modular Pipeline for Causal Structure Learning called MPCSL.

https://github.com/hpi-epic/mpcsl

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 15 committers (6.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.0%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

A Modular Pipeline for Causal Structure Learning called MPCSL.

Basic Info
  • Host: GitHub
  • Owner: hpi-epic
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 4.32 MB
Statistics
  • Stars: 8
  • Watchers: 5
  • Forks: 1
  • Open Issues: 21
  • Releases: 0
Created over 7 years ago · Last pushed about 3 years ago
Metadata Files
Readme License Citation

README.md

MPCSL: A Modular Pipeline for Causal Structure Learning

Docs CircleCI codecov

This repository contains the backend of MPCSL, a Modular Pipeline for Causal Structure Learning, build at the chair for Enterprise Platform and Integration Concepts at the Hasso Plattner Institute. The pipeline currently includes the following features, all of which are accessible via a REST API:

  • Store causal structure learning ready datasets into our backend
  • Set up causal structure learning experiments for different causal structure learning algorithms in R, Python and CUDA with different hyperparameter settings and dataset choices
  • Run the experiments as jobs directly in our backend
  • Manage all currently running jobs on the backend
  • Deliver the results and meta information of past experiments
  • Show distributions and perform interventions (currently limited to specific cases)
  • Comparison of different experiment results using quality metrics, such as type I or type II error, or graph edit distance
  • Extend the pipeline with new algorithms in their own execution environments

The following image shows the holistic architecture as a FMC diagram:

Setup

Requirements

As the user interface files are stored in a different currently private repository, you have to clone the repo using:

git clone --recurse-submodules git@github.com:hpi-epic/mpcsl.git

Getting Started

  1. minikube start
  2. garden deploy
  3. garden run task seed-db
  4. Goto minikube ip in browser

Setup Algorithms

garden run task db-setup-algorithms loads the algorithms into the database.

Seeding Example Dataset/Experiment

With garden run task seed-db an example dataset will be loaded into the database. The example dataset is generated from an EARTHQUAKE bayesian network on this page.

Endpoint Documentation

A Swagger documentation of our REST endpoints is available using /swagger/index.html given default host and port settings.

Maintainers

Contact: firstname.lastname@hpi.de

Contributors

Owner

  • Name: Enterprise Platform & Integration Concepts Research Group
  • Login: hpi-epic
  • Kind: organization
  • Location: Potsdam, Germany

Citation (CITATION.cff)

# YAML 1.2
---
abstract: |
    "The examination of causal structures is crucial for data scientists in a variety of machine learning application scenarios.
    In recent years, the corresponding interest in methods of causal structure learning has led to a wide spectrum of independent implementations, each having specific accuracy characteristics and introducing implementation-specific overhead in the runtime.
    Hence, considering a selection of algorithms or different implementations in different programming languages utilizing different hardware setups becomes a tedious manual task with high setup costs.
    Consequently, a tool that enables to plug in existing methods from different libraries into a single system to compare and evaluate the results is substantial support for data scientists in their research efforts.
    
    In this work, we propose an architectural blueprint of a pipeline for causal structure learning and outline our reference implementation MPCSL that addresses the requirements towards platform independence and modularity while ensuring the comparability and reproducibility of experiments.
    Moreover, we demonstrate the capabilities of MPCSL within a case study, where we evaluate existing implementations of the well-known PC-Algorithm concerning their runtime performance characteristics."
authors: 
  -
    affiliation: "Hasso Plattner Institute, University of Potsdam"
    family-names: Huegle
    given-names: Johannes
  -
    affiliation: "Hasso Plattner Institute, University of Potsdam"
    family-names: Hagedorn
    given-names: Christopher
  -
    affiliation: "Hasso Plattner Institute, University of Potsdam"
    family-names: Perscheid
    given-names: Michael
  -
    affiliation: "Hasso Plattner Institute, University of Potsdam"
    family-names: Plattner
    given-names: Hasso
cff-version: "1.1.0"
doi: "10.1145/3447548.3467082"
message: "If you use this software, please cite the paper."
title: "MPCSL"
references:
  - type: conference-paper
    authors: 
    -   affiliation: "Hasso Plattner Institute, University of Potsdam"
        family-names: Huegle
        given-names: Johannes
    -   affiliation: "Hasso Plattner Institute, University of Potsdam"
        family-names: Hagedorn
        given-names: Christopher
    -   affiliation: "Hasso Plattner Institute, University of Potsdam"
        family-names: Perscheid
        given-names: Michael
    -   affiliation: "Hasso Plattner Institute, University of Potsdam"
        family-names: Plattner
        given-names: Hasso
    title: "MPCSL - A Modular Pipeline for Causal Structure Learning"
    year: 2021
    collection-title: "Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21)"
    conference:
      name: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21)
    doi: "10.1145/3447548.3467082"  
...

GitHub Events

Total
Last Year

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 219
  • Total Committers: 15
  • Avg Commits per committer: 14.6
  • Development Distribution Score (DDS): 0.817
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
danthe me@d****m 40
Jonathan Schneider j****r@s****e 35
Christopher Hagedorn c****9@g****m 31
Alexander Kastius a****s@q****e 29
Jonas Umland j****d@s****e 16
milanpro m****l@g****m 15
MariusDanner m****s@d****e 15
Tobias Nack t****3@g****m 12
boehmchen 4****n 6
mschroederi c****e@m****e 5
danthe d****6@g****e 5
Victor Künstler v****r@o****m 3
constantin-lange c****e@s****e 3
Johannes Huegle j****e@h****e 2
Theresa Zobel t****l@s****e 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 46
  • Total pull requests: 154
  • Average time to close issues: 3 months
  • Average time to close pull requests: 13 days
  • Total issue authors: 7
  • Total pull request authors: 10
  • Average comments per issue: 0.35
  • Average comments per pull request: 0.45
  • Merged pull requests: 141
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mschroederi (17)
  • ChristopherSchmidt89 (14)
  • jonasumland (6)
  • boehmchen (3)
  • jonaschn (3)
  • constantin-lange (2)
  • Dencrash (1)
Pull Request Authors
  • danthe96 (31)
  • jonaschn (29)
  • Raandom (26)
  • ChristopherSchmidt89 (21)
  • jonasumland (19)
  • Dencrash (11)
  • mschroederi (7)
  • boehmchen (7)
  • constantin-lange (2)
  • theresazobel (1)
Top Labels
Issue Labels
enhancement (10) bug (4) backlog (4) backend (3) frontend (3) design (2) wontfix (1) question (1)
Pull Request Labels
help wanted (1) enhancement (1)

Dependencies

.github/workflows/mpci-execution-cupc.publish.yml actions
  • actions/checkout v2 composite
.github/workflows/mpci-execution-python.publish.yml actions
  • actions/checkout v2 composite
.github/workflows/mpci-execution-r.publish.yml actions
  • actions/checkout v2 composite
services/executionenvironments/cuda/Dockerfile docker
  • nvidia/cuda 10.1-devel-ubuntu18.04 build
services/executionenvironments/python-generators/Dockerfile docker
  • python 3.7 build
services/executionenvironments/r/generator/Dockerfile docker
  • chris89/mpci_r latest build
services/executionenvironments/python/requirements.txt pypi
  • ccmi *
  • cython >=0.26
  • networkx *
  • numpy *
  • pandas *
  • requests *
  • scikit-learn *
  • scipy *
  • tigramite *
services/executionenvironments/python-generators/upload_requirements.txt pypi
  • manm_cs ==0.1.2
  • requests ==2.25.1
services/python-images/requirements.txt pypi
  • Babel ==2.9.0
  • Click ==7.0
  • Faker ==1.0.0
  • Flask ==1.0.2
  • Flask-Migrate ==2.3.1
  • Flask-RESTful ==0.3.6
  • Flask-SQLAlchemy ==2.4.4
  • Flask-SocketIO ==4.2.1
  • Jinja2 ==2.10
  • Mako ==1.0.7
  • MarkupSafe ==1.1.0
  • Pillow ==8.1.0
  • PyYAML ==5.3.1
  • Pygments ==2.7.4
  • SQLAlchemy ==1.3.20
  • Sphinx ==2.0.1
  • Werkzeug ==0.14.1
  • absl-py ==0.11.0
  • aiohttp ==3.6.2
  • alabaster ==0.7.12
  • alembic ==1.0.6
  • aniso8601 ==4.0.1
  • appnope ==0.1.2
  • async-timeout ==3.0.1
  • atomicwrites ==1.2.1
  • attrs ==18.2.0
  • backcall ==0.2.0
  • bidict ==0.21.2
  • cachetools ==4.2.0
  • causaldag ==0.1a162
  • certifi ==2018.11.29
  • chardet ==3.0.4
  • codecov ==2.0.15
  • conditional-independence ==0.1a5
  • coverage ==5.3
  • cycler ==0.10.0
  • dataclasses ==0.6
  • decorator ==4.3.2
  • dnspython ==2.0.0
  • docutils ==0.16
  • eventlet ==0.25.1
  • factory-boy ==2.11.1
  • flake8 ==3.8.4
  • flask-restful-swagger-2 ==0.35
  • frozendict ==1.2
  • future ==0.18.2
  • google-auth ==1.23.0
  • graphical-model-learning ==0.1a7
  • graphical-models ==0.1a5
  • greenlet ==0.4.17
  • idna ==2.7
  • ijson ==2.3
  • imagesize ==1.2.0
  • importlib-metadata ==3.1.1
  • ipdb ==0.13.4
  • ipython ==7.20.0
  • ipython-genutils ==0.2.0
  • itsdangerous ==1.1.0
  • jedi ==0.18.0
  • joblib ==1.0.0
  • kiwisolver ==1.3.1
  • kubernetes ==10.0.1
  • marshmallow ==2.16.3
  • marshmallow-sqlalchemy ==0.15.0
  • matplotlib ==3.3.4
  • mccabe ==0.6.1
  • monotonic ==1.5
  • more-itertools ==4.3.0
  • multidict ==4.7.6
  • netrd ==0.2.2
  • networkx ==2.5
  • numexpr ==2.7.2
  • numpy ==1.16.0
  • numpydoc ==1.1.0
  • oauthlib ==3.1.0
  • ortools ==8.1.8487
  • packaging ==20.9
  • pandas ==1.1.4
  • parso ==0.8.1
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • pluggy ==0.8.0
  • progressbar2 ==3.53.1
  • prompt-toolkit ==3.0.16
  • protobuf ==3.14.0
  • psutil ==5.4.8
  • psycopg2 ==2.8.6
  • psycopg2-binary ==2.8.6
  • ptyprocess ==0.7.0
  • py ==1.7.0
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pycodestyle ==2.6.0
  • pyflakes ==2.2.0
  • pygam ==0.8.0
  • pyhdb ==0.3.4
  • pyparsing ==2.4.7
  • pytest ==3.10.1
  • pytest-cov ==2.6.1
  • pytest-ordering ==0.6
  • python-dateutil ==2.7.5
  • python-editor ==1.0.3
  • python-engineio ==4.0.0
  • python-socketio ==5.0.1
  • python-utils ==2.5.6
  • pytz ==2018.7
  • requests ==2.20.1
  • requests-oauthlib ==1.3.0
  • rsa ==4.6
  • scikit-learn ==0.24.1
  • scipy ==1.5.4
  • six ==1.11.0
  • snowballstemmer ==2.1.0
  • sphinx-rtd-theme ==0.5.1
  • sphinxcontrib-applehelp ==1.0.2
  • sphinxcontrib-devhelp ==1.0.2
  • sphinxcontrib-htmlhelp ==1.0.3
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.3
  • sphinxcontrib-serializinghtml ==1.1.4
  • sqlalchemy-hana ==0.3.0
  • text-unidecode ==1.2
  • threadpoolctl ==2.1.0
  • tqdm ==4.57.0
  • traitlets ==5.0.5
  • typing ==3.7.4.3
  • typing-extensions ==3.7.4.3
  • urllib3 ==1.24.2
  • wcwidth ==0.2.5
  • websocket-client ==0.55.0
  • yarl ==1.6.3
  • zipp ==3.4.0