https://github.com/py-why/dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
6 of 100 committers (6.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Keywords from Contributors

mlops reinforcement-learning transformers large-language-models distributed agents parallel embedding autograd tensor

Last synced: 5 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: py-why
License: mit
Language: Python
Default Branch: main
Homepage: https://www.pywhy.org/dowhy
Size: 740 MB

Statistics

Stars: 7,673
Watchers: 143
Forks: 982
Open Issues: 141
Releases: 17

Topics

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Created over 7 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Governance

README.rst

|BuildStatus|_ |PyPiVersion|_ |PythonSupport|_ |Downloads|_ |discord|_ |gurubase|_

.. |PyPiVersion| image:: https://img.shields.io/pypi/v/dowhy.svg
.. _PyPiVersion: https://pypi.org/project/dowhy/

.. |PythonSupport| image:: https://img.shields.io/pypi/pyversions/dowhy.svg
.. _PythonSupport: https://pypi.org/project/dowhy/

.. |BuildStatus| image:: https://github.com/py-why/dowhy/actions/workflows/ci.yml/badge.svg
.. _BuildStatus: https://github.com/py-why/dowhy/actions

.. |Downloads| image:: https://pepy.tech/badge/dowhy
.. _Downloads: https://pepy.tech/project/dowhy

.. |discord| image:: https://img.shields.io/discord/818456847551168542
.. _discord: https://discord.gg/cSBGb3vsZb

.. |gurubase| image:: https://img.shields.io/badge/Gurubase-Ask%20DoWhy%20Guru-006BFF
.. _gurubase: https://gurubase.io/g/dowhy

.. image:: dowhy-logo-large.png
  :width: 50%
  :align: center


`Checkout the documentation `_
===============================================================

- The documentation, user guide, sample notebooks and other information are available at
    `https://py-why.github.io/dowhy `_
- DoWhy is part of the `PyWhy Ecosystem `_. For more tools and libraries related to causality, checkout the `PyWhy GitHub organization `_!
- For any questions, comments, or discussions about specific use cases, join our community on `Discord `_ (|discord|_)
- Jump right into some case studies:
    - Effect estimation: `Hotel booking cancellations `_ | `Effect of customer loyalty programs `_ | `Optimizing article headlines `_ | `Effect of home visits on infant health (IHDP) `_ | `Causes of customer churn/attrition `_
    - Root cause analysis and explanations: `Causal attribution and root-cause analysis of an online shop `_ | `Finding the Root Cause of Elevated Latencies in a Microservice Architecture `_ | `Finding Root Causes of Changes in a Supply Chain `_

For more example notebooks, see `here! `_

Introduction & Key Features
===========================
Decision-making involves understanding how different variables affect each other and predicting the outcome when some of them are changed to new values. For instance, given an outcome variable, one may be interested in determining how a potential action(s) may affect it, understanding what led to its current value, or simulate what would happen if some variables are changed. Answering such questions requires causal reasoning. DoWhy is a Python library that guides you through the various steps of causal reasoning and provides a unified interface for answering causal questions.

DoWhy provides a wide variety of algorithms for effect estimation, prediction, quantification
of causal influences, diagnosis of causal structures, root cause analysis, interventions and
counterfactuals. A key feature of DoWhy is its refutation and falsification API that can test causal assumptions for any estimation method,
thus making inference more robust and accessible to non-experts.

**Graphical Causal Models and Potential Outcomes: Best of both worlds**

DoWhy builds on two of the most powerful frameworks for causal inference:
graphical causal models and potential outcomes. For effect estimation, it uses graph-based criteria and do-calculus for
modeling assumptions and identifying a non-parametric causal effect. For estimation, it switches to methods based
primarily on potential outcomes.

For causal questions beyond effect estimation, it uses the power of graphical causal models by modeling the data
generation process via explicit causal mechanisms at each node, which, for instance, unlocks capabilities to attribute
observed effects to particular variables or estimate point-wise counterfactuals.

For a quick introduction to causal inference, check out `amit-sharma/causal-inference-tutorial `_
We also gave a more comprehensive tutorial at the ACM Knowledge Discovery and Data Mining (`KDD 2018 `_) conference: `causalinference.gitlab.io/kdd-tutorial `_.
For an introduction to the four steps of causal inference and its implications for machine learning, you can access this video tutorial from Microsoft Research `DoWhy Webinar `_ and for an introduction to the graphical causal model API, see the `PyCon presentation on Root Cause Analysis with DoWhy `_.

Key Features
~~~~~~~~~~~~

.. image:: https://raw.githubusercontent.com/py-why/dowhy/main/docs/images/dowhy-features.png

DoWhy supports the following causal tasks:

- Effect estimation (identification, average causal effect, conditional average causal effect, instrumental variables and more)
- Quantify causal influences (mediation analysis, direct arrow strength, intrinsic causal influence)
- What-if analysis (generate samples from interventional distribution, estimate counterfactuals)
- Root cause analysis and explanations (attribute anomalies to their causes, find causes for changes in distributions, estimate feature relevance and more)

For more details and how to use these methods in practice, checkout the documentation at `https://py-why.github.io/dowhy `_

Quick Start
===========
DoWhy support Python 3.8+. To install, you can use pip, poetry, or conda.

**Latest Release**

Install the latest `release `__ using pip.

.. code:: shell

   pip install dowhy

Install the latest `release `__ using poetry.

.. code:: shell

   poetry add dowhy

Install the latest `release `__ using conda.

.. code:: shell

   conda install -c conda-forge dowhy

If you face "Solving environment" problems with conda, then try :code:`conda update --all` and then install dowhy. If that does not work, then use :code:`conda config --set channel_priority false` and try to install again. If the problem persists, please `add your issue here `_.

**Development Version**

If you prefer to use the latest dev version, your dependency management tool will need to point at our GitHub repository.

.. code:: shell

    pip install git+https://github.com/py-why/dowhy@main


**Requirements**

DoWhy requires a few dependencies. 
Details on specific versions can be found in `pyproject.toml <./pyproject.toml>`_, under the `tool.poetry.dependencies` section.

If you face any problems, try installing dependencies manually.

.. code:: shell

    pip install '=='

Optionally, if you wish to input graphs in the dot format, then install pydot (or pygraphviz).

For better-looking graphs, you can optionally install pygraphviz. To proceed,
first install graphviz and then pygraphviz (on Ubuntu and Ubuntu WSL).

.. note::
    Installing pygraphviz can cause problems on some platforms.
    One way that works for most Linux distributions is to
    first install graphviz and then pygraphviz as shown below.
    Otherwise, please consult the documentation of `pygraphviz `_.

.. code:: shell

    sudo apt install graphviz libgraphviz-dev graphviz-dev pkg-config
    pip install --global-option=build_ext \
    --global-option="-I/usr/local/include/graphviz/" \
    --global-option="-L/usr/local/lib/graphviz" pygraphviz

Example: Effect identification and estimation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Most causal tasks in DoWhy only require a few lines of code to write. Here, we exemplarily estimate the causal effect of
a treatment on an outcome variable:

.. code:: python

    from dowhy import CausalModel
    import dowhy.datasets

    # Load some sample data
    data = dowhy.datasets.linear_dataset(
        beta=10,
        num_common_causes=5,
        num_instruments=2,
        num_samples=10000,
        treatment_is_binary=True)

A causal graph can be defined in different way, but the most common way is via `NetworkX `_.
After loading in the data, we use the four main operations for effect estimation in DoWhy: *model*, *identify*,
*estimate* and *refute*:

.. code:: python

    # I. Create a causal model from the data and given graph.
    model = CausalModel(
        data=data["df"],
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=data["gml_graph"])  # Or alternatively, as nx.DiGraph

    # II. Identify causal effect and return target estimands
    identified_estimand = model.identify_effect()

    # III. Estimate the target estimand using a statistical method.
    estimate = model.estimate_effect(identified_estimand,
                                     method_name="backdoor.propensity_score_matching")

    # IV. Refute the obtained estimate using multiple robustness checks.
    refute_results = model.refute_estimate(identified_estimand, estimate,
                                           method_name="random_common_cause")

DoWhy stresses on the interpretability of its output. At any point in the analysis,
you can inspect the untested assumptions, identified estimands (if any), and the
estimate (if any). Here's a sample output of the linear regression estimator:

.. image:: https://raw.githubusercontent.com/py-why/dowhy/main/docs/images/regression_output.png
    :width: 80%

For a full code example, check out the `Getting Started with DoWhy `_ notebook.

You can also use Conditional Average Treatment Effect (CATE) estimation methods from `EconML `_, as shown in the `Conditional Treatment Effects `_ notebook. Here's a code snippet.

.. code:: python

	from sklearn.preprocessing import PolynomialFeatures
	from sklearn.linear_model import LassoCV
	from sklearn.ensemble import GradientBoostingRegressor
	dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",
                        control_value = 0,
                        treatment_value = 1,
                        target_units = lambda df: df["X0"]>1,
                        confidence_intervals=False,
                        method_params={
                            "init_params":{'model_y':GradientBoostingRegressor(),
                                           'model_t': GradientBoostingRegressor(),
                                           'model_final':LassoCV(),
                                           'featurizer':PolynomialFeatures(degree=1, include_bias=True)},
                            "fit_params":{}})


Example: Graphical causal model (GCM) based inference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DoWhy's graphical causal model framework offers powerful tools to address causal questions beyond effect estimation.
It is based on Pearl's graphical causal model framework and models the causal data generation process of each variable
explicitly via *causal mechanisms* to support a wide range of causal algorithms. For more details, see the book
`Elements of Causal Inference `_.

Complex causal queries, such as attributing observed anomalies to nodes in the system, can be performed with just a few
lines of code:

.. code:: python

    import networkx as nx, numpy as np, pandas as pd
    from dowhy import gcm

    # Let's generate some "normal" data we assume we're given from our problem domain:
    X = np.random.normal(loc=0, scale=1, size=1000)
    Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000)
    Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000)
    data = pd.DataFrame(dict(X=X, Y=Y, Z=Z))

    # 1. Modeling cause-effect relationships as a structural causal model
    #    (causal graph + functional causal models):
    causal_model = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')]))  # X -> Y -> Z
    gcm.auto.assign_causal_mechanisms(causal_model, data)

    # 2. Fitting the SCM to the data:
    gcm.fit(causal_model, data)

    # Optional: Evaluate causal model
    print(gcm.evaluate_causal_model(causal_model, data))

    # Step 3: Perform a causal analysis.
    # results = gcm.(causal_model, ...)
    # For instance, root cause analysis:
    anomalous_sample = pd.DataFrame(dict(X=[0.1], Y=[6.2], Z=[19]))  # Here, Y is the root cause.

    # "Which node is the root cause of the anomaly in Z?":
    anomaly_attribution = gcm.attribute_anomalies(causal_model, "Z", anomalous_sample)

    # Or sampling from an interventional distribution. Here, under the intervention do(Y := 2).
    samples = gcm.interventional_samples(causal_model, interventions={'Y': lambda y: 2}, num_samples_to_draw=100)

The GCM framework offers many more features beyond these examples. For a full code example, check out the `Online Shop example notebook `_.

For more functionalities, example applications of DoWhy and details about the outputs, see the `User Guide `_ or
checkout `Jupyter notebooks `_.

More Information & Resources
============================
`Microsoft Research Blog `_ | `Video Tutorial for Effect Estimation `_ | `Video Tutorial for Root Cause Analysis `_ | `Arxiv Paper `_ | `Arxiv Paper (Graphical Causal Model extension) `_ | `Slides `_


Citing this package
~~~~~~~~~~~~~~~~~~~
If you find DoWhy useful for your work, please cite **both** of the following two references:

- Amit Sharma, Emre Kiciman. DoWhy: An End-to-End Library for Causal Inference. 2020. https://arxiv.org/abs/2011.04216
- Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing. DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models. 2024. MLOSS 25(147):1−7. https://jmlr.org/papers/v25/22-1258.html

Bibtex::

  @article{dowhy,
    title={DoWhy: An End-to-End Library for Causal Inference},
    author={Sharma, Amit and Kiciman, Emre},
    journal={arXiv preprint arXiv:2011.04216},
    year={2020}
  }

  @article{JMLR:v25:22-1258,
  author  = {Patrick Bl{{\"o}}baum and Peter G{{\"o}}tz and Kailash Budhathoki and Atalanti A. Mastakouri and Dominik Janzing},
  title   = {DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models},
  journal = {Journal of Machine Learning Research},
  year    = {2024},
  volume  = {25},
  number  = {147},
  pages   = {1--7},
  url     = {http://jmlr.org/papers/v25/22-1258.html}
  }

Issues
~~~~~~
If you encounter an issue or have a specific request for DoWhy, please `raise an issue `_.

Contributing
~~~~~~~~~~~~

This project welcomes contributions and suggestions. For a guide to contributing and a list of all contributors, check out `CONTRIBUTING.md `_ and our `docs for contributing code `_. Our `contributor code of conduct is available here `_.

Owner

Name: PyWhy
Login: py-why
Kind: organization

Website: pywhy.org
Repositories: 15
Profile: https://github.com/py-why

GitHub Events

Total

Create event: 31
Release event: 1
Issues event: 53
Watch event: 583
Delete event: 22
Issue comment event: 170
Push event: 120
Pull request review event: 75
Pull request review comment event: 37
Pull request event: 97
Fork event: 60

Last Year

Create event: 31
Release event: 1
Issues event: 53
Watch event: 583
Delete event: 22
Issue comment event: 170
Push event: 120
Pull request review event: 75
Pull request review comment event: 37
Pull request event: 97
Fork event: 60

Committers

Last synced: 9 months ago

All Time

Total Commits: 992
Total Committers: 100
Avg Commits per committer: 9.92
Development Distribution Score (DDS): 0.729

Past Year

Commits: 69
Committers: 22
Avg Commits per committer: 3.136
Development Distribution Score (DDS): 0.696

Top Committers

Name	Email	Commits
Amit Sharma	a**a@l**m	269
Patrick Bloebaum	b**p@a**m	213
Peter Goetz	p**o@a**m	85
Tanmay Kulkarni	t**7@g**m	53
allcontributors[bot]	4****]	48
dependabot[bot]	4****]	45
Adam Kelleher	a**h@g**m	34
Chris Trevino	d****o	33
Andres Morales	a**r@m**m	21
Arshiaarya	a**2@g**m	11
Kailash Budhathoki	1****i	9
EgorKraevTransferwise	6****e	8
Michael Marien	m**h@g**m	7
Siddhant Haldar	s**4@g**m	7
anusha0409	4****9	6
Emre Kıcıman	e**k@m**m	6
James Fiedler	j**r@g**m	5
RaulPL	r**5@g**m	5
eeulig	c**t@e**m	5
Gaweł Kazimierczuk	k**i@k**l	4
Rahul Shrestha	r**1@g**m	4
drawlinson	d**r@g**m	4
Priyadutt	6****t	4
AndreaChlebikova	A****a	3
Lukas Heumos	l**s@p**t	3
Nick Parente	5****1	3
cfreksen	c**r@f**k	3
yogabonito	y****o	3
Andrew Clark	a**2@s**k	3
kmhj13	k**k@g**m	2
and 70 more...

Committer Domains (Top 20 + Academic)

microsoft.com: 7 amazon.com: 3 eeulig.com: 1 kazigk.pl: 1 posteo.net: 1 freksen.dk: 1 sheffield.ac.uk: 1 grabtaxi.com: 1 target.com: 1 zillowgroup.com: 1 cornell.edu: 1 glovoapp.com: 1 redhat.com: 1 bluewin.ch: 1 veldt.jp: 1 ryanrussell.org: 1 snu.edu.in: 1 stjohngrimbly.com: 1 cern.ch: 1 julienc.io: 1 pitt.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 193
Total pull requests: 372
Average time to close issues: 2 months
Average time to close pull requests: 17 days
Total issue authors: 137
Total pull request authors: 52
Average comments per issue: 3.32
Average comments per pull request: 1.07
Merged pull requests: 314
Bot issues: 0
Bot pull requests: 97

Past Year

Issues: 34
Pull requests: 63
Average time to close issues: about 1 month
Average time to close pull requests: 8 days
Issue authors: 28
Pull request authors: 20
Average comments per issue: 1.09
Average comments per pull request: 0.9
Merged pull requests: 47
Bot issues: 0
Bot pull requests: 19

View more stats

Top Authors

Issue Authors

AlxndrMlk (5)
Zethson (5)
ankur-tutlani (5)
asha24choudhary (5)
Klesel (4)
jcreinhold (3)
xwbxxx (3)
VasundharaAcharya (3)
adam2392 (3)
kbattocchi (3)
PMK1991 (3)
elakhatibi (3)
Yangliu-SY (3)
priamai (3)
benTC74 (3)

Pull Request Authors

bloebp (144)
dependabot[bot] (69)
allcontributors[bot] (34)
amit-sharma (21)
rahulbshrestha (13)
andresmor-ms (9)
drawlinson (7)
petergtz (7)
nparent1 (7)
yogabonito (6)
darthtrevino (5)
Zethson (5)
bhatt-priyadutt (5)
kmhj13 (4)
emrekiciman (4)

Top Labels

Issue Labels

question (93) stale (62) bug (47) enhancement (30) good first issue (7)

Pull Request Labels

dependencies (69) python (40) stale (20) github_actions (14) javascript (4) enhancement (3)

Packages

Total packages: 2
Total downloads:
- pypi 75,467 last-month
Total docker downloads: 78

Total dependent packages: 4
(may contain duplicates)
Total dependent repositories: 43
(may contain duplicates)
Total versions: 23
Total maintainers: 2

pypi.org: dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions

Documentation: https://py-why.github.io/dowhy
License: MIT
Latest release: 0.11.1
published about 2 years ago

Versions: 17
Dependent Packages: 4
Dependent Repositories: 43
Downloads: 75,467 Last month
Docker Downloads: 78

Rankings

Stargazers count: 0.4%

Forks count: 1.4%

Downloads: 1.6%

Dependent packages count: 1.6%

Average: 1.8%

Dependent repos count: 2.2%

Docker downloads count: 3.6%

Maintainers (2)

amit-sharma bloebp

Last synced: 6 months ago

proxy.golang.org: github.com/py-why/dowhy

Documentation: https://pkg.go.dev/github.com/py-why/dowhy#section-documentation
License: mit
Latest release: v0.11.1
published about 2 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

Dependencies

.github/workflows/advanced-on-demand.yml actions

abatilo/actions-poetry v2.3.0 composite
actions/checkout v4 composite
actions/github-script v7 composite
actions/setup-python v4 composite

.github/workflows/build-docker-image-docs.yml actions

actions/checkout v4 composite
pmorelli92/github-container-registry-build-push 2.1.0 composite

.github/workflows/ci-install.yml actions

actions/checkout v4 composite
actions/setup-python v4 composite
th0th/notify-discord v0.4.1 composite

.github/workflows/ci.yml actions

abatilo/actions-poetry v2.3.0 composite
actions/checkout v4 composite
actions/setup-python v4 composite
th0th/notify-discord v0.4.1 composite

.github/workflows/close-inactive-issues.yml actions

actions/stale v8 composite

.github/workflows/docs-ci.yml actions

actions/checkout v4 composite

.github/workflows/docs-release.yml actions

actions/checkout v4 composite
peaceiris/actions-gh-pages v3 composite

.github/workflows/docs.yml actions

actions/checkout v4 composite
peaceiris/actions-gh-pages v3 composite

.github/workflows/nightly-tests.yml actions

abatilo/actions-poetry v2.3.0 composite
actions/checkout v4 composite
actions/setup-python v4 composite
th0th/notify-discord v0.4.1 composite

.github/workflows/python-publish.yml actions

abatilo/actions-poetry v2.3.0 composite
actions/checkout v4 composite
actions/setup-python v4 composite

docs/Dockerfile docker

ghcr.io/py-why/dowhy-example-notebooks-deps latest build

docs/version_patcher/package-lock.json npm

@tootallnate/once 2.0.0
abab 2.0.6
acorn 8.8.1
acorn-globals 7.0.1
acorn-walk 8.2.0
agent-base 6.0.2
asynckit 0.4.0
combined-stream 1.0.8
cssom 0.5.0
cssom 0.3.8
cssstyle 2.3.0
data-urls 3.0.2
debug 4.3.4
decimal.js 10.4.2
deep-is 0.1.4
delayed-stream 1.0.0
domexception 4.0.0
entities 4.4.0
escodegen 2.0.0
esprima 4.0.1
estraverse 5.3.0
esutils 2.0.3
fast-levenshtein 2.0.6
form-data 4.0.0
html-encoding-sniffer 3.0.0
http-proxy-agent 5.0.0
https-proxy-agent 5.0.1
iconv-lite 0.6.3
is-potential-custom-element-name 1.0.1
jsdom 20.0.2
levn 0.3.0
mime-db 1.52.0
mime-types 2.1.35
ms 2.1.2
nwsapi 2.2.2
optionator 0.8.3
parse5 7.1.1
prelude-ls 1.1.2
psl 1.9.0
punycode 2.1.1
querystringify 2.2.0
requires-port 1.0.0
safer-buffer 2.1.2
saxes 6.0.0
source-map 0.6.1
symbol-tree 3.2.4
tough-cookie 4.1.3
tr46 3.0.0
type-check 0.3.2
universalify 0.2.0
url-parse 1.5.10
w3c-xmlserializer 3.0.0
webidl-conversions 7.0.0
whatwg-encoding 2.0.0
whatwg-mimetype 3.0.0
whatwg-url 11.0.0
word-wrap 1.2.4
ws 8.11.0
xml-name-validator 4.0.0
xmlchars 2.2.0

docs/version_patcher/package.json npm

jsdom ^20.0.2

poetry.lock pypi

237 dependencies

pyproject.toml pypi

causal-learn >=0.1.3.0
cvxpy ^1.2.2
cython >=0.29.32
econml >=0.14.1
joblib >=1.1.0
matplotlib >=3.5.3
networkx >=2.8.5
numpy >=1.20
pandas >=1.4.3
pydot ^1.4.2
pygraphviz ^1.9
python >=3.8,<3.12
scikit-learn >1.0
scipy >=1.4.1
statsmodels >=0.13.5
sympy >=1.10.1
tqdm >=4.64.0

binder/environment.yml pypi

econml *
graphviz *