https://github.com/biomedsciai/causallib

A Python package for modular causal inference analysis and model evaluations

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 12 committers (8.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.9%) to scientific vocabulary

Keywords

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 10 months ago · JSON representation

Repository

A Python package for modular causal inference analysis and model evaluations

Basic Info

Host: GitHub
Owner: BiomedSciAI
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 13.2 MB

Statistics

Stars: 787
Watchers: 22
Forks: 105
Open Issues: 5
Releases: 16

Topics

causal causal-inference causal-models causality data-science machine-learning ml

Created over 7 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Code of conduct

Causal Inference 360

A Python package for inferring causal effects from observational data.

Description

Causal inference analysis enables estimating the causal effect of an intervention on some outcome from real-world non-experimental observational data.

This package provides a suite of causal methods, under a unified scikit-learn-inspired API. It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models. This modular approach supports highly-flexible causal modelling. The fit-and-predict-like API makes it possible to train on one set of examples and estimate an effect on the other (out-of-bag), which allows for a more "honest"¹ effect estimation.

The package also includes an evaluation suite. Since most causal-models utilize machine learning models internally, we can diagnose poor-performing models by re-interpreting known ML evaluations from a causal perspective.

If you use the package, please consider citing Shimoni et al., 2019:

Reference

bibtex @article{causalevaluations, title={An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference}, author={Shimoni, Yishai and Karavani, Ehud and Ravid, Sivan and Bak, Peter and Ng, Tan Hung and Alford, Sharon Hensley and Meade, Denise and Goldschmidt, Yaara}, journal={arXiv preprint arXiv:1906.00442}, year={2019} }

¹ Borrowing Wager & Athey terminology of avoiding overfit.

Installation

bash pip install causallib

Usage

The package is imported using the name causallib. Each causal model requires an internal machine-learning model. causallib supports any model that has a sklearn-like fit-predict API (note some models might require a predict_proba implementation). For example: ```Python from sklearn.linearmodel import LogisticRegression from causallib.estimation import IPW from causallib.datasets import loadnhefs

data = loadnhefs() ipw = IPW(LogisticRegression()) ipw.fit(data.X, data.a) potentialoutcomes = ipw.estimatepopulationoutcome(data.X, data.a, data.y) effect = ipw.estimateeffect(potentialoutcomes[1], potential_outcomes[0]) ``` Comprehensive Jupyter Notebooks examples can be found in the examples directory.

Community support

We use the Slack workspace at causallib.slack.com for informal communication. We encourage you to ask questions regarding causal-inference modelling or usage of causallib that don't necessarily merit opening an issue on Github.

Use this invite link to join causallib on Slack.

Approach to causal-inference

Some key points on how we address causal-inference estimation

1. Emphasis on potential outcome prediction

Causal effect may be the desired outcome. However, every effect is defined by two potential (counterfactual) outcomes. We adopt this two-step approach by separating the effect-estimating step from the potential-outcome-prediction step. A beneficial consequence to this approach is that it better supports multi-treatment problems where "effect" is not well-defined.

2. Stratified average treatment effect

The causal inference literature devotes special attention to the population on which the effect is estimated on. For example, ATE (average treatment effect on the entire sample), ATT (average treatment effect on the treated), etc. By allowing out-of-bag estimation, we leave this specification to the user. For example, ATE is achieved by model.estimate_population_outcome(X, a) and ATT is done by stratifying on the treated: model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])

3. Families of causal inference models

We distinguish between two types of models: * Weight models: weight the data to balance between the treatment and control groups, and then estimates the potential outcome by using a weighted average of the observed outcome. Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models. * Direct outcome models: uses the covariates (features) and treatment assignment to build a model that predicts the outcome directly. The model can then be used to predict the outcome under any assignment of treatment values, specifically the potential-outcome under assignment of all controls or all treated.
These models are usually known as Standardization models, and it should be noted that, currently, they are the only ones able to generate individual effect estimation (otherwise known as CATE).

4. Confounders and DAGs

One of the most important steps in causal inference analysis is to have proper selection on both dimensions of the data to avoid introducing bias: * On rows: thoughtfully choosing the right inclusion\exclusion criteria for individuals in the data. * On columns: thoughtfully choosing what covariates (features) act as confounders and should be included in the analysis.

This is a place where domain expert knowledge is required and cannot be fully and truly automated by algorithms. This package assumes that the data provided to the model fit the criteria. However, filtering can be applied in real-time using a scikit-learn pipeline estimator that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.

Owner

Name: BiomedSciAI
Login: BiomedSciAI
Kind: organization

Repositories: 6
Profile: https://github.com/BiomedSciAI

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 3
Watch event: 65
Issue comment event: 7
Push event: 3
Pull request event: 4
Fork event: 9

Last Year

Create event: 1
Release event: 1
Issues event: 3
Watch event: 65
Issue comment event: 7
Push event: 3
Pull request event: 4
Fork event: 9

Committers

Last synced: about 1 year ago

All Time

Total Commits: 81
Total Committers: 12
Avg Commits per committer: 6.75
Development Distribution Score (DDS): 0.346

Past Year

Commits: 6
Committers: 2
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.167

Top Committers

Name	Email	Commits
Ehud Karavani	1****r	53
Yishai Shimoni	s**i@g**m	15
mmdanziger	m****r	3
yoavkt	y**t@g**m	2
liranszlak	7****k	1
dennislwei	d**i@g**m	1
d-vct	1****t	1
Steve Martinelli	s****t	1
Lior Ness	l**s@g**m	1
Itay Manes	i**s@g**m	1
Chirag Nagpal	c**4@h**m	1
Chirag Nagpal	c**n@c**u	1

Committer Domains (Top 20 + Academic)

cs.cmu.edu: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 22
Total pull requests: 55
Average time to close issues: 4 months
Average time to close pull requests: about 14 hours
Total issue authors: 20
Total pull request authors: 15
Average comments per issue: 3.0
Average comments per pull request: 0.35
Merged pull requests: 48
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 3
Pull requests: 5
Average time to close issues: 6 days
Average time to close pull requests: 2 days
Issue authors: 3
Pull request authors: 2
Average comments per issue: 1.67
Average comments per pull request: 0.2
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

woonjeung (2)
jgdpsingh (2)
R2Bb1T (1)
yishaishimoni (1)
JuliusKumar (1)
GiacomoPinardi (1)
GMGassner (1)
winston-zillow (1)
myoshimu (1)
Mawul4j (1)
glotglutton (1)
marcoBmota8 (1)
Giovannibriglia (1)
agnusdei13 (1)
ehudkr (1)

Pull Request Authors

ehudkr (36)
JulinaM (3)
mmdanziger (3)
d-vct (2)
yoavkt (2)
chiragnagpal (2)
yishaishimoni (2)
stevemar (1)
Itaymanes (1)
marcoBmota8 (1)
SagiPolaczek (1)
liranszlak (1)
liorness (1)
kgreenewald (1)
dennislwei (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 3
Total downloads:
- pypi 1,552 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 3
(may contain duplicates)
Total versions: 48
Total maintainers: 2

proxy.golang.org: github.com/biomedsciai/causallib

Documentation: https://pkg.go.dev/github.com/biomedsciai/causallib#section-documentation
License: apache-2.0
Latest release: v0.10.0
published over 1 year ago

Versions: 16
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 10 months ago

proxy.golang.org: github.com/BiomedSciAI/causallib

Documentation: https://pkg.go.dev/github.com/BiomedSciAI/causallib#section-documentation
License: apache-2.0
Latest release: v0.10.0
published over 1 year ago

Versions: 16
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 10 months ago

pypi.org: causallib

A Python package for flexible and modular causal inference modeling

Homepage: https://github.com/BiomedSciAI/causallib
Documentation: https://causallib.readthedocs.io/en/latest/
License: Apache License 2.0
Latest release: 0.10.0
published over 1 year ago

Versions: 16
Dependent Packages: 0
Dependent Repositories: 3
Downloads: 1,552 Last month
Docker Downloads: 0

Rankings

Stargazers count: 2.5%

Docker downloads count: 4.5%

Forks count: 4.7%

Downloads: 6.0%

Average: 6.1%

Dependent repos count: 9.0%

Dependent packages count: 10.0%

Maintainers (2)

ehudk yishais

Last synced: 10 months ago

Dependencies

causallib/contrib/requirements.txt pypi

faiss-gpu *
torch >=1.2.0

docs/requirements.txt pypi

m2r2 *
sphinx ==4.4.0
sphinx-rtd-theme *

requirements.txt pypi

matplotlib >=2.2,<4
networkx >=1.1,<3
numpy >=1.13,<2
pandas >=0.25.2,<2
scikit-learn >=0.20,<2
scipy >=0.19,<2
statsmodels >=0.9,<1

.github/workflows/build.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
paambaati/codeclimate-action v3.2.0 composite

https://github.com/biomedsciai/causallib

Science Score: 46.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Causal Inference 360

Description

Installation

Usage

Community support

Approach to causal-inference

1. Emphasis on potential outcome prediction

2. Stratified average treatment effect

3. Families of causal inference models

4. Confounders and DAGs

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/biomedsciai/causallib

Rankings

proxy.golang.org: github.com/BiomedSciAI/causallib

Rankings

pypi.org: causallib

Rankings

Maintainers (2)

Dependencies