pywhy-stats

Python package for (conditional) independence testing and statistical functions related to causality.

https://github.com/py-why/pywhy-stats

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (20.0%) to scientific vocabulary

Keywords

conditional-independence-test independence-testing python statistics

Last synced: 6 months ago · JSON representation

Repository

Python package for (conditional) independence testing and statistical functions related to causality.

Basic Info

Host: GitHub
Owner: py-why
License: mit
Language: Python
Default Branch: main
Homepage: https://www.pywhy.org/pywhy-stats/
Size: 4.8 MB

Statistics

Stars: 28
Watchers: 2
Forks: 4
Open Issues: 9
Releases: 1

Topics

conditional-independence-test independence-testing python statistics

Created almost 3 years ago · Last pushed 9 months ago

Metadata Files

Readme Contributing License Citation

PyWhy-Stats

Pywhy-stats serves as Python library for implementations of various statistical methods, such as (un)conditional independence tests, which can be utilized in tasks like causal discovery. In the current version, PyWhy-stats supports: - Kernel-based independence and conditional k-sample tests - FisherZ-based independence tests - Power-divergence independence tests - Bregman-divergence conditional k-sample tests

Documentation

See the development version documentation.

Or see stable version documentation

Installation

Installation is best done via pip or conda. For developers, they can also install from source using pip. See installation page for full details.

Dependencies

Minimally, pywhy-stats requires:

* Python (>=3.8)
* numpy
* scipy
* scikit-learn

User Installation

If you already have a working installation of numpy and scipy, the easiest way to install pywhy-stats is using pip:

pip install -U pywhy-stats

To install the package from github, clone the repository and then cd into the directory. You can then use poetry to install:

poetry install

# if you would like an editable install of pywhy-stats for dev purposes
pip install -e .

Quick Start

In the following sections, we will use artificial exemplary data to demonstrate the API's functionality. More information about the methods and hyperparameters can be found in the documentation.

Note that most methods in PyWhy-Stats support multivariate inputs. For this. simply pass in a 2D numpy array where rows represent samples and columns the different dimensions.

Unconditional Independence Tests

Consider the following exemplary data:

```Python import numpy as np

rng = np.random.defaultrng(0) X = rng.standardnormal((200, 1)) Y = np.exp(X + rng.standard_normal(size=(200, 1))) ```

Here, $Y$ depends on $X$ in a non-linear way. We can use the simplified API of PyWhy-Stats to test the null hypothesis that the variables are independent:

```Python from pywhystats import independencetest

result = independence_test(X, Y) print("p-value:", result.pvalue, "Test statistic:", result.statistic) ```

The independence_test method returns an object containing a p-value, a test statistic, and possibly additional information about the test. By default, this method employs a heuristic to select the most appropriate test for the data. Currently, it defaults to a kernel-based independence test.

As we observed, the p-value is significantly small. Using, for example, a significance level of 0.05, we would reject the null hypothesis of independence and infer that these variables are dependent. However, a p-value exceeding the significance level doesn't conclusively indicate that the variables are independent, it only indicates insufficient evidence of dependence.

We can also be more specific in the type of independence test we want to use. For instance, to use a FisherZ test, we can indicate this by:

```Python from pywhy_stats import Methods

result = independence_test(X, Y, method=Methods.FISHERZ) print("p-value:", result.pvalue, "Test statistic:", result.statistic) ```

Or for the kernel based independence test:

```Python from pywhy_stats import Methods

result = independence_test(X, Y, method=Methods.KCI) print("p-value:", result.pvalue, "Test statistic:", result.statistic) ```

For more information about the available methods, hyperparameters and other details, see the documentation.

Conditional independence test

Similar to the unconditional independence test, we can use the same API to condition on another variable or set of variables. First, let's generate a third variable $Z$ to condition on:

``` import numpy as np

rng = np.random.defaultrng(0) Z = rng.standardnormal((200, 1)) X = Z + rng.standardnormal(size=(200, 1)) Y = np.exp(Z + rng.standardnormal(size=(200, 1))) ```

Here, $X$ and $Y$ are dependent due to $Z$. Running an unconditional independence test yields:

```Python from pywhystats import independencetest

result = independence_test(X, Y) print("p-value:", result.pvalue, "Test statistic:", result.statistic) ```

Again, the p-value is very small, indicating a high likelihood that $X$ and $Y$ are dependent. Now, let's condition on $Z$, which should render the variables as independent:

Python result = independence_test(X, Y, condition_on=Z) print("p-value:", result.pvalue, "Test statistic:", result.statistic)

We observe that the p-value isn't small anymore. Indeed, if the variables were independent, we would expect the p-value to be uniformly distributed on $[0, 1]$.

(Conditional) k-sample test

In certain settings, you may be interested in testing the invariance between k (conditional) distributions. For example, say you have data collected over the same set of variables (X, Y) from humans ($P^1(X, Y)$) and bonobos ($P^2(X, Y)$). You can determine if the conditional distributions $P^1(Y | X) = P^2(Y | X)$ using conditional two-sample test.

First, we can create some simulated data that arise from two distinct distributions. However, the data generating Y is invariant across these two settings once we condition on X.

```Python import numpy as np

rng = np.random.defaultrng(0) X1 = rng.standardnormal((200, 1)) X2 = rng.uniform(low=0.0, high=1.0, size=(200, 1))

Y1 = np.exp(X1 + rng.standardnormal(size=(200, 1))) Y2 = np.exp(X2 + rng.standardnormal(size=(200, 1)))

groups = np.concatenate((np.zeros((200, 1)), np.ones((200, 1)))) X = np.concatenate((X1, X2)) Y = np.concatenate((Y1, Y2)) ```

We test the hypothesis that $P^1(Y | X) = P^2(Y | X)$ now with the following code.

```Python from pywhystats import conditionalksample

test that P^1(Y | X) = P^2(Y | X)

result = conditional_ksample.kcd.condind(X, Y, groups)

print("p-value:", result.pvalue, "Test statistic:", result.statistic) ```

Contributing

We welcome contributions from the community. Please refer to our contributing document and developer document for information on developer workflows.

Owner

Name: PyWhy
Login: py-why
Kind: organization

Website: pywhy.org
Repositories: 15
Profile: https://github.com/py-why

GitHub Events

Total

Watch event: 6
Push event: 4

Last Year

Watch event: 6
Push event: 4

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 9
Total pull requests: 29
Average time to close issues: 3 days
Average time to close pull requests: 28 days
Total issue authors: 2
Total pull request authors: 3
Average comments per issue: 0.67
Average comments per pull request: 1.79
Merged pull requests: 22
Bot issues: 0
Bot pull requests: 13

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

adam2392 (6)
rafaela-amorim (1)

Pull Request Authors

dependabot[bot] (19)
adam2392 (9)
bloebp (7)

Top Labels

Issue Labels

bug (2)

Pull Request Labels

dependencies (19) github_actions (15) python (4) No Changelog Needed (1)

Packages

Total packages: 1
Total downloads:
- pypi 11 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 2

pypi.org: pywhy-stats

Statistical methods for Python

Homepage: https://github.com/pywhy/pywhy-stats
Documentation: https://py-why.github.io/pywhy-stats
License: MIT
Latest release: 0.1
published over 2 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 11 Last month

Rankings

Dependent packages count: 7.4%

Average: 38.3%

Dependent repos count: 69.2%

Maintainers (2)

adam2392 bloebp

Last synced: 6 months ago

Dependencies

.github/workflows/circle_artifacts.yml actions

larsoner/circleci-artifacts-redirector-action master composite

.github/workflows/docs-release.yml actions

abatilo/actions-poetry v2.3.0 composite
actions/checkout v4 composite
peaceiris/actions-gh-pages v3 composite

.github/workflows/main.yml actions

abatilo/actions-poetry v2.3.0 composite
actions/checkout v4 composite
actions/download-artifact v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
codecov/codecov-action v3 composite
softprops/action-gh-release v1 composite

.github/workflows/pr_checks.yml actions

actions/checkout v4 composite

.github/workflows/python-publish.yml actions

abatilo/actions-poetry v2.3.0 composite
actions/checkout v4 composite
actions/setup-python v4 composite

poetry.lock pypi

189 dependencies

pyproject.toml pypi

numpy ^1.23.0
python >=3.8,<3.12
scikit-learn >= 1.0
scipy ^1.9.0

setup.py pypi

pywhy-stats

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

PyWhy-Stats

Documentation

Installation

Dependencies

User Installation

Quick Start

Unconditional Independence Tests

Conditional independence test

(Conditional) k-sample test

test that P^1(Y | X) = P^2(Y | X)

Contributing

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pywhy-stats

Rankings

Maintainers (2)

Dependencies