spacebench

SpaCE, the Spatial Confounding Environment, loads benchmark datasets for causal inference methods tackling spatial confounding

https://github.com/nsaph-projects/space

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

SpaCE, the Spatial Confounding Environment, loads benchmark datasets for causal inference methods tackling spatial confounding

Basic Info
Statistics
  • Stars: 17
  • Watchers: 5
  • Forks: 4
  • Open Issues: 7
  • Releases: 3
Created almost 3 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

Licence PyPI version build codecov build

🚀 Description

Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, SpaCE provides realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and a smoothness and confounding scores characterizing the effect of a missing spatial confounder. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. Realistic semi-synthetic outcomes and counterfactuals are generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. SpaCE facilitates an automated end-to-end machine learning pipeline, simplifying data loading, experimental setup, and model evaluation.

🐍 Installation

Install the PyPI version:

sh pip install "spacebench[all]" The option [all] installs all dependencies necessary for the spatial confounding algorithms and the examples. If you only want to use the SpaceDatasets, use pip install spacebench instead.

You can also install the latest 🔥 features from the development version:

sh pip install "git+https://github.com/NSAPH-Projects/space@dev#egg=spacebench[all]"

Python 3.10 or higher is required. See the docs and requirements.txt for more information.

🐢 Getting started

To obtain a benchmark dataset for spatial confounding you need to 1) create a SpaceEnv which contains real treatment and confounder data, and a realistic semi-synthetic outcome, 2) create a SpaceDataset which masks a spatially-varying confounder and facilitates the data loading pipeline for causal inference.

python from spacebench import SpaceEnv env = SpaceEnv('healthd_dmgrcs_mortality_disc') dataset = env.make() print(dataset) SpaceDataset with a missing spatial confounder: treatment: (3109,) (binary) confounders: (3109, 30) outcome: (3109,) counterfactuals: (3109, 2) confounding score of missing: 0.02 spatial smoothness score of missing: 0.11 graph edge list: (9237, 2) graph node coordinates: (3109, 2) parent SpaceEnv: healthd_dmgrcs_mortality_disc WARNING ⚠️ : this dataset contains a (realistic) synthetic outcome! By using it, you agree to understand its limitations. The variable names have been masked to emphasize that no inferences can be made about the source data.

Available SpaceEnvs

The list of available environments can be in the documentations or in an interactive session as:

python from spacebench import DataMaster dm = DataMaster() dm.master.head()

| environments | treatmenttype | collection | |:-------------------------------|:-----------------|:---------------------------------| | healthddmgrcsmortalitydisc | binary | Air Pollution and Mortality | | cdcsvilimtenghburdiccont | continuous | Social Vulnerability and Welfare | | climaterelhumwfsmokecont | continuous | Heat Exposure and Wildfires | | climatewfsmokeminrtydisc | binary | Heat Exposure and Wildfires | | healthdhhincomortalitycont | continuous | Air Pollution and Mortality | | healthdpollutnmortalitycont | continuous | Air Pollution and Mortality | | countyeducatnelectioncont | continuous | Welfare and Elections | | countyphyactivlifexpcycont | continuous | Welfare and Elections | | countydmgrcselectiondisc | binary | Welfare and Elections | | cdcsvinohsdppovertycont | continuous | Social Vulnerability and Welfare | | cdcsvinohsdppovertydisc | binary | Social Vulnerability and Welfare |

To learn more about the data collections and the environments see the docs. The data collections and environments are hosted at the Harvard Dataverse. "Data "nutrition labels" for the collections can be found here. The environments are produced using the space-data repository from a data collection with a configuration file. Don't forget to read our paper.

🙉 Code of Conduct

Please note that the SpaCE project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

👽 Contact

We welcome contributions and feedback about spacebench. If you have any suggestions or ideas, please open an issue or submit a pull request.

Documentation

The documentation is hosted at https://nsaph-projects.github.io/space/.

Owner

  • Name: NSAPH Projects
  • Login: NSAPH-Projects
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: "SpaCE: The Spatial Confounding Environment"
identifiers:
  - description: "SpaCE Data GitHub repository."
    type: url
    value: "https://github.com/NSAPH-Projects/space-data"
  - description: "SpaCE GitHub repository."
    type: url
    value: "https://github.com/NSAPH-Projects/space"
  - description: "SpaCE Data Collection."
    type: doi
    value: 10.7910/DVN/SYNPBS
authors:
  - family-names: Tec
    given-names: Mauricio
  - family-names: Trisovic
    given-names: Ana
    orcid: https://orcid.org/0000-0003-1991-0533
  - family-names: Audirac
    given-names: Michelle
  - family-names: Woodward
    given-names: Sophie
  - family-names: Hu
    given-names: Kate
  - family-names: Khoshnevis
    given-names: Naeem
  - family-names: Dominici
    given-names: Francesca
year: 2023
license: MIT

GitHub Events

Total
  • Issues event: 3
  • Watch event: 2
  • Issue comment event: 5
  • Push event: 1
Last Year
  • Issues event: 3
  • Watch event: 2
  • Issue comment event: 5
  • Push event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 79
  • Total pull requests: 81
  • Average time to close issues: 23 days
  • Average time to close pull requests: 6 days
  • Total issue authors: 8
  • Total pull request authors: 7
  • Average comments per issue: 1.05
  • Average comments per pull request: 0.73
  • Merged pull requests: 70
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mauriciogtec (38)
  • audiracmichelle (12)
  • Naeemkh (8)
  • atrisovic (8)
  • sophi890 (4)
  • jckitch (3)
  • fresleven (1)
  • zcalhoun (1)
Pull Request Authors
  • mauriciogtec (24)
  • Naeemkh (18)
  • atrisovic (15)
  • audiracmichelle (8)
  • sophi890 (5)
  • jckitch (3)
  • zcalhoun (1)
Top Labels
Issue Labels
enhancement (6) API (4) data generation (3) installation (2) algorithms (1) documentation (1) examples (1) refactoring (1) sampling models (1) bug (1) reproducibility (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 11 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
pypi.org: spacebench

Spatial confounding poses a significant challenge in scientific studies where unobserved spatial variables influence both treatment and outcome, leading to spurious associations. SpaCE provides realistic benchmark datasets and tools for systematically valuating causal inference methods for spatial confounding. Each dataset includes training data with spatial confounding, true counterfactuals, a spatial graph with coordinates, and realistic semi-synthetic outcomes.

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 11 Last month
Rankings
Dependent packages count: 7.1%
Average: 19.7%
Dependent repos count: 32.2%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/python-app-dist.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • s-weigand/setup-conda v1 composite
.github/workflows/python-app.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v3 composite
  • s-weigand/setup-conda v1 composite
optional-requirements.txt pypi
  • jsonlines >=3.1
  • matplotlib >=3.4.3
  • pysal >=2.5.0
  • pytorch-lightning >=2.0.2
  • scikit-learn >=1.2.2
  • seaborn >=0.11.2
  • torch_geometric >=2.3.1
  • torchaudio >=2.0.2
  • torchmetrics >=0.11.4
  • torchvision >=0.15.2
  • xgboost >=1.7.4
requirements.txt pypi
  • networkx >=3.0
  • numpy >=1.19.2
  • pandas >=1.5.3
  • pyDataverse >=0.3.1
  • pyproj ==3.4.1
  • pyyaml >=6.0
  • requests >=2.28.1
  • scipy >=1.10.1
  • setuptools >=58.0.4
  • tqdm >=4.62.3
  • urllib3 >=1.26.11
setup.py pypi