omop-meds

An ETL pipeline for transforming OMOP datasets into the MEDS format using the MEDS-Transforms library.

https://github.com/rvandewater/omop_meds

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

An ETL pipeline for transforming OMOP datasets into the MEDS format using the MEDS-Transforms library.

Basic Info
  • Host: GitHub
  • Owner: rvandewater
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.02 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 5
  • Releases: 10
Created about 1 year ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

MEDS OMOP ETL with MEDS-Transforms

PyPI - Version codecov tests code-quality python license PRs contributors DOI Static Badge

An ETL pipeline for transforming OMOP datasets into the MEDS format using the MEDS-Transforms library. Thanks to the developers of the first OMOP MEDS ETL, from which we took inspiration, which can be found here: https://github.com/Medical-Event-Data-Standard/meds_etl. We currently support OMOP 5.3 and 5.4 datasets.

bash pip install OMOP_MEDS OMOP_MEDS root_output_dir=$ROOT_OUTPUT_DIR

To try with the MIMIC-IV OMOP demo dataset, you can run:

bash OMOP_MEDS root_output_dir=/path/to/your/output do_download=True ++do_demo=True

Example config for an OMOP dataset:

```yaml datasetname: MIMICIVOMOP rawdatasetversion: 1.0 omopversion: 5.3

urls: dataset: - https://physionet.org/content/mimic-iv-demo-omop/0.9/ - url: EXAMPLECONTROLLEDURL username: ${oc.env:DATASETDOWNLOADUSERNAME} password: ${oc.env:DATASETDOWNLOADPASSWORD} demo: - https://physionet.org/content/mimic-iv-demo-omop/0.9/ common: - EXAMPLESHAREDURL # Often used for shared metadata files ```

Pre-MEDS settings

The following settings can be used to configure the pre-MEDS steps.

bash OMOP_MEDS \ root_output_dir=/sc/arion/projects/hpims-hpi/projects/foundation_models_ehr/cohorts/meds_debug/small_demo \ raw_input_dir=/sc/arion/projects/hpims-hpi/projects/foundation_models_ehr/cohorts/full_omop \ do_download=False ++do_overwrite=True ++limit_subjects=50

  • root_output_dir: Set the root output directory.
  • raw_input_dir: Path to the raw input directory.
  • do_download: Set to False to skip downloading the dataset.
  • ++do_overwrite: Set to True to overwrite existing files.
  • ++limit_subjects: Limit the number of subjects to process.

MEDS-transforms settings

If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).

Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:

pip install hydra-joblib-launcher --upgrade

Then, you can set the number of workers as environment variable:

bash export N_WORKERS=16

Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:

bash export N_SUBJECTS_PER_SHARD=1000

Citation

If you use this dataset, please use the citation link in Github.

Owner

  • Name: Robin van de Water
  • Login: rvandewater
  • Kind: user
  • Location: Berlin
  • Company: Hasso Plattner Institute

PhD student in Medical Event Prediction at Hasso Plattner Institute in collaboration with the Charité hospital (Berlin)

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "OMOP_MEDS ETL"
doi: "10.5281/zenodo.15132444"
authors:
  - family-names: "van de Water"
    given-names: "Robin Philippus"
    orcid: "https://orcid.org/0000-0002-2895-4872"
date-released: "2025-02-19"
url: "https://github.com/rvandewater/OMOP_MEDS"
repository-code: "https://github.com/rvandewater/OMOP_MEDS"
license: "MIT"

GitHub Events

Total
  • Create event: 11
  • Issues event: 18
  • Release event: 7
  • Watch event: 1
  • Issue comment event: 29
  • Public event: 1
  • Push event: 77
  • Pull request event: 9
Last Year
  • Create event: 11
  • Issues event: 18
  • Release event: 7
  • Watch event: 1
  • Issue comment event: 29
  • Public event: 1
  • Push event: 77
  • Pull request event: 9

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 11
  • Total pull requests: 6
  • Average time to close issues: 2 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 3.18
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 11
  • Pull requests: 6
  • Average time to close issues: 2 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 3.18
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bschilder (9)
  • rvandewater (2)
Pull Request Authors
  • rvandewater (12)
Top Labels
Issue Labels
documentation (1) enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 64 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 9
  • Total maintainers: 1
pypi.org: omop-meds

An ETL to convert OMOP data to the MEDS format.

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 64 Last month
Rankings
Dependent packages count: 9.6%
Average: 31.7%
Dependent repos count: 53.8%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/code-quality-main.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • pre-commit/action v3.0.1 composite
.github/workflows/code-quality-pr.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • pre-commit/action v3.0.1 composite
  • trilom/file-changes-action v1.2.4 composite
.github/workflows/python-build.yaml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
  • sigstore/gh-action-sigstore-python v3.0.0 composite
.github/workflows/tests.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • codecov/codecov-action v4.0.1 composite
  • codecov/test-results-action v1 composite
pyproject.toml pypi
  • beautifulsoup4 *
  • hydra-core *
  • loguru *
  • meds-transforms >=0.1
  • polars *
  • requests *