omop_schema

Basic package that provides the omop_schema to load in schema compliant OMOP csv file, to validate them, and to cast non-compliant dataframes.

https://github.com/rvandewater/omop_schema

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic package that provides the omop_schema to load in schema compliant OMOP csv file, to validate them, and to cast non-compliant dataframes.

Basic Info
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Created 11 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

OMOP Schema

PyPI - Version codecov Tests Code Quality Python License PRs Welcome Contributors

omop_schema is a Python package designed to read, manage, and convert OMOP (Observational Medical Outcomes Partnership) data into the correct schema. It provides tools to handle OMOP CDM (Common Data Model) tables, convert schemas between different formats (e.g., PyArrow, Polars, Pandas), and load datasets efficiently.

Features

  • Schema Management: Define and manage schemas for OMOP CDM tables for different versions (e.g., 5.3, v5.4).
  • Schema Conversion: Convert PyArrow schemas to Polars or Pandas-compatible schemas.
  • Dataset Loading: Load datasets from CSV files into PyArrow tables, ensuring they match the defined schema.
  • Optional Dependencies: Support for Polars and Pandas as optional dependencies for schema conversion.

Installation

Install the package using pip:

bash pip install omop_schema

To include optional dependencies:

bash pip install omop_schema[polars] pip install omop_schema[pandas] pip install omop_schema[polars,pandas]

Usage

1. Define and Retrieve OMOP Schemas

The package provides predefined schemas for OMOP CDM tables. You can retrieve the schema for a specific table:

```python from omop_schema.schema.v54 import OMOPSchemaV54

schemav54 = OMOPSchemaV54() conceptschema = schemav54.getpyarrowschema("concept") print(conceptschema) ```

2. Load Datasets

You can load datasets from a folder containing CSV files. The files are matched to the predefined schemas:

```python from omop_schema.schema.v54 import OMOPSchemaV54

schemav54 = OMOPSchemaV54() datasets = schemav54.loadcsvdataset("path/to/csv/folder")

Access a specific table

concepttable = datasets["concept"] print(concepttable) ```

3. Convert PyArrow Schema to Polars Schema

If Polars is installed, you can convert a PyArrow schema to a Polars-compatible schema:

```python from omopschema.utils import pyarrowtopolarsschema import pyarrow as pa

arrow_schema = pa.schema( [ pa.field("column1", pa.int64()), pa.field("column2", pa.string()), ] )

polarsschema = pyarrowtopolarsschema(arrowschema) print(polarsschema) ```

4. Convert PyArrow Schema to Pandas Schema

If Pandas is installed, you can convert a PyArrow schema to a Pandas-compatible schema:

```python from omopschema.utils import pyarrowtopandasschema import pyarrow as pa

arrow_schema = pa.schema( [ pa.field("column1", pa.int64()), pa.field("column2", pa.string()), ] )

pandasschema = pyarrowtopandasschema(arrowschema) print(pandasschema) ```

Optional Dependencies

  • Polars: For converting PyArrow schemas to Polars schemas.
  • Pandas: For converting PyArrow schemas to Pandas schemas.

OMOP Dataset Validation - Graphical Method

This documentation provides an overview of how to use the validate_omop_dataset_graphically function to validate OMOP datasets and display the results in a graphical format.

The validate_omop_dataset_graphically function validates an OMOP dataset against a predefined schema and displays the results in a graphical table format using the rich library. It also logs the validation process for real-time feedback.

python validate_omop_dataset_graphically(validator, dataset_path="path/to/dataset")

The function will:

  • Log the validation progress in real-time.
  • Display a graphical table summarizing:
    • Missing columns
    • Mismatched columns
    • Extra columns
    • Correct columns

Example Output

Console Log

plaintext 2023-10-01 12:00:00 - INFO - Validating table: person 2023-10-01 12:00:01 - INFO - Validation results for table 'person': Missing: month_of_birth, Mismatched: None, Extra: extra_column_1, Correct: person_id, gender_concept_id 2023-10-01 12:00:01 - INFO - Validation process completed.

Graphical Table

+-------------+-----------------------------+-----------------------------------+----------------------+----------------------+ | Table Name | Missing Columns | Mismatched Columns | Extra Columns | Correct Columns | +-------------+-----------------------------+-----------------------------------+----------------------+----------------------+ | person | month_of_birth: int64 | None | extra_column_1: str | person_id: int64 | | | day_of_birth: int64 | | | gender_concept_id: | | | | | | int64 | +-------------+-----------------------------+-----------------------------------+----------------------+----------------------+


Parameters

| Parameter | Type | Description | | --------------------------- | --------------- | ------------------------------------------------------------------------ | | validator | OMOPValidator | The validator object containing the schema for validation. | | dataset_path | `str | Path` | | load_with_expected_schema | bool | Whether to load the dataset with the expected schema. Default is True. |


Notes

  • Ensure the dataset files are in a supported format (e.g., .csv, .parquet).
  • The function uses the rich library for graphical output and logging for real-time feedback.
  • Missing or mismatched columns are highlighted in the output for easy identification.

Troubleshooting

  • Missing Dependencies: Install the required libraries using pip install rich.
  • Invalid Dataset Path: Ensure the dataset path is correct and accessible.
  • Schema Issues: Verify the schema file is correctly defined and matches the dataset structure.

Contributing

Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Owner

  • Name: Robin van de Water
  • Login: rvandewater
  • Kind: user
  • Location: Berlin
  • Company: Hasso Plattner Institute

PhD student in Medical Event Prediction at Hasso Plattner Institute in collaboration with the Charité hospital (Berlin)

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "omop_schema: a package for working with the OMOP CDM in Python"
doi: "10.5281/zenodo.15132444"
authors:
  - family-names: "van de Water"
    given-names: "Robin P."
    orcid: "https://orcid.org/0000-0002-2895-4872"
date-released: "2025-02-19"
url: "https://github.com/rvandewater/omop_schema"
repository-code: "https://github.com/rvandewater/omop_schema"
license: "MIT"

GitHub Events

Total
  • Release event: 2
  • Public event: 1
  • Push event: 50
  • Create event: 4
Last Year
  • Release event: 2
  • Public event: 1
  • Push event: 50
  • Create event: 4