omop_schema
Basic package that provides the omop_schema to load in schema compliant OMOP csv file, to validate them, and to cast non-compliant dataframes.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
Basic package that provides the omop_schema to load in schema compliant OMOP csv file, to validate them, and to cast non-compliant dataframes.
Basic Info
- Host: GitHub
- Owner: rvandewater
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://rvandewater.github.io/omop_schema/
- Size: 5.76 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
OMOP Schema
omop_schema is a Python package designed to read, manage, and convert OMOP (Observational Medical Outcomes Partnership) data into the correct schema. It provides tools to handle OMOP CDM (Common Data Model) tables, convert schemas between different formats (e.g., PyArrow, Polars, Pandas), and load datasets efficiently.
Features
- Schema Management: Define and manage schemas for OMOP CDM tables for different versions (e.g., 5.3, v5.4).
- Schema Conversion: Convert PyArrow schemas to Polars or Pandas-compatible schemas.
- Dataset Loading: Load datasets from CSV files into PyArrow tables, ensuring they match the defined schema.
- Optional Dependencies: Support for Polars and Pandas as optional dependencies for schema conversion.
Installation
Install the package using pip:
bash
pip install omop_schema
To include optional dependencies:
bash
pip install omop_schema[polars]
pip install omop_schema[pandas]
pip install omop_schema[polars,pandas]
Usage
1. Define and Retrieve OMOP Schemas
The package provides predefined schemas for OMOP CDM tables. You can retrieve the schema for a specific table:
```python from omop_schema.schema.v54 import OMOPSchemaV54
schemav54 = OMOPSchemaV54() conceptschema = schemav54.getpyarrowschema("concept") print(conceptschema) ```
2. Load Datasets
You can load datasets from a folder containing CSV files. The files are matched to the predefined schemas:
```python from omop_schema.schema.v54 import OMOPSchemaV54
schemav54 = OMOPSchemaV54() datasets = schemav54.loadcsvdataset("path/to/csv/folder")
Access a specific table
concepttable = datasets["concept"] print(concepttable) ```
3. Convert PyArrow Schema to Polars Schema
If Polars is installed, you can convert a PyArrow schema to a Polars-compatible schema:
```python from omopschema.utils import pyarrowtopolarsschema import pyarrow as pa
arrow_schema = pa.schema( [ pa.field("column1", pa.int64()), pa.field("column2", pa.string()), ] )
polarsschema = pyarrowtopolarsschema(arrowschema) print(polarsschema) ```
4. Convert PyArrow Schema to Pandas Schema
If Pandas is installed, you can convert a PyArrow schema to a Pandas-compatible schema:
```python from omopschema.utils import pyarrowtopandasschema import pyarrow as pa
arrow_schema = pa.schema( [ pa.field("column1", pa.int64()), pa.field("column2", pa.string()), ] )
pandasschema = pyarrowtopandasschema(arrowschema) print(pandasschema) ```
Optional Dependencies
- Polars: For converting PyArrow schemas to Polars schemas.
- Pandas: For converting PyArrow schemas to Pandas schemas.
OMOP Dataset Validation - Graphical Method
This documentation provides an overview of how to use the validate_omop_dataset_graphically function to validate OMOP datasets and display the results in a graphical format.
The validate_omop_dataset_graphically function validates an OMOP dataset against a predefined schema and displays the results in a graphical table format using the rich library. It also logs the validation process for real-time feedback.
python
validate_omop_dataset_graphically(validator, dataset_path="path/to/dataset")
The function will:
- Log the validation progress in real-time.
- Display a graphical table summarizing:
- Missing columns
- Mismatched columns
- Extra columns
- Correct columns
Example Output
Console Log
plaintext
2023-10-01 12:00:00 - INFO - Validating table: person
2023-10-01 12:00:01 - INFO - Validation results for table 'person': Missing: month_of_birth, Mismatched: None, Extra: extra_column_1, Correct: person_id, gender_concept_id
2023-10-01 12:00:01 - INFO - Validation process completed.
Graphical Table
+-------------+-----------------------------+-----------------------------------+----------------------+----------------------+
| Table Name | Missing Columns | Mismatched Columns | Extra Columns | Correct Columns |
+-------------+-----------------------------+-----------------------------------+----------------------+----------------------+
| person | month_of_birth: int64 | None | extra_column_1: str | person_id: int64 |
| | day_of_birth: int64 | | | gender_concept_id: |
| | | | | int64 |
+-------------+-----------------------------+-----------------------------------+----------------------+----------------------+
Parameters
| Parameter | Type | Description |
| --------------------------- | --------------- | ------------------------------------------------------------------------ |
| validator | OMOPValidator | The validator object containing the schema for validation. |
| dataset_path | `str | Path` |
| load_with_expected_schema | bool | Whether to load the dataset with the expected schema. Default is True. |
Notes
- Ensure the dataset files are in a supported format (e.g.,
.csv,.parquet). - The function uses the
richlibrary for graphical output andloggingfor real-time feedback. - Missing or mismatched columns are highlighted in the output for easy identification.
Troubleshooting
- Missing Dependencies: Install the required libraries using
pip install rich. - Invalid Dataset Path: Ensure the dataset path is correct and accessible.
- Schema Issues: Verify the schema file is correctly defined and matches the dataset structure.
Contributing
Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Owner
- Name: Robin van de Water
- Login: rvandewater
- Kind: user
- Location: Berlin
- Company: Hasso Plattner Institute
- Website: https://www.rpvandewater.com/
- Repositories: 1
- Profile: https://github.com/rvandewater
PhD student in Medical Event Prediction at Hasso Plattner Institute in collaboration with the Charité hospital (Berlin)
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "omop_schema: a package for working with the OMOP CDM in Python"
doi: "10.5281/zenodo.15132444"
authors:
- family-names: "van de Water"
given-names: "Robin P."
orcid: "https://orcid.org/0000-0002-2895-4872"
date-released: "2025-02-19"
url: "https://github.com/rvandewater/omop_schema"
repository-code: "https://github.com/rvandewater/omop_schema"
license: "MIT"
GitHub Events
Total
- Release event: 2
- Public event: 1
- Push event: 50
- Create event: 4
Last Year
- Release event: 2
- Public event: 1
- Push event: 50
- Create event: 4