rtx-kg2-gateway
Enabling RTX-KG2 data access through various means.
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization cu-dbmi has institutional domain (medschool.cuanschutz.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary
Repository
Enabling RTX-KG2 data access through various means.
Basic Info
- Host: GitHub
- Owner: CU-DBMI
- License: bsd-3-clause
- Language: Jupyter Notebook
- Default Branch: main
- Size: 653 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 2
- Releases: 1
Metadata Files
README.md
RTX-KG2 Gateway
Enabling RTX-KG2 data access through various means.
Overview
RTX-KG2 provides a knowledge graph composed of many different data sources. The output data from the RTX-KG2 project can benefit from the use of additional specialized graph database tools for analysis purposes. Please find a brief overview of these technologies below for a better understanding of how they're used in context with the RTX-KG2 data.
Graph Database Technologies
- Kuzu: Kuzu is an embeddable property graph database system which provides querying capabilities through Cypher. Kuzu includes a Python package and related API which enables local queries.
- See rtx-kg2-gateway-kuzu-database-details.md for more information on the database schema and data.
Installation
Python
Usage of the contents found within this repository depend on Python being available on the system.
One suggested way to use and manage Python is through pyenv (there are many other ways too!).
Please reference the pyproject.toml file for more information on Python versions which are compatible with this project.
Poetry environment
Please use Python poetry to run and install a Python environment related to this project.
The Poetry environment for this project includes dependencies which help run IDE environments, manage the data, and run workflows.
See here for more information about installing Poetry within your environment.
```bash
context: within the root of the repository
after installing poetry, create the environment
poetry install ```
Development
Running and updating Jupyter notebooks
Please follow installation steps above and then use a related Jupyter environment to open and explore the notebooks under the notebooks directory.
These notebooks leverage Jupyter Lab extensions (such as jupytext) through the related Poetry environment for this repository.
Usage of the notebooks outside of Jupyter Lab as an IDE may have varied experiences.
```bash
context: within the root of the repository
after creating poetry environment, run jupyter
poetry run jupyter lab ```
Executing sequences of Python modules as tasks
We use Poe the Poet to define and run tasks defined within pyproject.toml under the section [tool.poe.tasks*].
This allows for the definition and use of a task workflow when implementing multiple procedures in sequence.
For example, use the following to run the notebook_sample_data_generation task:
```bash
context: within the root of the repository
run data_prep task using poethepoet defined within pyproject.toml
poetry run poe notebooksampledata_generation ```
Existing tasks:
notebook_sample_data_generation: generates a sample parquet dataset and adds to a kuzu database.notebook_full_data_generation: generates full dataset and adds to a kuzu database.notebook_full_data_generation_with_metanames: generates full dataset with metanames specificity and adds to a kuzu database in similar fashion.
Citation and Acknowledgements
Data used by this repo includes RTX-KG2 which was published at the NCATS Biomedical Data Translator repository. Special thanks goes to those mentioned in the RTX-KG2 credits. Further data acknowledgments may be found within the data sources documentation.
Owner
- Name: University of Colorado Department of Biomedical Informatics
- Login: CU-DBMI
- Kind: organization
- Location: University of Colorado, School of Medicine, Anschutz Medical Campus
- Website: https://medschool.cuanschutz.edu/dbmi
- Repositories: 34
- Profile: https://github.com/CU-DBMI
Citation (CITATION.cff)
# CITATION.cff file for software work and references.
---
cff-version: 1.2.0
title: rtx-kg2-gateway
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: David
family-names: Bunten
orcid: 'https://orcid.org/0000-0001-6041-3665'
- given-names: Negar
family-names: Janani
orcid: 'https://orcid.org/0009-0000-7308-926X'
- given-names: Faisal
family-names: Alquaddoomi
orcid: 'https://orcid.org/0000-0003-4297-8747'
- given-names: Casey
family-names: Greene
orcid: 'https://orcid.org/0000-0001-8713-9213'
repository-code: 'https://github.com/CU-DBMI/rtx-kg2-gateway'
url: 'https://github.com/CU-DBMI/rtx-kg2-gateway'
abstract: >-
Enabling RTX-KG2 data access through various means.
keywords:
- python
- knowledge-graph
- rtx-kg2
license: BSD-3-Clause
references:
- title: >-
RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine
type: article
url: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04932-3
authors:
- family-names: Wood
given-names: E. C.
- family-names: Glen
given-names: Amy K.
- family-names: Kvarfordt
given-names: Lindsey G.
- family-names: Womack
given-names: Finn
- family-names: Acevedo
given-names: Liliana
- family-names: Yoon
given-names: Timothy S.
- family-names: Ma
given-names: Chunyu
- family-names: Flores
given-names: Veronica
- family-names: Sinha
given-names: Meghamala
- family-names: Chodpathumwan
given-names: Yodsawalai
- family-names: Termehchy
given-names: Arash
- family-names: Roach
given-names: Jared C.
- family-names: Mendoza
given-names: Luis
- family-names: Hoffman
given-names: Andrew S.
- family-names: Deutsch
given-names: Eric W.
- family-names: Koslicki
given-names: David
- family-names: Ramsey
given-names: Stephen A.
date-released: 2022-09-29
identifiers:
- type: doi
value: 10.1186/s12859-022-04932-3
notes: >-
RTX-KG2 data is referenced from https://github.com/ncats/translator-lfs-artifacts as part of this project.
- title: >-
Kùzu Graph Database Management System
type: software
url: https://github.com/kuzudb/kuzu
authors:
- family-names: Feng
given-names: Xiyang
- family-names: Jin
given-names: Guodong
- family-names: Chen
given-names: Ziyi
- family-names: Liu
given-names: Chang
- family-names: Salihoğlu
given-names: Semih
- title: DuckDB
type: software
url: https://github.com/duckdb/duckdb
authors:
- family-names: Raasveldt
given-names: Mark
- family-names: Muehleisen
given-names: Hannes
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- pre-commit/action v3.0.0 composite
- 148 dependencies
- black ^24.0.1 develop
- isort ^5.13.0 develop
- jupyterlab ^4.0.0 develop
- jupyterlab-code-formatter ^2.2.1 develop
- jupytext ^1.16.0 develop
- awkward ^2.6.0
- duckdb ^0.10.0
- genson ^1.2.2
- ijson ^3.2.3
- kuzu ^0.3.0
- pandas ^2.2.0
- poethepoet ^0.24.4
- poetry ^1.7.1
- pyarrow ^15.0.0
- python >=3.10,<3.13
- requests ^2.31.0
- tabulate ^0.9.0