https://github.com/cancervariants/therapy-normalization

Services and guidelines for normalizing drug and other therapy terms

https://github.com/cancervariants/therapy-normalization

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    3 of 13 committers (23.1%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords

bioinformatics bioinformatics-data precision-medicine

Keywords from Contributors

biomedical-informatics genetics genomics obofoundry disease-classification labels
Last synced: 6 months ago · JSON representation

Repository

Services and guidelines for normalizing drug and other therapy terms

Basic Info
Statistics
  • Stars: 12
  • Watchers: 4
  • Forks: 3
  • Open Issues: 54
  • Releases: 32
Topics
bioinformatics bioinformatics-data precision-medicine
Created over 6 years ago · Last pushed 7 months ago
Metadata Files
Readme License

README.md

Thera-Py

image image image Actions status DOI: 10.1093/jamiaopen/ooad093

Thera-Py normalizes free-text names and references for drugs and other biomedical therapeutics to stable, unambiguous concept identifiers to support genomic knowledge harmonization. <!-- /description -->


Live OpenAPI service


Installation

Install from PyPI:

shell python3 -m pip install thera-py

Usage

Deploying DynamoDB Locally

We use Amazon DynamoDB for data storage. To deploy locally, follow these instructions.

Setting Environment Variables

RxNorm requires a UMLS license, which you can register for one here. You must set the UMLS_API_KEY environment variable to your API key. This can be found in the UTS 'My Profile' area after singing in.

shell script export UMLS_API_KEY=12345-6789-abcdefg-hijklmnop # make sure to replace with your key!

HemOnc.org data requires a Harvard Dataverse API token. You must create a user account on the Harvard Dataverse website, you can follow these instructions to create an account and generate an API token. Once you have an API token, set the following environment variable:

shell script export HARVARD_DATAVERSE_API_KEY=12345-6789-abcdefgh-hijklmnop # make sure to replace with your key!

Update source(s)

The Therapy Normalizer currently aggregates therapy data from: * ChEMBL * ChemIDPlus * DrugBank (using CC0 data only) * Drugs@FDA * The IUPHAR/BPS Guide to PHARMACOLOGY * HemOnc.org (using CC-BY data only). * The National Cancer Institute Thesaurus * RxNorm * Wikidata

Direct data management requires installation of the etl dependency group:

shell python3 -m pip install 'thera-py[etl]'

To update source(s), pass them as arguments to the command thera-py update. For example, the following command updates ChEMBL and Wikidata:

commandline thera-py update chembl wikidata

You can update all sources at once with the --all flag:

commandline thera-py update --all

Thera-Py can retrieve all required data itself, using the wags-tails library. By default, data will be housed under ~/.local/share/wags_tails/ in a format like the following:

~/.local/share/wags_tails ├── chembl │ └── chembl_27.db ├── chemidplus │ └── chemidplus_20200327.xml ├── drugbank │ └── drugbank_5.1.8.csv ├── guidetopharmacology │ ├── guidetopharmacology_ligand_id_mapping_2021.3.tsv │ └── guidetopharmacology_ligands_2021.3.tsv ├── hemonc │ ├── hemonc_concepts_20210225.csv │ ├── hemonc_rels_20210225.csv │ └── hemonc_synonyms_20210225.csv ├── ncit │ └── ncit_20.09d.owl ├── rxnorm │ ├── rxnorm_drug_forms_20210104.yaml │ └── rxnorm_20210104.RRF └── wikidata └── wikidata_20210425.json

Updates to the HemOnc source depend on the Disease Normalizer service. If the Disease Normalizer database appears to be empty or incomplete, updates to HemOnc will also trigger a refresh of the Disease Normalizer database. See its README for additional data requirements.

Create Merged Concept Groups

The /normalize endpoint relies on merged concept groups. The --normalize flag generates these groups:

commandline thera-py update --normalize

Specifying the database URL endpoint

The default URL endpoint is http://localhost:8000. There are two different ways to specify the database URL endpoint.

The first way is to set the --db_url flag to the URL endpoint. commandline thera-py update --all --db_url=http://localhost:8001

The second way is to set the environment variable THERAPY_NORM_DB_URL to the URL endpoint. commandline export THERAPY_NORM_DB_URL="http://localhost:8001" thera-py update --all

Starting the therapy normalization service

From the project root, run the following:

commandline uvicorn therapy.main:app --reload

Next, view the OpenAPI docs on your local machine:

http://127.0.0.1:8000/therapy

FAQ

A data import method raised a SourceFormatError instance. How do I proceed?

TheraPy will automatically try to acquire the latest version of data for each source, but sometimes, sources alter the structure of their data (e.g. adding or removing CSV columns). If you encounter a SourceFormatException while importing data, please notify us by creating a new issue if one doesn't already exist, and we will attempt to resolve it.

In the meantime, you can force TheraPy to use an older data release by removing the incompatible version from the source data folder, manually downloading and replacing it with an older version of the data per the structure described above, and calling the CLI with the --use_existing argument.

Citation

If you use Thera-Py in scientific works, please cite the following article:

Matthew Cannon, James Stevenson, Kori Kuzma, Susanna Kiwala, Jeremy L Warner, Obi L Griffith, Malachi Griffith, Alex H Wagner, Normalization of drug and therapeutic concepts with Thera-Py, JAMIA Open, Volume 6, Issue 4, December 2023, ooad093, https://doi.org/10.1093/jamiaopen/ooad093

Development

Clone the repo and create a virtual environment:

shell git clone https://github.com/cancervariants/therapy-normalization cd therapy-normalization python3 -m virtualenv venv source venv/bin/activate

Install development dependencies and pre-commit:

shell python3 -m pip install -e '.[dev,tests]' pre-commit install

Check style with ruff:

shell python3 -m ruff format . && python3 -m ruff check --fix .

Run tests with pytest:

commandline pipenv run pytest

By default, tests will employ an existing DynamoDB database. For test environments where this is unavailable (e.g. in CI), the THERAPY_TEST environment variable can be set to initialize a local DynamoDB instance with miniature versions of input data files before tests are executed.

commandline export THERAPY_TEST=true

Sometimes, sources will update their data, and our test fixtures and data will become incorrect. The tests/scripts/ subdirectory includes scripts to rebuild data files, although most fixtures will need to be updated manually.

Owner

  • Name: VICC
  • Login: cancervariants
  • Kind: organization

The Variant Interpretation for Cancer Consortium

GitHub Events

Total
  • Create event: 39
  • Release event: 6
  • Issues event: 20
  • Watch event: 2
  • Delete event: 26
  • Issue comment event: 28
  • Push event: 71
  • Pull request review comment event: 10
  • Pull request review event: 43
  • Pull request event: 60
Last Year
  • Create event: 39
  • Release event: 6
  • Issues event: 20
  • Watch event: 2
  • Delete event: 26
  • Issue comment event: 28
  • Push event: 71
  • Pull request review comment event: 10
  • Pull request review event: 43
  • Pull request event: 60

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 1,305
  • Total Committers: 13
  • Avg Commits per committer: 100.385
  • Development Distribution Score (DDS): 0.285
Top Committers
Name Email Commits
James Stevenson j****n@n****g 933
korikuzma k****a@g****m 287
Alex H. Wagner, PhD A****r@n****g 32
Alex H. Wagner, PhD a****4@w****u 17
Kori Kuzma 4****a@u****m 8
Susanna Kiwala s****a@w****u 8
mcannon068nw M****2@n****g 8
dependabot[bot] 4****]@u****m 4
Alex H. Wagner, PhD a@a****o 3
Jeremy Warner j****r@v****g 2
Kuzma k****2@r****g 1
James Stevenson j****n@g****m 1
Brian Walsh w****r@o****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 94
  • Total pull requests: 149
  • Average time to close issues: 7 months
  • Average time to close pull requests: 10 days
  • Total issue authors: 5
  • Total pull request authors: 4
  • Average comments per issue: 0.7
  • Average comments per pull request: 0.25
  • Merged pull requests: 126
  • Bot issues: 0
  • Bot pull requests: 7
Past Year
  • Issues: 13
  • Pull requests: 50
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 5 days
  • Issue authors: 3
  • Pull request authors: 2
  • Average comments per issue: 0.69
  • Average comments per pull request: 0.22
  • Merged pull requests: 41
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jsstevenson (58)
  • korikuzma (23)
  • ahwagner (5)
  • mcannon068nw (4)
  • wesleygoar (1)
Pull Request Authors
  • jsstevenson (131)
  • korikuzma (42)
  • dependabot[bot] (7)
  • bwalsh (1)
Top Labels
Issue Labels
bug (22) enhancement (11) priority:high (11) data-cleaning (11) data-structure (9) priority:low (8) ux (8) priority:medium (7) test (7) new-source (7) stale-exempt (4) documentation (4) performance (4) RxNorm (4) build (3) analysis (2) ChemIDplus (2) DrugBank (2) technical debt (2) Wikidata (2) duplicate (1) wontfix (1) GuideToPharmacology (1) ChEMBL (1) NCIt (1) requirement (1) cleanup (1)
Pull Request Labels
priority:low (80) priority:medium (30) priority:high (20) bug (18) build (15) enhancement (6) documentation (6) cleanup (5) dependencies (4) chore (3) test (2) ci/cd (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,931 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 2
  • Total versions: 43
  • Total maintainers: 3
pypi.org: thera-py

VICC normalization routines for therapeutics

  • Homepage: https://github.com/cancervariants/therapy-normalization
  • Documentation: https://github.com/cancervariants/therapy-normalization
  • License: MIT License Copyright (c) 2020-2024 VICC Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 0.11.0
    published 7 months ago
  • Versions: 43
  • Dependent Packages: 1
  • Dependent Repositories: 2
  • Downloads: 1,931 Last month
Rankings
Dependent packages count: 4.8%
Dependent repos count: 11.5%
Average: 14.0%
Forks count: 16.9%
Downloads: 18.3%
Stargazers count: 18.5%
Maintainers (3)
Last synced: 6 months ago