https://github.com/cancervariants/therapy-normalization
Services and guidelines for normalizing drug and other therapy terms
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
3 of 13 committers (23.1%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Services and guidelines for normalizing drug and other therapy terms
Basic Info
- Host: GitHub
- Owner: cancervariants
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://normalize.cancervariants.org/therapy/
- Size: 3.36 MB
Statistics
- Stars: 12
- Watchers: 4
- Forks: 3
- Open Issues: 54
- Releases: 32
Topics
Metadata Files
README.md
Thera-Py
Thera-Py normalizes free-text names and references for drugs and other biomedical therapeutics to stable, unambiguous concept identifiers to support genomic knowledge harmonization.
<!-- /description -->
Installation
Install from PyPI:
shell
python3 -m pip install thera-py
Usage
Deploying DynamoDB Locally
We use Amazon DynamoDB for data storage. To deploy locally, follow these instructions.
Setting Environment Variables
RxNorm requires a UMLS license, which you can register for one here.
You must set the UMLS_API_KEY environment variable to your API key. This can be found in the UTS 'My Profile' area after singing in.
shell script
export UMLS_API_KEY=12345-6789-abcdefg-hijklmnop # make sure to replace with your key!
HemOnc.org data requires a Harvard Dataverse API token. You must create a user account on the Harvard Dataverse website, you can follow these instructions to create an account and generate an API token. Once you have an API token, set the following environment variable:
shell script
export HARVARD_DATAVERSE_API_KEY=12345-6789-abcdefgh-hijklmnop # make sure to replace with your key!
Update source(s)
The Therapy Normalizer currently aggregates therapy data from: * ChEMBL * ChemIDPlus * DrugBank (using CC0 data only) * Drugs@FDA * The IUPHAR/BPS Guide to PHARMACOLOGY * HemOnc.org (using CC-BY data only). * The National Cancer Institute Thesaurus * RxNorm * Wikidata
Direct data management requires installation of the etl dependency group:
shell
python3 -m pip install 'thera-py[etl]'
To update source(s), pass them as arguments to the command thera-py update. For example, the following command updates ChEMBL and Wikidata:
commandline
thera-py update chembl wikidata
You can update all sources at once with the --all flag:
commandline
thera-py update --all
Thera-Py can retrieve all required data itself, using the wags-tails library. By default, data will be housed under ~/.local/share/wags_tails/ in a format like the following:
~/.local/share/wags_tails
├── chembl
│ └── chembl_27.db
├── chemidplus
│ └── chemidplus_20200327.xml
├── drugbank
│ └── drugbank_5.1.8.csv
├── guidetopharmacology
│ ├── guidetopharmacology_ligand_id_mapping_2021.3.tsv
│ └── guidetopharmacology_ligands_2021.3.tsv
├── hemonc
│ ├── hemonc_concepts_20210225.csv
│ ├── hemonc_rels_20210225.csv
│ └── hemonc_synonyms_20210225.csv
├── ncit
│ └── ncit_20.09d.owl
├── rxnorm
│ ├── rxnorm_drug_forms_20210104.yaml
│ └── rxnorm_20210104.RRF
└── wikidata
└── wikidata_20210425.json
Updates to the HemOnc source depend on the Disease Normalizer service. If the Disease Normalizer database appears to be empty or incomplete, updates to HemOnc will also trigger a refresh of the Disease Normalizer database. See its README for additional data requirements.
Create Merged Concept Groups
The /normalize endpoint relies on merged concept groups. The --normalize flag generates these groups:
commandline
thera-py update --normalize
Specifying the database URL endpoint
The default URL endpoint is http://localhost:8000.
There are two different ways to specify the database URL endpoint.
The first way is to set the --db_url flag to the URL endpoint.
commandline
thera-py update --all --db_url=http://localhost:8001
The second way is to set the environment variable THERAPY_NORM_DB_URL to the URL endpoint.
commandline
export THERAPY_NORM_DB_URL="http://localhost:8001"
thera-py update --all
Starting the therapy normalization service
From the project root, run the following:
commandline
uvicorn therapy.main:app --reload
Next, view the OpenAPI docs on your local machine:
http://127.0.0.1:8000/therapy
FAQ
A data import method raised a SourceFormatError instance. How do I proceed?
TheraPy will automatically try to acquire the latest version of data for each source, but sometimes, sources alter the structure of their data (e.g. adding or removing CSV columns). If you encounter a SourceFormatException while importing data, please notify us by creating a new issue if one doesn't already exist, and we will attempt to resolve it.
In the meantime, you can force TheraPy to use an older data release by removing the incompatible version from the source data folder, manually downloading and replacing it with an older version of the data per the structure described above, and calling the CLI with the --use_existing argument.
Citation
If you use Thera-Py in scientific works, please cite the following article:
Matthew Cannon, James Stevenson, Kori Kuzma, Susanna Kiwala, Jeremy L Warner, Obi L Griffith, Malachi Griffith, Alex H Wagner, Normalization of drug and therapeutic concepts with Thera-Py, JAMIA Open, Volume 6, Issue 4, December 2023, ooad093, https://doi.org/10.1093/jamiaopen/ooad093
Development
Clone the repo and create a virtual environment:
shell
git clone https://github.com/cancervariants/therapy-normalization
cd therapy-normalization
python3 -m virtualenv venv
source venv/bin/activate
Install development dependencies and pre-commit:
shell
python3 -m pip install -e '.[dev,tests]'
pre-commit install
Check style with ruff:
shell
python3 -m ruff format . && python3 -m ruff check --fix .
Run tests with pytest:
commandline
pipenv run pytest
By default, tests will employ an existing DynamoDB database. For test environments where this is unavailable (e.g. in CI), the THERAPY_TEST environment variable can be set to initialize a local DynamoDB instance with miniature versions of input data files before tests are executed.
commandline
export THERAPY_TEST=true
Sometimes, sources will update their data, and our test fixtures and data will become incorrect. The tests/scripts/ subdirectory includes scripts to rebuild data files, although most fixtures will need to be updated manually.
Owner
- Name: VICC
- Login: cancervariants
- Kind: organization
- Website: http://cancervariants.org
- Repositories: 14
- Profile: https://github.com/cancervariants
The Variant Interpretation for Cancer Consortium
GitHub Events
Total
- Create event: 39
- Release event: 6
- Issues event: 20
- Watch event: 2
- Delete event: 26
- Issue comment event: 28
- Push event: 71
- Pull request review comment event: 10
- Pull request review event: 43
- Pull request event: 60
Last Year
- Create event: 39
- Release event: 6
- Issues event: 20
- Watch event: 2
- Delete event: 26
- Issue comment event: 28
- Push event: 71
- Pull request review comment event: 10
- Pull request review event: 43
- Pull request event: 60
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 1,305
- Total Committers: 13
- Avg Commits per committer: 100.385
- Development Distribution Score (DDS): 0.285
Top Committers
| Name | Commits | |
|---|---|---|
| James Stevenson | j****n@n****g | 933 |
| korikuzma | k****a@g****m | 287 |
| Alex H. Wagner, PhD | A****r@n****g | 32 |
| Alex H. Wagner, PhD | a****4@w****u | 17 |
| Kori Kuzma | 4****a@u****m | 8 |
| Susanna Kiwala | s****a@w****u | 8 |
| mcannon068nw | M****2@n****g | 8 |
| dependabot[bot] | 4****]@u****m | 4 |
| Alex H. Wagner, PhD | a@a****o | 3 |
| Jeremy Warner | j****r@v****g | 2 |
| Kuzma | k****2@r****g | 1 |
| James Stevenson | j****n@g****m | 1 |
| Brian Walsh | w****r@o****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 94
- Total pull requests: 149
- Average time to close issues: 7 months
- Average time to close pull requests: 10 days
- Total issue authors: 5
- Total pull request authors: 4
- Average comments per issue: 0.7
- Average comments per pull request: 0.25
- Merged pull requests: 126
- Bot issues: 0
- Bot pull requests: 7
Past Year
- Issues: 13
- Pull requests: 50
- Average time to close issues: about 1 month
- Average time to close pull requests: 5 days
- Issue authors: 3
- Pull request authors: 2
- Average comments per issue: 0.69
- Average comments per pull request: 0.22
- Merged pull requests: 41
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jsstevenson (58)
- korikuzma (23)
- ahwagner (5)
- mcannon068nw (4)
- wesleygoar (1)
Pull Request Authors
- jsstevenson (131)
- korikuzma (42)
- dependabot[bot] (7)
- bwalsh (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 1,931 last-month
- Total dependent packages: 1
- Total dependent repositories: 2
- Total versions: 43
- Total maintainers: 3
pypi.org: thera-py
VICC normalization routines for therapeutics
- Homepage: https://github.com/cancervariants/therapy-normalization
- Documentation: https://github.com/cancervariants/therapy-normalization
- License: MIT License Copyright (c) 2020-2024 VICC Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
Latest release: 0.11.0
published 7 months ago