hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`

https://github.com/biocommons/hgvs

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    2 of 41 committers (4.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary

Keywords

bioinformatics genome-analysis genomics sequencing variant-analysis variation
Last synced: 6 months ago · JSON representation

Repository

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`

Basic Info
  • Host: GitHub
  • Owner: biocommons
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://hgvs.readthedocs.io/
  • Size: 23 MB
Statistics
  • Stars: 274
  • Watchers: 19
  • Forks: 97
  • Open Issues: 52
  • Releases: 2
Topics
bioinformatics genome-analysis genomics sequencing variant-analysis variation
Created almost 9 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation Codeowners Authors

README.md

hgvs - manipulate biological sequence variants according to Human Genome Variation Society recommendations

The hgvs package provides a Python library to parse, format, validate, normalize, and map sequence variants according to Variation Nomenclature (aka Human Genome Variation Society) recommendations.

Specifically, the hgvs package focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package does not attempt to cover the full scope of HGVS recommendations. Please refer to issues for limitations.

Information

rtd changelog getting_help GitHub license binder

Latest Release

GitHub tag pypi_rel

Development

coveralls issues GitHub Open Pull Requests GitHub license GitHub stars GitHub forks

Features

  • Parsing is based on formal grammar.
  • An easy-to-use object model that represents most variant types (SNVs, indels, dups, inversions, etc) and concepts (intronic offsets, uncertain positions, intervals)
  • A variant normalizer that rewrites variants in canonical forms and substitutes reference sequences (if reference and transcript sequences differ)
  • Formatters that generate HGVS strings from internal representations
  • Tools to map variants between genome, transcript, and protein sequences
  • Reliable handling of regions genome-transcript discrepancies
  • Pluggable data providers support alternative sources of transcript mapping data
  • Extensive automated tests, including those for all variant types and \"problematic\" transcripts
  • Easily installed using remote data sources. Installation with local data sources is straightforward and completely obviates network access

Citation

Wang M, Callenberg KM, Dalgleish R, Fedtsov A, Fox N, Freeman PJ, et al.
hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update.
Hum Mutat. 2018. doi:10.1002/humu.23615

Important Notes

  • You are encouraged to browse issues. All known issues are listed there. Please report any issues you find.
  • Use a pip package specification to stay within minor releases. For example, hgvs>=1.5,<1.6. hgvs uses Semantic Versioning.

Installing HGVS Locally

Important: For more detailed installation and configuration instructions, see the HGVS readthedocs

Prerequisites

libpq
python3
postgresql

Examples for installation:

MacOS :

brew install libpq
brew install python3
brew install postgresql@14

Ubuntu :

sudo apt install gcc libpq-dev python3-dev

Installation Steps

By default, hgvs uses remote data sources, which makes installation easy. If you would like to use local instances of the data sources, see the readthedocs.

  1. Create a virtual environment using your preferred method.

    Example:

    python3 -m venv venv
    
  2. Run the following commands in your virtual environment:

    source venv/bin/activate
    pip install --upgrade setuptools
    pip install hgvs
    

See Installation instructions for details, including instructions for installing Universal Transcript Archive (UTA) and SeqRepo locally.

Examples and Usage

See examples and readthedocs for usage.

Contributing

The hgvs package is a community effort. Please see Contributing to get started in submitting source code, tests, or documentation. Thanks for getting involved!

Testing

Existing tests use a cache that is committed with the repo to ensure that tests do not require external networking. To develop new tests, which requires loading the cache, you should install UTA and Seqrepo (and the rest service) locally.

docker compose --project-name biocommons -f $PWD/misc/docker-compose.yml up

IMPORTANT: Loading the test caches is currently hampered b #551, #760, and #761. To load reliably, use make test-relearn-iteratively for now.

See Also

Other packages that manipulate HGVS variants:

Owner

  • Name: biocommons
  • Login: biocommons
  • Kind: organization

a collection of open source bioinformatics tools

GitHub Events

Total
  • Create event: 13
  • Release event: 2
  • Issues event: 31
  • Watch event: 27
  • Delete event: 14
  • Issue comment event: 90
  • Push event: 54
  • Pull request review comment event: 16
  • Pull request review event: 35
  • Pull request event: 39
  • Fork event: 3
Last Year
  • Create event: 13
  • Release event: 2
  • Issues event: 31
  • Watch event: 27
  • Delete event: 14
  • Issue comment event: 90
  • Push event: 54
  • Pull request review comment event: 16
  • Pull request review event: 35
  • Pull request event: 39
  • Fork event: 3

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 1,491
  • Total Committers: 41
  • Avg Commits per committer: 36.366
  • Development Distribution Score (DDS): 0.317
Past Year
  • Commits: 75
  • Committers: 7
  • Avg Commits per committer: 10.714
  • Development Distribution Score (DDS): 0.6
Top Committers
Name Email Commits
Reece Hart r****t@g****m 1,018
Meng w****5@g****m 160
Rudy Rico r****o@i****m 116
Vincent Fusaro v****o@i****m 36
Katie Stahl k****l@n****g 30
Reece Hart r****e@b****g 23
Lucas Wiman l****n@g****m 19
Robert Queenin 2****a 19
Dave Lawrence d****w@g****m 15
Piotr Kaleta p****a@g****m 9
Alexandre Fedtsov s****r 5
Dimitris Iliopoulos d****s@p****m 3
Kevin Jacobs j****s@b****m 3
jdasilva-invitae j****a@i****m 3
Alan Rubin a****n@w****u 2
Andreas Prlic a****c@i****m 2
Emanuel Langit e****t@f****m 2
Jai Pradeesh j****h@g****m 2
Jake Peacock j****k@g****m 2
Andreas Prlic a****c@g****m 1
Andy McMurry (AndyMC) a****c@a****g 1
pjcoenen 6****n 1
The Gitter Badger b****r@g****m 1
Liang Chen c****c 1
Kori Kuzma 4****a 1
Keith Callenberg k****g 1
Andreas Prlic 3****e 1
Ben Robinson b****3@g****m 1
Caitlin Gong c****o@g****m 1
Caitlin Gong c****g@v****u 1
and 11 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 179
  • Total pull requests: 120
  • Average time to close issues: about 3 years
  • Average time to close pull requests: 3 months
  • Total issue authors: 54
  • Total pull request authors: 28
  • Average comments per issue: 3.12
  • Average comments per pull request: 2.78
  • Merged pull requests: 76
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 18
  • Pull requests: 28
  • Average time to close issues: 13 days
  • Average time to close pull requests: 14 days
  • Issue authors: 10
  • Pull request authors: 7
  • Average comments per issue: 0.56
  • Average comments per pull request: 0.93
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • reece (78)
  • davmlaw (18)
  • andreasprlic (6)
  • wlymanambry (5)
  • nh13 (4)
  • b0d0nne11 (4)
  • mihaitodor (4)
  • holtgrewe (3)
  • akeeeshi (3)
  • wentgithub (3)
  • AgustinRamiroDiaz (3)
  • worker000000 (3)
  • sptaylor (2)
  • cassiemk (2)
  • ecalifornica (2)
Pull Request Authors
  • davmlaw (22)
  • ecalifornica (20)
  • jsstevenson (12)
  • reece (11)
  • b0d0nne11 (10)
  • andreasprlic (5)
  • korikuzma (5)
  • ccaitlingo (5)
  • csw (2)
  • jdasilva-invitae (2)
  • dmyersturnbull (2)
  • jPleyte (2)
  • katiestahl (2)
  • markgene (2)
  • Estefanos8080 (2)
Top Labels
Issue Labels
stale (61) closed-by-stale (59) enhancement (31) bug (22) resurrected (20) keep alive (17) Invitae (4) question (3) duplicate (1) won't fix (1) project proposal (1) 2.0 goal (1) not-a-bug (1) RFC/proposal (1) task (1) trivial (1) mapper (1) minor (1) Epic (1) wontfix (1) critical (1)
Pull Request Labels
stale (16) closed-by-stale (16) resurrected (10) keep alive (10)

Dependencies

setup.py pypi
  • attrs >=17.4.0
  • biocommons.seqrepo >=0.6.5
  • bioutils >=0.4.0,<1.0
  • configparser >=3.3.0
  • ipython *
  • parsley *
  • psycopg2 *
  • six *
.github/workflows/python-package.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
misc/docker-compose.yml docker
  • biocommons/seqrepo latest
  • biocommons/seqrepo-rest-service latest
  • biocommons/uta uta_20180821
pyproject.toml pypi