ncbitax2lin

🐞 Convert NCBI taxonomy dump into lineages

https://github.com/zyxue/ncbitax2lin

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, sciencedirect.com, nature.com, mdpi.com
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.1%) to scientific vocabulary

Keywords

lineage ncbi ncbi-taxonomy pandas python taxdump taxonomy
Last synced: 6 months ago · JSON representation

Repository

🐞 Convert NCBI taxonomy dump into lineages

Basic Info
  • Host: GitHub
  • Owner: zyxue
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 395 KB
Statistics
  • Stars: 145
  • Watchers: 7
  • Forks: 30
  • Open Issues: 3
  • Releases: 0
Topics
lineage ncbi ncbi-taxonomy pandas python taxdump taxonomy
Created almost 10 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License

README.md

NCBItax2lin

Downloads

Convert NCBI taxonomy dump into lineages. An example for human (tax_id=9606) is like

| tax_id | superkingdom | phylum | class | order | family | genus | species | family1 | forma | genus1 | infraclass | infraorder | kingdom | no rank | no rank1 | no rank10 | no rank11 | no rank12 | no rank13 | no rank14 | no rank15 | no rank16 | no rank17 | no rank18 | no rank19 | no rank2 | no rank20 | no rank21 | no rank22 | no rank3 | no rank4 | no rank5 | no rank6 | no rank7 | no rank8 | no rank9 | parvorder | species group | species subgroup | species1 | subclass | subfamily | subgenus | subkingdom | suborder | subphylum | subspecies | subtribe | superclass | superfamily | superorder | superorder1 | superphylum | tribe | varietas | |--------|--------------|----------|----------|----------|-----------|-------|--------------|---------|-------|--------|------------|-------------|---------|--------------------|--------------|----------------------|-----------|-----------|-----------|-----------|---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|---------------|------------|---------------|------------|--------------|---------------|------------|---------------|------------------|----------|----------|-----------|----------|------------|-------------|-----------|------------|----------|------------|-------------|------------------|-------------|-------------|-------|----------| | 9606 | Eukaryota | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens | | | | | Simiiformes | Metazoa | cellular organisms | Opisthokonta | Dipnotetrapodomorpha | Tetrapoda | Amniota | Theria | Eutheria | Boreoeutheria | | | | | Eumetazoa | | | | Bilateria | Deuterostomia | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Catarrhini | | | | | Homininae | | | Haplorrhini | Craniata | | | | Hominoidea | Euarchontoglires | | | | |

Install

ncbitax2lin supports python-3.7, python-3.8, and python-3.9.

pip install -U ncbitax2lin

Generate lineages

First download taxonomy dump from NCBI:

bash wget -N ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz mkdir -p taxdump && tar zxf taxdump.tar.gz -C ./taxdump

Then, run ncbitax2lin

bash ncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp

By default, the generated lineages will be saved to ncbi_lineages_[date_of_utcnow].csv.gz. The output file can be overwritten with --output option.

FAQ

Q: I have a large number of sequences with their corresponding accession numbers from NCBI, how to get their lineages?

A: First, you need to map accession numbers (GI is deprecated) to tax IDs based on nucl_*accession2taxid.gz files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/. Secondly, you can trace a sequence's whole lineage based on its tax ID. The tax-id-to-lineage mapping is what NCBItax2lin can generate for you.

If you have any question about this project, please feel free to create a new issue.

Note on taxdump.tar.gz.md5

It appears that NCBI periodically regenerates taxdump.tar.gz and taxdump.tar.gz.md5 even when its content is still the same. I am not sure how their regeneration works, but taxdump.tar.gz.md5 will differ simply because of a different timestamp.

Used in

  • Mahmoudabadi, G., & Phillips, R. (2018). A comprehensive and quantitative exploration of thousands of viral genomes. ELife, 7. https://doi.org/10.7554/eLife.31955
  • Dombrowski, N. et al. (2020) Undinarchaeota illuminate DPANN phylogeny and the impact of gene transfer on archaeal evolution, Nature Communications. Springer US, 11(1). doi: 10.1038/s41467-020-17408-w. https://www.nature.com/articles/s41467-020-17408-w
  • Schenberger Santos, A. R. et al. (2020) NAD+ biosynthesis in bacteria is controlled by global carbon/ nitrogen levels via PII signaling, Journal of Biological Chemistry, 295(18), pp. 6165–6176. doi: 10.1074/jbc.RA120.012793. https://www.sciencedirect.com/science/article/pii/S0021925817482433
  • Villada, J. C., Duran, M. F. and Lee, P. K. H. (2020) Interplay between Position-Dependent Codon Usage Bias and Hydrogen Bonding at the 5' End of ORFeomes, mSystems, 5(4), pp. 1–18. doi: 10.1128/msystems.00613-20. https://msystems.asm.org/content/5/4/e00613-20
  • Byadgi, O. et al. (2020) Transcriptome analysis of amyloodinium ocellatum tomonts revealed basic information on the major potential virulence factors, Genes, 11(11), pp. 1–12. doi: 10.3390/genes11111252. https://www.mdpi.com/2073-4425/11/11/1252

Development

Install dependencies

poetry shell poetry install

Testing

make format make all

Publish (only for administrator)

poetry version [minor/major etc.] poetry publish --build -u __token__ --password pypi-<token-from-pypi>

Owner

  • Name: Zhuyi Xue
  • Login: zyxue
  • Kind: user
  • Location: Los Angeles

GitHub Events

Total
  • Issues event: 2
  • Watch event: 9
  • Delete event: 1
  • Issue comment event: 3
  • Push event: 11
  • Pull request event: 6
  • Fork event: 1
  • Create event: 2
Last Year
  • Issues event: 2
  • Watch event: 9
  • Delete event: 1
  • Issue comment event: 3
  • Push event: 11
  • Pull request event: 6
  • Fork event: 1
  • Create event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 95
  • Total Committers: 2
  • Avg Commits per committer: 47.5
  • Development Distribution Score (DDS): 0.179
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Zhuyi Xue a****8@g****m 78
Zhuyi Xue z****e@a****a 17
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 18
  • Total pull requests: 12
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 4 days
  • Total issue authors: 17
  • Total pull request authors: 5
  • Average comments per issue: 4.78
  • Average comments per pull request: 0.17
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 3
  • Average time to close issues: 29 days
  • Average time to close pull requests: about 1 hour
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mewu3 (2)
  • xiaodre21 (1)
  • lfaino (1)
  • zousm912zou (1)
  • naurasd (1)
  • 65degnorth (1)
  • eray-sahin (1)
  • tgolubch (1)
  • bpil83 (1)
  • ocstringham (1)
  • josuebarrera (1)
  • Xueliang24 (1)
  • hepcat72 (1)
  • nicolereynolds1 (1)
  • binitl (1)
Pull Request Authors
  • zyxue (10)
  • biocoder (2)
  • alienzj (1)
  • tfrcarvalho (1)
  • cdebourcy (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 94 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 12
  • Total maintainers: 1
pypi.org: ncbitax2lin

A tool that converts NCBI taxonomy dump into lineages

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 94 Last month
Rankings
Stargazers count: 6.5%
Forks count: 7.4%
Dependent packages count: 10.1%
Average: 10.6%
Dependent repos count: 11.5%
Downloads: 17.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

poetry.lock pypi
  • astroid 2.9.3 develop
  • atomicwrites 1.4.0 develop
  • attrs 21.4.0 develop
  • autoflake 1.4 develop
  • black 22.1.0 develop
  • click 8.0.4 develop
  • colorama 0.4.4 develop
  • coverage 5.5 develop
  • distlib 0.3.4 develop
  • filelock 3.4.2 develop
  • importlib-metadata 4.10.1 develop
  • isort 5.10.1 develop
  • lazy-object-proxy 1.7.1 develop
  • mccabe 0.6.1 develop
  • more-itertools 8.12.0 develop
  • mypy 0.941 develop
  • mypy-extensions 0.4.3 develop
  • packaging 21.3 develop
  • pathspec 0.9.0 develop
  • platformdirs 2.4.1 develop
  • pluggy 0.13.1 develop
  • py 1.11.0 develop
  • pyflakes 2.4.0 develop
  • pylint 2.12.2 develop
  • pyparsing 3.0.7 develop
  • pytest 5.4.3 develop
  • pytest-parallel 0.1.1 develop
  • tblib 1.7.0 develop
  • toml 0.10.2 develop
  • tomli 2.0.1 develop
  • tox 3.24.5 develop
  • typed-ast 1.4.3 develop
  • virtualenv 20.13.0 develop
  • wcwidth 0.2.5 develop
  • wrapt 1.13.3 develop
  • zipp 3.7.0 develop
  • fire 0.3.1
  • numpy 1.21.1
  • pandas 1.1.5
  • python-dateutil 2.8.2
  • pytz 2021.3
  • six 1.16.0
  • termcolor 1.1.0
  • typing-extensions 3.10.0.2
pyproject.toml pypi
  • autoflake ^1.3.1 develop
  • black ^22.1.0 develop
  • coverage ^5.4 develop
  • isort ^5.7.0 develop
  • mypy ^0.941 develop
  • pylint ^2.5.0 develop
  • pytest ^5.2 develop
  • pytest-parallel ^0.1.0 develop
  • tox ^3.21.4 develop
  • fire ^0.3.1
  • pandas ^1.0.3
  • python ^3.7,<3.10
  • typing-extensions ^3.7.4
.github/workflows/python-package.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite