Science Score: 46.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, sciencedirect.com, nature.com, mdpi.com -
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary
Keywords
Repository
🐞 Convert NCBI taxonomy dump into lineages
Basic Info
Statistics
- Stars: 145
- Watchers: 7
- Forks: 30
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
NCBItax2lin
Convert NCBI taxonomy dump into lineages. An example for human (tax_id=9606) is like
| tax_id | superkingdom | phylum | class | order | family | genus | species | family1 | forma | genus1 | infraclass | infraorder | kingdom | no rank | no rank1 | no rank10 | no rank11 | no rank12 | no rank13 | no rank14 | no rank15 | no rank16 | no rank17 | no rank18 | no rank19 | no rank2 | no rank20 | no rank21 | no rank22 | no rank3 | no rank4 | no rank5 | no rank6 | no rank7 | no rank8 | no rank9 | parvorder | species group | species subgroup | species1 | subclass | subfamily | subgenus | subkingdom | suborder | subphylum | subspecies | subtribe | superclass | superfamily | superorder | superorder1 | superphylum | tribe | varietas | |--------|--------------|----------|----------|----------|-----------|-------|--------------|---------|-------|--------|------------|-------------|---------|--------------------|--------------|----------------------|-----------|-----------|-----------|-----------|---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|---------------|------------|---------------|------------|--------------|---------------|------------|---------------|------------------|----------|----------|-----------|----------|------------|-------------|-----------|------------|----------|------------|-------------|------------------|-------------|-------------|-------|----------| | 9606 | Eukaryota | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens | | | | | Simiiformes | Metazoa | cellular organisms | Opisthokonta | Dipnotetrapodomorpha | Tetrapoda | Amniota | Theria | Eutheria | Boreoeutheria | | | | | Eumetazoa | | | | Bilateria | Deuterostomia | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Catarrhini | | | | | Homininae | | | Haplorrhini | Craniata | | | | Hominoidea | Euarchontoglires | | | | |
Install
ncbitax2lin supports python-3.7, python-3.8, and python-3.9.
pip install -U ncbitax2lin
Generate lineages
First download taxonomy dump from NCBI:
bash
wget -N ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
mkdir -p taxdump && tar zxf taxdump.tar.gz -C ./taxdump
Then, run ncbitax2lin
bash
ncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp
By default, the generated lineages will be saved to
ncbi_lineages_[date_of_utcnow].csv.gz. The output file can be overwritten with
--output option.
FAQ
Q: I have a large number of sequences with their corresponding accession numbers from NCBI, how to get their lineages?
A: First, you need to map accession numbers (GI is deprecated) to tax IDs
based on nucl_*accession2taxid.gz files from
ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/. Secondly, you can trace a
sequence's whole lineage based on its tax ID. The tax-id-to-lineage mapping is
what NCBItax2lin can generate for you.
If you have any question about this project, please feel free to create a new issue.
Note on taxdump.tar.gz.md5
It appears that NCBI periodically regenerates taxdump.tar.gz and
taxdump.tar.gz.md5 even when its content is still the same. I am not sure how
their regeneration works, but taxdump.tar.gz.md5 will differ simply because
of a different timestamp.
Used in
- Mahmoudabadi, G., & Phillips, R. (2018). A comprehensive and quantitative exploration of thousands of viral genomes. ELife, 7. https://doi.org/10.7554/eLife.31955
- Dombrowski, N. et al. (2020) Undinarchaeota illuminate DPANN phylogeny and the impact of gene transfer on archaeal evolution, Nature Communications. Springer US, 11(1). doi: 10.1038/s41467-020-17408-w. https://www.nature.com/articles/s41467-020-17408-w
- Schenberger Santos, A. R. et al. (2020) NAD+ biosynthesis in bacteria is controlled by global carbon/ nitrogen levels via PII signaling, Journal of Biological Chemistry, 295(18), pp. 6165–6176. doi: 10.1074/jbc.RA120.012793. https://www.sciencedirect.com/science/article/pii/S0021925817482433
- Villada, J. C., Duran, M. F. and Lee, P. K. H. (2020) Interplay between Position-Dependent Codon Usage Bias and Hydrogen Bonding at the 5' End of ORFeomes, mSystems, 5(4), pp. 1–18. doi: 10.1128/msystems.00613-20. https://msystems.asm.org/content/5/4/e00613-20
- Byadgi, O. et al. (2020) Transcriptome analysis of amyloodinium ocellatum tomonts revealed basic information on the major potential virulence factors, Genes, 11(11), pp. 1–12. doi: 10.3390/genes11111252. https://www.mdpi.com/2073-4425/11/11/1252
Development
Install dependencies
poetry shell
poetry install
Testing
make format
make all
Publish (only for administrator)
poetry version [minor/major etc.]
poetry publish --build -u __token__ --password pypi-<token-from-pypi>
Owner
- Name: Zhuyi Xue
- Login: zyxue
- Kind: user
- Location: Los Angeles
- Website: http://zyxue.github.io/
- Repositories: 21
- Profile: https://github.com/zyxue
GitHub Events
Total
- Issues event: 2
- Watch event: 9
- Delete event: 1
- Issue comment event: 3
- Push event: 11
- Pull request event: 6
- Fork event: 1
- Create event: 2
Last Year
- Issues event: 2
- Watch event: 9
- Delete event: 1
- Issue comment event: 3
- Push event: 11
- Pull request event: 6
- Fork event: 1
- Create event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Zhuyi Xue | a****8@g****m | 78 |
| Zhuyi Xue | z****e@a****a | 17 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 18
- Total pull requests: 12
- Average time to close issues: about 1 month
- Average time to close pull requests: 4 days
- Total issue authors: 17
- Total pull request authors: 5
- Average comments per issue: 4.78
- Average comments per pull request: 0.17
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 3
- Average time to close issues: 29 days
- Average time to close pull requests: about 1 hour
- Issue authors: 2
- Pull request authors: 2
- Average comments per issue: 2.0
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mewu3 (2)
- xiaodre21 (1)
- lfaino (1)
- zousm912zou (1)
- naurasd (1)
- 65degnorth (1)
- eray-sahin (1)
- tgolubch (1)
- bpil83 (1)
- ocstringham (1)
- josuebarrera (1)
- Xueliang24 (1)
- hepcat72 (1)
- nicolereynolds1 (1)
- binitl (1)
Pull Request Authors
- zyxue (10)
- biocoder (2)
- alienzj (1)
- tfrcarvalho (1)
- cdebourcy (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 94 last-month
- Total dependent packages: 0
- Total dependent repositories: 2
- Total versions: 12
- Total maintainers: 1
pypi.org: ncbitax2lin
A tool that converts NCBI taxonomy dump into lineages
- Homepage: https://github.com/zyxue/ncbitax2lin
- Documentation: https://ncbitax2lin.readthedocs.io/
- License: MIT
-
Latest release: 2.4.1
published 11 months ago
Rankings
Maintainers (1)
Dependencies
- astroid 2.9.3 develop
- atomicwrites 1.4.0 develop
- attrs 21.4.0 develop
- autoflake 1.4 develop
- black 22.1.0 develop
- click 8.0.4 develop
- colorama 0.4.4 develop
- coverage 5.5 develop
- distlib 0.3.4 develop
- filelock 3.4.2 develop
- importlib-metadata 4.10.1 develop
- isort 5.10.1 develop
- lazy-object-proxy 1.7.1 develop
- mccabe 0.6.1 develop
- more-itertools 8.12.0 develop
- mypy 0.941 develop
- mypy-extensions 0.4.3 develop
- packaging 21.3 develop
- pathspec 0.9.0 develop
- platformdirs 2.4.1 develop
- pluggy 0.13.1 develop
- py 1.11.0 develop
- pyflakes 2.4.0 develop
- pylint 2.12.2 develop
- pyparsing 3.0.7 develop
- pytest 5.4.3 develop
- pytest-parallel 0.1.1 develop
- tblib 1.7.0 develop
- toml 0.10.2 develop
- tomli 2.0.1 develop
- tox 3.24.5 develop
- typed-ast 1.4.3 develop
- virtualenv 20.13.0 develop
- wcwidth 0.2.5 develop
- wrapt 1.13.3 develop
- zipp 3.7.0 develop
- fire 0.3.1
- numpy 1.21.1
- pandas 1.1.5
- python-dateutil 2.8.2
- pytz 2021.3
- six 1.16.0
- termcolor 1.1.0
- typing-extensions 3.10.0.2
- autoflake ^1.3.1 develop
- black ^22.1.0 develop
- coverage ^5.4 develop
- isort ^5.7.0 develop
- mypy ^0.941 develop
- pylint ^2.5.0 develop
- pytest ^5.2 develop
- pytest-parallel ^0.1.0 develop
- tox ^3.21.4 develop
- fire ^0.3.1
- pandas ^1.0.3
- python ^3.7,<3.10
- typing-extensions ^3.7.4
- actions/checkout v2 composite
- actions/setup-python v2 composite