gtdbtk
GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
Science Score: 59.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 12 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, nature.com -
✓Committers with academic emails
2 of 22 committers (9.1%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
Basic Info
- Host: GitHub
- Owner: Ecogenomics
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Homepage: https://ecogenomics.github.io/GTDBTk/
- Size: 29.1 MB
Statistics
- Stars: 541
- Watchers: 20
- Forks: 90
- Open Issues: 25
- Releases: 44
Topics
Metadata Files
README.md
GTDB-Tk
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy (GTDB). It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the GNU General Public License (Version 3).
Notifications about GTDB-Tk releases will be available through the GTDB Twitter account and the GTDB Announcements Forum.
Please post questions and issues related to GTDB-Tk on the Issues section of the GitHub repository. Questions related to the GTDB can be posted on the GTDB Forum or sent to the GTDB team.
🚀 Getting started
Be sure to check the hardware requirements, then choose your preferred method:
📖 Documentation
Documentation for GTDB-Tk can be found here.
✨ New Features
GTDB-Tk v2.5.0+ includes the following new features:
- GTDB-Tk now uses skani exclusively for genome clustering, replacing the previous mash/skani hybrid approach.
As a result, the mutually exclusive required options --mash_db and --skip_ani_screen have been removed. The --skip_ani_screen
flag has now been changed to an optional parameter.
⚠️This change is not backward-compatible and may break existing pipelines or scripts that rely on these options.
📈 Performance
Using ANI screen "can" reduce computation by >50%, although it depends on the set of input genomes. A set of input genomes consisting primarily of new species will not benefit from ANI screen as much as a set of genomes that are largely assigned to GTDB species clusters. In the latter case, the ANI screen will reduce the number of genomes that need to be classified by pplacer which reduces computation time substantially (between 25% and 60% in our testing).
📚 References
GTDB-Tk is described in:
- Chaumeil PA, et al. 2022. GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database. Bioinformatics, btac672.
- Chaumeil PA, et al. 2019. GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics, btz848.
The Genome Taxonomy Database (GTDB) is described in:
- Parks, D.H., et al. (2021). GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, 50: D785–D794.
- Rinke, C, et al. (2021). A standardized archaeal taxonomy for the Genome Taxonomy Database. Nature Microbiology, 6: 946–959.
- Parks, D.H., et al. 2020. A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology, https://doi.org/10.1038/s41587-020-0501-8.
- Parks DH, et al. 2018. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology, http://dx.doi.org/10.1038/nbt.4229.
We strongly encourage you to cite the following 3rd party dependencies:
- Matsen FA, et al. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11:538.
- Jain C, et al. 2019. High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries. Nat. Communications, doi: 10.1038/s41467-018-07641-9.
- Shaw J. and Yu Y.W. 2023. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods, 20, pages1661–1665 (2023).
- Hyatt D, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119. doi: 10.1186/1471-2105-11-119.
- Price MN, et al. 2010. FastTree 2 - Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One, 5, e9490.
- Eddy SR. 2011. Accelerated profile HMM searches. PLOS Comp. Biol., 7:e1002195.
- Ondov BD, et al. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132. doi: 10.1186/s13059-016-0997-x.
© Copyright
Copyright 2017 Pierre-Alain Chaumeil. See LICENSE for further details.
Owner
- Name: Australian Centre for Ecogenomics
- Login: Ecogenomics
- Kind: organization
- Location: University of Queensland, Australia
- Website: ecogenomic.org
- Repositories: 37
- Profile: https://github.com/Ecogenomics
GitHub Events
Total
- Create event: 3
- Release event: 2
- Issues event: 71
- Watch event: 59
- Issue comment event: 138
- Push event: 36
- Pull request event: 10
- Gollum event: 3
- Fork event: 6
Last Year
- Create event: 3
- Release event: 2
- Issues event: 71
- Watch event: 59
- Issue comment event: 138
- Push event: 36
- Pull request event: 10
- Gollum event: 3
- Fork event: 6
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Aaron Mussig | a****g@g****m | 294 |
| pchaumeil | u****m@u****u | 213 |
| Donovan Parks | d****s@g****m | 162 |
| pchaumeil | p****l@g****m | 49 |
| pchaumeil | p****l@q****g | 24 |
| Cameron Hyde | 4****t | 1 |
| Donovan | 9****s | 1 |
| Florian Plaza Oñate | f****a | 1 |
| Linda Fenske | 1****3 | 1 |
| Nicolai Søborg | N****g | 1 |
| Samuel Aroney | 4****S | 1 |
| Valentyn Bezshapkin | 6****3 | 1 |
| just.in.lee | 3****6 | 1 |
| tr11-sanger | 9****r | 1 |
| Asaf Peer | A****r@j****g | 1 |
| Ben Woodcroft | b****t@g****m | 1 |
| Daniel McDonald | d****d@u****u | 1 |
| Davide Albanese | d****e@g****m | 1 |
| Florian Plaza Oñate | f****e@i****r | 1 |
| Malte Rühlemann | m****n@g****m | 1 |
| Mingye Wang | a****6@g****m | 1 |
| Moritz | m****k@g****m | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 204
- Total pull requests: 46
- Average time to close issues: 3 months
- Average time to close pull requests: 9 days
- Total issue authors: 161
- Total pull request authors: 10
- Average comments per issue: 3.03
- Average comments per pull request: 0.28
- Merged pull requests: 39
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 47
- Pull requests: 9
- Average time to close issues: about 1 month
- Average time to close pull requests: 1 minute
- Issue authors: 41
- Pull request authors: 3
- Average comments per issue: 1.68
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- saras224 (5)
- zwets (4)
- donovan-h-parks (3)
- lfenske-93 (3)
- jolespin (3)
- luwenyi111 (3)
- wanxn518 (3)
- pchaumeil (3)
- nick-youngblut (3)
- wwood (3)
- lyisrae1 (2)
- sarehaghababaee (2)
- JiangweiPan1230 (2)
- jianshu93 (2)
- aaronmussig (2)
Pull Request Authors
- pchaumeil (37)
- MartinVad (2)
- juanvillada (2)
- donovan-h-parks (2)
- aaronmussig (2)
- Artoria2e5 (2)
- wasade (2)
- wwood (1)
- AroneyS (1)
- neoformit (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 389 last-month
- Total docker downloads: 475
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 2
(may contain duplicates) - Total versions: 53
- Total maintainers: 4
pypi.org: gtdbtk
A toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
- Homepage: https://github.com/Ecogenomics/GTDBTk
- Documentation: https://gtdbtk.readthedocs.io/
- License: GPL3
-
Latest release: 2.5.0
published 6 months ago
Rankings
Maintainers (3)
spack.io: py-gtdbtk
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy (GTDB).
- Homepage: https://github.com/Ecogenomics/GTDBTk
- License: []
-
Latest release: 2.3.2
published over 2 years ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- python 3.8-slim-bullseye build
- jupyter *
- linuxdoc ==20211220
- matplotlib *
- nbsphinx *
- recommonmark *
- sphinx *
- sphinx-argparse *
- sphinx-rtd-theme *
- sphinx-sitemap *
- dendropy >=4.1.0
- pydantic >=1.9.2,<2.0a1
- tqdm >=4.35.0