https://github.com/biocore/nested-classification

Tool for QIIME2 nested-classification

https://github.com/biocore/nested-classification

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    2 of 4 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Tool for QIIME2 nested-classification

Basic Info
  • Host: GitHub
  • Owner: biocore
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 171 KB
Statistics
  • Stars: 0
  • Watchers: 6
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created over 4 years ago · Last pushed almost 4 years ago
Metadata Files
Readme License

README.md

nested-classification

A QIIME 2 plugin for performing nested classification. This plugin utilizes the q2-sample-classifier: https://github.com/qiime2/q2-sample-classifier.git

Nested-classification is a tool for comparing similarities of microbial data of humans to animals.

Install

In a QIIME 2 enviroment: git fork https://github.com/biocore/nested-classification.git pip install -e .

How to use

1. training-samples

The first method ran should be training-samples with animal (non-human) metadata and feature-table. This will train and output classifiers for later use.

Inputs

  1. QIIME 2 artifact feature-table
  2. Corresponding non-human metadata including 'sample-id' as index column and 'ncbi-taxon-id' with information of host NCBI taxids
  3. Pre-existing output directory (folder) for storing classifier models ### Outputs Output-directory folder will be populated with estimators labeled by their taxon IDs.

2. predict-samples

The second method queries human data against the previously trained models.

Inputs

  1. QIIME 2 HUMAN artifact feature-table
  2. The same non-human metadata used to train the stored models (this is used as a map)
  3. An input-directory (folder) with the stored classifiers ### Outputs
  4. probabilities.qzv: the probability of each query belonging to the taxid, or the model's predicted likelihood the individual is in the taxid's clade
  5. predictions.qzv; predictions of "True" or "False" whether or not the individual belonds to that taxid's clade

How does this work?

q2-nc uses ete3.NBITaxa to pull information from NCBI Database. Using this, we are able to create a taxonomy tree from the taxids in the metadata. This taxonomy tree follows an evolutionary hierachy with vertebrate at the root and species as the leaves. Starting at vertebrate, we traverse down the tree to train models of subsets of the animal data at every (valid) node. Then, this same tree can be re-built so the human data can be inputted into the nested-classifiers through a depth traversal of the taxonomy tree.

References

QIIME2:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9

q2-sample-classifier:

@article {Bokulich306167, author = {Bokulich, Nicholas and Dillon, Matthew and Bolyen, Evan and Kaehler, Benjamin D and Huttley, Gavin A and Caporaso, J Gregory}, title = {{q2-sample-classifier}: machine-learning tools for microbiome classification and regression}, year = {2018}, doi = {10.21105/joss.00934}, journal = {Journal of Open Source Software}, volume={3}, number={30}, pages={934} }

@article{pedregosa2011scikit, title={Scikit-learn: Machine learning in Python}, author={Pedregosa, Fabian and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, {\'E}douard}, journal={Journal of machine learning research}, volume={12}, number={Oct}, pages={2825--2830}, year={2011} }

Owner

  • Name: biocore
  • Login: biocore
  • Kind: organization
  • Location: Cyberspace

Collaboratively developed bioinformatics software.

GitHub Events

Total
Last Year

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 48
  • Total Committers: 4
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.375
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
jjjoelle j****h@u****u 30
Daniel McDonald d****d@u****u 9
NathalieFranklin 6****n 7
Shria Arcot 7****8 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: 11 days
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.3
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • jjjoelle (5)
  • wasade (4)
  • NathalieFranklin (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml conda
  • pandas