Sourcepredict

Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification - Published in JOSS (2019)

https://github.com/maxibor/sourcepredict

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 30 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, joss.theoj.org, zenodo.org
  • Committers with academic emails
    2 of 3 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

machine-learning microbiome source-tracking

Scientific Fields

Biology Life Sciences - 88% confidence
Mathematics Computer Science - 84% confidence
Last synced: 4 months ago · JSON representation

Repository

Prediction/source tracking of metagenomic samples source using machine learning

Basic Info
  • Host: GitHub
  • Owner: maxibor
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 23.3 MB
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 7
Topics
machine-learning microbiome source-tracking
Created about 7 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License

README.md

Build Status Coverage Status Anaconda-Server Badge Documentation Status DOI DOI


Sourcepredict is a Python package distributed through Conda, to classify and predict the origin of metagenomic samples, given a reference dataset of known origins, a problem also known as source tracking. Sourcepredict solves this problem by using machine learning classification on dimensionally reduced datasets.

Installation

With conda (recommended)

bash $ conda install -c conda-forge -c maxibor sourcepredict

With pip

bash $ pip install sourcepredict

Example

Input

Usage

bash $ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/test/dog_test_sink_sample.csv -O dog_example.csv $ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_labels.csv -O sp_labels.csv $ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_sources.csv -O sp_sources.csv $ sourcepredict -s sp_sources.csv -l sp_labels.csv dog_example.csv Step 1: Checking for unknown proportion == Sample: ERR1915662 == Adding unknown Normalizing (GMPR) Computing Bray-Curtis distance Performing MDS embedding in 2 dimensions KNN machine learning Training KNN classifier on 2 cores... -> Testing Accuracy: 1.0 ---------------------- - Sample: ERR1915662 known:98.61% unknown:1.39% Step 2: Checking for source proportion Computing weighted_unifrac distance on species rank TSNE embedding in 2 dimensions KNN machine learning Performing 5 fold cross validation on 2 cores... Trained KNN classifier with 10 neighbors -> Testing Accuracy: 0.99 ---------------------- - Sample: ERR1915662 Canis_familiaris:96.1% Homo_sapiens:2.47% Soil:1.43% Sourcepredict result written to dog_test_sample.sourcepredict.csv

Output

Sourcepredict output the predicted source contribution to each sink sample, and the embedding of all samples in the lower dimensional space. See documentation for details.

Runtime

Depending on the normalization method (-n), the embedding (-me) method, the cpus available for parallel processing (-t), and the data, the runtime should be between a few seconds and a few minutes per sink sample.

Documentation

The documentation of SourcePredict is available here: sourcepredict.readthedocs.io

Sourcepredict example files

Environments included in the example source file

  • Homo sapiens gut microbiome (1, 2, 3, 4, 5, 6)
  • Canis familiaris gut microbiome (1)
  • Soil microbiome (1, 2, 3)

Contributing Code, Documentation, or Feedback

If you wish to contribute to Sourcepredict, you are welcome and encouraged to contribute by opening an issue, or creating a pull-request. All contributions will be made under the GPLv3 license. More informations can found on the contributing page.

How to cite

Sourcepredict has been published in JOSS.

@article{Borry2019Sourcepredict, journal = {Journal of Open Source Software}, doi = {10.21105/joss.01540}, issn = {2475-9066}, number = {41}, publisher = {The Open Journal}, title = {Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification}, url = {http://dx.doi.org/10.21105/joss.01540}, volume = {4}, author = {Borry, Maxime}, pages = {1540}, date = {2019-09-04}, year = {2019}, month = {9}, day = {4} }

Owner

  • Name: Maxime Borry
  • Login: maxibor
  • Kind: user
  • Location: Mainz, Germany
  • Company: TRON - Translational Oncology Mainz

Bioinformatics Scientist

JOSS Publication

Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification
Published
September 04, 2019
Volume 4, Issue 41, Page 1540
Authors
Maxime Borry ORCID
Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, 07745, Germany
Editor
Lorena Pantano ORCID
Tags
microbiome source tracking machine learning

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 197
  • Total Committers: 3
  • Avg Commits per committer: 65.667
  • Development Distribution Score (DDS): 0.056
Past Year
  • Commits: 8
  • Committers: 1
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
maxibor m****y@g****m 186
Maxime Borry b****y@m****e 8
Maxime Borry b****y@m****e 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 3
  • Total pull requests: 1
  • Average time to close issues: 6 days
  • Average time to close pull requests: 17 minutes
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 3.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gavinmdouglas (2)
  • will-rowe (1)
Pull Request Authors
  • maxibor (1)
Top Labels
Issue Labels
bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 10 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 2
  • Total maintainers: 1
pypi.org: sourcepredict

Classification and prediction of the origin of metagenomic samples

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 10 Last month
  • Docker Downloads: 0
Rankings
Docker downloads count: 1.8%
Dependent packages count: 10.1%
Dependent repos count: 21.6%
Stargazers count: 23.1%
Average: 25.9%
Forks count: 29.8%
Downloads: 68.8%
Maintainers (1)
Last synced: 4 months ago

Dependencies

docs/requirements.txt pypi
  • ipykernel *
  • nbsphinx *
  • sphinxcontrib-napoleon *
setup.py pypi
  • ete3 *
  • numpy *
  • pandas *
  • scikit-bio *
  • scikit-learn *
  • scipy *
  • umap-learn *
.github/workflows/publish_conda.yml actions
  • actions/checkout master composite
  • maxibor/conda-package-publish-action master composite
.github/workflows/publish_pypi.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/sourcepredict_ci.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v3 composite