Sourcepredict
Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification - Published in JOSS (2019)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 30 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, joss.theoj.org, zenodo.org -
✓Committers with academic emails
2 of 3 committers (66.7%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Scientific Fields
Repository
Prediction/source tracking of metagenomic samples source using machine learning
Basic Info
Statistics
- Stars: 9
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 7
Topics
Metadata Files
README.md

Sourcepredict is a Python package distributed through Conda, to classify and predict the origin of metagenomic samples, given a reference dataset of known origins, a problem also known as source tracking. Sourcepredict solves this problem by using machine learning classification on dimensionally reduced datasets.
Installation
With conda (recommended)
bash
$ conda install -c conda-forge -c maxibor sourcepredict
With pip
bash
$ pip install sourcepredict
Example
Input
- Sink taxonomic count file (see example file and documentation)
- Source taxonomic count file (see example file and documentation)
- Source label file (see example file and documentation)
Usage
bash
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/test/dog_test_sink_sample.csv -O dog_example.csv
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_labels.csv -O sp_labels.csv
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_sources.csv -O sp_sources.csv
$ sourcepredict -s sp_sources.csv -l sp_labels.csv dog_example.csv
Step 1: Checking for unknown proportion
== Sample: ERR1915662 ==
Adding unknown
Normalizing (GMPR)
Computing Bray-Curtis distance
Performing MDS embedding in 2 dimensions
KNN machine learning
Training KNN classifier on 2 cores...
-> Testing Accuracy: 1.0
----------------------
- Sample: ERR1915662
known:98.61%
unknown:1.39%
Step 2: Checking for source proportion
Computing weighted_unifrac distance on species rank
TSNE embedding in 2 dimensions
KNN machine learning
Performing 5 fold cross validation on 2 cores...
Trained KNN classifier with 10 neighbors
-> Testing Accuracy: 0.99
----------------------
- Sample: ERR1915662
Canis_familiaris:96.1%
Homo_sapiens:2.47%
Soil:1.43%
Sourcepredict result written to dog_test_sample.sourcepredict.csv
Output
Sourcepredict output the predicted source contribution to each sink sample, and the embedding of all samples in the lower dimensional space. See documentation for details.
Runtime
Depending on the normalization method (-n), the embedding (-me) method, the cpus available for parallel processing (-t), and the data, the runtime should be between a few seconds and a few minutes per sink sample.
Documentation
The documentation of SourcePredict is available here: sourcepredict.readthedocs.io
Sourcepredict example files
- The sources were obtained with a simple Nextflow pipeline, with Kraken2 using the MiniKraken2v28GB.
See the documentation for more informations on how to build a custom source file. - The example source file is here moderngutmicrobiomes_sources.csv
- The example label file is here moderngutmicrobiomes_sources.csv
Environments included in the example source file
- Homo sapiens gut microbiome (1, 2, 3, 4, 5, 6)
- Canis familiaris gut microbiome (1)
- Soil microbiome (1, 2, 3)
Contributing Code, Documentation, or Feedback
If you wish to contribute to Sourcepredict, you are welcome and encouraged to contribute by opening an issue, or creating a pull-request. All contributions will be made under the GPLv3 license. More informations can found on the contributing page.
How to cite
Sourcepredict has been published in JOSS.
@article{Borry2019Sourcepredict,
journal = {Journal of Open Source Software},
doi = {10.21105/joss.01540},
issn = {2475-9066},
number = {41},
publisher = {The Open Journal},
title = {Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification},
url = {http://dx.doi.org/10.21105/joss.01540},
volume = {4},
author = {Borry, Maxime},
pages = {1540},
date = {2019-09-04},
year = {2019},
month = {9},
day = {4}
}
Owner
- Name: Maxime Borry
- Login: maxibor
- Kind: user
- Location: Mainz, Germany
- Company: TRON - Translational Oncology Mainz
- Website: https://maximeborry.com
- Repositories: 141
- Profile: https://github.com/maxibor
Bioinformatics Scientist
JOSS Publication
Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification
Authors
Tags
microbiome source tracking machine learningGitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| maxibor | m****y@g****m | 186 |
| Maxime Borry | b****y@m****e | 8 |
| Maxime Borry | b****y@m****e | 3 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 3
- Total pull requests: 1
- Average time to close issues: 6 days
- Average time to close pull requests: 17 minutes
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 3.33
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- gavinmdouglas (2)
- will-rowe (1)
Pull Request Authors
- maxibor (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 10 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 2
- Total maintainers: 1
pypi.org: sourcepredict
Classification and prediction of the origin of metagenomic samples
- Homepage: https://github.com/maxibor/sourcepredict
- Documentation: https://sourcepredict.readthedocs.io/
- License: GPLv3
-
Latest release: 0.5.1
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- ipykernel *
- nbsphinx *
- sphinxcontrib-napoleon *
- ete3 *
- numpy *
- pandas *
- scikit-bio *
- scikit-learn *
- scipy *
- umap-learn *
- actions/checkout master composite
- maxibor/conda-package-publish-action master composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v3 composite
