expert-mst

Transfer Learning-enabled context-aware microbial source tracking

https://github.com/hust-ningkang-lab/expert

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 7 DOI reference(s) in README
✓
Academic publication links
Links to: biorxiv.org, nature.com
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Keywords

deep-learning microbial-source-tracking transfer-learning

Last synced: 6 months ago · JSON representation

Repository

Transfer Learning-enabled context-aware microbial source tracking

Basic Info

Host: GitHub
Owner: HUST-NingKang-Lab
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 432 MB

Statistics

Stars: 14
Watchers: 4
Forks: 4
Open Issues: 3
Releases: 0

Topics

deep-learning microbial-source-tracking transfer-learning

Created over 5 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Code of conduct Citation

EXPERT - a scalable model for quantifying source contributions for microbial communities

Challenges remain to be addressed in terms of quantifying source origins for microbiome samples in a fast, comprehensive, and context-aware manner. Traditional approaches to such quantification have severe trade-offs between efficiency, accuracy, and scalability.

Here, we introduce EXPERT, a scalable community-level microbial source tracking approach. Built upon the biome ontology information and transfer learning techniques, EXPERT has acquired the context-aware flexibility and could easily expand the supervised model's search scope to include the context-depende/nt community samples and understudied biomes. While at the same time, it is superior to current approaches in source tracking accuracy and speed. EXPERT's superiority has been demonstrated on multiple source tracking tasks, including source tracking samples collected at different disease stages and longitudinal samples. For details refer to our original study.

Supervised learning (with high efficiency and accuracy) meets transfer learning (with inherent high scalability), towards better understanding the dark matters in microbial community.

Support

For support using EXPERT, please contact us.

This is our beta version, any comments or insights would be greatly appreciated.

Features

Context-aware ability to adapt to microbiome studies via transfer learning
Fast, accurate and interpretable source tracking via ontology-aware forward propagation
Supports both amplicon sequencing and whole-genome sequencing data.
Selective learning from partially-labeled training data
Ultra-fast data cleaning & cleaning via in-memory NCBI taxonomy database
Parallelized feature encoding via tensorflow.keras

Installation

You can simply install EXPERT using pip package manager.

bash pip install expert-mst # Install EXPERT expert init # Initialize EXPERT and install NCBI taxonomy database

Quick start

Here we quickly go-through basic functionalities of EXPERT through a case study, which have already been conducted in our preprinted paper. We also provided more functional show-cases in another repository.

Things to know before starting

EXPERT's fantastic function is its automatic generalization of fundamental models, which allows non-deep-learning users to modify the models just in terminal, without the need of any programming skill. Here we generalize a fundamental model for monitoring the progression of colorectal cancer (CRC) and assess the performance of the generalized model. We only use the disease model trained for quantifying contribution from hosts with different disease-associated biomes (refer to our preprint for details).
Microbial source tracking: Bayesian community-wide culture-independent microbial source tracking | Nature Methods
Cross-validation: Cross-validation (statistics) - Wikipedia

Get prepared

Please follow our instructions below and make sure all these commands were run on Linux/Mac OSX platform. You may also need to install Anaconda before starting.

Install expert-mst version 0.2 (suggested).

bash pip install https://github.com/HUST-NingKang-Lab/EXPERT/releases/download/v0.2/expert-0.2_cpu-py3-none-any.whl expert init

Download the fundamental model and dataset to be used. Here CM is a abbreviation term of countMatrix, which is a format of abundance data (each row represents a taxon, and each column represents a sample/run). Mapper is another important input of EXPERT, which records source biomes for input samples.

bash wget -c https://github.com/HUST-NingKang-Lab/EXPERT/releases/download/v0.2-m/disease_model.tgz tar zxvf disease_model.tgz # Decompress the fundamental model. for file in {QueryCM.tsv,SourceCM.tsv,QueryMapper.csv,SourceMapper.csv}; do wget -c https://raw.githubusercontent.com/HUST-NingKang-Lab/EXPERT/master/data/$file; done

Preprocess the dataset

Construct a biome ontology representing stages of CRC. You'll see constructed ontology like a tree in the printed message.

bash grep -v "Env" SourceMapper.csv | awk -F ',' '{print $6}' | sort | uniq > microbiomes.txt expert construct -i microbiomes.txt -o ontology.pkl

Map microbial community samples to the biome ontology to obtain hierarchical labels. You'll see counts of the samples on each biome ontology layer in the printed message.

bash expert map --to-otlg -i SourceMapper.csv -t ontology.pkl -o SourceLabels.h5 expert map --to-otlg -i QueryMapper.csv -t ontology.pkl -o QueryLabels.h5

Convert input abundance data to model-acceptable hdf file. The EXPERT model only accepts standardized abundance data. Here we standardize the abundance data using convert mode.

bash ls SourceCM.tsv > inputList; expert convert -i inputList -o SourceCM.h5 --in-cm; ls QueryCM.tsv > inputList; expert convert -i inputList -o QueryCM.h5 --in-cm; rm inputList

Modeling and evaluation

Transfer knowledge about disease (from the disease model) to the CRC model, for a better performance on the CRC monitoring. You'll see running log and training process in the printed message.

bash expert transfer -i SourceCM.h5 -l SourceLabels.h5 -t ontology.pkl -m disease_model -o CRC_model

Search the query samples against the model.

bash expert search -i QueryCM.h5 -m CRC_model -o quantified_source_contributions

Evaluate the performance of the CRC model. You'll obtain a performance report on each stage of CRC.

bash expert evaluate -i quantified_source_contributions -l QueryLabels.h5 -o performance_report cat performance_report/overall.csv

You now have acquired skills of EXPERT modeling for microbial source tracking. Next, you may want to explore a question: Which fundamental model gives the best performance on the CRC monitoring? You may want to assess the performance utilizing another fundamental model. Good luck.

Advanced usage

EXPERT has enabled the adaptation to context-dependent studies, in which you can choose potential sources to be estimated. Please follow our documentation: advanced usage.

Model resources

| Model | Biome ontology | Top-level biome | Data source | Dataset size | Download link | Note | | ------------- | -------------------------------------------------------- | ---------------- | --------------------------------------------- | ------------ | ------------------------------------------------------------ | ------------------------------------------------------ | | general model | biome ontology for 132 biomes on earth (as of Jan. 2020) | root | MGnify | 115,892 | download | The samples were not uniformly processed by MGnify | | human model | biome ontology for 27 human-associated biomes | human | MGnify | 52,537 | download | The samples were not uniformly processed by MGnify | | disease model | biome ontology for 20 human disease-associated biomes | root (human gut) | GMrepo | 13,642 | download | The samples were uniformly processed by GMrepo |

Note: These models were trained on EXPERT version 0.2.

How-to-cite

If you are using EXPERT in a scientific publication (or inspired by the approach), we would appreciate citations to the following paper:

Hui Chong, Yuguo Zha, Qingyang Yu, Mingyue Cheng, Guangzhou Xiong, Nan Wang, Xinhe Huang, Shijuan Huang, Chuqing Sun, Sicheng Wu, Wei-Hua Chen, Luis Pedro Coelho, Kang Ning*. EXPERT: transfer learning-enabled context-aware microbial community classification. Briefings in Bioinformatics, 2022; bbac396. doi:10.1093/bib/bbac396.

Maintainer

| Name | Email | Organization | | :-------: | --------------------- | ------------------------------------------------------------ | | Hui Chong | huichong.me@gmail.com | Research Assistant, School of Life Science and Technology, Huazhong University of Science & Technology | | Xinhe Huang | huangxinhe@hust.edu.cn | Undergraduate,School of Life Science and Technology, Huazhong University of Science & Technology| | Shijuan Huang | hshijuan@qq.com | Undergraduate,School of Life Science and Technology, Huazhong University of Science & Technology| | Kang Ning | ningkang@hust.edu.cn | Professor, School of Life Science and Technology, Huazhong University of Science & Technology |

Owner

Name: NingLab
Login: HUST-NingKang-Lab
Kind: organization
Email: ningkang@hust.edu.cn
Location: Luoyu Road 1037, Wuhan, Hubei, China

Website: http://www.microbioinformatics.org/
Repositories: 35
Profile: https://github.com/HUST-NingKang-Lab

Huazhong University of Science and Technology

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 226
Total Committers: 6
Avg Commits per committer: 37.667
Development Distribution Score (DDS): 0.442

Top Committers

Name	Email	Commits
AdeBC	c**7@g**m	126
AdeBC	h**e@g**m	89
Hui-Chong	3**C@u**m	7
HugoZha	5**a@u**m	2
HuangShijuan	3**9@q**m	1
HuangShijuan	h**n@q**m	1

Committer Domains (Top 20 + Academic)

qq.com: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 5
Total pull requests: 5
Average time to close issues: about 15 hours
Average time to close pull requests: less than a minute
Total issue authors: 4
Total pull request authors: 1
Average comments per issue: 1.2
Average comments per pull request: 0.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

AdeBC (2)
wangnan990826 (1)
XiongGZ (1)

Pull Request Authors

AdeBC (5)

Top Labels

Issue Labels

enhancement (2) good first issue (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 14 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 2
Total maintainers: 1

pypi.org: expert-mst

Exact and pervasive expert model for source tracking based on transfer learning

Homepage: https://github.com/HUST-NingKang-Lab/EXPERT
Documentation: https://expert-mst.readthedocs.io/
License: MIT
Latest release: 0.2
published about 5 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 14 Last month

Rankings

Dependent packages count: 10.1%

Stargazers count: 16.5%

Forks count: 16.8%

Average: 21.2%

Dependent repos count: 21.6%

Downloads: 41.0%

Maintainers (1)

AdeBC

Last synced: 6 months ago

Dependencies

environment.yaml pypi

absl-py ==0.10.0
astunparse ==1.6.3
cachetools ==4.1.1
chardet ==3.0.4
ete3 ==3.1.2
future ==0.18.2
gast ==0.3.3
google-auth ==1.22.1
google-auth-oauthlib ==0.4.1
google-pasta ==0.2.0
grpcio ==1.32.0
h5py ==2.10.0
idna ==2.10
joblib ==0.17.0
keras-preprocessing ==1.1.2
living-tree ==0.0.5
markdown ==3.3.1
numexpr ==2.7.1
numpy ==1.18.5
oauthlib ==3.1.0
opt-einsum ==3.3.0
pandas ==1.1.3
protobuf ==3.13.0
pyasn1 ==0.4.8
pyasn1-modules ==0.2.8
python-dateutil ==2.8.1
pytz ==2020.1
requests ==2.24.0
requests-oauthlib ==1.3.0
rsa ==4.6
scikit-learn ==0.23.2
scipy ==1.5.3
six ==1.15.0
tables ==3.6.1
tensorboard ==2.3.0
tensorboard-plugin-wit ==1.7.0
tensorflow-cpu ==2.3.1
tensorflow-estimator ==2.3.0
termcolor ==1.1.0
threadpoolctl ==2.1.0
treelib ==1.5.5
urllib3 ==1.25.10
werkzeug ==1.0.1
wrapt ==1.12.1

expert-mst

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

EXPERT - a scalable model for quantifying source contributions for microbial communities

Support

Features

Installation

Quick start

Things to know before starting

Get prepared

Preprocess the dataset

Modeling and evaluation

Advanced usage

Model resources

How-to-cite

Maintainer

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: expert-mst

Rankings

Maintainers (1)

Dependencies