expert-lightning
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 10 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: LudensZhang
- License: mit
- Language: Python
- Default Branch: main
- Size: 76 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
EXPERT-lightning - a scalable model for quantifying source contributions for microbial communities
Challenges remain to be addressed in terms of quantifying source origins for microbiome samples in a fast, comprehensive, and context-aware manner. Traditional approaches to such quantification have severe trade-offs between efficiency, accuracy, and scalability.
Here, we introduce EXPERT, a scalable community-level microbial source tracking approach. Built upon the biome ontology information and transfer learning techniques, EXPERT has acquired the context-aware flexibility and could easily expand the supervised model's search scope to include the context-depende/nt community samples and understudied biomes. While at the same time, it is superior to current approaches in source tracking accuracy and speed. EXPERT's superiority has been demonstrated on multiple source tracking tasks, including source tracking samples collected at different disease stages and longitudinal samples. For details refer to our original study.
Supervised learning (with high efficiency and accuracy) meets transfer learning (with inherent high scalability), towards better understanding the dark matters in microbial community.

Support
For support using EXPERT, please contact us.
This is our beta version, any comments or insights would be greatly appreciated.
Features
- Context-aware ability to adapt to microbiome studies via transfer learning
- Fast, accurate and interpretable source tracking via ontology-aware forward propagation
- Supports both amplicon sequencing and whole-genome sequencing data.
- Selective learning from partially-labeled training data
- Ultra-fast data cleaning & cleaning via in-memory NCBI taxonomy database
- Parallelized feature encoding via
pytorch_lightning
Installation
You can simply install EXPERT-lightning using setuptools, we recommend creating a new virtual environment before installing EXPERT-lightning via conda or virtualenv.
bash
conda create -n expert-lightning # Create a new conda environment
conda activate expert-lightning # Activate the environment
python setup.py install # Install EXPERT-lightning
expert init # Initialize EXPERT and install NCBI taxonomy database
note: For GPU acceleration, we recomment users to install the GPU version of PyTorch (version 2.0.0) according to the official website or create conda environment according to our environment.yaml file.
bash
conda env create -f environment.yaml
Quick start
Here we quickly go-through basic functionalities of EXPERT through a case study, which have already been conducted in our preprinted paper. We also provided more functional show-cases in another repository.
Things to know before starting
EXPERT's fantastic function is its automatic generalization of fundamental models, which allows non-deep-learning users to modify the models just in terminal, without the need of any programming skill. Here we generalize a fundamental model for monitoring the progression of colorectal cancer (CRC) and assess the performance of the generalized model. We only use the disease model trained for quantifying contribution from hosts with different disease-associated biomes (refer to our preprint for details).
Microbial source tracking: Bayesian community-wide culture-independent microbial source tracking | Nature Methods
Cross-validation: Cross-validation (statistics) - Wikipedia
Get prepared
Please follow our instructions below and make sure all these commands were run on Linux/Mac OSX platform. You may also need to install Anaconda before starting.
Install EXPERT-lightning.
Download the fundamental model and dataset to be used. Here
CMis a abbreviation term ofcountMatrix, which is a format of abundance data (each row represents a taxon, and each column represents a sample/run).Mapperis another important input of EXPERT, which records source biomes for input samples.
Note: The disease model is trained by EXPERT v0.X, which is not compatible with EXPERT-lightning, you need to convert the model to EXPERT-lightning format by EXPERT-model_converter
bash
wget -c https://github.com/HUST-NingKang-Lab/EXPERT/releases/download/v0.2-m/disease_model.tgz
tar zxvf disease_model.tgz # Decompress the fundamental model.
for file in {QueryCM.tsv,SourceCM.tsv,QueryMapper.csv,SourceMapper.csv}; do
wget -c https://raw.githubusercontent.com/HUST-NingKang-Lab/EXPERT/master/data/$file;
done
Preprocess the dataset
- Construct a biome ontology representing stages of CRC. You'll see constructed ontology like a tree in the printed message.
bash
grep -v "Env" SourceMapper.csv | awk -F ',' '{print $6}' | sort | uniq > microbiomes.txt
expert construct -i microbiomes.txt -o ontology.pkl
- Map microbial community samples to the biome ontology to obtain hierarchical labels. You'll see counts of the samples on each biome ontology layer in the printed message.
bash
expert map --to-otlg -i SourceMapper.csv -t ontology.pkl -o SourceLabels.h5
expert map --to-otlg -i QueryMapper.csv -t ontology.pkl -o QueryLabels.h5
- Convert input abundance data to model-acceptable
hdffile. The EXPERT model only accepts standardized abundance data. Here we standardize the abundance data usingconvertmode.
bash
ls SourceCM.tsv > inputList; expert convert -i inputList -o SourceCM.h5 --in-cm;
ls QueryCM.tsv > inputList; expert convert -i inputList -o QueryCM.h5 --in-cm;
rm inputList
Modeling and evaluation
- Transfer knowledge about disease (from the disease model) to the CRC model, for a better performance on the CRC monitoring. You'll see running log and training process in the printed message.
bash
expert transfer -i SourceCM.h5 -l SourceLabels.h5 -t ontology.pkl -m disease_model -o CRC_model
- Search the query samples against the model.
bash
expert search -i QueryCM.h5 -m CRC_model -o quantified_source_contributions
- Evaluate the performance of the CRC model. You'll obtain a performance report on each stage of CRC.
bash
expert evaluate -i quantified_source_contributions -l QueryLabels.h5 -o performance_report
cat performance_report/overall.csv
You now have acquired skills of EXPERT modeling for microbial source tracking. Next, you may want to explore a question: Which fundamental model gives the best performance on the CRC monitoring? You may want to assess the performance utilizing another fundamental model. Good luck.
Advanced usage
EXPERT has enabled the adaptation to context-dependent studies, in which you can choose potential sources to be estimated. Please follow our documentation: advanced usage.
Model resources
| Model | Biome ontology | Top-level biome | Data source | Dataset size | Download link | Note | | ------------- | -------------------------------------------------------- | ---------------- | --------------------------------------------- | ------------ | ------------------------------------------------------------ | ------------------------------------------------------ | | general model | biome ontology for 132 biomes on earth (as of Jan. 2020) | root | MGnify | 115,892 | download | The samples were not uniformly processed by MGnify | | human model | biome ontology for 27 human-associated biomes | human | MGnify | 52,537 | download | The samples were not uniformly processed by MGnify | | disease model | biome ontology for 20 human disease-associated biomes | root (human gut) | GMrepo | 13,642 | download | The samples were uniformly processed by GMrepo |
Note: These models were trained on EXPERT version 0.X. If you want to use these models in EXPERT-lightning, you need to convert the model to EXPERT-lightning format by EXPERT-model_converter. The general model has been converted to EXPERT-lightning format and setted as the default model in EXPERT-lightning.
How-to-cite
If you are using EXPERT in a scientific publication (or inspired by the approach), we would appreciate citations to the following paper:
EXPERT: Transfer Learning-enabled context-aware microbial source tracking Hui Chong, Qingyang Yu, Yuguo Zha, Guangzhou Xiong, Nan Wang, Xinhe Huang, Shijuan Huang, Chuqing Sun, Sicheng Wu, Wei-Hua Chen, Luis Pedro Coelho, Kang Ning bioRxiv 10.1101/2021.01.29.428751; doi: https://doi.org/10.1101/2021.01.29.428751
Maintainer
| Name | Email | Organization | | :-------: | --------------------- | ------------------------------------------------------------ | | Hui Chong | huichong.me@gmail.com | Research Assistant, School of Life Science and Technology, Huazhong University of Science & Technology | | Xinhe Huang | huangxinhe@hust.edu.cn | Undergraduate,School of Life Science and Technology, Huazhong University of Science & Technology| | Shijuan Huang | hshijuan@qq.com | Undergraduate,School of Life Science and Technology, Huazhong University of Science & Technology| | Kang Ning | ningkang@hust.edu.cn | Professor, School of Life Science and Technology, Huazhong University of Science & Technology |
Owner
- Name: HaohongZhang
- Login: LudensZhang
- Kind: user
- Location: Wuhan, China
- Repositories: 3
- Profile: https://github.com/LudensZhang