mantis-ml

mantis-ml: Stochastic semi-supervised learning to prioritise genes from high throughput genomic screens

https://github.com/astrazeneca-cgr-publications/mantis-ml-release

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Keywords

auto-ml genomics keras machine-learning scikit-learn tensorflow

Last synced: 6 months ago · JSON representation

Repository

mantis-ml: Stochastic semi-supervised learning to prioritise genes from high throughput genomic screens

Basic Info

Host: GitHub
Owner: astrazeneca-cgr-publications
License: mpl-2.0
Language: Python
Default Branch: master
Homepage:
Size: 274 MB

Statistics

Stars: 30
Watchers: 3
Forks: 9
Open Issues: 1
Releases: 2

Topics

auto-ml genomics keras machine-learning scikit-learn tensorflow

Created over 6 years ago · Last pushed about 2 years ago

Metadata Files

Readme License

mantis-ml

Introduction
Installation
Run

Introduction

mantis-ml is a disease-agnostic gene prioritisation framework, implementing stochastic semi-supervised learning on top of scikit-learn and keras/tensorflow.
mantis-ml takes its name from the Greek word '' which means 'fortune teller', 'predicter'.

|Publication - Please cite: | | :---- | |Mantis-ml: Disease-Agnostic Gene Prioritization from High-Throughput Genomic Screens by Stochastic Semi-supervised Learning.
Dimitrios Vitsios, Slav Petrovski.
The American Journal of Human Genetics (Cell Press), May 7, 2020 https://doi.org/10.1016/j.ajhg.2020.03.012 |

| Gene prioritisation Atlas: | | :---- | | https://dvitsios.github.io/mantis-ml-predictions | | This resource contains gene prediction results extracted by mantis-ml across 10 disease areas in 6 specialties: Cardiology, Immunology, Nephrology, Neurology, Psychiatry and Pulmonology. |

Installation

Requirements: Python3 (tested with v3.6.7)

mantis-ml can be installed through pip: pip install mantis-ml

Alternatively, it can be installed from the github repository:

``` git clone https://github.com/astrazeneca-cgr-publications/mantis-ml-release.git pip install -e .

or

python setup.py install ```

In either case, it is highly recommended to create a new virtual environment (e.g. with conda) before installing mantis-ml: conda create -n mantis_ml python=3.6 conda config --append channels conda-forge # add conda-forge in the channels list conda activate mantis_ml # activate the newly created conda environment

You may now call the following scripts from the command line: - mantisml: run mantis-ml gene prioritisation based on a provided config file (.yaml) - mantisml-preview: preview selected phenotypes and features based on a provided config file - mantisml-overlap: run enrichment test between mantis-ml predictions and an external ranked gene list to get refined gene predictions

Run each command with -h to see all available options.

Run

You need to provide a config file (.yaml) containing information about the diseases/phenotypes of interest.

Required field:

Disease/Phenotype terms: terms that characterise a phenotype or disease of interest (free text)

Optional fields:

Additional associated terms: terms used along with Disease/Phenotype terms to extract additional disease/phenotype-associated features (free text)
Diseases/Phenotypes to exclude: terms to exclude from disease/phenotype characterisation and feature selection (free text)

Config examples: ```

Epilepsy_config.yaml

Disease/Phenotype terms: epileptic, epilepsy, seizure Additional associated terms: brain, nerve, nervous, neuronal, cerebellum, cerebral, hippocampus, hypothalamus Diseases/Phenotypes to exclude:

CKD_config.yaml

Disease/Phenotype terms: renal, kidney, nephro, glomerul, distal tubule Additional associated terms: Diseases/Phenotypes to exclude: adrenal ```

Other example config files can be found under example-input or mantis-ml/conf.

Supervised learning models

mantis-ml runs 6 different supervised models by default: Extra Trees, Random Forest, SVC, Gradient Boosting, XGBoost and Deep Neural Net.
It is also possible to run mantis-ml with the -f / --fast option, which will force mantis-ml to train only 4 classifiers: Extra Trees, Random Forest, SVC and Gradient Boosting.
Additionally, the user may explicitly specify which supervised models to be used for training via the -m option. The available model options are coded as follows:
- et: Extra Trees
- rf: Random Forest
- gb: Gradient Boosting
- xgb: XGBoost
- svc: Support Vector Classifier
- dnn: Deep Neural Net
- stack: Stacking classifier

Multiple models may be specified using a , separator, e.g. -m et, -m et,stack,gb etc.

Estimated run time

mantis-ml total run time is inversely proportional to the number of known disease-associated (seed) genes (the fewer the seed genes are the more balanced datasets there are to be trained).
Example run times for different numbers of seed genes are given in this table. All results correspond to mantis-ml runs across 10 stochastic iterations, training with 6 different supervised models and using 10 cores.

| Disease example| Num. of seed genes | Total run time | | -------------- | ------------------ | --------------- | | Epilepsy | 864 | 2h | | Chronic Kidney Disease | 587 | 2.5h | | Amyotrophic Lateral Sclerosis | 77 | 11h |

Representative examples of run times when using the -f / --fast option, two classifiers with the -m option or just the Stacking classifer are also given below (CKD dataset, 10 stochastic iterations, 10 cores):

| Number of models | Total run time | | -------------- | --------------- | | 6 (default) | 2.5h | | 4 (-f) | 43m | | 2 (-m et,rf) | 19m | | Stacking (-m stack) | 1.5h |

`mantisml`

You need to provide a config file (.yaml) and an output directory.
You may also: - define the number of threads to use (-n option; default value: 4). - define the number of stochastic iterations (-i option; default value: 10) - provide a file with custom seed genes (-k option; file should contain new-line separated HGNC names; bypasses HPO)

mantisml -c [config_file] -o [output_dir] [-n nthreads] [-i iterations] [-k custom_seed_genes.txt]

Example

mantisml -c CKD_config.yaml -o ./CKD-run mantisml -c Epilepsy_config.yaml -o /tmp/Epilepsy-testing -n 20

`mantisml` Output

mantisml predictions for all genes and across all classifiers can be found at [output_dir]/Gene-Predictions.
The AUC_performance_by_Classifier.pdf file under the same dir contains information about the AUC performance per classifier and thus informs about the best performing classifier.

Output figures from all steps during the mantis-ml run (e.g. Exploratory Data Analysis/EDA, supervised-learning, unsupervised-learning) can be found under [output_dir]/Output-Figures.

`mantisml-profiler`

Preview selected phenotypes and features (optional)

You may preview all selected phenotypes and relevant features based on your input config file parameters by running the mantisml-profiler command.

To run mantisml-profiler, you need to provide a config file (.yaml) and an output directory. mantisml-profiler [-v] -c [config_file] -o [output_dir]

`mantisml-overlap`

Run enrichment test between mantis-ml predictions and an external ranked gene list to get refined gene predictions

To run mantisml-overlap, you need to provide a config file (.yaml), an output directory with mantisml results and an external ranked gene list file (mantisml has to be run already given the same ouput directory). mantisml-overlap -c [config_file] -o [output_dir] -e [external_ranked_file]

`mantisml-overlap` external ranked file [-e]

The external ranked gene list file may contain a single column with ranked genes or two columns (tab-delimited), with the 2nd column containing p-values. Examples of external ranked lists for both cases are available at example-input.

`mantisml-overlap` Output

Results are available under [output_dir]/Overlap-Enrichment-Results.

mantisml-overlap generates figures with the enrichment signal between mantis-ml predictions and the external ranked file, based on a hypergeometric test. These can be found under: Overlap-Enrichment-Results/hypergeom-enrichment-figures.
mantisml-overlap also extracts consensus gene predictions with support by multiple classifiers. Results can be found at Overlap-Enrichment-Results/Gene-Predictions-After-Overlap.

Owner

Name: AstraZeneca Centre for Genomics Research - publications
Login: astrazeneca-cgr-publications
Kind: organization
Email: CGR-Informatics-Support@astrazeneca.com
Location: United Kingdom

Repositories: 3
Profile: https://github.com/astrazeneca-cgr-publications

GitHub Events

Total

Watch event: 3

Last Year

Watch event: 3

Committers

Last synced: over 1 year ago

All Time

Total Commits: 165
Total Committers: 2
Avg Commits per committer: 82.5
Development Distribution Score (DDS): 0.364

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
kclc950	d**s@g**m	105
Dimitrios Vitsios	d****s	60

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 9
Average time to close issues: N/A
Average time to close pull requests: 22 days
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.56
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 4

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (4)
dvitsios (4)
Iain-S (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (4)

Packages

Total packages: 1
Total downloads:
- pypi 63 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 11
Total maintainers: 1

pypi.org: mantis-ml

Disease-agnostic gene prioritisation from high-throughput genomic screens by stochastic semi-supervised learning

Homepage: https://github.com/astrazeneca-cgr-publications/mantis-ml-release
Documentation: https://mantis-ml.readthedocs.io/
License: mpl-2.0
Latest release: 1.6.5
published over 5 years ago

Versions: 11
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 63 Last month

Rankings

Dependent packages count: 10.0%

Forks count: 11.9%

Stargazers count: 12.9%

Average: 16.9%

Dependent repos count: 21.7%

Downloads: 27.9%

Maintainers (1)

dvitsios

Last synced: 6 months ago

Dependencies

setup.py pypi

Keras ==2.2.4
PyYAML ==5.1
bokeh ==1.1.0
h5py ==2.9.0
matplotlib ==3.0.3
numpy ==1.16.3
numpydoc ==0.8.0
palettable ==3.1.1
pandas ==0.24.2
plotly ==3.9.0
scikit-learn ==0.20.3
scipy ==1.2.1
seaborn ==0.9.0
setuptools ==39.1.0
tables ==3.5.1
tensorflow ==1.12.0
tqdm ==4.14
twine ==3.0.0
umap-learn ==0.3.8
xgboost ==0.80

mantis-ml

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

mantis-ml

Introduction

Installation

or

Run

Required field:

Optional fields:

Epilepsy_config.yaml

CKD_config.yaml

Supervised learning models

Estimated run time

mantisml

Example

mantisml Output

mantisml-profiler

Preview selected phenotypes and features (optional)

mantisml-overlap

Run enrichment test between mantis-ml predictions and an external ranked gene list to get refined gene predictions

mantisml-overlap external ranked file [-e]

mantisml-overlap Output

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mantis-ml

Rankings

Maintainers (1)

Dependencies

`mantisml`

`mantisml` Output

`mantisml-profiler`

`mantisml-overlap`

`mantisml-overlap` external ranked file [-e]

`mantisml-overlap` Output