galaxy-ml

Make machine learning simpler with Galaxy

https://github.com/goeckslab/galaxy-ml

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
✓
Committers with academic emails
2 of 8 committers (25.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Keywords from Contributors

sequences dna usegalaxy genomics workflow-engine ngs bioinformatics interactive optimizing-compiler clade

Last synced: 10 months ago · JSON representation

Repository

Make machine learning simpler with Galaxy

Basic Info

Host: GitHub
Owner: goeckslab
License: mit
Language: HTML
Default Branch: main
Homepage: https://goeckslab.github.io/Galaxy-ML/
Size: 613 MB

Statistics

Stars: 11
Watchers: 2
Forks: 7
Open Issues: 7
Releases: 6

Created about 7 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

Galaxy-ML

Galaxy-ML is a web machine learning end-to-end pipeline building framework, with special support to biomedical data. Under the management of unified scikit-learn APIs, cutting-edge machine learning libraries are combined together to provide thousands of different pipelines suitable for various needs. In the form of Galalxy tools, Galaxy-ML provides scalabe, reproducible and transparent machine learning computations.

Key features

simple web UI
no coding or minimum coding requirement
fast model deployment and model selection, specialized in hyperparameter tuning using GridSearchCV
high level of parallel and automated computation

Supported modules

A typic machine learning pipeline is composed of a main estimator/model and optional preprocessing component(s).

Model

scikit-learn
- sklearn.ensemble
- sklearn.linear_model
- sklearn.naive_bayes
- sklearn.neighbors
- sklearn.svm
- sklearn.tree
xgboost
- XGBClassifier
- XGBRegressor
mlxtend
- StackingCVClassifier
- StackingClassifier
- StackingCVRegressor
- StackingRegressor
Keras (Deep learning models are re-implemented to fully support sklearn APIs. Supports parameter, including layer subparameter, swaps or searches. Supports callbacks)
- KerasGClassifier
- KerasGRegressor
- KerasGBatchClassifier (works best with online data generators, processing images, genomic sequences and so on)
BinarizeTargetClassifier/BinarizeTargetRegressor
IRAPSClassifier

Preprocessor

scikit-learn
- sklearn.preprocessing
- sklearn.feature_selection
- sklearn.decomposition
- sklearn.kernel_approximation
- sklearn.cluster
imblanced-learn
- imblearn.under_sampling
- imblearn.over_sampling
- imblearn.combine
skrebate
- ReliefF
- SURF
- SURFstar
- MultiSURF
- MultiSURFstar
TDMScaler
DyRFE/DyRFECV
Z_RandomOverSampler
GenomeOneHotEncoder
ProteinOneHotEncoder
FastaDNABatchGenerator
FastaRNABatchGenerator
FastaProteinBatchGenerator
GenomicIntervalBatchGenerator
GenomicVariantBatchGenerator
ImageDataFrameBatchGenerator

Installation

APIs for models, preprocessors and utils implemented in Galaxy-ML can be installed separately.

Installing using anaconda (recommended)

conda install -c bioconda -c conda-forge Galaxy-ML

Installing using pip

pip install -U Galaxy-ML

Installing from source

python setup.py install

Using source code inplace

python install -e .

To install Galaxy-ML tools in Galaxy, please refer to https://galaxyproject.org/admin/tools/add-tool-from-toolshed-tutorial/.

Running the tests

Before running the tests, run the following commands:

conda create --name galaxy_ml python=3.9 conda activate galaxy_ml pip install -e . pip install nose nose-htmloutput pytest cd galaxy_ml

To run all tests and generate an HTML report: nosetests ./tests --with-html --html-file=./report.html

To run tests in a specific file (e.g., testkerasgalaxy.py file) and generate an HTML report nosetests ./tests/test_keras_galaxy.py --with-html --html-file=./report.html

To run a specific test in a specific file (e.g., testmultidimensionaloutput test in testkerasgalaxy.py file) and generate an HTML report ``` nosetests ./tests/testkerasgalaxy.py:testmultidimensionaloutput --with-html --html-file=./report.html ```

Examples for using Galaxy-ML custom models

```

handle imports

from keras.models import Sequential from keras.layers import Dense, Activation from sklearn.modelselection import GridSearchCV from galaxyml.kerasgalaxymodels import KerasGClassifier

build a DNN classifier

model = Sequential() model.add(Dense(64)) model.add(Activation(relu')) model.add((Dense(1, activation=sigmoid))) config = model.get_config()

classifier = KerasGClassifier(config, random_state=42)

clone a classifier

clf = clone(classifier)

Get parameters

params = clf.get_params()

Set parameters

newparams = dict( epochs=60, lr=0.01, layers1Denseconfigkernelinitializerconfigseed=999, layers0Denseconfigkernelinitializerconfigseed=999 ) clf.setparams(**new_params)

model evaluation using GridSearchCV

grid = GridSearchCV(clf, paramgrid={}, scoring=rocauc, cv=5, n_jobs=2) grid.fit(X, y) ```

Example for using Galaxy-ML to persist a sklearn/keras model

``` from galaxyml.modelpersist import (dumpmodeltoh5, loadmodelfromh5)

dump model to hdf5

dumpmodeltoh5(model, `savepath`, store_hyperparameter=True)

load model from hdf5

model = loadmodelfromh5(`pathto_hdf5)``

Performance comparison

Galaxy-ML's HDF5 saving utils perform faster than cPickle for large, array-rich models.

``` Loading model using pickle... (1.2471628189086914 s)

Dumping model using pickle... (3.6942389011383057 s) File size: 930712861

Dumping model to hdf5... (3.006715774536133 s) File size: 930729696

Loading model from hdf5... (0.6420958042144775 s)

Pipeline(memory=None, steps=[('robustscaler', RobustScaler(copy=True, quantilerange=(25.0, 75.0), withcentering=True, withscaling=True)), ('kneighborsclassifier', KNeighborsClassifier(algorithm='auto', leafsize=30, metric='minkowski', metricparams=None, njobs=1, n_neighbors=100, p=2, weights='uniform'))], verbose=False) ```

Publication

Gu Q, Kumar A, Bray S, Creason A, Khanteymoori A, Jalili V, et al. (2021) Galaxy-ML: An accessible, reproducible, and scalable machine learning toolkit for biomedicine. PLoS Comput Biol 17(6): e1009014. https://doi.org/10.1371/journal.pcbi.1009014

Owner

Name: goeckslab
Login: goeckslab
Kind: organization

Repositories: 8
Profile: https://github.com/goeckslab

GitHub Events

Total

Last Year

Committers

Last synced: over 2 years ago

All Time

Total Commits: 609
Total Committers: 8
Avg Commits per committer: 76.125
Development Distribution Score (DDS): 0.082

Past Year

Commits: 30
Committers: 1
Avg Commits per committer: 30.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Qiang Gu	g**1@g**m	559
Qiang Gu	3****u	26
Kaivan Kamali	k**2@p**u	14
Anup Kumar	a**z@g**m	6
Marcel Bargull	m**l@u**u	1
kxk302	k**2@g**m	1
dependabot[bot]	4****]	1
Björn Grüning	b**n@g**u	1

Committer Domains (Top 20 + Academic)

gruenings.eu: 1 udo.edu: 1 psu.edu: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 9
Total pull requests: 62
Average time to close issues: 5 days
Average time to close pull requests: 10 days
Total issue authors: 3
Total pull request authors: 6
Average comments per issue: 0.78
Average comments per pull request: 0.48
Merged pull requests: 54
Bot issues: 0
Bot pull requests: 4

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

qiagu (6)
kxk302 (2)
anuprulez (1)

Pull Request Authors

qiagu (50)
dependabot[bot] (4)
kxk302 (4)
qchiujunhao (3)
mbargull (1)
bgruening (1)

Top Labels

Issue Labels

tools / enhancement (2) tool / usage tips (1)

Pull Request Labels

dependencies (4) api / enhancement (1) tools / enhancement (1)

Packages

Total packages: 1
Total downloads:
- pypi 26 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 18
Total maintainers: 1

pypi.org: galaxy-ml

Galaxy Machine Learning Library

Homepage: https://github.com/goeckslab/Galaxy-ML/
Documentation: https://galaxy-ml.readthedocs.io/
License: MIT License
Latest release: 0.10.0
published over 3 years ago

Versions: 18
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 26 Last month

Rankings

Dependent packages count: 10.0%

Forks count: 12.5%

Average: 16.7%

Stargazers count: 17.1%

Dependent repos count: 21.7%

Downloads: 22.1%

Maintainers (1)

qiagu

Last synced: 10 months ago

Dependencies

requirements.txt pypi

asteval >=0.9.14
bleach >=3.3.0
cython >=0.29.11
h5py >=3.1
imbalanced-learn >=0.8.0,<0.9
joblib >=0.13.2,<1.0
matplotlib >=3.1.1
mlxtend >=0.17,<0.18
numpy >=1.18.0,<1.21
pandas >=1.0,<1.3
plotly >=4.10.0,<5.0
pyfaidx *
pytabix *
scikit-learn >=0.24,<0.25
scikit-optimize >=0.9
scipy >=1.3.1
six <=1.15.0
skrebate >=0.60,<0.70
tensorflow >=2.5.0,<2.6
xgboost >=1.3,<1.4

.github/workflows/ci.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/download-artifact v2 composite
actions/setup-python v1 composite
actions/upload-artifact v2 composite
galaxyproject/planemo-ci-action v1 composite
peter-evans/create-or-update-comment v1 composite
postgres 11 docker

.github/workflows/pr.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/download-artifact v2 composite
actions/setup-python v1 composite
actions/upload-artifact v2 composite
galaxyproject/planemo-ci-action v1 composite
postgres 11 docker

.github/workflows/slash.yaml actions

peter-evans/slash-command-dispatch v2 composite

Dockerfile docker

python 3.9-slim build

setup.py pypi

galaxy-ml

Science Score: 46.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

Galaxy-ML

Key features

Supported modules

Model

Preprocessor

Installation

Installing using anaconda (recommended)

Installing using pip

Installing from source

Using source code inplace

Running the tests

Examples for using Galaxy-ML custom models

handle imports

build a DNN classifier

clone a classifier

Get parameters

Set parameters

model evaluation using GridSearchCV

Example for using Galaxy-ML to persist a sklearn/keras model

dump model to hdf5

load model from hdf5

Performance comparison

Publication

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: galaxy-ml

Rankings

Maintainers (1)

Dependencies