Science Score: 46.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
✓Committers with academic emails
2 of 8 committers (25.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords from Contributors
Repository
Make machine learning simpler with Galaxy
Basic Info
- Host: GitHub
- Owner: goeckslab
- License: mit
- Language: HTML
- Default Branch: main
- Homepage: https://goeckslab.github.io/Galaxy-ML/
- Size: 613 MB
Statistics
- Stars: 11
- Watchers: 2
- Forks: 7
- Open Issues: 7
- Releases: 6
Metadata Files
README.md
Galaxy-ML
Galaxy-ML is a web machine learning end-to-end pipeline building framework, with special support to biomedical data. Under the management of unified scikit-learn APIs, cutting-edge machine learning libraries are combined together to provide thousands of different pipelines suitable for various needs. In the form of Galalxy tools, Galaxy-ML provides scalabe, reproducible and transparent machine learning computations.
Key features
- simple web UI
- no coding or minimum coding requirement
- fast model deployment and model selection, specialized in hyperparameter tuning using
GridSearchCV - high level of parallel and automated computation
Supported modules
A typic machine learning pipeline is composed of a main estimator/model and optional preprocessing component(s).
Model
- scikit-learn
- sklearn.ensemble
- sklearn.linear_model
- sklearn.naive_bayes
- sklearn.neighbors
- sklearn.svm
- sklearn.tree
- xgboost
- XGBClassifier
- XGBRegressor
-
- StackingCVClassifier
- StackingClassifier
- StackingCVRegressor
- StackingRegressor
Keras (Deep learning models are re-implemented to fully support sklearn APIs. Supports parameter, including layer subparameter, swaps or searches. Supports
callbacks)- KerasGClassifier
- KerasGRegressor
- KerasGBatchClassifier (works best with online data generators, processing images, genomic sequences and so on)
BinarizeTargetClassifier/BinarizeTargetRegressor
Preprocessor
- scikit-learn
- sklearn.preprocessing
- sklearn.feature_selection
- sklearn.decomposition
- sklearn.kernel_approximation
- sklearn.cluster
- imblanced-learn
- imblearn.under_sampling
- imblearn.over_sampling
- imblearn.combine
- skrebate
- ReliefF
- SURF
- SURFstar
- MultiSURF
- MultiSURFstar
- TDMScaler
- DyRFE/DyRFECV
- Z_RandomOverSampler
- GenomeOneHotEncoder
- ProteinOneHotEncoder
- FastaDNABatchGenerator
- FastaRNABatchGenerator
- FastaProteinBatchGenerator
- GenomicIntervalBatchGenerator
- GenomicVariantBatchGenerator
- ImageDataFrameBatchGenerator
Installation
APIs for models, preprocessors and utils implemented in Galaxy-ML can be installed separately.
Installing using anaconda (recommended)
conda install -c bioconda -c conda-forge Galaxy-ML
Installing using pip
pip install -U Galaxy-ML
Installing from source
python setup.py install
Using source code inplace
python install -e .
To install Galaxy-ML tools in Galaxy, please refer to https://galaxyproject.org/admin/tools/add-tool-from-toolshed-tutorial/.
Running the tests
Before running the tests, run the following commands:
conda create --name galaxy_ml python=3.9
conda activate galaxy_ml
pip install -e .
pip install nose nose-htmloutput pytest
cd galaxy_ml
To run all tests and generate an HTML report:
nosetests ./tests --with-html --html-file=./report.html
To run tests in a specific file (e.g., testkerasgalaxy.py file) and generate an HTML report
nosetests ./tests/test_keras_galaxy.py --with-html --html-file=./report.html
To run a specific test in a specific file (e.g., testmultidimensionaloutput test in testkerasgalaxy.py file) and generate an HTML report ``` nosetests ./tests/testkerasgalaxy.py:testmultidimensionaloutput --with-html --html-file=./report.html ```
Examples for using Galaxy-ML custom models
```
handle imports
from keras.models import Sequential from keras.layers import Dense, Activation from sklearn.modelselection import GridSearchCV from galaxyml.kerasgalaxymodels import KerasGClassifier
build a DNN classifier
model = Sequential() model.add(Dense(64)) model.add(Activation(relu')) model.add((Dense(1, activation=sigmoid))) config = model.get_config()
classifier = KerasGClassifier(config, random_state=42)
clone a classifier
clf = clone(classifier)
Get parameters
params = clf.get_params()
Set parameters
newparams = dict( epochs=60, lr=0.01, layers1Denseconfigkernelinitializerconfigseed=999, layers0Denseconfigkernelinitializerconfigseed=999 ) clf.setparams(**new_params)
model evaluation using GridSearchCV
grid = GridSearchCV(clf, paramgrid={}, scoring=rocauc, cv=5, n_jobs=2) grid.fit(X, y) ```
Example for using Galaxy-ML to persist a sklearn/keras model
``` from galaxyml.modelpersist import (dumpmodeltoh5, loadmodelfromh5)
dump model to hdf5
dumpmodeltoh5(model, `savepath`, store_hyperparameter=True)
load model from hdf5
model = loadmodelfromh5(`pathto_hdf5)
``
Performance comparison
Galaxy-ML's HDF5 saving utils perform faster than cPickle for large, array-rich models.
``` Loading model using pickle... (1.2471628189086914 s)
Dumping model using pickle... (3.6942389011383057 s) File size: 930712861
Dumping model to hdf5... (3.006715774536133 s) File size: 930729696
Loading model from hdf5... (0.6420958042144775 s)
Pipeline(memory=None, steps=[('robustscaler', RobustScaler(copy=True, quantilerange=(25.0, 75.0), withcentering=True, withscaling=True)), ('kneighborsclassifier', KNeighborsClassifier(algorithm='auto', leafsize=30, metric='minkowski', metricparams=None, njobs=1, n_neighbors=100, p=2, weights='uniform'))], verbose=False) ```
Publication
Gu Q, Kumar A, Bray S, Creason A, Khanteymoori A, Jalili V, et al. (2021) Galaxy-ML: An accessible, reproducible, and scalable machine learning toolkit for biomedicine. PLoS Comput Biol 17(6): e1009014. https://doi.org/10.1371/journal.pcbi.1009014
Owner
- Name: goeckslab
- Login: goeckslab
- Kind: organization
- Repositories: 8
- Profile: https://github.com/goeckslab
GitHub Events
Total
Last Year
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Qiang Gu | g****1@g****m | 559 |
| Qiang Gu | 3****u | 26 |
| Kaivan Kamali | k****2@p****u | 14 |
| Anup Kumar | a****z@g****m | 6 |
| Marcel Bargull | m****l@u****u | 1 |
| kxk302 | k****2@g****m | 1 |
| dependabot[bot] | 4****] | 1 |
| Björn Grüning | b****n@g****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 9
- Total pull requests: 62
- Average time to close issues: 5 days
- Average time to close pull requests: 10 days
- Total issue authors: 3
- Total pull request authors: 6
- Average comments per issue: 0.78
- Average comments per pull request: 0.48
- Merged pull requests: 54
- Bot issues: 0
- Bot pull requests: 4
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- qiagu (6)
- kxk302 (2)
- anuprulez (1)
Pull Request Authors
- qiagu (50)
- dependabot[bot] (4)
- kxk302 (4)
- qchiujunhao (3)
- mbargull (1)
- bgruening (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 26 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 18
- Total maintainers: 1
pypi.org: galaxy-ml
Galaxy Machine Learning Library
- Homepage: https://github.com/goeckslab/Galaxy-ML/
- Documentation: https://galaxy-ml.readthedocs.io/
- License: MIT License
-
Latest release: 0.10.0
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- asteval >=0.9.14
- bleach >=3.3.0
- cython >=0.29.11
- h5py >=3.1
- imbalanced-learn >=0.8.0,<0.9
- joblib >=0.13.2,<1.0
- matplotlib >=3.1.1
- mlxtend >=0.17,<0.18
- numpy >=1.18.0,<1.21
- pandas >=1.0,<1.3
- plotly >=4.10.0,<5.0
- pyfaidx *
- pytabix *
- scikit-learn >=0.24,<0.25
- scikit-optimize >=0.9
- scipy >=1.3.1
- six <=1.15.0
- skrebate >=0.60,<0.70
- tensorflow >=2.5.0,<2.6
- xgboost >=1.3,<1.4
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/download-artifact v2 composite
- actions/setup-python v1 composite
- actions/upload-artifact v2 composite
- galaxyproject/planemo-ci-action v1 composite
- peter-evans/create-or-update-comment v1 composite
- postgres 11 docker
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/download-artifact v2 composite
- actions/setup-python v1 composite
- actions/upload-artifact v2 composite
- galaxyproject/planemo-ci-action v1 composite
- postgres 11 docker
- peter-evans/slash-command-dispatch v2 composite
- python 3.9-slim build