propulate
Propulate is an asynchronous population-based optimization algorithm and software package for global optimization and hyperparameter search on high-performance computers.
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 22 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization helmholtz-ai-energy has institutional domain (www.helmholtz.ai) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Keywords
Repository
Propulate is an asynchronous population-based optimization algorithm and software package for global optimization and hyperparameter search on high-performance computers.
Basic Info
- Host: GitHub
- Owner: Helmholtz-AI-Energy
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://doi.org/10.1007/978-3-031-32041-5_6
- Size: 8.08 MB
Statistics
- Stars: 39
- Watchers: 4
- Forks: 8
- Open Issues: 57
- Releases: 4
Topics
Metadata Files
README.md
Parallel Propagator of Populations
Click here to watch our 3 min introduction video!
What Propulate can do for you
Propulate is an HPC-tailored software for solving optimization problems in parallel. It is openly accessible and easy
to use. Compared to a widely used competitor, Propulate is consistently faster - at least an order of magnitude for a
set of typical benchmarks - and in some cases even more accurate.
Inspired by biology, Propulate borrows mechanisms from biological evolution, such as selection, recombination, and
mutation. Evolution begins with a population of solution candidates, each with randomly initialized genes. It is an
iterative "survival of the fittest" process where the population at each iteration can be viewed as a generation. For
each generation, the fitness of each candidate in the population is evaluated. The genes of the fittest candidates are
incorporated in the next generation.
Like in nature, Propulate does not wait for all compute units to finish the evaluation of the current generation.
Instead, the compute units communicate the currently available information and use that to breed the next candidate
immediately. This avoids waiting idly for other units and thus a load imbalance.
Each unit is responsible for evaluating a single candidate. The result is a fitness level corresponding with that
candidate’s genes, allowing us to compare and rank all candidates. This information is sent to other compute units as
soon as it becomes available.
When a unit is finished evaluating a candidate and communicating the resulting fitness, it breeds the candidate for the
next generation using the fitness values of all candidates it evaluated and received from other units so far.
Propulate can be used for hyperparameter optimization and neural architecture search at scale.
It was already successfully applied in several accepted scientific publications. Applications include grid load
forecasting, remote sensing, and structural molecular biology:
J. Debus, C. Debus, G. Dissertori, et al. PETNet–Coincident Particle Event Detection using Spiking Neural Networks. 2024 Neuro Inspired Computational Elements Conference (NICE), La Jolla, CA, USA, pp. 1-9 ( 2024). https://doi.org/10.1109/NICE61972.2024.10549584
D. Coquelin, K. Flügel, M. Weiel, et al. AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning. arXiv preprint (2024). https://doi.org/10.48550/arXiv.2405.01067
D. Coquelin, K. Flügel, M. Weiel, et al. Harnessing Orthogonality to Train Low-Rank Neural Networks. arXiv preprint (2024). https://doi.org/10.48550/arXiv.2401.08505
A. Weyrauch, T. Steens, O. Taubert, et al. ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers. 2024 IEEE Conference on Artificial Intelligence (CAI), Singapore, pp. 1187-1194 (2024). https://doi.org/10.1109/CAI59869.2024.00212
O. Taubert, F. von der Lehr, A. Bazarova, et al. RNA contact prediction by data efficient deep learning. Communications Biology 6(1), 913 (2023). https://doi.org/10.1038/s42003-023-05244-9
D. Coquelin, K. Flügel, M. Weiel, et al. Harnessing Orthogonality to Train Low-Rank Neural Networks. arXiv preprint (2023). https://doi.org/10.48550/arXiv.2401.08505
Y. Funk, M. Götz, and H. Anzt. Prediction of optimal solvers for sparse linear systems using deep learning. Proceedings of the 2022 SIAM Conference on Parallel Processing for Scientific Computing (pp. 14-24). Society for Industrial and Applied Mathematics (2022). https://doi.org/10.1137/1.9781611977141.2
D. Coquelin, R. Sedona, M. Riedel, and M. Götz. Evolutionary Optimization of Neural Architectures in Remote Sensing Classification Problems. IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, pp. 1587-1590 (2021). https://doi.org/10.1109/IGARSS47720.2021.9554309
In more technical terms
Propulate is a massively parallel evolutionary hyperparameter optimizer based on the island model with asynchronous
propagation of populations and asynchronous migration.
In contrast to classical GAs, Propulate maintains a continuous population of already evaluated individuals with a
softened notion of the typically strictly separated, discrete generations.
Our contributions include:
- A novel parallel genetic algorithm based on a fully asynchronized island model with independently processing workers.
- Massive parallelism by asynchronous propagation of continuous populations and migration via efficient communication using the message passing interface.
- Optimized use efficiency of parallel hardware by minimizing idle times in distributed computing environments.
To be more efficient, the generations are less well separated than they usually are in evolutionary algorithms. New individuals are generated from a pool of currently active, already evaluated individuals that may be from any generation. Individuals may be removed from the breeding population based on different criteria.
You can find the corresponding publication here:
Taubert, O. et al. (2023). Massively Parallel Genetic Optimization Through Asynchronous Propagation of Populations. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. doi.org/10.1007/978-3-031-32041-5_6
Documentation
Check out the full documentation at https://propulate.readthedocs.io/ :rocket:! Here you can find installation instructions, tutorials, theoretical background, and API references.
:point_right: If you have any questions or run into any challenges while using Propulate, don't hesitate to post an
issue :bookmark:, reach out via GitHub
discussions :octocat:, or contact us directly via e-mail
:email: to propulate@lists.kit.edu.
Installation
- You can install the latest stable release from PyPI:
pip install propulate - If you need the latest updates, you can also install
Propulatedirectly from the master branch. Pull and runpip install .. - If you want to run the tutorials, you can install the required dependencies via:
pip install ."[tutorials]" - If you want to contribute to
Propulateas a developer, you need to install the required dependencies with the package:pip install -e ."[dev]".
Propulate depends on mpi4py and requires an MPI implementation under
the hood. Currently, it is only tested with OpenMPI.
Quickstart
Below, you can find a quick recipe for how to use Propulate in general. Check out the official
ReadTheDocs documentation for more detailed tutorials
and explanations.
Let's minimize the sphere function $f\text{sphere}\left(x,y\right)=x^2 +y^2$ with Propulate as a quick example. The
minimum is at $\left(x, y\right)=\left(0,0\right)$ at the orange star.
First, we need to define the key ingredients that define our optimization problem:
- The search space of the parameters to be optimized as a Python dictionary. Propulate can handle three different
parameter types:
- A tuple of float for a continuous parameter, e.g., `{"learningrate": (0.0001, 0.01)}
- A tuple ofintfor an ordinal parameter, e.g.,{"conv_layers": (2, 10)}
- A tuple ofstrfor a categorical parameter, e.g.,{"activation": ("relu", "sigmoid", "tanh")}`
Thus, an exemplary search space might look like this:
python
search_space = {
"learning_rate": (0.0001, 0.01), # Search a continuous space between 0.0001 and 0.01.
"num_layers": (2, 10), # Search the integer space between 2 and 10 (inclusive).
"activation": ("relu", "sigmoid", "tanh"), # Search the categorical space with the specified possibilities.
}
The sphere function has two continuous parameters, $x$ and $y$, and we consider $x,y\in\left[-5.12,5.12\right]$. The
search space in our example thus looks like this:
python
limits = {
"x": (-5.12, 5.12),
"y": (-5.12, 5.12)
}
- The loss function. This is the function we want to minimize in order to find the best parameters. It can be any
Python function that
- takes a set of parameters as a Python dictionary as an input.
- returns a scalar loss value that determines how good the tested parameter set is.
In this example, the loss function whose minimum we want to find is the sphere function: ```python def sphere(params: Dict[str, float]) -> float: """ Sphere function: continuous, convex, separable, differentiable, unimodal
Input domain: -5.12 <= x, y <= 5.12
Global minimum 0 at (x, y) = (0, 0)
Parameters
----------
params: Dict[str, float]
The function parameters.
Returns
-------
float
The function value.
"""
return numpy.sum(numpy.array(list(params.values())) ** 2).item()
Next, we need to define the **evolutionary operator** or propagator that we want to use to breed new individuals during the
optimization process. `Propulate` provides a reasonable default propagator via a utility function:
python
Set up logger for Propulate optimization.
propulate.setloggerconfig()
Set up separate random number generator for Propulate optimization. DO NOT USE SOMEWHERE ELSE!
rng = random.Random(
Set up evolutionary operator.
propagator = propulate.getdefaultpropagator(
popsize=config.popsize, # The breeding population size
limits=limits, # The search-space limits
rng=rng, # Random number generator
)
We also need to set up the asynchronous parallel evolutionary **optimizer**, that is a so-called ``Propulator`` instance:
python
Set up Propulator performing actual optimization.
propulator = propulate.Propulator(
lossfn=sphere,
propagator=propagator,
rng=rng,
generations=config.generations,
checkpointpath=config.checkpoint,
)
Now we can run the actual optimization. Overall, ``generations * mpi4py.MPI.COMM_WORLD.size`` evaluations will be
performed:
python
Run optimization and print summary of results.
propulator.propulate()
propulator.summarize()
The output should look something like this:
text
PROPULATE: Parallel Propagator of Populations
[2024-03-12 14:37:01,374][propulate.propulator][INFO] - No valid checkpoint file given. Initializing population randomly... [2024-03-12 14:37:01,374][propulate.propulator][INFO] - Island 0 has 4 workers. [2024-03-12 14:37:01,374][propulate.propulator][INFO] - Island 0 Worker 0: In generation 0... [2024-03-12 14:37:01,374][propulate.propulator][INFO] - Island 0 Worker 3: In generation 0... [2024-03-12 14:37:01,374][propulate.propulator][INFO] - Island 0 Worker 2: In generation 0... [2024-03-12 14:37:01,374][propulate.propulator][INFO] - Island 0 Worker 1: In generation 0... [2024-03-12 14:37:01,377][propulate.propulator][INFO] - Island 0 Worker 3: In generation 10... [2024-03-12 14:37:01,377][propulate.propulator][INFO] - Island 0 Worker 1: In generation 10... [2024-03-12 14:37:01,378][propulate.propulator][INFO] - Island 0 Worker 0: In generation 10... [2024-03-12 14:37:01,378][propulate.propulator][INFO] - Island 0 Worker 2: In generation 10...
... [2024-03-12 14:37:02,197][propulate.propulator][INFO] - Island 0 Worker 1: In generation 960... [2024-03-12 14:37:02,206][propulate.propulator][INFO] - Island 0 Worker 2: In generation 990... [2024-03-12 14:37:02,206][propulate.propulator][INFO] - Island 0 Worker 1: In generation 970... [2024-03-12 14:37:02,215][propulate.propulator][INFO] - Island 0 Worker 1: In generation 980... [2024-03-12 14:37:02,224][propulate.propulator][INFO] - Island 0 Worker 1: In generation 990... [2024-03-12 14:37:02,232][propulate.propulator][INFO] - OPTIMIZATION DONE. NEXT: Final checks for incoming messages... [2024-03-12 14:37:02,244][propulate.propulator][INFO] - ###########
SUMMARY
Number of currently active individuals is 4000. Expected overall number of evaluations is 4000. [2024-03-12 14:37:03,703][propulate.propulator][INFO] - Top 1 result(s) on island 0: (1): [{'a': '2.91E-3', 'b': '-3.05E-3'}, loss 1.78E-5, island 0, worker 0, generation 956] ```
Let's get your hands dirty
Do the following to run the example script:
- Make sure you have a working MPI installation on your machine.
- If you have not already done this, create a fresh virtual environment with
Python:$ python3 -m venv best-venv-ever - Activate it:
$ source best-venv-ever/bin/activate - Upgrade
pip:$ pip install --upgrade pip - Install
Propulate:$ pip install propulate - Run the example script
propulator_example.py:$ mpirun --use-hwthread-cpus python propulator_example.py
Acknowledgments
This work is supported by the Helmholtz AI platform grant.
Owner
- Name: Helmholtz AI Energy
- Login: Helmholtz-AI-Energy
- Kind: organization
- Email: consultant-helmholtz.ai@kit.edu
- Location: Karlsruhe, Germany
- Website: https://www.helmholtz.ai/
- Repositories: 9
- Profile: https://github.com/Helmholtz-AI-Energy
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
url: "https://github.com/Helmholtz-AI-Energy/propulate"
preferred-citation:
journal: "Lecture Notes in Computer Science"
title: "Massively Parallel Genetic Optimization Through Asynchronous Propagation of Populations"
authors:
- family-names: "Taubert"
given-names: "Oskar"
- family-names: "Weiel"
given-names: "Marie"
- family-names: "Coquelin"
given-names: "Daniel"
- family-names: "Farshian"
given-names: "Anis"
- family-names: "Debus"
given-names: "Charlotte"
- family-names: "Schug"
given-names: "Alexander"
- family-names: "Streit"
given-names: "Achim"
- family-names: "Götz"
given-names: "Markus"
volume: 13948
start: 106
end: 124
year: 2023
doi: 10.1007/978-3-031-32041-5_6
GitHub Events
Total
- Create event: 4
- Issues event: 3
- Watch event: 6
- Delete event: 4
- Member event: 1
- Issue comment event: 11
- Push event: 56
- Pull request review comment event: 1
- Pull request review event: 6
- Pull request event: 11
- Fork event: 1
Last Year
- Create event: 4
- Issues event: 3
- Watch event: 6
- Delete event: 4
- Member event: 1
- Issue comment event: 11
- Push event: 56
- Pull request review comment event: 1
- Pull request review event: 6
- Pull request event: 11
- Fork event: 1
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 73
- Total pull requests: 33
- Average time to close issues: 4 months
- Average time to close pull requests: 9 days
- Total issue authors: 9
- Total pull request authors: 8
- Average comments per issue: 0.49
- Average comments per pull request: 0.39
- Merged pull requests: 27
- Bot issues: 0
- Bot pull requests: 5
Past Year
- Issues: 1
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 3 months
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 1.2
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 5
Top Authors
Issue Authors
- mcw92 (46)
- oskar-taubert (30)
- coquelin77 (7)
- Markus-Goetz (6)
- fluegelk (5)
- Morridin (3)
- SMEISEN (2)
- DonnyDevV (1)
- elcorto (1)
- otaub (1)
Pull Request Authors
- mcw92 (23)
- oskar-taubert (12)
- pre-commit-ci[bot] (9)
- coquelin77 (4)
- SMEISEN (3)
- JanisHe (2)
- Morridin (1)
- elcorto (1)
- DonnyDevV (1)
- vtotiv (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- cycler ==0.11.0
- deepdiff ==5.8.0
- kiwisolver ==1.4.2
- matplotlib ==3.2.1
- mpi4py ==3.0.3
- numpy ==1.22.3
- ordered-set ==4.1.0
- pyparsing ==3.0.7
- python-dateutil ==2.8.2
- six ==1.16.0
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite