Melissa
Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses - Published in JOSS (2023)
Science Score: 89.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
13 of 33 committers (39.4%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Repository
Melissa is a file-avoiding, fault-tolerant, and elastic framework designed for large-scale sensitivity analysis and large-scale deep surrogate training on supercomputers.
Basic Info
- Host: gitlab.inria.fr
- Owner: melissa
- License: bsd-3-clause
- Default Branch: develop
Statistics
- Stars: 2
- Forks: 3
- Open Issues:
- Releases: 0
Topics
https://gitlab.inria.fr/melissa/melissa/blob/develop/
## Melissa [](https://doi.org/10.21105/joss.05291) ### Summary Melissa is a file-avoiding, fault-tolerant, and elastic framework designed for _large-scale sensitivity analysis_ and _large-scale deep surrogate training_ on supercomputers. Some of its largest studies have utilized up to 30,000 cores to run 80,000 parallel simulations while avoiding up to 288 TB of intermediate data storage (see [@ribes2022]).  Traditional sensitivity analysis and deep surrogate training involve running multiple simulation instances with different input parameters, storing the results on disk, and later retrieving them to train a neural network or compute required statistics. However, the storage demands can quickly become overwhelming, leading to long read times and inefficient data processing. To mitigate this, researchers often reduce study sizes by running lower-resolution simulations or down-sampling output data in space and time. ### How it works Melissa (as shown in the figure below) overcomes storage limitations by eliminating intermediate file storage and processing data in transit, enabling large-scale data processing: - **Sensitivity Analysis Server:** Melissa uses iterative statistical algorithms and an asynchronous client-server model for data transfer. Instead of storing simulation outputs on disk, it transmits them via NxM communication patterns to a parallelized server. This approach enables real-time statistical computations without requiring disk storage, allowing full-scale studies with oblivious statistical mapping for every mesh element and time step. Melissa supports various statistical measures (_e.g._ mean, variance, skewness, kurtosis, and Sobol indices) and can be extended with new algorithms. - **Deep Learning Server:** Following a similar approach, client simulations send data in a round-robin manner to a parallelized, multithreaded server. The server manages a buffer for training batches, ensuring efficient memory use. Once the buffer reaches a predefined safety watermark, selected samples form training batches for distributed training on GPUs or CPUs. Memory is managed dynamically by selecting and evicting samples based on predefined policies, enabling both online and pseudo-offline training by adjusting the buffer size, watermark, and selection/eviction strategies.  Both sensitivity analysis and deep surrogate training in Melissa depend on three key components: 1. **Melissa Client:** This is the parallel numerical simulation code, adapted to function as a client. Each client runs independently and sends mid-simulation output to the server whenever `melissa_send()` is called. 2. **Melissa Server:** A parallel executable responsible for computing statistics or training a Neural Network (more details [here](docs/melissa-server.md)). It updates statistics and generates training batches upon receiving new data from any connected client. 3. **Melissa Launcher:** A front-end Python script that orchestrates the execution of the study (more details [here](docs/melissa-launcher.md)). It automates large-scale job scheduling in `OpenMPI` and integrates with cluster schedulers like `slurm` and `OAR`, handling job submission, monitoring, and fault tolerance. ### User interface To run an analysis with Melissa, users need to follow these steps: 1. **Instrument the Simulation Code:** Modify the simulation to use the Melissa API with three main calls`init`, `send`, and `finalize`so it functions as a Melissa client ([details here](docs/use-case/instrument-solver.md)). 2. **Configure the Analysis:** Define how simulation parameters are sampled, select statistical computations, or specify the Neural Network architecture and training settings ([details here](docs/use-case/configuration-file.md)). 3. **Launch the Analysis:** Run the Melissa launcher via the terminal or the supercomputer's front-end ([quick start guide](docs/first-dl-study.md)). Melissa handles resource allocation, execution monitoring, and automatic restarts for failed components. Melissas API currently supports C, Fortran, and Python solvers but can be extended to other languages by following the approach in the [API folder](https://gitlab.inria.fr/melissa/melissa/-/tree/develop/api). ### List of publications * **MelissaDL x Breed: Towards Data-Efficient On-line Supervised Training of Multi-parametric Surrogates with Active Learning.** Sofya Dymchenko, Abhishek Purandare, Bruno Raffin [https://hal-lara.archives-ouvertes.fr/NUMPEX/hal-04712480v1](https://hal-lara.archives-ouvertes.fr/NUMPEX/hal-04712480v1) * **Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses.** Marc Schouler, Robert Alexander Caulk, Lucas Meyer, Thophile Terraz, Christoph Conrads, Sebastian Friedemann, Achal Agarwal, Juan Manuel Baldonado, Bartlomiej Pogodziski, Anna Sekula, et al. [https://inria.hal.science/hal-04145897](https://inria.hal.science/hal-04145897) * **Melissa: Large Scale In Transit Sensitivity Analysis Avoiding Intermediate Files.** Thophile Terraz, Alejandro Ribes, Yvan Fournier, Bertrand Iooss, Bruno Raffin. The International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Nov 2017, Denver, United States. pp.1 - 14. [PDF](https://hal.inria.fr/hal-01607479/file/main-Sobol-SC-2017-HALVERSION.pdf) * **The Challenges of In Situ Analysis for Multiple Simulations.** Alejandro Ribs, Bruno Raffin. ISAV 2020 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Nov 2020, Atlanta, United States. pp.1-6. [https://hal.inria.fr/hal-02968789](https://hal.inria.fr/hal-02968789)
Owner
- Name: melissa
- Login: melissa
- Kind: organization
- Repositories: 1
- Profile: https://gitlab.inria.fr/melissa
This group gathers the solutions based on the Melissa architecture for on-line processing of data produced from large scale ensemble runs (sensibility analysis, data assimilation,...)
JOSS Publication
Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses
Authors
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France, Industrial AI Laboratory SINCLAIR, EDF Lab Paris-Saclay, France
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France
Institute of Bioorganic Chemistry Polish Academy of Sciences, Poznań Supercomputing and Networking Center
Institute of Bioorganic Chemistry Polish Academy of Sciences, Poznań Supercomputing and Networking Center
Industrial AI Laboratory SINCLAIR, EDF Lab Paris-Saclay, France
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France
Tags
supercomputing sensitivity analysis deep learning distributed systems orchestrationCommitters
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Marc Schouler | m****r@i****r | 559 |
| Terraz Theophile | t****z@i****r | 547 |
| rcaulk | r****k@i****r | 377 |
| Abhishek Purandare | a****e@i****r | 359 |
| Christoph Conrads | c****s@i****r | 196 |
| robcaulk | r****k@g****m | 139 |
| Bartlomiej Pogodzinski | b****i@m****l | 86 |
| Lucas Meyer | l****r@i****r | 66 |
| RAFFIN Bruno | b****n@i****r | 60 |
| Anna | s****a@g****m | 59 |
| Sebastian Friedemann | s****n@g****t | 58 |
| Achal Agarwal | a****1@g****m | 19 |
| sfriedem | s****n@i****e | 17 |
| xy124 | q****d@g****t | 17 |
| Achal Agarwal | a****l@i****r | 11 |
| Bartłomiej Pogodziński | b****i@g****m | 10 |
| friedems | s****n@u****r | 7 |
| Anthony Geay | a****y@e****r | 6 |
| Robert Caulk | r****k@f****n | 6 |
| jbaldona | u****o@j****r | 5 |
| tterraz | t****z@t****r | 5 |
| sfriedem | s****n@i****r | 4 |
| Marc Schouler | s****c@g****m | 3 |
| jbaldona | u****o@j****r | 2 |
| Adrien Faure | a****e@p****m | 1 |
| DYMCHENKO Sofya | s****o@i****r | 1 |
| Juan Manuel Baldonado | j****o@f****r | 1 |
| Juan Manuel Baldonado | j****o@g****r | 1 |
| Juan Manuel Baldonado | j****o@M****l | 1 |
| Juan Manuel Baldonado | j****o@e****r | 1 |
| and 3 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
Packages
- Total packages: 3
- Total downloads: unknown
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 4
- Total maintainers: 4
spack.io: melissa
Melissa is a file-avoiding, adaptive, fault-tolerant and elastic framework, to run large-scale sensitivity analysis on supercomputers.
- Homepage: https://gitlab.inria.fr/melissa/melissa
- License: []
-
Latest release: 0.7.1
published about 4 years ago
Rankings
Maintainers (2)
spack.io: melissa-api
Melissa is a file-avoiding, adaptive, fault-tolerant and elastic framework, to run large-scale sensitivity analysis or deep-surrogate training on supercomputers. This package builds the API used when instrumenting the clients.
- Homepage: https://gitlab.inria.fr/melissa/melissa
- License: []
Rankings
spack.io: py-melissa-core
Melissa is a file-avoiding, adaptive, fault-tolerant and elastic framework, to run large-scale sensitivity analysis or deep-surrogate training on supercomputers. This package builds the launcher and server modules.
- Homepage: https://gitlab.inria.fr/melissa/melissa
- License: []