Melissa

Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses - Published in JOSS (2023)

https://gitlab.inria.fr/melissa/melissa

Keywords

Modeling&Simulation high performance computing (HPC)

Keywords from Contributors

cryptocurrencies

Last synced: 10 months ago · JSON representation

Repository

Melissa is a file-avoiding, fault-tolerant, and elastic framework designed for large-scale sensitivity analysis and large-scale deep surrogate training on supercomputers.

Basic Info

Host: gitlab.inria.fr
Owner: melissa
License: bsd-3-clause
Default Branch: develop

Statistics

Stars: 2
Forks: 3
Open Issues:
Releases: 0

Topics

Modeling&Simulation high performance computing (HPC)

Created over 3 years ago

https://gitlab.inria.fr/melissa/melissa/blob/develop/

## Melissa

[![DOI](https://joss.theoj.org/papers/10.21105/joss.05291/status.svg)](https://doi.org/10.21105/joss.05291)

### Summary

Melissa is a file-avoiding, fault-tolerant, and elastic framework designed for _large-scale sensitivity analysis_ and _large-scale deep surrogate training_ on supercomputers. Some of its largest studies have utilized up to 30,000 cores to run 80,000 parallel simulations while avoiding up to 288 TB of intermediate data storage (see [@ribes2022]).

![Melissa architecture](docs/assets/melissa-architecture.png)

Traditional sensitivity analysis and deep surrogate training involve running multiple simulation instances with different input parameters, storing the results on disk, and later retrieving them to train a neural network or compute required statistics. However, the storage demands can quickly become overwhelming, leading to long read times and inefficient data processing. To mitigate this, researchers often reduce study sizes by running lower-resolution simulations or down-sampling output data in space and time.

### How it works

Melissa (as shown in the figure below) overcomes storage limitations by eliminating intermediate file storage and processing data in transit, enabling large-scale data processing:

- **Sensitivity Analysis Server:** Melissa uses iterative statistical algorithms and an asynchronous client-server model for data transfer. Instead of storing simulation outputs on disk, it transmits them via NxM communication patterns to a parallelized server. This approach enables real-time statistical computations without requiring disk storage, allowing full-scale studies with oblivious statistical mapping for every mesh element and time step. Melissa supports various statistical measures (_e.g._ mean, variance, skewness, kurtosis, and Sobol indices) and can be extended with new algorithms.

- **Deep Learning Server:** Following a similar approach, client simulations send data in a round-robin manner to a parallelized, multithreaded server. The server manages a buffer for training batches, ensuring efficient memory use. Once the buffer reaches a predefined safety watermark, selected samples form training batches for distributed training on GPUs or CPUs. Memory is managed dynamically by selecting and evicting samples based on predefined policies, enabling both online and pseudo-offline training by adjusting the buffer size, watermark, and selection/eviction strategies.

![Overview of Melissa's deep learning framework](docs/assets/melissa-dl.png)

Both sensitivity analysis and deep surrogate training in Melissa depend on three key components:

1. **Melissa Client:** This is the parallel numerical simulation code, adapted to function as a client. Each client runs independently and sends mid-simulation output to the server whenever `melissa_send()` is called.

2. **Melissa Server:** A parallel executable responsible for computing statistics or training a Neural Network (more details [here](docs/melissa-server.md)). It updates statistics and generates training batches upon receiving new data from any connected client.

3. **Melissa Launcher:** A front-end Python script that orchestrates the execution of the study (more details [here](docs/melissa-launcher.md)). It automates large-scale job scheduling in `OpenMPI` and integrates with cluster schedulers like `slurm` and `OAR`, handling job submission, monitoring, and fault tolerance.

### User interface

To run an analysis with Melissa, users need to follow these steps:

1. **Instrument the Simulation Code:** Modify the simulation to use the Melissa API with three main calls`init`, `send`, and `finalize`so it functions as a Melissa client ([details here](docs/use-case/instrument-solver.md)).

2. **Configure the Analysis:** Define how simulation parameters are sampled, select statistical computations, or specify the Neural Network architecture and training settings ([details here](docs/use-case/configuration-file.md)).

3. **Launch the Analysis:** Run the Melissa launcher via the terminal or the supercomputer's front-end ([quick start guide](docs/first-dl-study.md)). Melissa handles resource allocation, execution monitoring, and automatic restarts for failed components.

Melissas API currently supports C, Fortran, and Python solvers but can be extended to other languages by following the approach in the [API folder](https://gitlab.inria.fr/melissa/melissa/-/tree/develop/api).

### List of publications

* **MelissaDL x Breed: Towards Data-Efficient On-line Supervised Training of Multi-parametric Surrogates with Active Learning.** Sofya Dymchenko, Abhishek Purandare, Bruno Raffin [https://hal-lara.archives-ouvertes.fr/NUMPEX/hal-04712480v1](https://hal-lara.archives-ouvertes.fr/NUMPEX/hal-04712480v1)

* **Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses.** Marc Schouler, Robert Alexander Caulk, Lucas Meyer, Thophile Terraz, Christoph Conrads, Sebastian Friedemann, Achal Agarwal, Juan Manuel Baldonado, Bartlomiej Pogodziski, Anna Sekula, et al. [https://inria.hal.science/hal-04145897](https://inria.hal.science/hal-04145897)

* **Melissa: Large Scale In Transit Sensitivity Analysis Avoiding Intermediate Files.** Thophile Terraz, Alejandro Ribes, Yvan Fournier, Bertrand Iooss, Bruno Raffin. The International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Nov 2017, Denver, United States. pp.1 - 14. [PDF](https://hal.inria.fr/hal-01607479/file/main-Sobol-SC-2017-HALVERSION.pdf)

* **The Challenges of In Situ Analysis for Multiple Simulations.** Alejandro Ribs, Bruno Raffin. ISAV 2020 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Nov 2020, Atlanta, United States. pp.1-6. [https://hal.inria.fr/hal-02968789](https://hal.inria.fr/hal-02968789)

Owner

Name: melissa
Login: melissa
Kind: organization

Repositories: 1
Profile: https://gitlab.inria.fr/melissa

This group gathers the solutions based on the Melissa architecture for on-line processing of data produced from large scale ensemble runs (sensibility analysis, data assimilation,...)

JOSS Publication

Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses

Published

June 16, 2023

DOI

10.21105/joss.05291

Volume 8, Issue 86, Page 5291

Authors

Marc Schouler

Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Robert Alexander Caulk

Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Lucas Meyer

Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France, Industrial AI Laboratory SINCLAIR, EDF Lab Paris-Saclay, France

Théophile Terraz
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Christoph Conrads
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Sebastian Friedemann
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Achal Agarwal

Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Juan Manuel Baldonado
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Bartłomiej Pogodziński
Institute of Bioorganic Chemistry Polish Academy of Sciences, Poznań Supercomputing and Networking Center

Anna Sekuła

Institute of Bioorganic Chemistry Polish Academy of Sciences, Poznań Supercomputing and Networking Center

Alejandro Ribes
Industrial AI Laboratory SINCLAIR, EDF Lab Paris-Saclay, France

Bruno Raffin
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, France

Editor

Patrick Diehl

Committers

Last synced: 10 months ago

All Time

Total Commits: 2,627
Total Committers: 33
Avg Commits per committer: 79.606
Development Distribution Score (DDS): 0.787

Past Year

Commits: 360
Committers: 2
Avg Commits per committer: 180.0
Development Distribution Score (DDS): 0.003

Top Committers

Name	Email	Commits
Marc Schouler	m**r@i**r	559
Terraz Theophile	t**z@i**r	547
rcaulk	r**k@i**r	377
Abhishek Purandare	a**e@i**r	359
Christoph Conrads	c**s@i**r	196
robcaulk	r**k@g**m	139
Bartlomiej Pogodzinski	b**i@m**l	86
Lucas Meyer	l**r@i**r	66
RAFFIN Bruno	b**n@i**r	60
Anna	s**a@g**m	59
Sebastian Friedemann	s**n@g**t	58
Achal Agarwal	a**1@g**m	19
sfriedem	s**n@i**e	17
xy124	q**d@g**t	17
Achal Agarwal	a**l@i**r	11
Bartłomiej Pogodziński	b**i@g**m	10
friedems	s**n@u**r	7
Anthony Geay	a**y@e**r	6
Robert Caulk	r**k@f**n	6
jbaldona	u**o@j**r	5
tterraz	t**z@t**r	5
sfriedem	s**n@i**r	4
Marc Schouler	s**c@g**m	3
jbaldona	u**o@j**r	2
Adrien Faure	a**e@p**m	1
DYMCHENKO Sofya	s**o@i**r	1
Juan Manuel Baldonado	j**o@f**r	1
Juan Manuel Baldonado	j**o@g**r	1
Juan Manuel Baldonado	j**o@M**l	1
Juan Manuel Baldonado	j**o@e**r	1
and 3 more...

Committer Domains (Top 20 + Academic)

inria.fr: 10 edf.fr: 2 gmx.net: 2 total.com: 1 eduroam-112180.grenet.fr: 1 grisou-46.nancy.grid5000.fr: 1 jean-zay1.idris.fr: 1 terraz-pc.imag.fr: 1 jean-zay3.idris.fr: 1 univ-grenoble-alpes.fr: 1 inria.de: 1 man.poznan.pl: 1

Issues and Pull Requests

Last synced: 10 months ago

Packages

Total packages: 3
Total downloads: unknown

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 0
(may contain duplicates)
Total versions: 4
Total maintainers: 4

spack.io: melissa

Melissa is a file-avoiding, adaptive, fault-tolerant and elastic framework, to run large-scale sensitivity analysis on supercomputers.

Homepage: https://gitlab.inria.fr/melissa/melissa
License: []
Latest release: 0.7.1
published about 4 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Average: 28.6%

Dependent packages count: 57.3%

Maintainers (2)

raffino abhishek1297

Last synced: 10 months ago

spack.io: melissa-api

Melissa is a file-avoiding, adaptive, fault-tolerant and elastic framework, to run large-scale sensitivity analysis or deep-surrogate training on supercomputers. This package builds the API used when instrumenting the clients.

Homepage: https://gitlab.inria.fr/melissa/melissa
License: []

Versions: 0
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Average: 29.0%

Dependent packages count: 57.9%

Maintainers (3)

raffino robcaulk mschouler

Last synced: 10 months ago

spack.io: py-melissa-core

Melissa is a file-avoiding, adaptive, fault-tolerant and elastic framework, to run large-scale sensitivity analysis or deep-surrogate training on supercomputers. This package builds the launcher and server modules.

Homepage: https://gitlab.inria.fr/melissa/melissa
License: []

Versions: 0
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Average: 29.0%

Dependent packages count: 58.0%

Maintainers (2)

raffino abhishek1297

Last synced: 10 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Melissa

Science Score: 89.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

https://gitlab.inria.fr/melissa/melissa/blob/develop/

Owner

JOSS Publication

Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses

Authors

Editor

Tags

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Packages

spack.io: melissa

Rankings

Maintainers (2)

spack.io: melissa-api

Rankings

Maintainers (3)

spack.io: py-melissa-core

Rankings

Maintainers (2)