https://github.com/fgnt/dlp_mpi

Keywords from Contributors

audio dereverberation enhancement signal-processing speech toolbox

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: fgnt
License: mit
Language: Python
Default Branch: master
Homepage: https://pypi.org/project/dlp-mpi/
Size: 208 KB

Statistics

Stars: 5
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 0

Created over 6 years ago · Last pushed 9 months ago

Metadata Files

Readme License

dlp_mpi - Data-level parallelism with mpi for python

Run an serial algorithm on multiple examples	Use dlp_mpi to run the loop body in parallel	Use dlp_mpi to run a function in parallel
```python # python script.py import time examples = list(range(10)) results = [] for example in examples: # Some heavy workload: # CPU or IO time.sleep(0.2) result = example # Remember the results results.append(result) # Summarize your experiment print(sum(results)) ```	```python # mpiexec -np 8 python script.py import time import dlp_mpi examples = list(range(10)) results = [] for example in dlp_mpi.split_managed( examples): # Some heavy workload: # CPU or IO time.sleep(0.2) result = example # Remember the results results.append(result) results = dlp_mpi.gather(results) if dlp_mpi.IS_MASTER: results = [ result for worker_results in results for result in worker_results ] # Summarize your experiment print(results) ```	```python # mpiexec -np 8 python script.py import time import dlp_mpi examples = list(range(10)) results = [] def work_load(example): # Some heavy workload: # CPU or IO time.sleep(0.2) result = example return result for result in dlp_mpi.map_unordered( work_load, examples): # Remember the results results.append(result) if dlp_mpi.IS_MASTER: # Summarize your experiment print(results) ```

This package uses mpi4py to provide utilities to parallelize algorithms that are applied to multiple examples.

The core idea is: Start N processes and each process works on a subset of all examples. To start the processes mpiexec can be used. Most HPC systems support MPI to scatter the workload across multiple hosts. For the command, look in the documentation for your HPC system and search for MPI launches.

Since each process should operate on different examples, MPI provides the variables RANK and SIZE, where SIZE is the number of workers and RANK is a unique identifier from 0 to SIZE - 1. The easiest way to improve the execution time is to process examples[RANK::SIZE] on each worker. This is a round-robin load balancing (dlp_mpi.split_round_robin). A more advanced load balancing is dlp_mpi.split_managed, where one process manages the load and assigns a new task to a worker once he finishes the last task.

When in the end of a program all results should be summarized or written in a single file, communication between all processes is nessesary. For this purpose dlp_mpi.gather (mpi4py.MPI.COMM_WORLD.gather) can be used. This function sends all data to the root process (Here, pickle is used for serialization).

As an alternative to splitting the data, this package also provides a map style parallelization (see example in the beginning): The function dlp_mpi.map_unordered calls work_load in parallel and executes the for body in serial. The communication between the processes is only the result and the index to get the ith example from the examples, i.e., the example aren't transferred between the processes.

Availabel utilities and functions

Note: dlp_mpi has dummy implementations, when mpi4py is not installed and the environment indicates that no MPI is used (Useful for running on a laptop).

dlp_mpi.RANK or mpi4py.MPI.COMM_WORLD.rank: The rank of the process. To avoid programming errors, if dlp_mpi.RANK: ... will fail.
dlp_mpi.SIZE or mpi4py.MPI.COMM_WORLD.size: The number of processes.
dlp_mpi.IS_MASTER: A flag that indicates whether the process is the default master/controller/root.
dlp_mpi.bcast(...) or mpi4py.MPI.COMM_WORLD.bcast(...): Broadcast the data from the root to all workers.
dlp_mpi.gather(...) or mpi4py.MPI.COMM_WORLD.gather(...): Send data from all workers to the root.
dlp_mpi.barrier() or mpi4py.MPI.COMM_WORLD.Barrier(): Sync all prosesses.

The advanced functions that are provided in this package are

split_round_robin(examples): Zero communication split of the data. The default is identical to examples[dlp_mpi.RANK::dlp_mpi.SIZE].
split_managed(examples): The master process manages the load balance while the others do the work. Note: The master process does not distribute the examples. It is assumed that examples have the same order on each worker.
map_unordered(work_load, examples): The master process manages the load balance, while the others execute the work_load function. The result is sent back to the master process.

Runtime

Without this package, your code runs in serial. The execution time of the following code snippets will be demonstrated by running it with this package. Regarding the color: The examples = ... is the setup code. Therefore, the code and the corresponding block representing the execution time it is blue in the code.

(Serial Worker)

This easiest way to parallelize the workload (dark orange) is to do a round-robin assignment of the load: for example in dlp_mpi.split_round_robin(examples). This function call is equivalent to for example in examples[dlp_mpi.RANK::dlp_mpi.SIZE]. Thus, there is zero comunications between the workers. Only when it is nessesary to do some final work on the results of all data (e.g. calculating average metrics) a communication is necessary. This is done with the gather function. This function returns the worker results in a list on the master process and the worker process gets a None return value. Depending on the workload the round-robin assingment can be suboptimal. See the example block diagram. Worker 1 got tasks that are relatively long. So this worker used much more time than the others.

(Round Robin)

To overcome the limitations of the round robin assignment, this package helps to use a manager to assign the work to the workers. This optimizes the utilization of the workers. Once a worker finished an example, it requests a new one from the manager and gets one assigned. Note: The communication is only which example should be processed (i.e. the index of the example) not the example itself.

(Managed Split)

An alternative to splitting the iterator is to use a map function. The function is then executed on a worker and the return value is sent back to the manager. Be carefull, that the loop body is fast enough, otherwise it can be a bottleneck. You should use the loop body only for book keeping, not for actual workload. When a worker sends a task to the manager, the manager sends back a new task and enters the for loop body. While the manager is in the loop body, he cannot react on requests of other workers, see the block diagram:

(Managed Map)

Installation

You can install this package from pypi: bash pip install mpi4py pip install dlp_mpi where mpi4py is a backend for this package. You can skip the installation of mpi4py when you want to use the internal backend, it is called ame.

To check if the installation was successful, try the following command: bash $ mpiexec -np 4 python -c 'import dlp_mpi; print(dlp_mpi.RANK)' 3 0 1 2 The command should print the numbers 0, 1, 2 and 3. The order is random. When that line prints 4 times a zero, something went wrong.

This can happen, when you have no mpi installed or the installation is broken. In a Debian-based Linux you can install it with sudo apt install libopenmpi-dev. When you do not have the rights to install something with apt, you could also install mpi4py with conda. The above pip install will install mpi4py from pypi. Be careful, that the installation from conda may conflict with your locally installed mpi. Especially in High Performance Computing (HPC) environments this can cause troubles.

AME Backend

The ame backend can be activated by setting the environment variable DLP_MPI_BACKEND to ame:

bash export DLP_MPI_BACKEND=ame

It implements the interface of mpi4py that is used by dlp_mpi.

It has the following properties:

Pure python implementation with sockets:
- No issues with binaries: The actual motivation for ame
- Most likely slower than mpi4py: MPI has many optimizations that are not implemented in ame
Communication only between root and workers, i.e. no communication between workers. So you cannot change the root in any function of dlp_mpi. But it is also unlikely that you need this feature. At least, I never needed it.
Assumes a trusted environment: The communication is not encrypted. So do not use it in an untrusted environment.
Supported launchers (mpiexec and srun):
- mpiexec build with PMI (uses PMI to setup the environment)
- mpiexec build with PMIx (use file based setup)
- slurm/srun with PMIx (use file based setup)

FAQ

Q: Can I run a script that uses dlp_mpi on my laptop, that has no running MPI (i.e. broken installation)?

A: Yes, when you uninstall mpi4py (i.e. pip uninstall mpi4py) after installing this package. When MPI is working or missing, code written with dlp_mpi should work.

Owner

Name: Department of Communications Engineering University of Paderborn
Login: fgnt
Kind: organization
Location: Paderborn, Germany

Website: http://nt.uni-paderborn.de
Repositories: 37
Profile: https://github.com/fgnt

GitHub Events

Total

Issue comment event: 1
Push event: 2
Pull request review event: 2
Pull request review comment event: 3

Last Year

Issue comment event: 1
Push event: 2
Pull request review event: 2
Pull request review comment event: 3

Committers

Last synced: about 2 years ago

All Time

Total Commits: 88
Total Committers: 5
Avg Commits per committer: 17.6
Development Distribution Score (DDS): 0.227

Past Year

Commits: 7
Committers: 2
Avg Commits per committer: 3.5
Development Distribution Score (DDS): 0.143

Top Committers

Name	Email	Commits
Christoph	c**j@m**e	68
Lukas Drude	m**l@l**e	16
Jahn Heymann	h**n@n**e	2
Dennis Ruppel	d**u@m**e	1
RautenbergFrederik	1****k	1

Committer Domains (Top 20 + Academic)

mail.uni-paderborn.de: 2 nt.upb.de: 1 lukas-drude.de: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 0
Total pull requests: 10
Average time to close issues: N/A
Average time to close pull requests: 4 days
Total issue authors: 0
Total pull request authors: 4
Average comments per issue: 0
Average comments per pull request: 0.6
Merged pull requests: 9
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

boeddeker (9)
LukasDrude (1)
RautenbergFrederik (1)
MoreDelay (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 186 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 4
Total maintainers: 2

pypi.org: dlp-mpi

Data-level parallelism with mpi in python

Homepage: https://github.com/fgnt/dlp_mpi
Documentation: https://dlp-mpi.readthedocs.io/
License: MIT License
Latest release: 0.0.4
published almost 2 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 186 Last month

Rankings

Dependent packages count: 10.1%

Forks count: 19.1%

Average: 20.2%

Stargazers count: 21.5%

Dependent repos count: 21.6%

Downloads: 28.5%

Maintainers (2)

cbj thequilo

Last synced: 7 months ago

https://github.com/fgnt/dlp_mpi

Science Score: 44.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

dlp_mpi - Data-level parallelism with mpi for python

Availabel utilities and functions

Runtime

Installation

AME Backend

FAQ

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: dlp-mpi

Rankings

Maintainers (2)

Dependencies