https://github.com/radiantone/blazer

An HPC abstraction over MPI with built-in parallel compute primitives

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary

Keywords

hpc mpi mpi4py parallel-computing pipeline python supercomputing workflows

Last synced: 6 months ago · JSON representation

To run: (venv) $ export PYTHONPATH=. (venv) $ mpirun -n 4 python blazer/examples/example1.py PARALLEL1: [{'this': 1}, {'this': 2}, {'this': 3}, {'this': 4}, {'this': 5}] PARALLEL2: [{'this': 4}, {'this': 6}, {'this': 2}, {'this': 8},; {'this': 10}] PIPELINE: {'this': {'some': ({'this': 'DATA'},), 'date': '2022-02-11 02:47:23.356461'}} SCATTER_DATA: [{'some': 0}, {'some': 1}, {'some': 2}, {'some': 3}, {'some': 4}, {'some': 5}, {'some': 6}, {'some': 7}, {'some': 8}, {'some': 9}, {'some': None}, {'some': None}] PIPELINE RESULT: [{'this': [{'this': ([{'some': 0}, {'some': 1}, {'some': 2}, {'some': 3}, {'some': 4}, {'some': 5}, {'some': 6}, {'some': 7}, {'some': 8}, {'some': 9}, {'some': None}, {'some': None}],)}, {'some': {'some': [{'some': 0}, {'some': 1}, {'some': 2}, {'some': 3}, {'some': 4}, {'some': 5}, {'some': 6}, {'some': 7}, {'some': 8}, {'some': 9}, {'some': None}, {'some': None}]}}]}, {'some': 'some'}, {'more': 'stuff'}] [0, 1, 2, 3, 4, 5, 6, 7] SCATTER: [{'some': 0}, {'some': 1}, {'some': 2}, {'some': 3}, {'some': 4}, {'some': 5}, {'some': 6}, {'some': 7}]

A map/reduce example

```python import blazer from blazer.hpc.mpi import map, reduce

def sqr(x): return x * x

def add(x, y=0): return x+y

with blazer.begin(): result = map(sqr, [1, 2, 3, 4])

blazer.print(result)
result = reduce(add, result)

blazer.print(result)

```

To run: (venv) $ export PYTHONPATH=. (venv) $ mpirun -n 4 python blazer/examples/example3.py [1, 4, 9, 16] 30

Streaming Compute

Blazer supports the notion of streaming compute to handle jobs where the data can't fit into memory on a single machine. In this example we implement a map/reduce computation where everything is streaming from the source data through the results without loading all the data into memory.

```python """ Streaming map/reduce example """ from itertools import groupby from random import randrange from typing import Generator

import blazer from blazer.hpc.mpi import stream

def datagen() -> Generator: for i in range(0, 1000): r = randrange(2) v = randrange(100) if r: yield {"one": 1, "value": v} else: yield {"zero": 0, "value": v}

def key_func(k): return k["key"]

def map(datum): datum["key"] = list(datum.keys())[0] return datum

def reduce(datalist): from blazer.hpc.mpi import rank

_list = sorted(datalist, key=key_func)
grouped = groupby(_list, key_func)
return [{"rank": rank, key: list(group)} for key, group in grouped]

with blazer.begin(): import json

mapper = stream(datagen(), map, results=True)
reducer = stream(mapper, reduce, results=True)
if blazer.ROOT:
    for result in reducer:
        blazer.print("RESULT", json.dumps(result, indent=4))

```

NOTE: blazer has (currently) only been tested on mpirun (Open MPI) 4.1.0

Overview

Blazer is a high-performance computing (HPC) library that hides the complexities of a super computer's message-passing interface or (MPI). Users want to focus on their code and their data and not fuss with low-level API's for orchestrating results, building pipelines and running fast, parallel code. This is why blazer exists!

With blazer, a user only needs to work with simple, straightforward python. No cumbersome API's, idioms, or decorators are needed. This means they can get started quicker, run faster code, and get their jobs done faster!

General Design

Blazer is designed around the concept of computing primitives or operations. Some of the primitives include:

parallel - For computing a list of tasks in parallel
pipeline - For computing a list of tasks in sequence, passing the results along
map - For mapping a task to a dataset
reduce - For mapping a task to a data list and computing a single result

In addition there are other primitives to help manipulate lists of tasks or data, such as:

where - Filter a list of tasks or data elements based on a function or lambda
select - Apply a function to each list element and return the result

Context Handlers

Blazer uses convenient context handlers to control blocks of code that need to be scheduled to MPI processes behind the scenes. There are two types of context handlers currently.

MPI Context Handler

blazer.begin() is a mandatory context that enables the MPI scheduler behind the various primitives to operate correctly.

```python

import blazer

blazer.begin(): def get_data(): """ Data generator """ for i in range(0, (size * 2)): yield i

result = scatter(get_data(), calc_some)
blazer.print("SCATTER:", result)

```

GPU Context Handler

blazer.gpu() is a context that requests (from the invisible MPI scheduler) dedicated access to a specific GPU on your MPI node fabric.

```python import logging import blazer import numpy as np

from blazer.hpc.mpi.primitives import host, rank from numba import vectorize from timeit import default_timer as timer

def dovectors():

@vectorize(['float32(float32, float32)'], target='cuda')
def dopow(a, b):
    return a ** b

vec_size = 100

a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
c = np.zeros(vec_size, dtype=np.float32)

start = timer()
dopow(a, b)
duration = timer() - start
return duration

with blazer.begin(gpu=True): # on-fabric MPI scheduler with blazer.gpu() as gpu: # on-metal GPU scheduler # gpu object contains metadata about the GPU assigned print(dovectors()) ```

Shared Data Context Handler

Blazer supports synchronizing shared data across ranks seamlessly. Here is an example of sharing a dictionary where each rank adds its own data to the dictionary and it is available to all other ranks magically!

```python from random import randrange

import blazer from blazer.hpc.mpi.primitives import rank

with blazer.environment() as vars: rv = randrange(10) vars["rank" + str(rank)] = [ {"key": randrange(10)}, randrange(10), randrange(10), randrange(10), ]

print("RANK:", rank, "DATA", vars.vars)

blazer.print(vars["rank1"]) ```

Cross-Cluster Supercomputing

Blazer comes with a built-in design pattern for performing cross-cluster HPC. This is useful if you want to allocate compute resources on different super-computers and then build a pipeline of jobs across them. Here is a simple example using ALCF's Cooley and Theta systems (which are built into blazer).

```python from blazer.hpc.alcf import cooley, thetagpu from blazer.hpc.local import parallel, pipeline, partial as p

Log into each cluster using MFA password from MobilePASS

cooleyjob = cooley.job(user='dgovoni', n=1, q="debug", A="datascience", password=True, script="/home/dgovoni/git/blazer/testcooley.sh").login()
thetajob = thetagpu.job(user='dgovoni', n=1, q="single-gpu", A="datascience", password=True, script="/home/dgovoni/git/blazer/testthetagpu.sh").login()

def hello(data, *args): return "Hello "+str(data)

Mix and match cluster compute jobs with local code tasks

in serial chaining

cooleyjob("some data").then(hello).then(thetajob).then(hello)

Run a cross cluster compute job

result = pipeline([ p(thetajob,"some data2"), p(cooleyjob,"some data1") ])

print("Done") ```

When each job .login() method is run, it will gather the MFA login credentials for that system and then use that to schedule jobs on that system via ssh.

Notice the use of the pipeline primitive above. It's the same primitive you would use to build your compute workflows! Composable tasks and composable super-computers.

Owner

Name: Darren Govoni
Login: radiantone
Kind: user
Location: Washington DC

Repositories: 8
Profile: https://github.com/radiantone

High-Performance Computing Expert, Full-Stack Python, Typescript/Javascript, VueJS/Quasar, API Developer and more...

GitHub Events

Total

Last Year

Committers

Last synced: about 2 years ago

All Time

Total Commits: 311
Total Committers: 2
Avg Commits per committer: 155.5
Development Distribution Score (DDS): 0.01

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Darren Govoni	d**n@r**i	308
Darren Govoni	d**i@a**v	3

Committer Domains (Top 20 + Academic)

anl.gov: 1 rapidquest.ai: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 0
Total pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

radiantone (4)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 26 last-month

Total dependent packages: 0
Total dependent repositories: 3
Total versions: 18
Total maintainers: 1

pypi.org: blazer

An HPC abstraction over MPI that uses pipes and pydash primitives for composable super-computing.

Homepage: https://github.com/radiantone/blazer
Documentation: https://blazer.readthedocs.io/
License: CC0-1.0 License
Latest release: 0.1.35
published almost 4 years ago

Versions: 18
Dependent Packages: 0
Dependent Repositories: 3
Downloads: 26 Last month

Rankings

Dependent packages count: 7.3%

Dependent repos count: 9.1%

Stargazers count: 20.4%

Average: 20.9%

Forks count: 30.0%

Downloads: 37.6%

Maintainers (1)

radiantone

Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi

Babel ==2.9.1
GPUtil ==1.4.0
HeapDict ==1.0.1
Jinja2 ==3.0.3
MarkupSafe ==2.0.1
Pillow ==9.0.1
PyNaCl ==1.5.0
PyYAML ==6.0
Pygments ==2.11.2
SecretStorage ==3.3.1
Sphinx ==4.4.0
alabaster ==0.7.12
amqp ==5.0.9
astroid ==2.9.3
attrs ==21.4.0
autopep8 ==1.6.0
bcrypt ==3.2.0
billiard ==3.6.4.0
black ==22.1.0
bleach ==4.1.0
bokeh ==2.4.2
celery ==5.2.3
certifi ==2021.10.8
cffi ==1.15.0
charset-normalizer ==2.0.11
click ==8.0.3
click-didyoumean ==0.3.0
click-plugins ==1.1.1
click-repl ==0.2.0
cloudpickle ==2.0.0
colorama ==0.4.4
cryptography ==36.0.1
dask ==2022.1.1
dill ==0.3.4
distributed ==2022.1.1
docutils ==0.17.1
flake8 ==4.0.1
fsspec ==2022.1.0
idna ==3.3
imagesize ==1.3.0
importlib-metadata ==4.10.1
iniconfig ==1.1.1
install ==1.3.5
isort ==5.10.1
iterlib ==1.1.6
jeepney ==0.7.1
keyring ==23.5.0
kombu ==5.2.3
lazy-object-proxy ==1.7.1
llvmlite ==0.38.0
locket ==0.2.1
loguru ==0.6.0
mccabe ==0.6.1
msgpack ==1.0.3
mypy ==0.941
mypy-extensions ==0.4.3
numba ==0.55.1
numpy ==1.21.5
packaging ==21.3
pandas ==1.4.0
paramiko ==2.9.2
partd ==1.2.0
pathspec ==0.9.0
pipe ==1.6.0
pkginfo ==1.8.2
platformdirs ==2.4.1
pluggy ==1.0.0
prompt-toolkit ==3.0.28
psutil ==5.9.0
py ==1.11.0
pycodestyle ==2.8.0
pycparser ==2.21
pydash ==5.1.0
pyflakes ==2.4.0
pylint ==2.12.2
pyparsing ==3.0.7
pytest ==7.0.1
python-dateutil ==2.8.2
pytz ==2021.3
readme-renderer ==34.0
requests ==2.27.1
requests-toolbelt ==0.9.1
rfc3986 ==2.0.0
six ==1.16.0
snowballstemmer ==2.2.0
sortedcontainers ==2.4.0
sphinx-click ==3.1.0
sphinx-rtd-theme ==1.0.0
sphinxcontrib-applehelp ==1.0.2
sphinxcontrib-devhelp ==1.0.2
sphinxcontrib-htmlhelp ==2.0.0
sphinxcontrib-jsmath ==1.0.1
sphinxcontrib-qthelp ==1.0.3
sphinxcontrib-serializinghtml ==1.1.5
tblib ==1.7.0
toml ==0.10.2
tomli ==2.0.1
toolz ==0.11.2
torch ==1.10.2
tornado ==6.1
tqdm ==4.62.3
twine ==3.8.0
types-cryptography ==3.3.18
types-paramiko ==2.8.16
types-setuptools ==57.4.10
typing-extensions ==4.0.1
urllib3 ==1.26.8
vine ==5.0.0
wcwidth ==0.2.5
webencodings ==0.5.1
wrapt ==1.13.3
zict ==2.0.0
zipp ==3.7.0

requirements.txt pypi

Babel ==2.9.1
GPUtil ==1.4.0
HeapDict ==1.0.1
Jinja2 ==3.0.3
MarkupSafe ==2.0.1
Pillow ==9.0.1
PyNaCl ==1.5.0
PyYAML ==6.0
Pygments ==2.11.2
SecretStorage ==3.3.1
Sphinx ==4.4.0
alabaster ==0.7.12
amqp ==5.0.9
astroid ==2.9.3
attrs ==21.4.0
autopep8 ==1.6.0
bcrypt ==3.2.0
billiard ==3.6.4.0
black ==22.1.0
bleach ==4.1.0
bokeh ==2.4.2
celery ==5.2.3
certifi ==2021.10.8
cffi ==1.15.0
charset-normalizer ==2.0.11
click ==8.0.3
click-didyoumean ==0.3.0
click-plugins ==1.1.1
click-repl ==0.2.0
cloudpickle ==2.0.0
colorama ==0.4.4
cryptography ==36.0.1
dask ==2022.1.1
dill ==0.3.4
distributed ==2022.1.1
docutils ==0.17.1
flake8 ==4.0.1
fsspec ==2022.1.0
idna ==3.3
imagesize ==1.3.0
importlib-metadata ==4.10.1
iniconfig ==1.1.1
install ==1.3.5
isort ==5.10.1
iterlib ==1.1.6
jeepney ==0.7.1
keyring ==23.5.0
kombu ==5.2.3
lazy-object-proxy ==1.7.1
llvmlite ==0.38.0
locket ==0.2.1
loguru ==0.6.0
mccabe ==0.6.1
mpi4py ==3.1.3
msgpack ==1.0.3
mypy ==0.941
mypy-extensions ==0.4.3
numba ==0.55.1
numpy ==1.21.5
packaging ==21.3
pandas ==1.4.0
paramiko ==2.9.2
partd ==1.2.0
pathspec ==0.9.0
pipe ==1.6.0
pkginfo ==1.8.2
platformdirs ==2.4.1
pluggy ==1.0.0
prompt-toolkit ==3.0.28
psutil ==5.9.0
py ==1.11.0
pycodestyle ==2.8.0
pycparser ==2.21
pydash ==5.1.0
pyflakes ==2.4.0
pylint ==2.12.2
pyparsing ==3.0.7
pytest ==7.0.1
python-dateutil ==2.8.2
pytz ==2021.3
readme-renderer ==34.0
requests ==2.27.1
requests-toolbelt ==0.9.1
rfc3986 ==2.0.0
six ==1.16.0
snowballstemmer ==2.2.0
sortedcontainers ==2.4.0
sphinx-click ==3.1.0
sphinx-rtd-theme ==1.0.0
sphinxcontrib-applehelp ==1.0.2
sphinxcontrib-devhelp ==1.0.2
sphinxcontrib-htmlhelp ==2.0.0
sphinxcontrib-jsmath ==1.0.1
sphinxcontrib-qthelp ==1.0.3
sphinxcontrib-serializinghtml ==1.1.5
tblib ==1.7.0
toml ==0.10.2
tomli ==2.0.1
toolz ==0.11.2
torch ==1.10.2
tornado ==6.1
tqdm ==4.62.3
twine ==3.8.0
types-cryptography ==3.3.18
types-paramiko ==2.8.16
types-setuptools ==57.4.10
typing-extensions ==4.0.1
urllib3 ==1.26.8
vine ==5.0.0
wcwidth ==0.2.5
webencodings ==0.5.1
wrapt ==1.13.3
zict ==2.0.0
zipp ==3.7.0

setup.py pypi

dill ==0.3.4
mpi4py ==3.1.3
numba *
pipe ==1.6.0
pydash ==5.1.0
pytest *

.github/workflows/python-publish.yml actions

actions/checkout master composite
actions/checkout v1 composite
actions/create-release v1 composite
actions/setup-python v1 composite
heinrichreimer/github-changelog-generator-action v2.1.1 composite
pypa/gh-action-pypi-publish release/v1 composite

https://github.com/radiantone/blazer

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Install

Linting

Tests

Examples

Streaming Compute

Overview

General Design

Context Handlers

MPI Context Handler

GPU Context Handler

Shared Data Context Handler

Cross-Cluster Supercomputing

Log into each cluster using MFA password from MobilePASS

Mix and match cluster compute jobs with local code tasks

in serial chaining

Run a cross cluster compute job

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: blazer

Rankings

Maintainers (1)

Dependencies