Recent Releases of esi-acme
esi-acme - 2025.3
Included new convenience function bic_cluster_setup for the HPC cluster at
CoBIC Frankfurt. Analogous to the similarly named helper function built for
the ESI HPC cluster, bic_cluster_setup simplifies creating a Dask parallel
computing client. For instance, the following command transparently launches 10
SLURM workers in the 8GBSppc partition:
python
client = bic_cluster_setup(n_workers=10, partition="8GBSppc")
Additionally, ACME's automatic partition selection has been extended to also support
workloads running on the CoBIC HPC cluster. Similarly, all customization settings
supported by esi_cluster_setup are also available in bic_cluster_setup, e.g.,
cores_per_worker can be used together with mem_per_worker
and job_extra to create specialized computing clients custom-tailored
to specific workload requirements, e.g.,
python
client = bic_cluster_setup(n_workers=10,
cores_per_worker=3,
mem_per_worker="12GB",
job_extra=["--job-name='myjob'"],
partition="32GBSppc")
More information can be found in ACME's online documentation
NEW
- New convenience function
bic_cluster_setupto streamline managing Dask parallel computing clients on the CoBIC HPC cluster. - Added two new (optional) keywords to
slurm_cluster_setup:worker_extra_argscan be used to pass additional options for configuring Dask workers. Similarly,scheduler_optionspropagates custom settings to the Dask scheduler. - New helper function
is_bic_nodedetermines if ACME is running on the CoBIC HPC cluster - New helper function
get_interfacefinds the name of the network interface associated to a given IP address - New helper function
get_free_portfinds the lowest open port in a given range
CHANGED
- Changed default partition-type from "XS" to "S" on the ESI HPC cluster when letting ACME automatically choose a partition
- Updated testing setup (use centralized pytest.ini
configuration to not pollute tests with duplicate
PYTEST_ADDOPTSexports - Modernized the convenience script run_tests.sh: the script can now process arbitrary pytest options (run single tests, drop to PDB on error etc.)
REMOVED
- Support for the deprecated keywords
n_jobs,mem_per_job,n_jobs_startupandworkers_per_jobhas been removed. Code that still uses these keywords has to be modified to replace them with their corrresponding counterpartsn_workers,mem_per_worker,n_workers_startupandprocesses_per_worker, respectively.
FIXED
- Adapted helper script
run_tests.shto use SLURM defaults when running on unknown HPC clusters
- Python
Published by pantaray 9 months ago
esi-acme - 2025.1
Implementation of user's feature request: ACME can now allocate result datasets
with arbitrary dimensions via the result_shape keyword. In case it is not clear
(or cumbersome) to determine the shape of an aggregate results dataset a-priori,
setting the appropriate dimension(s) to np.inf prompts ACME to create a
resizable HDF5 dataset.
NEW
- Added support for "unlimited" datasets to allow flexible dimension
specifications in
result_shape. When setting the size of a dimension inresult_shapetonp.inf, ACME allocates a resizable HDF5 dataset for the results. This works for both virtual and regular datasets.
REMOVED
- As announced in the previous release the
start_clientkeyword has been removed fromlocal_cluster_setup(starting a daskLocalClusteralways starts a client anyway)
DEPRECATED
- Dropped support for Windows (ACME should work but is not tested any more)
- Dropped support for Python 3.7
FIXED
- Custom resource allocations were not correctly propagated to dask workers (especially in the "E880" partition on the ESI cluster). This has been fixed (cf #60)
- A bug in
python-msgpackunder Python 3.12 triggered de-serialization errors; temporarily pinnedpython-msgpackto version 1.0.5 but newer versions do not exhibit this problem (cf #59)
- Python
Published by pantaray 11 months ago
esi-acme - 2023.12
Better support for non-x86 micro-architectures. On the ESI HPC cluster,
the convenience function esi_cluster_setup now transparently works with the
local "E880" partition comprising our IBM POWER E880 servers. Similar to
the x86 nodes, a simple
python
client = esi_cluster_setup(n_workers=10, partition="E880")
is enough to launch ten SLURM workers each equipped with four POWER8 cores
and 16 GB RAM by default. Similarly, ACME's automatic partition selection has been
extended to also support workloads running inside the "E880" partition.
Nonetheless, esi_cluster_setup did not only get simpler to use but now also
comes with more (still completely optional) customization settings:
the new keyword cores_per_worker can be used together with mem_per_worker
and job_extra to create specialized computing clients custom-tailored
to specific workload requirements, e.g.,
python
client = esi_cluster_setup(n_workers=10,
cores_per_worker=3,
mem_per_worker="12GB",
job_extra=["--job-name='myjob'"],
partition="E880")
For more see Advanced Usage and Customization
NEW
- New keyword
cores_per_workerinesi_cluster_setupto explicitly set the core-count of SLURM workers. - Extended functionality of ACME's partition auto-selection on the ESI HPC cluster to include IBM POWER machines in the "E880" partition
- Added new "Tutorials" section in documentation
- Added new tutorial on using ACME for parallel evaluation of classifier accuracy (Thanks to @timnaher, cf #53)
- Added new tutorial on using ACME for parallel neural net model evaluation (Thanks to @timnaher, cf #53)
- Added type-hints following PEP 484 to support static code analyzers
(e.g.,
mypy) and clarify type conventions in internal functions with "sparse" docstrings.
CHANGED
- To avoid dubious (and hard to debug) errors,
esi_cluster_setupnow checks the micro-architecture of the submitting host against the chosen partition. This avoids accidental start attempts of ppc64le SLURM jobs from inside an x86_64 Python interpreter and vice versa.
REMOVED
- The
partitionkeyword inesi_cluster_setupdoes not have a default value any more (the old default of "8GBXS" was inappropriate most of the time) - The (undocumented) "anonymous" keyword
n_coresofesi_cluster_setuphas been removed in favor of the explicitcores_per_worker(now also visible in the API). Just liken_cores, setting the newcores_per_workerparameter is still optional: by default,esi_cluster_setupderives core-count fromDefMemPerCPUand the chosen value ofmem_per_worker. - In
slurm_cluster_setup, do not useDefMemPerCPUas fallback substitute in caseMaxMemPerCPUis not defined for chosen partition (may be overly restrictive on requested memory settings)
DEPRECATED
- Using
start_clientinlocal_cluster_setupdoes not have any effect any more: starting a daskLocalClusteralways starts a client.
FIXED
- fixed partition bug
run_tests.sh(Thanks to @timnaher, cf #53) - simplified and fixed interactive user queries: use the builtin
selectmodule in everything but Jupyter and rely on theinputmodule inside notebooks. - clarified docstring discussing
result_dtype: must not beNonebutstr(still defaults to "float") - numerous corrections of errata/outdated information in docstrings
- Python
Published by pantaray about 2 years ago
esi-acme - 2023.4
Re-designed ACME's logs and command line output.
NEW
- Created templates for filing issues and opening Pull Requests for ACME on GitHub.
- Enabled private security reporting in ACME's GitHub repository and added a security policy for ACME (in compliance with the OpenSSF Best Practices Badge)
CHANGED
- Overhauled ACME's logging facilities: many print messages have been
marked
"DEBUG"to make ACME's default output less "noisy". To this effect the Pythonloggingmodule is now used more extensively than before. The canonical name of ACME's logger is simply "ACME". - By default, ACME now creates a log-file alongside any auto-generated output files to keep a record of file creation and attribution.
- Reworked ACME's SyNCoPy interface: a dedicated module
spy_interface.pyis now managing ACME's I/O direction if ACME is called by SyNCoPy. This allows for (much) cleaner exception handling in ACME's cluster helpers (esi_cluster_setup,cluster_cleanupetc.) which ultimately permits a more streamlined extension of ACME to more HPC infrastructure. - Redesigned ACME's online documentation: increased font-size to enhance readability, included a contribution guide and reworked the overall page navigation + visual layout.
- Python
Published by pantaray over 2 years ago
esi-acme - 2022.12
Bugfix release.
CHANGED
- If not provided, a new lower default value of one is used for
n_workers_startup
FIXED
- Updated memory estimation logic on the ESI HPC cluster: if ACME does not
handle result output distribution but memory estimation is still requested
do not perform
memEstRunkeyword injection.
- Python
Published by pantaray about 3 years ago
esi-acme - 2022.11
Major changes in managing auto-generated files
- If write_worker_results is True, ACME now creates an aggregate results
container comprised of external links that point to actual data in HDF5
payload files generated by parallel workers.
- Optionally, results can be slotted into a single dataset/array (via the
result_shape keyword).
- If single_file is True, ACME stores results of parallel compute runs
not in dedicated payload files but all workers write to a single aggregate
results container.
- By providing output_dir, the location of auto-generated HDF5/pickle files can
be customized
- Entities in a distributed computing client that concurrently process tasks
are now consistently called "workers" (in line with dask terminology).
Accordingly the keywords n_jobs, mem_per_job, n_jobs_startup and
workers_per_job have been renamed n_workers, mem_per_worker,
n_workers_startup and processes_per_worker, respectively. To ensure
compatibility with existing code, the former names have been marked
deprecated but were not removed and are still functional.
A full list of changes is provided below
NEW
- Included keyword
output_dirinParallelMapthat allows to customize the storage location of files auto-generated by ACME (HDF5 and pickle). Only effective ifwrite_worker_resultsisTrue. - Added keyword
result_shapeinParallelMapto permit specifying the shape of an aggregate dataset/array that results from all computational runs are slotted into. In conjunction with the shape specification, the new keywordresult_dtypeoffers the option to control the numerical type (set to "float64" by default) of the resulting dataset (ifwrite_worker_results = True) or array (write_worker_results = False). On-disk dataset results collection is only available for auto-generated HDF5 containers (i.e,write_pickle = False) - Introduced keyword
single_fileinParallelMapto control, whether parallel workers store results of computational runs in dedicated HDF5 files (single_file = False, default) or share a single results container for saving (single_file = True). This option is only available for auto-generated HDF5 containers, pickle files are not supported (i.e.,write_worker_results = Trueandwrite_pickle = False). - Included options to specify worker count and memory consumption in
local_cluster_setup - Added a new section "Advanced Usage and Customization" in the online documentation that discusses settings and associated technical details
- Added support for Python 3.10 and updated dask dependencies
CHANGED
- Modified employed terminology throughout the package: to clearly delineate the difference between compute runs and worker processes (and to minimize friction between the documentation of ACME and dask), the term "worker" is now consistently used throughout the code base. If ACME is running on a SLURM cluster, a dask "worker" corresponds to a SLURM "job".
- In line with the above change, the following input arguments have been
renamed:
- in
ParallelMap: n_jobs->n_workersmem_per_job->mem_per_worker- in
esi_cluster_setupandslurm_cluster_setup: n_jobs->n_workersmem_per_job->mem_per_workern_jobs_startup->n_workers_startup- in
slurm_cluster_setup: workers_per_job->processes_per_worker
- in
- Made
esi_cluster_setuprespect already running clients so that new parallel computing clients are not launched on top of existing ones (thanks to @timnaher) - Introduced support for positional/keyword arguments of unit-length in
ParallelMapso thatn_inputscan be used as scaling parameter to launchn_inputscalls of a user-provided function - All docstrings and the online documentation have been re-written (and in parts clarified) to account for the newly introduced features.
- Code coverage is not computed by a GitHub action workflow but is now calculated by the GitLab CI job that invokes SLURM to run tests on the ESI HPC cluster.
DEPRECATED
The keywords n_jobs, mem_per_job, n_jobs_startup and workers_per_job
have been renamed. Using these keywords is still supported but raises a
DeprecationWarning.
- The keywords n_jobs and mem_per_job in both ParallelMap and
esi_cluster_setup are deprecated. To specify the number of parallel
workers and their memory resources, please use n_workers and mem_per_worker,
respectively (see corresponding item in the Section CHANGED above)
- The keyword n_jobs_startup in esi_cluster_setup is deprecated. Please
use n_workers_startup instead
FIXED
- Updated dependency versions (pin
clickto version < 8.1) and fixed Syncopy compatibility (increase recursion depth of input size estimation to one million calls) - Streamlined dryrun stopping logic invoked if user chooses to not continue with the computation after performing a dry-run
- Modified tests that are supposed to use an existing distributed computing client to not shut down that very client
- Updated memory estimation routine to deactivate auto-generation of results files to not accidentally corrupt pre-allocated containers before launching the actual concurrent computation
- Python
Published by pantaray about 3 years ago
esi-acme - [2022.8] - 2022-08-05
Bugfixes, new automatic ESI-HPC SLURM partition selection, expanded Python version compatibility and updated dependencies as well as online documentation overhaul.
NEW
- On the ESI HPC cluster, using
partition="auto"inParallelMapnow launches a heuristic automatic SLURM partition selection algorithm (instead of simply falling back to the "8GBXS" partition)
CHANGED
- Updated package dependencies (allow
h5pyver 3.x) and expanded support for recent Python versions (include 3.9) - Restructured and expanded online documentation based on suggestions from @naehert:
moved most examples and usage notes from
ParallelMap's docstring to dedicated docu pages and added new "Troubleshooting + FAQ" site.
FIXED
- Repeated
ParallelMapcalls ignored differinglogfilespecifications. This has been corrected. In addition, the logging setup routine now ensures that only oneFileHandleris used (any existing non-default log-file locations are removed from the logger to avoid generating multiple logs and/or accidentally appending to existing logs from previous runs).
- Python
Published by pantaray over 3 years ago
esi-acme - 2022.7
[2022.7] - 2022-07-06
Bugfixes, new versioning scheme and updated dependencies.
CHANGED
- Modified versioning scheme: use date-based version tags instead of increasing numbers
- Updated
dask,dask-jobqueueandscipydependency requirements - Removed any mentions of "hpx" from the code after upgrading the main file-server of the ESI cluster
FIXED
- Repaired broken FQDN detection in
is_esi_node
- Python
Published by pantaray over 3 years ago
esi-acme - 0.21
[0.21] - 2022-03-01
Performance improvements, new dryrun keyword and preparations for deploying
ACME on other clusters
NEW
- Re-designed cluster startup code: added new function
slurm_cluster_setupthat includes SLURM-specific (but ESI-agnostic) code for spinning up aSLURMCluster - Included new
dryrunkeyword inParallelMapto test-drive ACME's automatically generated argument lists simulating a single (randomly picked) worker call prior to the actual concurrent computation (addresses #39) - Added helper function
is_esi_nodeto determine if ACME is running on the ESI HPC cluster
CHANGED
- Do not parse scalars using
numbers.Number, usenumpy.numberinstead to catch Boolean values - Included
conda cleanin CD pipeline to avoid disk fillup by unused conda packages/cache
DEPRECATED
- Retired
conda2pipin favor of the modern setup.cfg dependency management system. ACME's dependencies are now listed in setup.cfg which is used to populate the conda environment file acme.yml at setup time. - Retired travis CI tests since free test runs are exhausted. Migrated to GitHub actions (and re-included codecov)
FIXED
- On the ESI HPC cluster set the job CPU count depending on the chosen partition if not explicitly provided by the user (one core per 8GB of RAM, e.g., jobs in a 32GB RAM partition now use 4 cores instead of just one)
- Python
Published by pantaray almost 4 years ago
esi-acme - v0.2c
[v0.2c] - 2021-10-19
NEW
- Included function
local_cluster_setupto launch a local distributed Dask multi-processing cluster running on the host machine
CHANGED
- Refined integration with SyNCoPy
FIXED
- Repaired auto-generated semantic version strings (use only release number + letter, remove local ".dev0" suffix from official release versions)
- Python
Published by pantaray about 4 years ago
esi-acme - v0.2b
[v0.2b] - 2021-08-04
NEW
- Support for custom
sbatcharguments (thanks to @KatharineShapcott)
FIXED
- Made ID fetching of crashed SLURM jobs more robust
- Corrected faulty override of
print/showwarningin case ACME was called from within SyNCoPy. - Cleaned up fetching of SLURM worker memory
- Corrected keywords in
CITATION.cff
- Python
Published by pantaray over 4 years ago
esi-acme - v02.a
[v0.2a] - 2021-05-18
NEW
- Made ACME PEP 517 compliant: added pyproject.toml and modified setup.py accordingly
- Added IBM POWER testing pipeline (via dedicated GitLab Runner)
CHANGED
- New default SLURM partition set to "8GBXS" in
esi_cluster_setup
REMOVED
- Retired tox in
slurmtestCI pipeline in favor of a "simple" pytest testing session due to file-locking problems of tox environments on NFS mounts
FIXED
- Stream-lined GitLab Runner setup: use cluster-wide conda instead of local
installations (that differ slightly across runners) and leverage
tox-condato fetch pre-built dependencies - Opt-in pickling was not propagated correctly in daemon-reentry situations
- Python
Published by pantaray over 4 years ago
esi-acme - v0.2
[v0.2] - 2021-05-05
NEW
- New keyword
write_picklecan be used to override HDF5 as default storage format in favor of pickle - Included code-coverage information and corresponding requirements for pull requests in ACME repo
- Added software citation file
CITATION.cff
CHANGED
- Changed job submission system: instead of using dask bags, input arguments are directly propagated using dask-client methods. This has the side-effect that the layout of in-memory results changed: instead of returning a nested lists of lists, the user namespace is populated with a plain list of objects (simplifying result handling in the process)
DEPRECATED
- In-memory list-of-list returns are not supported anymore;
ParallelMapnow returns plain (non-nested) lists.
FIXED
- If auto-saving to HDF5 fails, a new "emergency pickling" mechanic kicks in and attempts to pickle the offending return values instead
- User-provided functions in custom modules are now correctly propagated
by inheriting
sys.pathfrom the parent client - Argument distribution is more memory efficient: input arguments are not held in memory by the scheduler and then propagated to workers anymore. Instead, arguments shared by all workers are broadcast to the cluster and referenced by the workers.
- Any user-issued
KeyboardInterrupt(CTRL+Cbutton press) is caught and triggers a graceful shutdown of all worker jobs managed by the current client (specifically, do not leave SLURM jobs detached from the client running in the background) - Fixed progress bars that were left broken after an exception was raised
- Python
Published by pantaray over 4 years ago
esi-acme - v0.1b
[v0.1b] - 2020-01-15
NEW
- This CHANGELOG file
CHANGED
- Modified dependencies to not include Jupyter-related packages
FIXED
- Fixed markdown syntax and URLs
- Fixed CI pipelines and repaired
h5pyversion mismatch in dependencies - Pin ACME to Python 3.8.x due to various packages not working properly (yet) in Python 3.9
- Python
Published by pantaray almost 5 years ago