Recent Releases of parsec

parsec - parsec-4.0.2411

Curated Change log

Added

  • PaRSEC API 4.0.
  • Add DTD CUDA support including NEW tiles in DTD.
  • Add RoCM/HIP device support.
  • Add IrisXE/Level0 device support (experimental).
  • Enable users to manage their own data copies without PaRSEC interfering. Data copies are marked as being owned by PaRSEC or not and managed by PaRSEC or not. A data copy owned by PaRSEC can be reclaimed by PaRSEC when its reference count reaches 0, a data copy managed by PaRSEC can be copied / moved onto a different device, while a data copy not managed by PaRSEC will never be moved by the runtime.
  • Add an info system, and introduce two info hooks. See parsec/class/info.h for details. The info system allows the user to register info objects with different levels of structures and dynamic objects in the PaRSEC runtime.
  • PTG supports user-defined routines to move data between GPU and CPU, and user-defined sizes for buffers allocated on the GPU.
  • PTG supports reshaping data propagated between local tasks and the speficiation of two types on acccesses to data colletions.
  • PINS log SCHEDULE_BEGIN and SCHEDULE_END events to better track tasks lifecycle.
  • Detect and report oversubscribed binding of core resources.
  • PaRSEC Thread binding can be disabled (bind_threads 0 MCA parameter).
  • Load balancing between GPUs can be tuned (device_load_balance_skew MCA parameter).
  • Load balancing exclusivity between CPU/GPUs can be disabled (device_load_balance_allow_cpu MCA parameter).
  • Data sent in messages can be of variable size.
  • New API parsec_context_query can be used to obtain information on the system, like the number of devices, ranks, etc.
  • New active-message communication API gives low-level access to the PaRSEC communication system to DSLs.

Changed

  • Single letter command line options have been replaced with --mca parameters. --help is now --parsec-help.
  • Renamed symbols related to data distribution to properly prefix them with the parsec_ prefix. The old symbols have been deprecated.
  • DTD interface change: the global array parsecdtdarena_datatypes is replaced with functions to create, destroy, and get arena datatypes for DTD, and these objects now live inside the parsec context.
  • PARSEC_SUCCESS changed to 0 (from -1), all values for PARSEC_ERR_XYZ changed.
  • PaRSEC now requires CMake 3.21.
  • PaRSEC profiling tools now require Python 3.x
  • PaRSEC profiling system does not require for local dictionaries to be identical between ranks anymore.
  • time_estimate functions can be used to control task load balancing (replaces weight PTG property).

Deprecated

  • data distribution w/o the parsec_ prefix. Further documentation (including a sed script) can be found in contrib/renaming.

Removed

  • PaRSEC API 3.0
  • RECURSIVE Device support (this is temporary and will be restored in a future version).
  • Removed obsolete dbp2paje tool; h5totrace is the replacement tool to use. This removes the optional dependency on GTG.
  • Removed all command line options not prefixed by --mca, except for --parsec-help and --parsec-version.
  • Using more than PARSEC_GPU_MAX_WORKSPACE workspaces per device will now cause an error (instead of computing incorrect values).
  • PTG property weight (replaced by time_estimate).

Fixed

  • DTD Termination detection would occasionally assert.
  • Multiple bugs with GPU data ownership causing crashes and incorrect results when executing with more than 1 GPU.
  • Device-to-device memory copies would not work in some scenarios.
  • Suboptimal ordering of members in broadcast tree could cause performance reduction.
  • Cray MPI and MPICH would crash in MPI_Cancel and when using NULL datatypes.
  • Do not report incorrect flops/s capabilities (device_show_capabilities MCA parameter).
  • On some systems PaRSEC would allocate more GPU memory than is available on the device.
  • Performance with large number of GPU tasks with the same priority would be poor due to overhead of sorting by priority.

Known Bugs

  • PaRSEC Thread binding ignores externally provided binding (e.g., a cpuset enforced by srun); see issue icldisco/dplasma#9.
  • Enabling the RECURSIVE device will cause crashes (it is disabled by default in this release); see issues #548, #541.
  • Running out of GPU memory when using the NEW keyword in PTG may cause deadlocks; see issue #527.

Security

Merged Pull Requests

List of merged pull requests * [BBT#582] bugfix/atomic lifo: The offsetof was incorrect leading to lifo padding being wrong in external lifo by @abouteiller in https://github.com/ICLDisco/parsec/pull/316 * First sketch of a github action for building by @bosilca in https://github.com/ICLDisco/parsec/pull/309 * Miscellaneous profiling fixes by @omor1 in https://github.com/ICLDisco/parsec/pull/320 * Per-language compiler flags by @therault in https://github.com/ICLDisco/parsec/pull/326 * [BBT#541] A new way to install the internal headers by @bosilca in https://github.com/ICLDisco/parsec/pull/322 * Doc/GitHub by @abouteiller in https://github.com/ICLDisco/parsec/pull/330 * Provide a temporary fix for the flag detection. by @bosilca in https://github.com/ICLDisco/parsec/pull/336 * We need BISON 3, and try to automatically pick the brew variant on Mac OSX by @abouteiller in https://github.com/ICLDisco/parsec/pull/331 * Clean strings usages in CMake. by @bosilca in https://github.com/ICLDisco/parsec/pull/340 * Allow the runtime to compile even when PTG support is not possible. by @bosilca in https://github.com/ICLDisco/parsec/pull/332 * Work around GCC bug for atomic_thread_fence with memory order acquire by @devreal in https://github.com/ICLDisco/parsec/pull/343 * Fix parsec_future: volatile and memory barriers by @devreal in https://github.com/ICLDisco/parsec/pull/342 * Reshape test: variable used for polling should be volatile by @devreal in https://github.com/ICLDisco/parsec/pull/344 * Dust off the cmake_modules by @abouteiller in https://github.com/ICLDisco/parsec/pull/346 * New CMake versions use MPI_ROOT to find MPI by @abouteiller in https://github.com/ICLDisco/parsec/pull/345 * Fallback using a compatible HWLOC. by @bosilca in https://github.com/ICLDisco/parsec/pull/341 * hotfix: compile failure when Ayudame not found by @abouteiller in https://github.com/ICLDisco/parsec/pull/348 * Fix/quick fixes by @bosilca in https://github.com/ICLDisco/parsec/pull/350 * Update issue template to make it easier to read and easier to fill-up by @abouteiller in https://github.com/ICLDisco/parsec/pull/349 * Update the installation instructions by @abouteiller in https://github.com/ICLDisco/parsec/pull/354 * Cleanup/ptgpp assignments by @abouteiller in https://github.com/ICLDisco/parsec/pull/352 * Apply -g3 to DEBUG only, set default config to Release by @abouteiller in https://github.com/ICLDisco/parsec/pull/347 * Profiling msync and header commit by @therault in https://github.com/ICLDisco/parsec/pull/337 * Removing hard flex/bison dependency: only devs need to run the parser by @abouteiller in https://github.com/ICLDisco/parsec/pull/335 * Hicma/recursive by @bosilca in https://github.com/ICLDisco/parsec/pull/328 * Fix/deprecated support by @bosilca in https://github.com/ICLDisco/parsec/pull/362 * Add the filename to the generated profiling event name. by @bosilca in https://github.com/ICLDisco/parsec/pull/359 * Fix atomics on macosX not working properly (missing header) by @abouteiller in https://github.com/ICLDisco/parsec/pull/356 * Remove never compiled in '64bit' lifo implementation by @abouteiller in https://github.com/ICLDisco/parsec/pull/360 * Fix/many small updates by @bosilca in https://github.com/ICLDisco/parsec/pull/363 * Make the ParsecCompilerFlags.cmake self contained by @abouteiller in https://github.com/ICLDisco/parsec/pull/364 * Profiling fix: parsec_init(NULL, NULL) by @therault in https://github.com/ICLDisco/parsec/pull/339 * GitHub runner with spack by @bosilca in https://github.com/ICLDisco/parsec/pull/333 * Update PAPI SDE to fit the current API by @therault in https://github.com/ICLDisco/parsec/pull/365 * ucontext is not supported on OSX. by @bosilca in https://github.com/ICLDisco/parsec/pull/366 * recursive cb type was not correct by @abouteiller in https://github.com/ICLDisco/parsec/pull/368 * Since new policy, setting the non-cache variable creates an empty cache by @abouteiller in https://github.com/ICLDisco/parsec/pull/367 * Do now allow spack to be updated automatically. by @bosilca in https://github.com/ICLDisco/parsec/pull/375 * flex: on some machines, flex cannot work if parsec/utils is not created by @abouteiller in https://github.com/ICLDisco/parsec/pull/374 * Attempt to backport the revamp of the communication engine by @devreal in https://github.com/ICLDisco/parsec/pull/380 * Respect DISTDIR is provided. by @bosilca in https://github.com/ICLDisco/parsec/pull/383 * [RFC] profiling tools: more efficient cross-stream event matching by @omor1 in https://github.com/ICLDisco/parsec/pull/372 * Hash table: count used buckets only when needed by @devreal in https://github.com/ICLDisco/parsec/pull/379 * Print the debug rank from device_show_statistics by @abouteiller in https://github.com/ICLDisco/parsec/pull/386 * Handle error in CUDA/HIP module init and configurable max_streams by @therault in https://github.com/ICLDisco/parsec/pull/351 * Update to a newer spack compiler by @bosilca in https://github.com/ICLDisco/parsec/pull/392 * Make the PUSHOUT and other DTD GPU concepts generic by @abouteiller in https://github.com/ICLDisco/parsec/pull/387 * Workaround current CUDA/HIP "solution suspicious" bug... by @therault in https://github.com/ICLDisco/parsec/pull/381 * dtd_bench_simple_gemm.c relies on non-standard cblas.h file by @therault in https://github.com/ICLDisco/parsec/pull/317 * profiling tools: improve large buffer performance by @omor1 in https://github.com/ICLDisco/parsec/pull/390 * Add a load-balancing skew so that we favor locality up to a configurable limit by @abouteiller in https://github.com/ICLDisco/parsec/pull/389 * Fix redistribute wrapper by @devreal in https://github.com/ICLDisco/parsec/pull/395 * [BBT#509] Dtd cuda with new by @therault in https://github.com/ICLDisco/parsec/pull/318 * Only enable CUDA language if supported. by @bosilca in https://github.com/ICLDisco/parsec/pull/404 * [BBT#572] Implement hash table API providing a key handle during lock by @devreal in https://github.com/ICLDisco/parsec/pull/307 * profiling tools: fix format for padded structures by @omor1 in https://github.com/ICLDisco/parsec/pull/402 * Don't pass the execution stream around to recursive calls. by @bosilca in https://github.com/ICLDisco/parsec/pull/413 * Protect the inline functions by they device support. by @bosilca in https://github.com/ICLDisco/parsec/pull/415 * Allow the DSL to provide a task_snprintf function and use it when displaying the DOT by @devreal in https://github.com/ICLDisco/parsec/pull/409 * Execution stream keeps the highest priority task for local execution. by @bosilca in https://github.com/ICLDisco/parsec/pull/399 * TTG/termdet by @therault in https://github.com/ICLDisco/parsec/pull/391 * More data transfer statistics between devices by @therault in https://github.com/ICLDisco/parsec/pull/426 * Remove startup tasks from DTD by @therault in https://github.com/ICLDisco/parsec/pull/425 * tools/profiling: PTT v2 by @omor1 in https://github.com/ICLDisco/parsec/pull/418 * Fix/more warnings by @bosilca in https://github.com/ICLDisco/parsec/pull/416 * Fix possible access to unowned memory in `parsec_dtd_task_class_add_chore` by @DSMishler in https://github.com/ICLDisco/parsec/pull/427 * fix remote_dep_mpi.c by @cflinto in https://github.com/ICLDisco/parsec/pull/429 * bugfix for profiling without multiprocessing by @DSMishler in https://github.com/ICLDisco/parsec/pull/432 * Project_dyn test missing libm dependency by @abouteiller in https://github.com/ICLDisco/parsec/pull/433 * Ttg/termdet dynamic PTG by @therault in https://github.com/ICLDisco/parsec/pull/430 * Update API versions in examples by @abouteiller in https://github.com/ICLDisco/parsec/pull/435 * Make sure PaRSEC compiles for all rwlock implementations. by @bosilca in https://github.com/ICLDisco/parsec/pull/434 * comm: set parsec_tls_execution_stream in comm thread by @omor1 in https://github.com/ICLDisco/parsec/pull/422 * Use normal for loops to iterate over local index variables when they are a range. by @therault in https://github.com/ICLDisco/parsec/pull/329 * [BBT#559] Add llp scheduler: local lifo with priorities by @devreal in https://github.com/ICLDisco/parsec/pull/325 * [BBT#536] Add AMD RoCM/HIP device by @abouteiller in https://github.com/ICLDisco/parsec/pull/315 * TSL variables are not static by default. by @bosilca in https://github.com/ICLDisco/parsec/pull/437 * Allow overwriting of the completion and enqueue callbacks. by @bosilca in https://github.com/ICLDisco/parsec/pull/439 * Bring back support for changing the PaRSEC communicator by @bosilca in https://github.com/ICLDisco/parsec/pull/401 * Fix: termination detector race condition by @therault in https://github.com/ICLDisco/parsec/pull/438 * Fixes/tls in comm thread by @therault in https://github.com/ICLDisco/parsec/pull/440 * profiling: disambiguation between certain MPI events by @omor1 in https://github.com/ICLDisco/parsec/pull/376 * TTG/building system by @therault in https://github.com/ICLDisco/parsec/pull/424 * A small script to help us find which files might need copyright update. by @therault in https://github.com/ICLDisco/parsec/pull/442 * Remove dependency on argv[0] by @abouteiller in https://github.com/ICLDisco/parsec/pull/445 * Make configurable the treshold for warning about wrong binding on by @abouteiller in https://github.com/ICLDisco/parsec/pull/453 * Remove some warnings about unused variables by @abouteiller in https://github.com/ICLDisco/parsec/pull/451 * Fix statistics management with multiple GPU. by @bosilca in https://github.com/ICLDisco/parsec/pull/454 * Debug output about reshapping was too verbose by @abouteiller in https://github.com/ICLDisco/parsec/pull/456 * Topic/update spack by @bosilca in https://github.com/ICLDisco/parsec/pull/462 * Increase function name length limit in debug output by @devreal in https://github.com/ICLDisco/parsec/pull/463 * initial flags modification PR. Open to review. by @DSMishler in https://github.com/ICLDisco/parsec/pull/461 * Introduce parsec_taskpool_wait and parsec_taskpool_test by @devreal in https://github.com/ICLDisco/parsec/pull/411 * Fix profile dtd by @therault in https://github.com/ICLDisco/parsec/pull/475 * Correctly identify the Intel compiler. by @bosilca in https://github.com/ICLDisco/parsec/pull/472 * Proper case and imported target for hwloc by @abouteiller in https://github.com/ICLDisco/parsec/pull/457 * Use the profiling key macros by @bosilca in https://github.com/ICLDisco/parsec/pull/476 * No more tag collisions. by @bosilca in https://github.com/ICLDisco/parsec/pull/477 * profiling: fix multiple EXEC_END events in some circumstances by @omor1 in https://github.com/ICLDisco/parsec/pull/421 * DSL profiling by @therault in https://github.com/ICLDisco/parsec/pull/469 * Put END_C_DECLS at the end of device_cuda.h by @devreal in https://github.com/ICLDisco/parsec/pull/480 * Drop support for python2. by @bosilca in https://github.com/ICLDisco/parsec/pull/482 * Hotfix dtd dsl and warning by @therault in https://github.com/ICLDisco/parsec/pull/488 * Use pip to build and install the python support. by @bosilca in https://github.com/ICLDisco/parsec/pull/489 * Termdet callback order by @therault in https://github.com/ICLDisco/parsec/pull/494 * Fix the branching test in distributed by @therault in https://github.com/ICLDisco/parsec/pull/495 * Fix profiling tests by @therault in https://github.com/ICLDisco/parsec/pull/490 * Log all types of runtime-system events in task_profiler, and ensure that all time of compute threads is accounted for by @devreal in https://github.com/ICLDisco/parsec/pull/410 * Fix missing loop counter increment in pins task profiler by @devreal in https://github.com/ICLDisco/parsec/pull/498 * Close files between operations. by @bosilca in https://github.com/ICLDisco/parsec/pull/484 * Make parsec_data_t::device_copies a flexible array member by @devreal in https://github.com/ICLDisco/parsec/pull/499 * Fix an issue about GPU statistics (Issue#505) by @QingleiCao in https://github.com/ICLDisco/parsec/pull/506 * Do not cancel persistent requests with cray-mpich, it is broken. by @abouteiller in https://github.com/ICLDisco/parsec/pull/519 * Make sure data_in is not NULL on GPU when accessing nb_elts by @QingleiCao in https://github.com/ICLDisco/parsec/pull/492 * volatile uint32 is not always a valid type for MPI_SUM by @abouteiller in https://github.com/ICLDisco/parsec/pull/524 * Profiling corrected. by @josephjohnjj in https://github.com/ICLDisco/parsec/pull/526 * Remove all options that are not prefixed by --mca by @abouteiller in https://github.com/ICLDisco/parsec/pull/447 * Implement a universal hash function by @devreal in https://github.com/ICLDisco/parsec/pull/528 * Can't use NULL to pass-in MPI datatypes with MPICH derivatives by @abouteiller in https://github.com/ICLDisco/parsec/pull/518 * Update to the SDE available in PAPI-7.0.1 by @therault in https://github.com/ICLDisco/parsec/pull/522 * More flexible paranoid checks in data_dist/matrix implementations by @therault in https://github.com/ICLDisco/parsec/pull/530 * Handle the tertiary case for startup tasks. by @bosilca in https://github.com/ICLDisco/parsec/pull/508 * Fix some errors with ctest for profiling by @abouteiller in https://github.com/ICLDisco/parsec/pull/523 * pip install --prefix has version-dependent behaviors by @therault in https://github.com/ICLDisco/parsec/pull/532 * Fix old typo in termdet modules management: by @therault in https://github.com/ICLDisco/parsec/pull/533 * Allow for tag registration/deregistration any time. by @bosilca in https://github.com/ICLDisco/parsec/pull/521 * Topic/device naming by @bosilca in https://github.com/ICLDisco/parsec/pull/493 * Gpu workspace fix by @therault in https://github.com/ICLDisco/parsec/pull/510 * Fix bug: some scenarios would call a nullptr function in profiling by @therault in https://github.com/ICLDisco/parsec/pull/535 * Profiling hotfix: error in python3 when passing {char[64]} as conversion type by @therault in https://github.com/ICLDisco/parsec/pull/536 * Limit the number of recv requests to ensure there is space for sends by @devreal in https://github.com/ICLDisco/parsec/pull/538 * Flex flags bugfix and python venv log by @abouteiller in https://github.com/ICLDisco/parsec/pull/537 * Fix hash function generation for PTG by @therault in https://github.com/ICLDisco/parsec/pull/479 * Small warnings from clang14 by @abouteiller in https://github.com/ICLDisco/parsec/pull/542 * Print -wflags found only when results differ from cache by @abouteiller in https://github.com/ICLDisco/parsec/pull/540 * Fix the start/stop test. by @bosilca in https://github.com/ICLDisco/parsec/pull/546 * Gpu fix load balancing by @therault in https://github.com/ICLDisco/parsec/pull/517 * Tear-down parsec_ce after high-level communication by @devreal in https://github.com/ICLDisco/parsec/pull/549 * relabel mislabelled tests by @abouteiller in https://github.com/ICLDisco/parsec/pull/550 * Find the right spack environment. by @bosilca in https://github.com/ICLDisco/parsec/pull/559 * Add device async/again support. by @bosilca in https://github.com/ICLDisco/parsec/pull/544 * hotfix by @therault in https://github.com/ICLDisco/parsec/pull/562 * Profiling and PAPI SDE updates by @therault in https://github.com/ICLDisco/parsec/pull/565 * Fix the management of GPU copies. by @bosilca in https://github.com/ICLDisco/parsec/pull/563 * cmake -> CMAKE_COMMAND by @evaleev in https://github.com/ICLDisco/parsec/pull/567 * Correct the logic for passing over CPU/RECURSIVE devices by @abouteiller in https://github.com/ICLDisco/parsec/pull/557 * HAVE_PEER_ACCESS is always present in all relevant versions of CUDA or by @abouteiller in https://github.com/ICLDisco/parsec/pull/572 * CUDA: disable timer support on cuda events by @devreal in https://github.com/ICLDisco/parsec/pull/576 * Pick a stable spack branch by @bosilca in https://github.com/ICLDisco/parsec/pull/578 * Add an option to skip HWLOC compat run by @bosilca in https://github.com/ICLDisco/parsec/pull/582 * Allow locals and parameters to be defined via CMake. by @bosilca in https://github.com/ICLDisco/parsec/pull/583 * Prevent race condition in accelerator copies management by @bosilca in https://github.com/ICLDisco/parsec/pull/575 * Fix/rocm detect and unknown device warning by @abouteiller in https://github.com/ICLDisco/parsec/pull/577 * [python] do not build python support unless pandas is available by @evaleev in https://github.com/ICLDisco/parsec/pull/584 * Refactor GPU device to increase code factorization between the devices. by @therault in https://github.com/ICLDisco/parsec/pull/570 * Disable recursive device by default (temporarily) by @abouteiller in https://github.com/ICLDisco/parsec/pull/585 * remote_dep: rotate bcast topology around root by @omor1 in https://github.com/ICLDisco/parsec/pull/481 * Discover atomic support for __int128_t. by @bosilca in https://github.com/ICLDisco/parsec/pull/587 * Fix warnings. by @bosilca in https://github.com/ICLDisco/parsec/pull/591 * Typo in level zero component by @therault in https://github.com/ICLDisco/parsec/pull/592 * dbpreader: missing corner case in cache building by @therault in https://github.com/ICLDisco/parsec/pull/593 * Skip profiling for task classes without profiling information by @DSMishler in https://github.com/ICLDisco/parsec/pull/594 * cmake logic fixes for level zero and half-installed systems by @therault in https://github.com/ICLDisco/parsec/pull/596 * l0: dpccpp can't create output files in the build dir if the enclosing by @abouteiller in https://github.com/ICLDisco/parsec/pull/597 * gpu: some errors introduced during gpu despecialization caused deadlocks by @abouteiller in https://github.com/ICLDisco/parsec/pull/598 * ze: the queue need to be reset when task completes by @abouteiller in https://github.com/ICLDisco/parsec/pull/599 * Fixes for GPU memory oversubscription by @devreal in https://github.com/ICLDisco/parsec/pull/602 * If the DSL defines a task_snprintf function, use that function. by @therault in https://github.com/ICLDisco/parsec/pull/603 * Make sure the w2r task has a stageout set by @devreal in https://github.com/ICLDisco/parsec/pull/604 * Consistently use size_t for nb_elts in data and flows by @devreal in https://github.com/ICLDisco/parsec/pull/605 * Minor cleanup of the DTD parameters manipulation. by @bosilca in https://github.com/ICLDisco/parsec/pull/609 * Fix the parsec_future_t. by @bosilca in https://github.com/ICLDisco/parsec/pull/608 * Topic/add evaluate keyword by @bosilca in https://github.com/ICLDisco/parsec/pull/569 * Relative path, symbolic links, and python examples by @therault in https://github.com/ICLDisco/parsec/pull/606 * Update process_name.c. Fixes the issue #610 by @bimalgaudel in https://github.com/ICLDisco/parsec/pull/611 * CMAKE: Bring back checks for atomic CAS by @devreal in https://github.com/ICLDisco/parsec/pull/595 * HOTFIX: make the default number of devices be all the devices seen by… by @therault in https://github.com/ICLDisco/parsec/pull/613 * bugfix: in dtd sometimes the cpu incarnation and gpu incarnations are by @abouteiller in https://github.com/ICLDisco/parsec/pull/616 * Re-enable CI tests for cuda caps as they now work again. by @abouteiller in https://github.com/ICLDisco/parsec/pull/614 * Add capability of saving GPU statistics and printing diff vs saved stats by @abouteiller in https://github.com/ICLDisco/parsec/pull/558 * Fix the argument _NSGetExecutablePath. by @bosilca in https://github.com/ICLDisco/parsec/pull/620 * Add a context-level query capability. by @bosilca in https://github.com/ICLDisco/parsec/pull/621 * Prevent CI from running OOM when oversubscribing GPUs by @abouteiller in https://github.com/ICLDisco/parsec/pull/629 * Fix CUDA protection macro use by @bosilca in https://github.com/ICLDisco/parsec/pull/632 * Move comm profiling initialization into comm thread by @devreal in https://github.com/ICLDisco/parsec/pull/626 * Cleanup/cosmetics by @abouteiller in https://github.com/ICLDisco/parsec/pull/631 * Alternative solution to the CI problem with GPUs by @therault in https://github.com/ICLDisco/parsec/pull/633 * ci: all tests must use parsec_addtest by @abouteiller in https://github.com/ICLDisco/parsec/pull/635 * [BBT#237] Allow sender to send data of any size. by @bosilca in https://github.com/ICLDisco/parsec/pull/321 * bugfix: dtd taskpool destructor should work symetric to contructor by @abouteiller in https://github.com/ICLDisco/parsec/pull/637 * fixes memory leaks by @BrieucNicolas in https://github.com/ICLDisco/parsec/pull/639 * Fix the lack of direct GPU to GPU communications in multi-device runs. by @therault in https://github.com/ICLDisco/parsec/pull/642 * Compute CPU and GPU versions without lying during kernel epilog (enable TTG/PTG versioning to coexist) by @abouteiller in https://github.com/ICLDisco/parsec/pull/648 * Initialize the parsec's HWLOC subsystem before starting threads. by @abouteiller in https://github.com/ICLDisco/parsec/pull/650 * Fix overflow when calling parsec_data_create by @QingleiCao in https://github.com/ICLDisco/parsec/pull/646 * cmakery: let find_package find HIP v6 by @abouteiller in https://github.com/ICLDisco/parsec/pull/652 * Fix computation of available memory on gpu (avoid truncation and conversions) by @abouteiller in https://github.com/ICLDisco/parsec/pull/651 * bugfix: when hip is not found, its ok. by @abouteiller in https://github.com/ICLDisco/parsec/pull/656 * Add sanity check for free memory by @devreal in https://github.com/ICLDisco/parsec/pull/658 * Explicit message when outputing the warning about being unable to allocate memory in GPU code by @therault in https://github.com/ICLDisco/parsec/pull/655 * config: osx would not find bison on newer fink/brew by @abouteiller in https://github.com/ICLDisco/parsec/pull/657 * Consolidated error handling when GPU only tests execute on CPU systems by @abouteiller in https://github.com/ICLDisco/parsec/pull/644 * Add the number of copies evicted in the statistics of the devices. by @therault in https://github.com/ICLDisco/parsec/pull/666 * Fix use of calloc. by @bosilca in https://github.com/ICLDisco/parsec/pull/669 * Add: mca control for cpu load balancing (and don't report Gflops figures for cpus we can't determine) by @abouteiller in https://github.com/ICLDisco/parsec/pull/663 * Suffix-increment is deprecated on volatile variables in C++ by @devreal in https://github.com/ICLDisco/parsec/pull/674 * show-caps: don't report flops for unknown cuda devs, report peer access by @abouteiller in https://github.com/ICLDisco/parsec/pull/672 * Apply does not release user-defined memory by @QingleiCao in https://github.com/ICLDisco/parsec/pull/676 * Release lock in create_w2r_task if readers are readers are not zero by @devreal in https://github.com/ICLDisco/parsec/pull/678 * w2r task should unlock the lock if readers are not 0 by @devreal in https://github.com/ICLDisco/parsec/pull/682 * Refactored CI by @G-Ragghianti in https://github.com/ICLDisco/parsec/pull/667 * C11 atomic lock alignment in data_t by @abouteiller in https://github.com/ICLDisco/parsec/pull/685 * The device task is now released by the DSL by @bosilca in https://github.com/ICLDisco/parsec/pull/688 * Reenable the memory eviction code by @bosilca in https://github.com/ICLDisco/parsec/pull/679 * List ordered push: search from back if lower than pivot by @devreal in https://github.com/ICLDisco/parsec/pull/693 * Add an icl platform file, move saturn platform to legacy by @abouteiller in https://github.com/ICLDisco/parsec/pull/692 * Contrib/copycheck by @abouteiller in https://github.com/ICLDisco/parsec/pull/574 * bugfix: dtd would not run cpu hooks when compiled with cuda by @abouteiller in https://github.com/ICLDisco/parsec/pull/697 * v4.0.2411 changelog by @abouteiller in https://github.com/ICLDisco/parsec/pull/699 * Fix a race condition in DTD for the local termdet by @therault in https://github.com/ICLDisco/parsec/pull/698 * Bring back support for MPI allow_overtake by @bosilca in https://github.com/ICLDisco/parsec/pull/704 * Fix function name for parsec_atomic_fetch_add_int64 by @devreal in https://github.com/ICLDisco/parsec/pull/705 ### New Contributors * @omor1 made their first contribution in https://github.com/ICLDisco/parsec/pull/320 * @devreal made their first contribution in https://github.com/ICLDisco/parsec/pull/343 * @DSMishler made their first contribution in https://github.com/ICLDisco/parsec/pull/427 * @cflinto made their first contribution in https://github.com/ICLDisco/parsec/pull/429 * @QingleiCao made their first contribution in https://github.com/ICLDisco/parsec/pull/506 * @josephjohnjj made their first contribution in https://github.com/ICLDisco/parsec/pull/526 * @evaleev made their first contribution in https://github.com/ICLDisco/parsec/pull/567 * @bimalgaudel made their first contribution in https://github.com/ICLDisco/parsec/pull/611 * @BrieucNicolas made their first contribution in https://github.com/ICLDisco/parsec/pull/639 * @G-Ragghianti made their first contribution in https://github.com/ICLDisco/parsec/pull/667 **Full Changelog**: https://github.com/ICLDisco/parsec/commits/parsec-4.0.2411

- C
Published by abouteiller over 1 year ago

parsec - v3.0.2209

PaRSEC 22.09 (September 2022) API 3.0

  • Fix PaRSEC not compiling with gcc 10+

- C
Published by abouteiller almost 4 years ago

parsec - v3.0.2012

PaRSEC 20.12 (December 2020)

  • PaRSEC API 3.0

  • PaRSEC now requires CMake 3.16.

  • New configure system to ease the installation of PaRSEC. See INSTALL for details. This system automates installation on most DOE leadership systems.

  • Split DPLASMA and PaRSEC into separate repositories. PaRSEC moves from cmake-2.0 to cmake-3.12, using targets. Targets are exported for third-party integration

  • Add visualization tools to extract user-defined properties from the application (see: PR 229 visualization-tools)

  • Automate expression of required data transfers from host-to-device and device-to-host to satisfy depencencies (and anti-dependencies). PaRSEC tracks multiple versions of the same data as data copies with a coherency algorithm that initiates data transfers as needed. The heurisitic for the eviction policy in out-of-memory event on GPU has been optimized to allow for efficient operation in larger than GPU memory problems.

  • Add support for MPI out-of-order matching capabilities; Added capability for compute threads to send direct control messages to indicate completion of tasks to remote nodes (without delegation to the communication thread)

  • Remove communication mode EAGER from the runtime. It had a rare but hard to correct bug that would rarely deadlock, and the performance benefit was small.

  • Add a Map operator on the Block Cyclic matrix data collection that performs in-place data transformation on the collection with a user provided operator.

  • Add support in the runtime for user-defined properties evaluated at runtime and easy to export through a shared memory region (see: PR 229 visualization-tools)

  • Add a PAPI-SDE interface to the parsec library, to expose internal counters via the PAPI-Software Defined Events interface.

  • Add a backend support for OTF2 in the profiling mechanism. OTF2 is used automatically if a OTF2 installation is found.

  • Add a MCA parameter to control the number of ejected blocks from GPU memory (devicecudamaxnumberofejecteddata). Add a MCA parameter to control wether or not the GPU engine will take some time to sort the first N tasks of the pending queue (devicecudasortpendinglist).

  • Reshape the users vision of PaRSEC: they only have to include a single header (parsec.h) for most usages, and link with a single library (-lparsec).

  • Update the PaRSEC DSL handling of initial tasks. We now rely on 2 pieces of information: the number of DSL tasks, and the number of tasks imposed by the system (all types of data transfer).

  • Add a purely local scheduler (ll), that uses a single LIFO per thread. Each schedule operation does 1 atomic (push in local queue), each select operation does up to t atomics (pop in local queue, then try any other thread's queue until they are all tested empty).

  • Add a --ignore-properties=... option to parsec_ptgpp

  • Change API of hash tables: allow keys of arbitrary size. The API features how to build a key from a task; how to hash a key into 1 <= N <= 64 bits; and how to compare twy keys (plus a printing function to debug).

  • Change behavior of DEBUG_HISTORY: log all information inside a buffer of fixed size (MCA parameter) per thread, do not allocate memory during logging, and use timestamp to re-order output when the user calls dump()

  • DTD interface is updated (new flag to send pointer as parameter, unpacking of paramteres is simpler etc).

  • DTD provides mca param (dtddebugverbose) to print information about traversal of DAG in a separate output stream from the default.

- C
Published by abouteiller over 4 years ago