Recent Releases of parsec
parsec - parsec-4.0.2411
Curated Change log
Added
- PaRSEC API 4.0.
- Add DTD CUDA support including NEW tiles in DTD.
- Add RoCM/HIP device support.
- Add IrisXE/Level0 device support (experimental).
- Enable users to manage their own data copies without PaRSEC interfering. Data copies are marked as being owned by PaRSEC or not and managed by PaRSEC or not. A data copy owned by PaRSEC can be reclaimed by PaRSEC when its reference count reaches 0, a data copy managed by PaRSEC can be copied / moved onto a different device, while a data copy not managed by PaRSEC will never be moved by the runtime.
- Add an info system, and introduce two info hooks. See
parsec/class/info.hfor details. The info system allows the user to register info objects with different levels of structures and dynamic objects in the PaRSEC runtime. - PTG supports user-defined routines to move data between GPU and CPU, and user-defined sizes for buffers allocated on the GPU.
- PTG supports reshaping data propagated between local tasks and the speficiation of two types on acccesses to data colletions.
- PINS log
SCHEDULE_BEGINandSCHEDULE_ENDevents to better track tasks lifecycle. - Detect and report oversubscribed binding of core resources.
- PaRSEC Thread binding can be disabled (
bind_threads 0MCA parameter). - Load balancing between GPUs can be tuned (
device_load_balance_skewMCA parameter). - Load balancing exclusivity between CPU/GPUs can be disabled (
device_load_balance_allow_cpuMCA parameter). - Data sent in messages can be of variable size.
- New API
parsec_context_querycan be used to obtain information on the system, like the number of devices, ranks, etc. - New active-message communication API gives low-level access to the PaRSEC communication system to DSLs.
Changed
- Single letter command line options have been replaced with
--mcaparameters.--helpis now--parsec-help. - Renamed symbols related to data distribution to properly prefix them with the
parsec_prefix. The old symbols have been deprecated. - DTD interface change: the global array parsecdtdarena_datatypes is replaced with functions to create, destroy, and get arena datatypes for DTD, and these objects now live inside the parsec context.
PARSEC_SUCCESSchanged to0(from-1), all values forPARSEC_ERR_XYZchanged.- PaRSEC now requires CMake 3.21.
- PaRSEC profiling tools now require Python 3.x
- PaRSEC profiling system does not require for local dictionaries to be identical between ranks anymore.
time_estimatefunctions can be used to control task load balancing (replacesweightPTG property).
Deprecated
- data distribution w/o the
parsec_prefix. Further documentation (including a sed script) can be found incontrib/renaming.
Removed
- PaRSEC API 3.0
- RECURSIVE Device support (this is temporary and will be restored in a future version).
- Removed obsolete
dbp2pajetool;h5totraceis the replacement tool to use. This removes the optional dependency on GTG. - Removed all command line options not prefixed by
--mca, except for--parsec-helpand--parsec-version. - Using more than
PARSEC_GPU_MAX_WORKSPACEworkspaces per device will now cause an error (instead of computing incorrect values). - PTG property
weight(replaced bytime_estimate).
Fixed
- DTD Termination detection would occasionally assert.
- Multiple bugs with GPU data ownership causing crashes and incorrect results when executing with more than 1 GPU.
- Device-to-device memory copies would not work in some scenarios.
- Suboptimal ordering of members in broadcast tree could cause performance reduction.
- Cray MPI and MPICH would crash in
MPI_Canceland when usingNULLdatatypes. - Do not report incorrect flops/s capabilities (
device_show_capabilitiesMCA parameter). - On some systems PaRSEC would allocate more GPU memory than is available on the device.
- Performance with large number of GPU tasks with the same priority would be poor due to overhead of sorting by priority.
Known Bugs
- PaRSEC Thread binding ignores externally provided binding (e.g., a cpuset enforced by
srun); see issue icldisco/dplasma#9. - Enabling the
RECURSIVEdevice will cause crashes (it is disabled by default in this release); see issues #548, #541. - Running out of GPU memory when using the NEW keyword in PTG may cause deadlocks; see issue #527.
Security
Merged Pull Requests
List of merged pull requests
* [BBT#582] bugfix/atomic lifo: The offsetof was incorrect leading to lifo padding being wrong in external lifo by @abouteiller in https://github.com/ICLDisco/parsec/pull/316 * First sketch of a github action for building by @bosilca in https://github.com/ICLDisco/parsec/pull/309 * Miscellaneous profiling fixes by @omor1 in https://github.com/ICLDisco/parsec/pull/320 * Per-language compiler flags by @therault in https://github.com/ICLDisco/parsec/pull/326 * [BBT#541] A new way to install the internal headers by @bosilca in https://github.com/ICLDisco/parsec/pull/322 * Doc/GitHub by @abouteiller in https://github.com/ICLDisco/parsec/pull/330 * Provide a temporary fix for the flag detection. by @bosilca in https://github.com/ICLDisco/parsec/pull/336 * We need BISON 3, and try to automatically pick the brew variant on Mac OSX by @abouteiller in https://github.com/ICLDisco/parsec/pull/331 * Clean strings usages in CMake. by @bosilca in https://github.com/ICLDisco/parsec/pull/340 * Allow the runtime to compile even when PTG support is not possible. by @bosilca in https://github.com/ICLDisco/parsec/pull/332 * Work around GCC bug for atomic_thread_fence with memory order acquire by @devreal in https://github.com/ICLDisco/parsec/pull/343 * Fix parsec_future: volatile and memory barriers by @devreal in https://github.com/ICLDisco/parsec/pull/342 * Reshape test: variable used for polling should be volatile by @devreal in https://github.com/ICLDisco/parsec/pull/344 * Dust off the cmake_modules by @abouteiller in https://github.com/ICLDisco/parsec/pull/346 * New CMake versions use MPI_ROOT to find MPI by @abouteiller in https://github.com/ICLDisco/parsec/pull/345 * Fallback using a compatible HWLOC. by @bosilca in https://github.com/ICLDisco/parsec/pull/341 * hotfix: compile failure when Ayudame not found by @abouteiller in https://github.com/ICLDisco/parsec/pull/348 * Fix/quick fixes by @bosilca in https://github.com/ICLDisco/parsec/pull/350 * Update issue template to make it easier to read and easier to fill-up by @abouteiller in https://github.com/ICLDisco/parsec/pull/349 * Update the installation instructions by @abouteiller in https://github.com/ICLDisco/parsec/pull/354 * Cleanup/ptgpp assignments by @abouteiller in https://github.com/ICLDisco/parsec/pull/352 * Apply -g3 to DEBUG only, set default config to Release by @abouteiller in https://github.com/ICLDisco/parsec/pull/347 * Profiling msync and header commit by @therault in https://github.com/ICLDisco/parsec/pull/337 * Removing hard flex/bison dependency: only devs need to run the parser by @abouteiller in https://github.com/ICLDisco/parsec/pull/335 * Hicma/recursive by @bosilca in https://github.com/ICLDisco/parsec/pull/328 * Fix/deprecated support by @bosilca in https://github.com/ICLDisco/parsec/pull/362 * Add the filename to the generated profiling event name. by @bosilca in https://github.com/ICLDisco/parsec/pull/359 * Fix atomics on macosX not working properly (missing header) by @abouteiller in https://github.com/ICLDisco/parsec/pull/356 * Remove never compiled in '64bit' lifo implementation by @abouteiller in https://github.com/ICLDisco/parsec/pull/360 * Fix/many small updates by @bosilca in https://github.com/ICLDisco/parsec/pull/363 * Make the ParsecCompilerFlags.cmake self contained by @abouteiller in https://github.com/ICLDisco/parsec/pull/364 * Profiling fix: parsec_init(NULL, NULL) by @therault in https://github.com/ICLDisco/parsec/pull/339 * GitHub runner with spack by @bosilca in https://github.com/ICLDisco/parsec/pull/333 * Update PAPI SDE to fit the current API by @therault in https://github.com/ICLDisco/parsec/pull/365 * ucontext is not supported on OSX. by @bosilca in https://github.com/ICLDisco/parsec/pull/366 * recursive cb type was not correct by @abouteiller in https://github.com/ICLDisco/parsec/pull/368 * Since new policy, setting the non-cache variable creates an empty cache by @abouteiller in https://github.com/ICLDisco/parsec/pull/367 * Do now allow spack to be updated automatically. by @bosilca in https://github.com/ICLDisco/parsec/pull/375 * flex: on some machines, flex cannot work if parsec/utils is not created by @abouteiller in https://github.com/ICLDisco/parsec/pull/374 * Attempt to backport the revamp of the communication engine by @devreal in https://github.com/ICLDisco/parsec/pull/380 * Respect DISTDIR is provided. by @bosilca in https://github.com/ICLDisco/parsec/pull/383 * [RFC] profiling tools: more efficient cross-stream event matching by @omor1 in https://github.com/ICLDisco/parsec/pull/372 * Hash table: count used buckets only when needed by @devreal in https://github.com/ICLDisco/parsec/pull/379 * Print the debug rank from device_show_statistics by @abouteiller in https://github.com/ICLDisco/parsec/pull/386 * Handle error in CUDA/HIP module init and configurable max_streams by @therault in https://github.com/ICLDisco/parsec/pull/351 * Update to a newer spack compiler by @bosilca in https://github.com/ICLDisco/parsec/pull/392 * Make the PUSHOUT and other DTD GPU concepts generic by @abouteiller in https://github.com/ICLDisco/parsec/pull/387 * Workaround current CUDA/HIP "solution suspicious" bug... by @therault in https://github.com/ICLDisco/parsec/pull/381 * dtd_bench_simple_gemm.c relies on non-standard cblas.h file by @therault in https://github.com/ICLDisco/parsec/pull/317 * profiling tools: improve large buffer performance by @omor1 in https://github.com/ICLDisco/parsec/pull/390 * Add a load-balancing skew so that we favor locality up to a configurable limit by @abouteiller in https://github.com/ICLDisco/parsec/pull/389 * Fix redistribute wrapper by @devreal in https://github.com/ICLDisco/parsec/pull/395 * [BBT#509] Dtd cuda with new by @therault in https://github.com/ICLDisco/parsec/pull/318 * Only enable CUDA language if supported. by @bosilca in https://github.com/ICLDisco/parsec/pull/404 * [BBT#572] Implement hash table API providing a key handle during lock by @devreal in https://github.com/ICLDisco/parsec/pull/307 * profiling tools: fix format for padded structures by @omor1 in https://github.com/ICLDisco/parsec/pull/402 * Don't pass the execution stream around to recursive calls. by @bosilca in https://github.com/ICLDisco/parsec/pull/413 * Protect the inline functions by they device support. by @bosilca in https://github.com/ICLDisco/parsec/pull/415 * Allow the DSL to provide a task_snprintf function and use it when displaying the DOT by @devreal in https://github.com/ICLDisco/parsec/pull/409 * Execution stream keeps the highest priority task for local execution. by @bosilca in https://github.com/ICLDisco/parsec/pull/399 * TTG/termdet by @therault in https://github.com/ICLDisco/parsec/pull/391 * More data transfer statistics between devices by @therault in https://github.com/ICLDisco/parsec/pull/426 * Remove startup tasks from DTD by @therault in https://github.com/ICLDisco/parsec/pull/425 * tools/profiling: PTT v2 by @omor1 in https://github.com/ICLDisco/parsec/pull/418 * Fix/more warnings by @bosilca in https://github.com/ICLDisco/parsec/pull/416 * Fix possible access to unowned memory in `parsec_dtd_task_class_add_chore` by @DSMishler in https://github.com/ICLDisco/parsec/pull/427 * fix remote_dep_mpi.c by @cflinto in https://github.com/ICLDisco/parsec/pull/429 * bugfix for profiling without multiprocessing by @DSMishler in https://github.com/ICLDisco/parsec/pull/432 * Project_dyn test missing libm dependency by @abouteiller in https://github.com/ICLDisco/parsec/pull/433 * Ttg/termdet dynamic PTG by @therault in https://github.com/ICLDisco/parsec/pull/430 * Update API versions in examples by @abouteiller in https://github.com/ICLDisco/parsec/pull/435 * Make sure PaRSEC compiles for all rwlock implementations. by @bosilca in https://github.com/ICLDisco/parsec/pull/434 * comm: set parsec_tls_execution_stream in comm thread by @omor1 in https://github.com/ICLDisco/parsec/pull/422 * Use normal for loops to iterate over local index variables when they are a range. by @therault in https://github.com/ICLDisco/parsec/pull/329 * [BBT#559] Add llp scheduler: local lifo with priorities by @devreal in https://github.com/ICLDisco/parsec/pull/325 * [BBT#536] Add AMD RoCM/HIP device by @abouteiller in https://github.com/ICLDisco/parsec/pull/315 * TSL variables are not static by default. by @bosilca in https://github.com/ICLDisco/parsec/pull/437 * Allow overwriting of the completion and enqueue callbacks. by @bosilca in https://github.com/ICLDisco/parsec/pull/439 * Bring back support for changing the PaRSEC communicator by @bosilca in https://github.com/ICLDisco/parsec/pull/401 * Fix: termination detector race condition by @therault in https://github.com/ICLDisco/parsec/pull/438 * Fixes/tls in comm thread by @therault in https://github.com/ICLDisco/parsec/pull/440 * profiling: disambiguation between certain MPI events by @omor1 in https://github.com/ICLDisco/parsec/pull/376 * TTG/building system by @therault in https://github.com/ICLDisco/parsec/pull/424 * A small script to help us find which files might need copyright update. by @therault in https://github.com/ICLDisco/parsec/pull/442 * Remove dependency on argv[0] by @abouteiller in https://github.com/ICLDisco/parsec/pull/445 * Make configurable the treshold for warning about wrong binding on by @abouteiller in https://github.com/ICLDisco/parsec/pull/453 * Remove some warnings about unused variables by @abouteiller in https://github.com/ICLDisco/parsec/pull/451 * Fix statistics management with multiple GPU. by @bosilca in https://github.com/ICLDisco/parsec/pull/454 * Debug output about reshapping was too verbose by @abouteiller in https://github.com/ICLDisco/parsec/pull/456 * Topic/update spack by @bosilca in https://github.com/ICLDisco/parsec/pull/462 * Increase function name length limit in debug output by @devreal in https://github.com/ICLDisco/parsec/pull/463 * initial flags modification PR. Open to review. by @DSMishler in https://github.com/ICLDisco/parsec/pull/461 * Introduce parsec_taskpool_wait and parsec_taskpool_test by @devreal in https://github.com/ICLDisco/parsec/pull/411 * Fix profile dtd by @therault in https://github.com/ICLDisco/parsec/pull/475 * Correctly identify the Intel compiler. by @bosilca in https://github.com/ICLDisco/parsec/pull/472 * Proper case and imported target for hwloc by @abouteiller in https://github.com/ICLDisco/parsec/pull/457 * Use the profiling key macros by @bosilca in https://github.com/ICLDisco/parsec/pull/476 * No more tag collisions. by @bosilca in https://github.com/ICLDisco/parsec/pull/477 * profiling: fix multiple EXEC_END events in some circumstances by @omor1 in https://github.com/ICLDisco/parsec/pull/421 * DSL profiling by @therault in https://github.com/ICLDisco/parsec/pull/469 * Put END_C_DECLS at the end of device_cuda.h by @devreal in https://github.com/ICLDisco/parsec/pull/480 * Drop support for python2. by @bosilca in https://github.com/ICLDisco/parsec/pull/482 * Hotfix dtd dsl and warning by @therault in https://github.com/ICLDisco/parsec/pull/488 * Use pip to build and install the python support. by @bosilca in https://github.com/ICLDisco/parsec/pull/489 * Termdet callback order by @therault in https://github.com/ICLDisco/parsec/pull/494 * Fix the branching test in distributed by @therault in https://github.com/ICLDisco/parsec/pull/495 * Fix profiling tests by @therault in https://github.com/ICLDisco/parsec/pull/490 * Log all types of runtime-system events in task_profiler, and ensure that all time of compute threads is accounted for by @devreal in https://github.com/ICLDisco/parsec/pull/410 * Fix missing loop counter increment in pins task profiler by @devreal in https://github.com/ICLDisco/parsec/pull/498 * Close files between operations. by @bosilca in https://github.com/ICLDisco/parsec/pull/484 * Make parsec_data_t::device_copies a flexible array member by @devreal in https://github.com/ICLDisco/parsec/pull/499 * Fix an issue about GPU statistics (Issue#505) by @QingleiCao in https://github.com/ICLDisco/parsec/pull/506 * Do not cancel persistent requests with cray-mpich, it is broken. by @abouteiller in https://github.com/ICLDisco/parsec/pull/519 * Make sure data_in is not NULL on GPU when accessing nb_elts by @QingleiCao in https://github.com/ICLDisco/parsec/pull/492 * volatile uint32 is not always a valid type for MPI_SUM by @abouteiller in https://github.com/ICLDisco/parsec/pull/524 * Profiling corrected. by @josephjohnjj in https://github.com/ICLDisco/parsec/pull/526 * Remove all options that are not prefixed by --mca by @abouteiller in https://github.com/ICLDisco/parsec/pull/447 * Implement a universal hash function by @devreal in https://github.com/ICLDisco/parsec/pull/528 * Can't use NULL to pass-in MPI datatypes with MPICH derivatives by @abouteiller in https://github.com/ICLDisco/parsec/pull/518 * Update to the SDE available in PAPI-7.0.1 by @therault in https://github.com/ICLDisco/parsec/pull/522 * More flexible paranoid checks in data_dist/matrix implementations by @therault in https://github.com/ICLDisco/parsec/pull/530 * Handle the tertiary case for startup tasks. by @bosilca in https://github.com/ICLDisco/parsec/pull/508 * Fix some errors with ctest for profiling by @abouteiller in https://github.com/ICLDisco/parsec/pull/523 * pip install --prefix has version-dependent behaviors by @therault in https://github.com/ICLDisco/parsec/pull/532 * Fix old typo in termdet modules management: by @therault in https://github.com/ICLDisco/parsec/pull/533 * Allow for tag registration/deregistration any time. by @bosilca in https://github.com/ICLDisco/parsec/pull/521 * Topic/device naming by @bosilca in https://github.com/ICLDisco/parsec/pull/493 * Gpu workspace fix by @therault in https://github.com/ICLDisco/parsec/pull/510 * Fix bug: some scenarios would call a nullptr function in profiling by @therault in https://github.com/ICLDisco/parsec/pull/535 * Profiling hotfix: error in python3 when passing {char[64]} as conversion type by @therault in https://github.com/ICLDisco/parsec/pull/536 * Limit the number of recv requests to ensure there is space for sends by @devreal in https://github.com/ICLDisco/parsec/pull/538 * Flex flags bugfix and python venv log by @abouteiller in https://github.com/ICLDisco/parsec/pull/537 * Fix hash function generation for PTG by @therault in https://github.com/ICLDisco/parsec/pull/479 * Small warnings from clang14 by @abouteiller in https://github.com/ICLDisco/parsec/pull/542 * Print -wflags found only when results differ from cache by @abouteiller in https://github.com/ICLDisco/parsec/pull/540 * Fix the start/stop test. by @bosilca in https://github.com/ICLDisco/parsec/pull/546 * Gpu fix load balancing by @therault in https://github.com/ICLDisco/parsec/pull/517 * Tear-down parsec_ce after high-level communication by @devreal in https://github.com/ICLDisco/parsec/pull/549 * relabel mislabelled tests by @abouteiller in https://github.com/ICLDisco/parsec/pull/550 * Find the right spack environment. by @bosilca in https://github.com/ICLDisco/parsec/pull/559 * Add device async/again support. by @bosilca in https://github.com/ICLDisco/parsec/pull/544 * hotfix by @therault in https://github.com/ICLDisco/parsec/pull/562 * Profiling and PAPI SDE updates by @therault in https://github.com/ICLDisco/parsec/pull/565 * Fix the management of GPU copies. by @bosilca in https://github.com/ICLDisco/parsec/pull/563 * cmake -> CMAKE_COMMAND by @evaleev in https://github.com/ICLDisco/parsec/pull/567 * Correct the logic for passing over CPU/RECURSIVE devices by @abouteiller in https://github.com/ICLDisco/parsec/pull/557 * HAVE_PEER_ACCESS is always present in all relevant versions of CUDA or by @abouteiller in https://github.com/ICLDisco/parsec/pull/572 * CUDA: disable timer support on cuda events by @devreal in https://github.com/ICLDisco/parsec/pull/576 * Pick a stable spack branch by @bosilca in https://github.com/ICLDisco/parsec/pull/578 * Add an option to skip HWLOC compat run by @bosilca in https://github.com/ICLDisco/parsec/pull/582 * Allow locals and parameters to be defined via CMake. by @bosilca in https://github.com/ICLDisco/parsec/pull/583 * Prevent race condition in accelerator copies management by @bosilca in https://github.com/ICLDisco/parsec/pull/575 * Fix/rocm detect and unknown device warning by @abouteiller in https://github.com/ICLDisco/parsec/pull/577 * [python] do not build python support unless pandas is available by @evaleev in https://github.com/ICLDisco/parsec/pull/584 * Refactor GPU device to increase code factorization between the devices. by @therault in https://github.com/ICLDisco/parsec/pull/570 * Disable recursive device by default (temporarily) by @abouteiller in https://github.com/ICLDisco/parsec/pull/585 * remote_dep: rotate bcast topology around root by @omor1 in https://github.com/ICLDisco/parsec/pull/481 * Discover atomic support for __int128_t. by @bosilca in https://github.com/ICLDisco/parsec/pull/587 * Fix warnings. by @bosilca in https://github.com/ICLDisco/parsec/pull/591 * Typo in level zero component by @therault in https://github.com/ICLDisco/parsec/pull/592 * dbpreader: missing corner case in cache building by @therault in https://github.com/ICLDisco/parsec/pull/593 * Skip profiling for task classes without profiling information by @DSMishler in https://github.com/ICLDisco/parsec/pull/594 * cmake logic fixes for level zero and half-installed systems by @therault in https://github.com/ICLDisco/parsec/pull/596 * l0: dpccpp can't create output files in the build dir if the enclosing by @abouteiller in https://github.com/ICLDisco/parsec/pull/597 * gpu: some errors introduced during gpu despecialization caused deadlocks by @abouteiller in https://github.com/ICLDisco/parsec/pull/598 * ze: the queue need to be reset when task completes by @abouteiller in https://github.com/ICLDisco/parsec/pull/599 * Fixes for GPU memory oversubscription by @devreal in https://github.com/ICLDisco/parsec/pull/602 * If the DSL defines a task_snprintf function, use that function. by @therault in https://github.com/ICLDisco/parsec/pull/603 * Make sure the w2r task has a stageout set by @devreal in https://github.com/ICLDisco/parsec/pull/604 * Consistently use size_t for nb_elts in data and flows by @devreal in https://github.com/ICLDisco/parsec/pull/605 * Minor cleanup of the DTD parameters manipulation. by @bosilca in https://github.com/ICLDisco/parsec/pull/609 * Fix the parsec_future_t. by @bosilca in https://github.com/ICLDisco/parsec/pull/608 * Topic/add evaluate keyword by @bosilca in https://github.com/ICLDisco/parsec/pull/569 * Relative path, symbolic links, and python examples by @therault in https://github.com/ICLDisco/parsec/pull/606 * Update process_name.c. Fixes the issue #610 by @bimalgaudel in https://github.com/ICLDisco/parsec/pull/611 * CMAKE: Bring back checks for atomic CAS by @devreal in https://github.com/ICLDisco/parsec/pull/595 * HOTFIX: make the default number of devices be all the devices seen by… by @therault in https://github.com/ICLDisco/parsec/pull/613 * bugfix: in dtd sometimes the cpu incarnation and gpu incarnations are by @abouteiller in https://github.com/ICLDisco/parsec/pull/616 * Re-enable CI tests for cuda caps as they now work again. by @abouteiller in https://github.com/ICLDisco/parsec/pull/614 * Add capability of saving GPU statistics and printing diff vs saved stats by @abouteiller in https://github.com/ICLDisco/parsec/pull/558 * Fix the argument _NSGetExecutablePath. by @bosilca in https://github.com/ICLDisco/parsec/pull/620 * Add a context-level query capability. by @bosilca in https://github.com/ICLDisco/parsec/pull/621 * Prevent CI from running OOM when oversubscribing GPUs by @abouteiller in https://github.com/ICLDisco/parsec/pull/629 * Fix CUDA protection macro use by @bosilca in https://github.com/ICLDisco/parsec/pull/632 * Move comm profiling initialization into comm thread by @devreal in https://github.com/ICLDisco/parsec/pull/626 * Cleanup/cosmetics by @abouteiller in https://github.com/ICLDisco/parsec/pull/631 * Alternative solution to the CI problem with GPUs by @therault in https://github.com/ICLDisco/parsec/pull/633 * ci: all tests must use parsec_addtest by @abouteiller in https://github.com/ICLDisco/parsec/pull/635 * [BBT#237] Allow sender to send data of any size. by @bosilca in https://github.com/ICLDisco/parsec/pull/321 * bugfix: dtd taskpool destructor should work symetric to contructor by @abouteiller in https://github.com/ICLDisco/parsec/pull/637 * fixes memory leaks by @BrieucNicolas in https://github.com/ICLDisco/parsec/pull/639 * Fix the lack of direct GPU to GPU communications in multi-device runs. by @therault in https://github.com/ICLDisco/parsec/pull/642 * Compute CPU and GPU versions without lying during kernel epilog (enable TTG/PTG versioning to coexist) by @abouteiller in https://github.com/ICLDisco/parsec/pull/648 * Initialize the parsec's HWLOC subsystem before starting threads. by @abouteiller in https://github.com/ICLDisco/parsec/pull/650 * Fix overflow when calling parsec_data_create by @QingleiCao in https://github.com/ICLDisco/parsec/pull/646 * cmakery: let find_package find HIP v6 by @abouteiller in https://github.com/ICLDisco/parsec/pull/652 * Fix computation of available memory on gpu (avoid truncation and conversions) by @abouteiller in https://github.com/ICLDisco/parsec/pull/651 * bugfix: when hip is not found, its ok. by @abouteiller in https://github.com/ICLDisco/parsec/pull/656 * Add sanity check for free memory by @devreal in https://github.com/ICLDisco/parsec/pull/658 * Explicit message when outputing the warning about being unable to allocate memory in GPU code by @therault in https://github.com/ICLDisco/parsec/pull/655 * config: osx would not find bison on newer fink/brew by @abouteiller in https://github.com/ICLDisco/parsec/pull/657 * Consolidated error handling when GPU only tests execute on CPU systems by @abouteiller in https://github.com/ICLDisco/parsec/pull/644 * Add the number of copies evicted in the statistics of the devices. by @therault in https://github.com/ICLDisco/parsec/pull/666 * Fix use of calloc. by @bosilca in https://github.com/ICLDisco/parsec/pull/669 * Add: mca control for cpu load balancing (and don't report Gflops figures for cpus we can't determine) by @abouteiller in https://github.com/ICLDisco/parsec/pull/663 * Suffix-increment is deprecated on volatile variables in C++ by @devreal in https://github.com/ICLDisco/parsec/pull/674 * show-caps: don't report flops for unknown cuda devs, report peer access by @abouteiller in https://github.com/ICLDisco/parsec/pull/672 * Apply does not release user-defined memory by @QingleiCao in https://github.com/ICLDisco/parsec/pull/676 * Release lock in create_w2r_task if readers are readers are not zero by @devreal in https://github.com/ICLDisco/parsec/pull/678 * w2r task should unlock the lock if readers are not 0 by @devreal in https://github.com/ICLDisco/parsec/pull/682 * Refactored CI by @G-Ragghianti in https://github.com/ICLDisco/parsec/pull/667 * C11 atomic lock alignment in data_t by @abouteiller in https://github.com/ICLDisco/parsec/pull/685 * The device task is now released by the DSL by @bosilca in https://github.com/ICLDisco/parsec/pull/688 * Reenable the memory eviction code by @bosilca in https://github.com/ICLDisco/parsec/pull/679 * List ordered push: search from back if lower than pivot by @devreal in https://github.com/ICLDisco/parsec/pull/693 * Add an icl platform file, move saturn platform to legacy by @abouteiller in https://github.com/ICLDisco/parsec/pull/692 * Contrib/copycheck by @abouteiller in https://github.com/ICLDisco/parsec/pull/574 * bugfix: dtd would not run cpu hooks when compiled with cuda by @abouteiller in https://github.com/ICLDisco/parsec/pull/697 * v4.0.2411 changelog by @abouteiller in https://github.com/ICLDisco/parsec/pull/699 * Fix a race condition in DTD for the local termdet by @therault in https://github.com/ICLDisco/parsec/pull/698 * Bring back support for MPI allow_overtake by @bosilca in https://github.com/ICLDisco/parsec/pull/704 * Fix function name for parsec_atomic_fetch_add_int64 by @devreal in https://github.com/ICLDisco/parsec/pull/705 ### New Contributors * @omor1 made their first contribution in https://github.com/ICLDisco/parsec/pull/320 * @devreal made their first contribution in https://github.com/ICLDisco/parsec/pull/343 * @DSMishler made their first contribution in https://github.com/ICLDisco/parsec/pull/427 * @cflinto made their first contribution in https://github.com/ICLDisco/parsec/pull/429 * @QingleiCao made their first contribution in https://github.com/ICLDisco/parsec/pull/506 * @josephjohnjj made their first contribution in https://github.com/ICLDisco/parsec/pull/526 * @evaleev made their first contribution in https://github.com/ICLDisco/parsec/pull/567 * @bimalgaudel made their first contribution in https://github.com/ICLDisco/parsec/pull/611 * @BrieucNicolas made their first contribution in https://github.com/ICLDisco/parsec/pull/639 * @G-Ragghianti made their first contribution in https://github.com/ICLDisco/parsec/pull/667 **Full Changelog**: https://github.com/ICLDisco/parsec/commits/parsec-4.0.2411
- C
Published by abouteiller over 1 year ago
parsec - v3.0.2012
PaRSEC 20.12 (December 2020)
PaRSEC API 3.0
PaRSEC now requires CMake 3.16.
New configure system to ease the installation of PaRSEC. See INSTALL for details. This system automates installation on most DOE leadership systems.
Split DPLASMA and PaRSEC into separate repositories. PaRSEC moves from cmake-2.0 to cmake-3.12, using targets. Targets are exported for third-party integration
Add visualization tools to extract user-defined properties from the application (see: PR 229 visualization-tools)
Automate expression of required data transfers from host-to-device and device-to-host to satisfy depencencies (and anti-dependencies). PaRSEC tracks multiple versions of the same data as data copies with a coherency algorithm that initiates data transfers as needed. The heurisitic for the eviction policy in out-of-memory event on GPU has been optimized to allow for efficient operation in larger than GPU memory problems.
Add support for MPI out-of-order matching capabilities; Added capability for compute threads to send direct control messages to indicate completion of tasks to remote nodes (without delegation to the communication thread)
Remove communication mode EAGER from the runtime. It had a rare but hard to correct bug that would rarely deadlock, and the performance benefit was small.
Add a Map operator on the Block Cyclic matrix data collection that performs in-place data transformation on the collection with a user provided operator.
Add support in the runtime for user-defined properties evaluated at runtime and easy to export through a shared memory region (see: PR 229 visualization-tools)
Add a PAPI-SDE interface to the parsec library, to expose internal counters via the PAPI-Software Defined Events interface.
Add a backend support for OTF2 in the profiling mechanism. OTF2 is used automatically if a OTF2 installation is found.
Add a MCA parameter to control the number of ejected blocks from GPU memory (devicecudamaxnumberofejecteddata). Add a MCA parameter to control wether or not the GPU engine will take some time to sort the first N tasks of the pending queue (devicecudasortpendinglist).
Reshape the users vision of PaRSEC: they only have to include a single header (parsec.h) for most usages, and link with a single library (-lparsec).
Update the PaRSEC DSL handling of initial tasks. We now rely on 2 pieces of information: the number of DSL tasks, and the number of tasks imposed by the system (all types of data transfer).
Add a purely local scheduler (ll), that uses a single LIFO per thread. Each schedule operation does 1 atomic (push in local queue), each select operation does up to t atomics (pop in local queue, then try any other thread's queue until they are all tested empty).
Add a --ignore-properties=... option to parsec_ptgpp
Change API of hash tables: allow keys of arbitrary size. The API features how to build a key from a task; how to hash a key into 1 <= N <= 64 bits; and how to compare twy keys (plus a printing function to debug).
Change behavior of DEBUG_HISTORY: log all information inside a buffer of fixed size (MCA parameter) per thread, do not allocate memory during logging, and use timestamp to re-order output when the user calls dump()
DTD interface is updated (new flag to send pointer as parameter, unpacking of paramteres is simpler etc).
DTD provides mca param (dtddebugverbose) to print information about traversal of DAG in a separate output stream from the default.
- C
Published by abouteiller over 4 years ago