Recent Releases of mscclpp
mscclpp - MSCCL++ v0.7.0
What's Changed
- Move pipeline to official org by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/406
- Disable CuMemMap check for ROCm by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/411
- NVLS support for NCCL API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/410
- Supporting multi-node executors in NCCL API by @caiomcbr in https://github.com/microsoft/mscclpp/pull/412
- Fix synchronization in allreduce8 kernel by @dsidler in https://github.com/microsoft/mscclpp/pull/407
- Add ncclBcast / ncclBroadcast support by @SreevatsaAnantharamu in https://github.com/microsoft/mscclpp/pull/419
- Update README by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/414
- Fix nccl-test failure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/421
- Tackle build warnings by @chhwang in https://github.com/microsoft/mscclpp/pull/422
- trigger ci for release branches by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/426
- Fix CI trigger issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/428
- Fix typos in the pipeline by @chhwang in https://github.com/microsoft/mscclpp/pull/420
- Update version number by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/433
- Enhance the nccl error message handling by @seagater in https://github.com/microsoft/mscclpp/pull/434
- [NPKIT] Adding the NPKIT support for kernel allreduce7 in mscclpp-nccl by @PedramAlizadeh in https://github.com/microsoft/mscclpp/pull/399
- Fix azure pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/437
- Add
GpuBufferclass by @chhwang in https://github.com/microsoft/mscclpp/pull/423 - Fix CMake build messages by @chhwang in https://github.com/microsoft/mscclpp/pull/443
- Flushing Proxy Channels at CPU side upon reaching the Inflight Request Limit by @caiomcbr in https://github.com/microsoft/mscclpp/pull/415
- Fix Python binding of exceptions by @chhwang in https://github.com/microsoft/mscclpp/pull/444
- Auto-update version numbers in CMakeLists.txt by @chhwang in https://github.com/microsoft/mscclpp/pull/450
- Resolve cuMemMap error by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/451
- Manage runtime environments by @chhwang in https://github.com/microsoft/mscclpp/pull/452
- Lazily create streams for CudaIpcConnection by @chhwang in https://github.com/microsoft/mscclpp/pull/449
- Fix PR #449 by @chhwang in https://github.com/microsoft/mscclpp/pull/453
- Merge mscclpp-lang to mscclpp project by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/442
- Renaming channels by @chhwang in https://github.com/microsoft/mscclpp/pull/436
- Add multi-nodes example & update doc by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/455
- Adjusting BFS to seek circular dependencies in the msccl-tools DAG by @caiomcbr in https://github.com/microsoft/mscclpp/pull/459
- remove unnecessary sync by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/461
- Support ReduceScatter in the NCCL interface by @caiomcbr in https://github.com/microsoft/mscclpp/pull/460
- Updating MSCCLLang Examples by @caiomcbr in https://github.com/microsoft/mscclpp/pull/462
- Disable channel cache by @seagater in https://github.com/microsoft/mscclpp/pull/463
- Adjusting AllGather Collective in MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/466
- Adding Read Put Packet operation at Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/441
- NPKit Support to Read Put Packet Operation by @caiomcbr in https://github.com/microsoft/mscclpp/pull/471
- Adjust NPKit IB Event by @caiomcbr in https://github.com/microsoft/mscclpp/pull/472
- Fix minor typos and errors in documentation by @RyoYang in https://github.com/microsoft/mscclpp/pull/474
- Improving Get Operation at MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/475
- Fix memory OOM issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/479
- Mark mscclpp-test as deprecated in the doc by @chhwang in https://github.com/microsoft/mscclpp/pull/478
- Update allgather fallback algo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/476
- Add min operation for allreduce by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/481
- NCCL API CI Test for ReduceScatter by @caiomcbr in https://github.com/microsoft/mscclpp/pull/465
- Fix correctness issue when mscclppDisableChannelCache set to true by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/483
- nccl/rccl integration by @seagater in https://github.com/microsoft/mscclpp/pull/469
- Fix reduceMin failaure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/486
- Reduce Operation Support to the Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/484
- Add CI test for fallback allgather, allreduce, broadcastand reducescatter to NCCL operations by @seagater in https://github.com/microsoft/mscclpp/pull/485
- Remove the requirement for
CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_FABRIC_SUPPORTEDfor NVLS support by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/489 - Add CUDA 12.8 images by @chhwang in https://github.com/microsoft/mscclpp/pull/488
- Add a devcontainer configuration by @chhwang in https://github.com/microsoft/mscclpp/pull/490
- Fix CMake installation in Dockerfile for arm64 by @chhwang in https://github.com/microsoft/mscclpp/pull/491
- Export mscclpp GpuBuffer to dlpack format by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/492
- Fix the virtual address mapping issue of cuMemMap in fallback code by @seagater in https://github.com/microsoft/mscclpp/pull/501
- Improve signal/wait performance and fix barrier issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/499
- Fix performance issue introduced in PR: 499 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/505
- Add flag to disable nvls by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/500
- Optimized allreduce fallback for ~10KB sizes by @chhwang in https://github.com/microsoft/mscclpp/pull/506
- Automatic creation of Scratch Buffer at MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/510
- Use implicit ctors for default device ctors by @chhwang in https://github.com/microsoft/mscclpp/pull/512
- apps/nccl: fix a bug in allreduce kernels for graph mode by @nusislam in https://github.com/microsoft/mscclpp/pull/502
- Revised MemoryChannel interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/508
- Fix #508 by @chhwang in https://github.com/microsoft/mscclpp/pull/515
- Add NVLS based fallback algo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/507
- Enhance Collective Check at MSCCLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/511
- Support ibvregdmabuf_mr for buffer allocated by cuMemMalloc by @seagater in https://github.com/microsoft/mscclpp/pull/513
- Fix the issue of echo message for nccl fallback in CI test by @seagater in https://github.com/microsoft/mscclpp/pull/520
- Asynchronous setup by @chhwang in https://github.com/microsoft/mscclpp/pull/514
- Adding maxSpinCount to port channel flush by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/518
- Fix device assert by @chhwang in https://github.com/microsoft/mscclpp/pull/522
- Fix #514 by @chhwang in https://github.com/microsoft/mscclpp/pull/521
- Add a CMake option
MSCCLPP_GPU_ARCHSby @chhwang in https://github.com/microsoft/mscclpp/pull/525 - Update citations by @chhwang in https://github.com/microsoft/mscclpp/pull/524
- Set Up a CI Pipeline for H100 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/526
- Properly setting up the device in Ethernet Connection by @caiomcbr in https://github.com/microsoft/mscclpp/pull/527
- Add device semaphore API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/523
- Address NVCC warning #20012-D by @chhwang in https://github.com/microsoft/mscclpp/pull/528
- Rename
ChannelTriggerfields and check field values in debug builds by @chhwang in https://github.com/microsoft/mscclpp/pull/529 - DLPack fixes by @chhwang in https://github.com/microsoft/mscclpp/pull/537
- Improved documentation & minor interface revision by @chhwang in https://github.com/microsoft/mscclpp/pull/541
- Use a stream pool for
gpuCalloc*()by @chhwang in https://github.com/microsoft/mscclpp/pull/509 - Multi-stream CUDA IPC by @chhwang in https://github.com/microsoft/mscclpp/pull/326
- Fix #509 by @chhwang in https://github.com/microsoft/mscclpp/pull/546
- Fix build processes by @chhwang in https://github.com/microsoft/mscclpp/pull/545
- Do not use tail replica by default by @chhwang in https://github.com/microsoft/mscclpp/pull/544
- DeviceSemaphore fix by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/553
- Fix some typos in docs by @Edenzzzz in https://github.com/microsoft/mscclpp/pull/555
- New FIFO test by @chhwang in https://github.com/microsoft/mscclpp/pull/558
- FIFO improvements by @chhwang in https://github.com/microsoft/mscclpp/pull/557
- Fix #557 by @chhwang in https://github.com/microsoft/mscclpp/pull/560
- Support connection between local endpoints by @chhwang in https://github.com/microsoft/mscclpp/pull/561
- Fix multi-nodes CI pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/564
- Support any GPUs per node for NCCL_API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/566
- Fix pytest failure by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/567
- Fix a FIFO correctness bug by @chhwang in https://github.com/microsoft/mscclpp/pull/549
- New semaphore constructors by @chhwang in https://github.com/microsoft/mscclpp/pull/559
- Revise NVLS interface by @chhwang in https://github.com/microsoft/mscclpp/pull/458
- update readme & bump version by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/550
New Contributors
- @dsidler made their first contribution in https://github.com/microsoft/mscclpp/pull/407
- @seagater made their first contribution in https://github.com/microsoft/mscclpp/pull/434
- @PedramAlizadeh made their first contribution in https://github.com/microsoft/mscclpp/pull/399
- @RyoYang made their first contribution in https://github.com/microsoft/mscclpp/pull/474
- @nusislam made their first contribution in https://github.com/microsoft/mscclpp/pull/502
- @Edenzzzz made their first contribution in https://github.com/microsoft/mscclpp/pull/555
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.6.0...v0.7.0
- C++
Published by chhwang 9 months ago
mscclpp - MSCCL++ v0.6.0
Highlight
- Improved NCCL API integration in MSCCL++ for better performance and usability
- Enhanced execution plan-based executor in MSCCL++
- Fixed several bugs to improve stability and reliability
What's Changed
- Add support for different vector sizes in multimem instructions by @roshandathathri in https://github.com/microsoft/mscclpp/pull/332
- NCCL API Executor Integration by @caiomcbr in https://github.com/microsoft/mscclpp/pull/331
- Fix missing import in executor test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/334
- bfloat16 support by @chhwang in https://github.com/microsoft/mscclpp/pull/336
- Dynamically load libibverbs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/337
- Auto-tune vector sizes for NVLS allreduce6 by @roshandathathri in https://github.com/microsoft/mscclpp/pull/338
- Make ibverbs optional at compile time by @chhwang in https://github.com/microsoft/mscclpp/pull/340
- ProxyChannel Support in Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/342
- Support executors to send packets over ProxyChannel by @caiomcbr in https://github.com/microsoft/mscclpp/pull/344
- Fix for ROCm 6.0 by @chhwang in https://github.com/microsoft/mscclpp/pull/347
- Fix bug for construct sempaphore by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/341
- Add proxy channel related operations by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/351
- Add CI for rocm by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/346
- Tune threads per block for mscclpp executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/345
- Fix NPKit exit event offset by @yzygitzh in https://github.com/microsoft/mscclpp/pull/356
- Use IB transport flags only when an IB device exists by @chhwang in https://github.com/microsoft/mscclpp/pull/355
- Update ROCm CI by @chhwang in https://github.com/microsoft/mscclpp/pull/357
- Fixing RegisterMemory Allocation for ProxyChannels by @caiomcbr in https://github.com/microsoft/mscclpp/pull/353
- Fix NCCL API bugs by @chhwang in https://github.com/microsoft/mscclpp/pull/363
- Perf optimization & support clipping by @chhwang in https://github.com/microsoft/mscclpp/pull/364
- Fix copyright messages by @chhwang in https://github.com/microsoft/mscclpp/pull/367
- [Doc] mscclpp docs by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/348
- Executor AllGather In-Place Support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/365
- Fix algo repo name by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/369
- Update docker image for cuda12.4 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/370
- Fix in-place all-gather input buffer in executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/372
- [docs] fix quickstart link by @jeffra in https://github.com/microsoft/mscclpp/pull/374
- Add kernel-based verification for executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/378
- Lazily create the context stream by @chhwang in https://github.com/microsoft/mscclpp/pull/381
- Fixing Bug Const Offset in Execution Plan by @caiomcbr in https://github.com/microsoft/mscclpp/pull/380
- Fix light load bug by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/379
- Small Adjust in Test Data AllGather at Executor Test by @caiomcbr in https://github.com/microsoft/mscclpp/pull/384
- Fix missing packet parameter for executor by @yzygitzh in https://github.com/microsoft/mscclpp/pull/385
- NVLS support for msccl++ executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/375
- Fix typo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/389
- Improve CMake options by @chhwang in https://github.com/microsoft/mscclpp/pull/376
- Fixing Message Boundary AllReduce Fallback Code by @caiomcbr in https://github.com/microsoft/mscclpp/pull/391
- Fix mscclpp_benchmark by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/392
- Add cross threadblock barrier by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/383
- AllGather Executor Support in NCCL Interface by @caiomcbr in https://github.com/microsoft/mscclpp/pull/393
- Providing reduce-scatter test support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/390
- Select algo according to json config by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/396
- Add connection events for NPKit by @yzygitzh in https://github.com/microsoft/mscclpp/pull/386
- Revised ProxyChannel interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/400
- Setup pipeline for mscclpp over nccl by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/401
- Exception Max Number Operation per Tb by @caiomcbr in https://github.com/microsoft/mscclpp/pull/405
- Reduce memory usage for scratch buffer by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/403
- [Cherry-pick] Move pipeline to official org (#406) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/416
- [Cherry-pick] trigger ci for release branches (#426) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/427
- [Cherry-pick] Disable CuMemMap check for ROCm (#411) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/424
- [Cherry-pick] NVLS support for NCCL API (#410) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/425
- [Cherry-pick] Fix nccl-test failure issue (#421) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/429
New Contributors
- @jeffra made their first contribution in https://github.com/microsoft/mscclpp/pull/374
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.2...v0.6.0
- C++
Published by Binyang2014 over 1 year ago
mscclpp - MSCCL++ v0.5.2
What's Changed
- Add C++ executor test by @chhwang in https://github.com/microsoft/mscclpp/pull/304
- Cumulative Updates by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/309
- Add NPKit GPU event support by @yzygitzh in https://github.com/microsoft/mscclpp/pull/310
- Fix NPKit support for AMD by @yzygitzh in https://github.com/microsoft/mscclpp/pull/312
- Add "packet type" option for executor test by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/313
- Add support for multicast reduce insruction by @roshandathathri in https://github.com/microsoft/mscclpp/pull/316
- Update quickstart.md by @angelica-moreira in https://github.com/microsoft/mscclpp/pull/314
- Simplify/improve barrier in AllReduce6 by @roshandathathri in https://github.com/microsoft/mscclpp/pull/317
- Support NCCL APIs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/319
- Update allreduce_bench.py by @angelica-moreira in https://github.com/microsoft/mscclpp/pull/318
- Separate NPKit CPU timestamp access from different blocks for AMD platform by @yzygitzh in https://github.com/microsoft/mscclpp/pull/321
- AllReduce Kernel for Small Messages by @caiomcbr in https://github.com/microsoft/mscclpp/pull/322
- Resolve clang++ warnings by @chhwang in https://github.com/microsoft/mscclpp/pull/325
- Support to write packets via uint2 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/327
- Double buffering for NCCL APIs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/324
- v0.5.2 by @chhwang in https://github.com/microsoft/mscclpp/pull/328
New Contributors
- @angelica-moreira made their first contribution in https://github.com/microsoft/mscclpp/pull/314
- @caiomcbr made their first contribution in https://github.com/microsoft/mscclpp/pull/319
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.1...v0.5.2
- C++
Published by chhwang over 1 year ago
mscclpp - MSCCL++ v0.5.1
What's Changed
- Upgrade gtest by @chhwang in https://github.com/microsoft/mscclpp/pull/300
- Rename executor.cpp to executor_py.cpp by @chhwang in https://github.com/microsoft/mscclpp/pull/301
- Fix assert declaration & add a compile test by @chhwang in https://github.com/microsoft/mscclpp/pull/303
- Fix security issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/305
- v0.5.1 by @chhwang in https://github.com/microsoft/mscclpp/pull/308
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.0...v0.5.1
- C++
Published by chhwang almost 2 years ago
mscclpp - MSCCL++ v0.5.0
What's Changed
- Fix a typo name by @chhwang in https://github.com/microsoft/mscclpp/pull/286
- Add executor to execute schedule-plan file by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/283
- Allow binding allocated memory to NVLS multicast pointer by @roshandathathri in https://github.com/microsoft/mscclpp/pull/290
- Seperate headers for GPU data types by @chhwang in https://github.com/microsoft/mscclpp/pull/291
- Refactoring NVLS interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/293
- Include GPU data types only for kernel code by @chhwang in https://github.com/microsoft/mscclpp/pull/292
- Ethernet support by @chhwang in https://github.com/microsoft/mscclpp/pull/284
- Resolve multi-nodes test failure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/295
- Move pipeline to Azure org by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/296
- Optimized the execution kernel by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/294
- Allow obtaining cuda stream handle from PyTorch stream when launching kernel by @aashaka in https://github.com/microsoft/mscclpp/pull/297
- v0.5.0 by @chhwang in https://github.com/microsoft/mscclpp/pull/298
New Contributors
- @roshandathathri made their first contribution in https://github.com/microsoft/mscclpp/pull/290
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.3...v0.5.0
- C++
Published by chhwang almost 2 years ago
mscclpp - MSCCL++ v0.4.3
What's Changed
- Add optional prefix to installation paths by @chhwang in https://github.com/microsoft/mscclpp/pull/235
- Fix #235 by @chhwang in https://github.com/microsoft/mscclpp/pull/239
- Check
nvidia_peermemduring runtime by @chhwang in https://github.com/microsoft/mscclpp/pull/234 - Do not check value of
__HIP_PLATFORM_AMD__by @chhwang in https://github.com/microsoft/mscclpp/pull/240 - Fix crash in static variable deconstructor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/238
- Update interface to let user change fifo size by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/243
- Mask each fields of the trigger by @chhwang in https://github.com/microsoft/mscclpp/pull/244
- Minor improvement on device syncer by @chhwang in https://github.com/microsoft/mscclpp/pull/231
- remove make pylib-copy command by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/249
- Increase MSCCLPPBITSREGMEM_HANDLE to 9 by @aashaka in https://github.com/microsoft/mscclpp/pull/251
- Add
putWithSignal()latency tests by @chhwang in https://github.com/microsoft/mscclpp/pull/246 - NVLS support. by @saeedmaleki in https://github.com/microsoft/mscclpp/pull/250
- Fix wrong offset calculation by @chhwang in https://github.com/microsoft/mscclpp/pull/257
- Fix NVLS support by @chhwang in https://github.com/microsoft/mscclpp/pull/258
- Allow MSCCL++ CommGroup to take PyTorch tensors in args by @aashaka in https://github.com/microsoft/mscclpp/pull/255
- Fix multi-nodes test failure by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/262
- Allow semaphores and memory to be registered separately in ProxyService by @aashaka in https://github.com/microsoft/mscclpp/pull/264
- Remove cuda-python from project by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/245
- Fix the comm.py for nvls by @saeedmaleki in https://github.com/microsoft/mscclpp/pull/267
- New packet format & optimizations by @chhwang in https://github.com/microsoft/mscclpp/pull/256
- Fix multi-node ci pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/272
- add launch_bounds for mscclpp_test by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/273
- Fix bootstrapping mechanism by @chhwang in https://github.com/microsoft/mscclpp/pull/278
- v0.4.3 by @chhwang in https://github.com/microsoft/mscclpp/pull/279
New Contributors
- @aashaka made their first contribution in https://github.com/microsoft/mscclpp/pull/251
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.2...v0.4.3
- C++
Published by chhwang about 2 years ago
mscclpp - MSCCL++ v0.4.2
What's Changed
- Include
cstdintin packet_device.hpp by @chhwang in https://github.com/microsoft/mscclpp/pull/233 - Fix & improve perf for ROCm by @chhwang in https://github.com/microsoft/mscclpp/pull/232
- v0.4.2 by @chhwang in https://github.com/microsoft/mscclpp/pull/236
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.1...v0.4.2
- C++
Published by chhwang over 2 years ago
mscclpp - MSCCL++ v0.4.1
What's Changed
- Fix performance downgrade issue & update doc by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/229
- Add a documentation issue template by @chhwang in https://github.com/microsoft/mscclpp/pull/230
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.0...v0.4.1
- C++
Published by chhwang over 2 years ago
mscclpp - MSCCL++ v0.4.0
- Add Python benchmark
- Update documentation
- Add ROCm support
- Bug fixes
See details from https://github.com/microsoft/mscclpp/issues/160.
- C++
Published by chhwang over 2 years ago
mscclpp - MSCCL++ v0.3.0
- Updated interfaces
- Add Python bindings and interfaces
- Add Python unit tests
- Add more configurable parameters
- Add a new single-node AllReduce kernel
- Fix bugs
See details from https://github.com/microsoft/mscclpp/issues/89.
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.2.0...v0.3.0
- C++
Published by chhwang over 2 years ago
mscclpp - MSCCL++ v0.2.0
Communication Features and Interfaces
GPU-side communication interfaces (DeviceChannel)
- * [x] Proxy-based Interfaces:
ProxyChannel(#66) - * [x] In-SM Copy Interfaces:
SmChannel(#55) - * [x] Packet Copy Interfaces:
putPackets(),getPackets(),signalPacket()(#85, #90, #102)
Host-side interfaces
- * [x] Bootstrap: fix socket performance issue & bugs (#92, #100, #113)
- * [x] Communicator: implement (#66)
Transports support
- * [x] NVLink: implement (#66)
- * [x] InfiniBand: implement (#66)
- * [x] InfiniBand: tackle memory consistency issues (#96)
Performance Optimization
- * [x] Throughput: pass AllGather perf qualification (#77)
- * [x] Throughput: pass AllReduce perf qualification (#83, #90)
- * [x] Throughput: pass AllToAll perf qualification (#87)
- * [x] Latency: pass AllReduce perf qualification (#85, #90)
- * [x] Latency: pass 2-node AllReduce perf qualification (#109, #118)
Development Pipeline
- * [x] Unit Tests: cover all interfaces (#81, #91)
- * [x] mscclpp-test: add AllGather (#77)
- * [x] mscclpp-test: add AllReduce (#83)
- * [x] mscclpp-test: add AllToAll (#87)
- * [x] CI: lint, spelling, CodeQL (#79)
- * [x] CI: unit test (#81)
- * [x] CI: mscclpp-test (#93, #103)
- * [x] Package: publish Docker images (#104)
Documents
- * [x] Doxygen: add configuration (#72)
- * [x] README: enhance details (#88)
- * [x] License: add license comments on all files (#106)
- * [x] Code: cleanup & comments (#86, #119)
Full Changelog: https://github.com/microsoft/mscclpp/commits/v0.2.0
- C++
Published by chhwang over 2 years ago
mscclpp - MSCCL++ v0.1.0
Features
- Transport setup
- Bootstrap (initial meta-data exchange between ranks)
- Connection setup for P2P NVLink and InfiniBand
- CPU proxies for P2P NVLink and InfiniBand
- Transport interface
- Trigger FIFO
- put-signal-wait interface
- Tests
- AllToAll
- AllGather based on AllToAll
Full Changelog: https://github.com/microsoft/mscclpp/commits/v0.1.0
- C++
Published by chhwang about 3 years ago