Recent Releases of mscclpp

mscclpp - MSCCL++ v0.7.0

What's Changed

  • Move pipeline to official org by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/406
  • Disable CuMemMap check for ROCm by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/411
  • NVLS support for NCCL API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/410
  • Supporting multi-node executors in NCCL API by @caiomcbr in https://github.com/microsoft/mscclpp/pull/412
  • Fix synchronization in allreduce8 kernel by @dsidler in https://github.com/microsoft/mscclpp/pull/407
  • Add ncclBcast / ncclBroadcast support by @SreevatsaAnantharamu in https://github.com/microsoft/mscclpp/pull/419
  • Update README by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/414
  • Fix nccl-test failure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/421
  • Tackle build warnings by @chhwang in https://github.com/microsoft/mscclpp/pull/422
  • trigger ci for release branches by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/426
  • Fix CI trigger issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/428
  • Fix typos in the pipeline by @chhwang in https://github.com/microsoft/mscclpp/pull/420
  • Update version number by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/433
  • Enhance the nccl error message handling by @seagater in https://github.com/microsoft/mscclpp/pull/434
  • [NPKIT] Adding the NPKIT support for kernel allreduce7 in mscclpp-nccl by @PedramAlizadeh in https://github.com/microsoft/mscclpp/pull/399
  • Fix azure pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/437
  • Add GpuBuffer class by @chhwang in https://github.com/microsoft/mscclpp/pull/423
  • Fix CMake build messages by @chhwang in https://github.com/microsoft/mscclpp/pull/443
  • Flushing Proxy Channels at CPU side upon reaching the Inflight Request Limit by @caiomcbr in https://github.com/microsoft/mscclpp/pull/415
  • Fix Python binding of exceptions by @chhwang in https://github.com/microsoft/mscclpp/pull/444
  • Auto-update version numbers in CMakeLists.txt by @chhwang in https://github.com/microsoft/mscclpp/pull/450
  • Resolve cuMemMap error by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/451
  • Manage runtime environments by @chhwang in https://github.com/microsoft/mscclpp/pull/452
  • Lazily create streams for CudaIpcConnection by @chhwang in https://github.com/microsoft/mscclpp/pull/449
  • Fix PR #449 by @chhwang in https://github.com/microsoft/mscclpp/pull/453
  • Merge mscclpp-lang to mscclpp project by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/442
  • Renaming channels by @chhwang in https://github.com/microsoft/mscclpp/pull/436
  • Add multi-nodes example & update doc by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/455
  • Adjusting BFS to seek circular dependencies in the msccl-tools DAG by @caiomcbr in https://github.com/microsoft/mscclpp/pull/459
  • remove unnecessary sync by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/461
  • Support ReduceScatter in the NCCL interface by @caiomcbr in https://github.com/microsoft/mscclpp/pull/460
  • Updating MSCCLLang Examples by @caiomcbr in https://github.com/microsoft/mscclpp/pull/462
  • Disable channel cache by @seagater in https://github.com/microsoft/mscclpp/pull/463
  • Adjusting AllGather Collective in MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/466
  • Adding Read Put Packet operation at Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/441
  • NPKit Support to Read Put Packet Operation by @caiomcbr in https://github.com/microsoft/mscclpp/pull/471
  • Adjust NPKit IB Event by @caiomcbr in https://github.com/microsoft/mscclpp/pull/472
  • Fix minor typos and errors in documentation by @RyoYang in https://github.com/microsoft/mscclpp/pull/474
  • Improving Get Operation at MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/475
  • Fix memory OOM issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/479
  • Mark mscclpp-test as deprecated in the doc by @chhwang in https://github.com/microsoft/mscclpp/pull/478
  • Update allgather fallback algo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/476
  • Add min operation for allreduce by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/481
  • NCCL API CI Test for ReduceScatter by @caiomcbr in https://github.com/microsoft/mscclpp/pull/465
  • Fix correctness issue when mscclppDisableChannelCache set to true by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/483
  • nccl/rccl integration by @seagater in https://github.com/microsoft/mscclpp/pull/469
  • Fix reduceMin failaure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/486
  • Reduce Operation Support to the Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/484
  • Add CI test for fallback allgather, allreduce, broadcastand reducescatter to NCCL operations by @seagater in https://github.com/microsoft/mscclpp/pull/485
  • Remove the requirement for CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_FABRIC_SUPPORTED for NVLS support by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/489
  • Add CUDA 12.8 images by @chhwang in https://github.com/microsoft/mscclpp/pull/488
  • Add a devcontainer configuration by @chhwang in https://github.com/microsoft/mscclpp/pull/490
  • Fix CMake installation in Dockerfile for arm64 by @chhwang in https://github.com/microsoft/mscclpp/pull/491
  • Export mscclpp GpuBuffer to dlpack format by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/492
  • Fix the virtual address mapping issue of cuMemMap in fallback code by @seagater in https://github.com/microsoft/mscclpp/pull/501
  • Improve signal/wait performance and fix barrier issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/499
  • Fix performance issue introduced in PR: 499 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/505
  • Add flag to disable nvls by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/500
  • Optimized allreduce fallback for ~10KB sizes by @chhwang in https://github.com/microsoft/mscclpp/pull/506
  • Automatic creation of Scratch Buffer at MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/510
  • Use implicit ctors for default device ctors by @chhwang in https://github.com/microsoft/mscclpp/pull/512
  • apps/nccl: fix a bug in allreduce kernels for graph mode by @nusislam in https://github.com/microsoft/mscclpp/pull/502
  • Revised MemoryChannel interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/508
  • Fix #508 by @chhwang in https://github.com/microsoft/mscclpp/pull/515
  • Add NVLS based fallback algo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/507
  • Enhance Collective Check at MSCCLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/511
  • Support ibvregdmabuf_mr for buffer allocated by cuMemMalloc by @seagater in https://github.com/microsoft/mscclpp/pull/513
  • Fix the issue of echo message for nccl fallback in CI test by @seagater in https://github.com/microsoft/mscclpp/pull/520
  • Asynchronous setup by @chhwang in https://github.com/microsoft/mscclpp/pull/514
  • Adding maxSpinCount to port channel flush by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/518
  • Fix device assert by @chhwang in https://github.com/microsoft/mscclpp/pull/522
  • Fix #514 by @chhwang in https://github.com/microsoft/mscclpp/pull/521
  • Add a CMake option MSCCLPP_GPU_ARCHS by @chhwang in https://github.com/microsoft/mscclpp/pull/525
  • Update citations by @chhwang in https://github.com/microsoft/mscclpp/pull/524
  • Set Up a CI Pipeline for H100 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/526
  • Properly setting up the device in Ethernet Connection by @caiomcbr in https://github.com/microsoft/mscclpp/pull/527
  • Add device semaphore API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/523
  • Address NVCC warning #20012-D by @chhwang in https://github.com/microsoft/mscclpp/pull/528
  • Rename ChannelTrigger fields and check field values in debug builds by @chhwang in https://github.com/microsoft/mscclpp/pull/529
  • DLPack fixes by @chhwang in https://github.com/microsoft/mscclpp/pull/537
  • Improved documentation & minor interface revision by @chhwang in https://github.com/microsoft/mscclpp/pull/541
  • Use a stream pool for gpuCalloc*() by @chhwang in https://github.com/microsoft/mscclpp/pull/509
  • Multi-stream CUDA IPC by @chhwang in https://github.com/microsoft/mscclpp/pull/326
  • Fix #509 by @chhwang in https://github.com/microsoft/mscclpp/pull/546
  • Fix build processes by @chhwang in https://github.com/microsoft/mscclpp/pull/545
  • Do not use tail replica by default by @chhwang in https://github.com/microsoft/mscclpp/pull/544
  • DeviceSemaphore fix by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/553
  • Fix some typos in docs by @Edenzzzz in https://github.com/microsoft/mscclpp/pull/555
  • New FIFO test by @chhwang in https://github.com/microsoft/mscclpp/pull/558
  • FIFO improvements by @chhwang in https://github.com/microsoft/mscclpp/pull/557
  • Fix #557 by @chhwang in https://github.com/microsoft/mscclpp/pull/560
  • Support connection between local endpoints by @chhwang in https://github.com/microsoft/mscclpp/pull/561
  • Fix multi-nodes CI pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/564
  • Support any GPUs per node for NCCL_API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/566
  • Fix pytest failure by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/567
  • Fix a FIFO correctness bug by @chhwang in https://github.com/microsoft/mscclpp/pull/549
  • New semaphore constructors by @chhwang in https://github.com/microsoft/mscclpp/pull/559
  • Revise NVLS interface by @chhwang in https://github.com/microsoft/mscclpp/pull/458
  • update readme & bump version by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/550

New Contributors

  • @dsidler made their first contribution in https://github.com/microsoft/mscclpp/pull/407
  • @seagater made their first contribution in https://github.com/microsoft/mscclpp/pull/434
  • @PedramAlizadeh made their first contribution in https://github.com/microsoft/mscclpp/pull/399
  • @RyoYang made their first contribution in https://github.com/microsoft/mscclpp/pull/474
  • @nusislam made their first contribution in https://github.com/microsoft/mscclpp/pull/502
  • @Edenzzzz made their first contribution in https://github.com/microsoft/mscclpp/pull/555

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.6.0...v0.7.0

- C++
Published by chhwang 9 months ago

mscclpp - MSCCL++ v0.6.0

Highlight

  • Improved NCCL API integration in MSCCL++ for better performance and usability
  • Enhanced execution plan-based executor in MSCCL++
  • Fixed several bugs to improve stability and reliability

What's Changed

  • Add support for different vector sizes in multimem instructions by @roshandathathri in https://github.com/microsoft/mscclpp/pull/332
  • NCCL API Executor Integration by @caiomcbr in https://github.com/microsoft/mscclpp/pull/331
  • Fix missing import in executor test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/334
  • bfloat16 support by @chhwang in https://github.com/microsoft/mscclpp/pull/336
  • Dynamically load libibverbs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/337
  • Auto-tune vector sizes for NVLS allreduce6 by @roshandathathri in https://github.com/microsoft/mscclpp/pull/338
  • Make ibverbs optional at compile time by @chhwang in https://github.com/microsoft/mscclpp/pull/340
  • ProxyChannel Support in Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/342
  • Support executors to send packets over ProxyChannel by @caiomcbr in https://github.com/microsoft/mscclpp/pull/344
  • Fix for ROCm 6.0 by @chhwang in https://github.com/microsoft/mscclpp/pull/347
  • Fix bug for construct sempaphore by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/341
  • Add proxy channel related operations by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/351
  • Add CI for rocm by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/346
  • Tune threads per block for mscclpp executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/345
  • Fix NPKit exit event offset by @yzygitzh in https://github.com/microsoft/mscclpp/pull/356
  • Use IB transport flags only when an IB device exists by @chhwang in https://github.com/microsoft/mscclpp/pull/355
  • Update ROCm CI by @chhwang in https://github.com/microsoft/mscclpp/pull/357
  • Fixing RegisterMemory Allocation for ProxyChannels by @caiomcbr in https://github.com/microsoft/mscclpp/pull/353
  • Fix NCCL API bugs by @chhwang in https://github.com/microsoft/mscclpp/pull/363
  • Perf optimization & support clipping by @chhwang in https://github.com/microsoft/mscclpp/pull/364
  • Fix copyright messages by @chhwang in https://github.com/microsoft/mscclpp/pull/367
  • [Doc] mscclpp docs by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/348
  • Executor AllGather In-Place Support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/365
  • Fix algo repo name by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/369
  • Update docker image for cuda12.4 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/370
  • Fix in-place all-gather input buffer in executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/372
  • [docs] fix quickstart link by @jeffra in https://github.com/microsoft/mscclpp/pull/374
  • Add kernel-based verification for executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/378
  • Lazily create the context stream by @chhwang in https://github.com/microsoft/mscclpp/pull/381
  • Fixing Bug Const Offset in Execution Plan by @caiomcbr in https://github.com/microsoft/mscclpp/pull/380
  • Fix light load bug by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/379
  • Small Adjust in Test Data AllGather at Executor Test by @caiomcbr in https://github.com/microsoft/mscclpp/pull/384
  • Fix missing packet parameter for executor by @yzygitzh in https://github.com/microsoft/mscclpp/pull/385
  • NVLS support for msccl++ executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/375
  • Fix typo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/389
  • Improve CMake options by @chhwang in https://github.com/microsoft/mscclpp/pull/376
  • Fixing Message Boundary AllReduce Fallback Code by @caiomcbr in https://github.com/microsoft/mscclpp/pull/391
  • Fix mscclpp_benchmark by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/392
  • Add cross threadblock barrier by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/383
  • AllGather Executor Support in NCCL Interface by @caiomcbr in https://github.com/microsoft/mscclpp/pull/393
  • Providing reduce-scatter test support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/390
  • Select algo according to json config by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/396
  • Add connection events for NPKit by @yzygitzh in https://github.com/microsoft/mscclpp/pull/386
  • Revised ProxyChannel interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/400
  • Setup pipeline for mscclpp over nccl by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/401
  • Exception Max Number Operation per Tb by @caiomcbr in https://github.com/microsoft/mscclpp/pull/405
  • Reduce memory usage for scratch buffer by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/403
  • [Cherry-pick] Move pipeline to official org (#406) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/416
  • [Cherry-pick] trigger ci for release branches (#426) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/427
  • [Cherry-pick] Disable CuMemMap check for ROCm (#411) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/424
  • [Cherry-pick] NVLS support for NCCL API (#410) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/425
  • [Cherry-pick] Fix nccl-test failure issue (#421) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/429

New Contributors

  • @jeffra made their first contribution in https://github.com/microsoft/mscclpp/pull/374

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.2...v0.6.0

- C++
Published by Binyang2014 over 1 year ago

mscclpp - MSCCL++ v0.5.2

What's Changed

  • Add C++ executor test by @chhwang in https://github.com/microsoft/mscclpp/pull/304
  • Cumulative Updates by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/309
  • Add NPKit GPU event support by @yzygitzh in https://github.com/microsoft/mscclpp/pull/310
  • Fix NPKit support for AMD by @yzygitzh in https://github.com/microsoft/mscclpp/pull/312
  • Add "packet type" option for executor test by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/313
  • Add support for multicast reduce insruction by @roshandathathri in https://github.com/microsoft/mscclpp/pull/316
  • Update quickstart.md by @angelica-moreira in https://github.com/microsoft/mscclpp/pull/314
  • Simplify/improve barrier in AllReduce6 by @roshandathathri in https://github.com/microsoft/mscclpp/pull/317
  • Support NCCL APIs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/319
  • Update allreduce_bench.py by @angelica-moreira in https://github.com/microsoft/mscclpp/pull/318
  • Separate NPKit CPU timestamp access from different blocks for AMD platform by @yzygitzh in https://github.com/microsoft/mscclpp/pull/321
  • AllReduce Kernel for Small Messages by @caiomcbr in https://github.com/microsoft/mscclpp/pull/322
  • Resolve clang++ warnings by @chhwang in https://github.com/microsoft/mscclpp/pull/325
  • Support to write packets via uint2 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/327
  • Double buffering for NCCL APIs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/324
  • v0.5.2 by @chhwang in https://github.com/microsoft/mscclpp/pull/328

New Contributors

  • @angelica-moreira made their first contribution in https://github.com/microsoft/mscclpp/pull/314
  • @caiomcbr made their first contribution in https://github.com/microsoft/mscclpp/pull/319

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.1...v0.5.2

- C++
Published by chhwang over 1 year ago

mscclpp - MSCCL++ v0.5.1

What's Changed

  • Upgrade gtest by @chhwang in https://github.com/microsoft/mscclpp/pull/300
  • Rename executor.cpp to executor_py.cpp by @chhwang in https://github.com/microsoft/mscclpp/pull/301
  • Fix assert declaration & add a compile test by @chhwang in https://github.com/microsoft/mscclpp/pull/303
  • Fix security issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/305
  • v0.5.1 by @chhwang in https://github.com/microsoft/mscclpp/pull/308

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.0...v0.5.1

- C++
Published by chhwang almost 2 years ago

mscclpp - MSCCL++ v0.5.0

What's Changed

  • Fix a typo name by @chhwang in https://github.com/microsoft/mscclpp/pull/286
  • Add executor to execute schedule-plan file by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/283
  • Allow binding allocated memory to NVLS multicast pointer by @roshandathathri in https://github.com/microsoft/mscclpp/pull/290
  • Seperate headers for GPU data types by @chhwang in https://github.com/microsoft/mscclpp/pull/291
  • Refactoring NVLS interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/293
  • Include GPU data types only for kernel code by @chhwang in https://github.com/microsoft/mscclpp/pull/292
  • Ethernet support by @chhwang in https://github.com/microsoft/mscclpp/pull/284
  • Resolve multi-nodes test failure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/295
  • Move pipeline to Azure org by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/296
  • Optimized the execution kernel by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/294
  • Allow obtaining cuda stream handle from PyTorch stream when launching kernel by @aashaka in https://github.com/microsoft/mscclpp/pull/297
  • v0.5.0 by @chhwang in https://github.com/microsoft/mscclpp/pull/298

New Contributors

  • @roshandathathri made their first contribution in https://github.com/microsoft/mscclpp/pull/290

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.3...v0.5.0

- C++
Published by chhwang almost 2 years ago

mscclpp - MSCCL++ v0.4.3

What's Changed

  • Add optional prefix to installation paths by @chhwang in https://github.com/microsoft/mscclpp/pull/235
  • Fix #235 by @chhwang in https://github.com/microsoft/mscclpp/pull/239
  • Check nvidia_peermem during runtime by @chhwang in https://github.com/microsoft/mscclpp/pull/234
  • Do not check value of __HIP_PLATFORM_AMD__ by @chhwang in https://github.com/microsoft/mscclpp/pull/240
  • Fix crash in static variable deconstructor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/238
  • Update interface to let user change fifo size by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/243
  • Mask each fields of the trigger by @chhwang in https://github.com/microsoft/mscclpp/pull/244
  • Minor improvement on device syncer by @chhwang in https://github.com/microsoft/mscclpp/pull/231
  • remove make pylib-copy command by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/249
  • Increase MSCCLPPBITSREGMEM_HANDLE to 9 by @aashaka in https://github.com/microsoft/mscclpp/pull/251
  • Add putWithSignal() latency tests by @chhwang in https://github.com/microsoft/mscclpp/pull/246
  • NVLS support. by @saeedmaleki in https://github.com/microsoft/mscclpp/pull/250
  • Fix wrong offset calculation by @chhwang in https://github.com/microsoft/mscclpp/pull/257
  • Fix NVLS support by @chhwang in https://github.com/microsoft/mscclpp/pull/258
  • Allow MSCCL++ CommGroup to take PyTorch tensors in args by @aashaka in https://github.com/microsoft/mscclpp/pull/255
  • Fix multi-nodes test failure by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/262
  • Allow semaphores and memory to be registered separately in ProxyService by @aashaka in https://github.com/microsoft/mscclpp/pull/264
  • Remove cuda-python from project by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/245
  • Fix the comm.py for nvls by @saeedmaleki in https://github.com/microsoft/mscclpp/pull/267
  • New packet format & optimizations by @chhwang in https://github.com/microsoft/mscclpp/pull/256
  • Fix multi-node ci pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/272
  • add launch_bounds for mscclpp_test by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/273
  • Fix bootstrapping mechanism by @chhwang in https://github.com/microsoft/mscclpp/pull/278
  • v0.4.3 by @chhwang in https://github.com/microsoft/mscclpp/pull/279

New Contributors

  • @aashaka made their first contribution in https://github.com/microsoft/mscclpp/pull/251

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.2...v0.4.3

- C++
Published by chhwang about 2 years ago

mscclpp - MSCCL++ v0.4.2

What's Changed

  • Include cstdint in packet_device.hpp by @chhwang in https://github.com/microsoft/mscclpp/pull/233
  • Fix & improve perf for ROCm by @chhwang in https://github.com/microsoft/mscclpp/pull/232
  • v0.4.2 by @chhwang in https://github.com/microsoft/mscclpp/pull/236

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.1...v0.4.2

- C++
Published by chhwang over 2 years ago

mscclpp - MSCCL++ v0.4.1

What's Changed

  • Fix performance downgrade issue & update doc by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/229
  • Add a documentation issue template by @chhwang in https://github.com/microsoft/mscclpp/pull/230

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.4.0...v0.4.1

- C++
Published by chhwang over 2 years ago

mscclpp - MSCCL++ v0.4.0

  • Add Python benchmark
  • Update documentation
  • Add ROCm support
  • Bug fixes

See details from https://github.com/microsoft/mscclpp/issues/160.

- C++
Published by chhwang over 2 years ago

mscclpp - MSCCL++ v0.3.0

  • Updated interfaces
  • Add Python bindings and interfaces
  • Add Python unit tests
  • Add more configurable parameters
  • Add a new single-node AllReduce kernel
  • Fix bugs

See details from https://github.com/microsoft/mscclpp/issues/89.

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.2.0...v0.3.0

- C++
Published by chhwang over 2 years ago

mscclpp - MSCCL++ v0.2.0

Communication Features and Interfaces

GPU-side communication interfaces (DeviceChannel)

  1. * [x] Proxy-based Interfaces: ProxyChannel (#66)
  2. * [x] In-SM Copy Interfaces: SmChannel (#55)
  3. * [x] Packet Copy Interfaces: putPackets(), getPackets(), signalPacket() (#85, #90, #102)

Host-side interfaces

  1. * [x] Bootstrap: fix socket performance issue & bugs (#92, #100, #113)
  2. * [x] Communicator: implement (#66)

Transports support

  1. * [x] NVLink: implement (#66)
  2. * [x] InfiniBand: implement (#66)
  3. * [x] InfiniBand: tackle memory consistency issues (#96)

Performance Optimization

  1. * [x] Throughput: pass AllGather perf qualification (#77)
  2. * [x] Throughput: pass AllReduce perf qualification (#83, #90)
  3. * [x] Throughput: pass AllToAll perf qualification (#87)
  4. * [x] Latency: pass AllReduce perf qualification (#85, #90)
  5. * [x] Latency: pass 2-node AllReduce perf qualification (#109, #118)

Development Pipeline

  1. * [x] Unit Tests: cover all interfaces (#81, #91)
  2. * [x] mscclpp-test: add AllGather (#77)
  3. * [x] mscclpp-test: add AllReduce (#83)
  4. * [x] mscclpp-test: add AllToAll (#87)
  5. * [x] CI: lint, spelling, CodeQL (#79)
  6. * [x] CI: unit test (#81)
  7. * [x] CI: mscclpp-test (#93, #103)
  8. * [x] Package: publish Docker images (#104)

Documents

  1. * [x] Doxygen: add configuration (#72)
  2. * [x] README: enhance details (#88)
  3. * [x] License: add license comments on all files (#106)
  4. * [x] Code: cleanup & comments (#86, #119)

Full Changelog: https://github.com/microsoft/mscclpp/commits/v0.2.0

- C++
Published by chhwang over 2 years ago

mscclpp - MSCCL++ v0.1.0

Features

  • Transport setup
    • Bootstrap (initial meta-data exchange between ranks)
    • Connection setup for P2P NVLink and InfiniBand
    • CPU proxies for P2P NVLink and InfiniBand
  • Transport interface
    • Trigger FIFO
    • put-signal-wait interface
  • Tests
    • AllToAll
    • AllGather based on AllToAll

Full Changelog: https://github.com/microsoft/mscclpp/commits/v0.1.0

- C++
Published by chhwang about 3 years ago