ark

A GPU-driven system framework for scalable AI applications

https://github.com/microsoft/ark

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

A GPU-driven system framework for scalable AI applications

Basic Info

Host: GitHub
Owner: microsoft
License: mit
Language: C++
Default Branch: main
Homepage:
Size: 2.16 MB

Statistics

Stars: 117
Watchers: 11
Forks: 17
Open Issues: 11
Releases: 7

Created about 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Code of conduct Citation Security Support

ARK

A GPU-driven system framework for scalable AI applications.

| Pipelines | Build Status | |-------------------|-------------------| | Unit Tests (CUDA) | | | Unit Tests (ROCm) | |

NOTE (Nov 2023): ROCm unit tests will be replaced into an Azure pipeline in the future.

See Quick Start to quickly get started.

Overview

ARK is a deep learning framework especially designed for highly optimized performance over distributed GPUs. Specifically, ARK adopts a GPU-driven execution model, where the GPU autonomously schedule and execute both computation and communication without any CPU intervention.

ARK provides a set of APIs for users to express their distributed deep learning applications. ARK then automatically schedules a GPU-driven execution plan for the application, which generates a GPU kernel code called loop kernel. The loop kernel is a GPU kernel that contains a loop that iteratively executes the entire application, including both computation and communication. ARK then executes the loop kernel on the distributed GPUs.

GPU-driven System Architecture

Status & Roadmap

ARK is under active development and a part of its features will be added in a future release. The following describes key features of each version.

New in ARK v0.5 (Latest Release)

Integrate with MSCCL++
Removed dependency on gpudma
Add AMD CDNA3 architecture support
Support communication for AMD GPUs
Optimize OpGraph scheduling
Add a multi-GPU Llama2 example

See details from https://github.com/microsoft/ark/issues/168.

ARK v0.6 (TBU, Jan. 2024)

Overall performance optimization
Improve Python unit tests & code coverage

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Citations

KAIST and Microsoft Logos

ARK is a collaborative research initiative between KAIST and Microsoft Research. If you use this project in your research, please cite our NSDI'23 paper:

bibtex @inproceedings{HwangPSQCX23, author = {Changho Hwang and KyoungSoo Park and Ran Shu and Xinyuan Qu and Peng Cheng and Yongqiang Xiong}, title = {ARK: GPU-driven Code Execution for Distributed Deep Learning}, booktitle = {20th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 23)}, year = {2023}, publisher = {{USENIX} Association}, }

Owner

Name: Microsoft
Login: microsoft
Kind: organization
Email: opensource@microsoft.com
Location: Redmond, WA

Website: https://opensource.microsoft.com
Twitter: OpenAtMicrosoft
Repositories: 7,257
Profile: https://github.com/microsoft

Open source projects and samples from Microsoft

Citation (CITATION.cff)

cff-version: 1.2.0
title: "ARK: A GPU-driven system framework for scalable AI applications"
version: 0.5.0
message: >-
  If you use this project in your research, please cite it as below.
authors:
  - given-names: Changho
    family-names: Hwang
    affiliation: Microsoft Research

repository-code: 'https://github.com/microsoft/ark'
abstract: >-
  ARK is a deep learning framework especially designed for highly optimized
  performance over distributed GPUs. Specifically, ARK adopts a GPU-driven
  execution model, where the GPU autonomously schedule and execute both
  computation and communication without any CPU intervention.
  ARK provides a set of APIs for users to express their distributed deep
  learning applications. ARK then automatically schedules a GPU-driven
  execution plan for the application, which generates a GPU kernel code
  called loop kernel. The loop kernel is a GPU kernel that contains a loop
  that iteratively executes the entire application, including both
  computation and communication. ARK then executes the loop kernel on the
  distributed GPUs.
license: MIT
license-url: https://github.com/microsoft/ark/blob/main/LICENSE

preferred-citation:
  type: conference-paper
  title: "ARK: GPU-driven Code Execution for Distributed Deep Learning"
  authors:
  - given-names: Changho
    family-names: Hwang
    affiliation: Microsoft Research, KAIST
  - given-names: KyoungSoo
    family-names: Park
    affiliation: KAIST
  - given-names: Ran
    family-names: Shu
    affiliation: Microsoft Research
  - given-names: Xinyuan
    family-names: Qu
    affiliation: Microsoft Research
  - given-names: Peng
    family-names: Cheng
    affiliation: Microsoft Research
  - given-names: Yongqiang
    family-names: Xiong
    affiliation: Microsoft Research
  conference:
    name: 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI '23)
    city: Boston
    region: MA
    country: US
  month: 4
  year: 2023
  url: https://www.usenix.org/conference/nsdi23/presentation/hwang

GitHub Events

Total

Watch event: 9
Push event: 7
Fork event: 3

Last Year

Watch event: 9
Push event: 7
Fork event: 3

Committers

Last synced: about 1 year ago

All Time

Total Commits: 194
Total Committers: 4
Avg Commits per committer: 48.5
Development Distribution Score (DDS): 0.175

Past Year

Commits: 18
Committers: 2
Avg Commits per committer: 9.0
Development Distribution Score (DDS): 0.111

Top Committers

Name	Email	Commits
Changho Hwang	c**g@m**m	160
Lifan Wu	v**u@m**m	22
Binyang Li	b**i@m**m	7
Microsoft Open Source	m****e	5

Committer Domains (Top 20 + Academic)

microsoft.com: 3

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 11
Total pull requests: 89
Average time to close issues: 24 days
Average time to close pull requests: 16 days
Total issue authors: 5
Total pull request authors: 4
Average comments per issue: 0.82
Average comments per pull request: 0.76
Merged pull requests: 67
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 34
Average time to close issues: N/A
Average time to close pull requests: 5 days
Issue authors: 0
Pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.53
Merged pull requests: 25
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

chhwang (5)
elliottall (2)
liuw666-bruce (1)
shenyanmei2020 (1)

Pull Request Authors

chhwang (91)
naturalcandy (19)
Binyang2014 (11)
wusar (2)
trapp3rhat (1)
testerofpen (1)
deas23 (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/codeql.yml actions

actions/checkout v3 composite
github/codeql-action/analyze v2 composite
github/codeql-action/autobuild v2 composite
github/codeql-action/init v2 composite
styfle/cancel-workflow-action 0.8.0 composite

.github/workflows/lint.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

docs/sphinx/requirements.txt pypi

myst-parser *
rinohtype *
sphinx *
sphinx-book-theme *
sphinx-prompt *

pyproject.toml pypi

requirements.txt pypi

numpy *
pickle *
pyproject_metadata *
scikit-build-core *

.github/workflows/ut-cuda.yml actions

actions/checkout v4 composite

.github/workflows/ut-rocm.yml actions

actions/checkout v4 composite

examples/llama/requirements.txt pypi

fairscale *
sentencepiece *
torch *

ark

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

ARK

Overview

Status & Roadmap

New in ARK v0.5 (Latest Release)

ARK v0.6 (TBU, Jan. 2024)

Contributing

Trademarks

Citations

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies