Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary
Repository
A GPU-driven system framework for scalable AI applications
Basic Info
Statistics
- Stars: 117
- Watchers: 11
- Forks: 17
- Open Issues: 11
- Releases: 7
Metadata Files
README.md
ARK
A GPU-driven system framework for scalable AI applications.
| Pipelines | Build Status |
|-------------------|-------------------|
| Unit Tests (CUDA) | |
| Unit Tests (ROCm) |
|
NOTE (Nov 2023): ROCm unit tests will be replaced into an Azure pipeline in the future.
See Quick Start to quickly get started.
Overview
ARK is a deep learning framework especially designed for highly optimized performance over distributed GPUs. Specifically, ARK adopts a GPU-driven execution model, where the GPU autonomously schedule and execute both computation and communication without any CPU intervention.
ARK provides a set of APIs for users to express their distributed deep learning applications. ARK then automatically schedules a GPU-driven execution plan for the application, which generates a GPU kernel code called loop kernel. The loop kernel is a GPU kernel that contains a loop that iteratively executes the entire application, including both computation and communication. ARK then executes the loop kernel on the distributed GPUs.
Status & Roadmap
ARK is under active development and a part of its features will be added in a future release. The following describes key features of each version.
New in ARK v0.5 (Latest Release)
- Integrate with MSCCL++
- Removed dependency on
gpudma - Add AMD CDNA3 architecture support
- Support communication for AMD GPUs
- Optimize OpGraph scheduling
- Add a multi-GPU Llama2 example
See details from https://github.com/microsoft/ark/issues/168.
ARK v0.6 (TBU, Jan. 2024)
- Overall performance optimization
- Improve Python unit tests & code coverage
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Citations
ARK is a collaborative research initiative between KAIST and Microsoft Research. If you use this project in your research, please cite our NSDI'23 paper:
bibtex
@inproceedings{HwangPSQCX23,
author = {Changho Hwang and
KyoungSoo Park and
Ran Shu and
Xinyuan Qu and
Peng Cheng and
Yongqiang Xiong},
title = {ARK: GPU-driven Code Execution for Distributed Deep Learning},
booktitle = {20th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 23)},
year = {2023},
publisher = {{USENIX} Association},
}
Owner
- Name: Microsoft
- Login: microsoft
- Kind: organization
- Email: opensource@microsoft.com
- Location: Redmond, WA
- Website: https://opensource.microsoft.com
- Twitter: OpenAtMicrosoft
- Repositories: 7,257
- Profile: https://github.com/microsoft
Open source projects and samples from Microsoft
Citation (CITATION.cff)
cff-version: 1.2.0
title: "ARK: A GPU-driven system framework for scalable AI applications"
version: 0.5.0
message: >-
If you use this project in your research, please cite it as below.
authors:
- given-names: Changho
family-names: Hwang
affiliation: Microsoft Research
repository-code: 'https://github.com/microsoft/ark'
abstract: >-
ARK is a deep learning framework especially designed for highly optimized
performance over distributed GPUs. Specifically, ARK adopts a GPU-driven
execution model, where the GPU autonomously schedule and execute both
computation and communication without any CPU intervention.
ARK provides a set of APIs for users to express their distributed deep
learning applications. ARK then automatically schedules a GPU-driven
execution plan for the application, which generates a GPU kernel code
called loop kernel. The loop kernel is a GPU kernel that contains a loop
that iteratively executes the entire application, including both
computation and communication. ARK then executes the loop kernel on the
distributed GPUs.
license: MIT
license-url: https://github.com/microsoft/ark/blob/main/LICENSE
preferred-citation:
type: conference-paper
title: "ARK: GPU-driven Code Execution for Distributed Deep Learning"
authors:
- given-names: Changho
family-names: Hwang
affiliation: Microsoft Research, KAIST
- given-names: KyoungSoo
family-names: Park
affiliation: KAIST
- given-names: Ran
family-names: Shu
affiliation: Microsoft Research
- given-names: Xinyuan
family-names: Qu
affiliation: Microsoft Research
- given-names: Peng
family-names: Cheng
affiliation: Microsoft Research
- given-names: Yongqiang
family-names: Xiong
affiliation: Microsoft Research
conference:
name: 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI '23)
city: Boston
region: MA
country: US
month: 4
year: 2023
url: https://www.usenix.org/conference/nsdi23/presentation/hwang
GitHub Events
Total
- Watch event: 9
- Push event: 7
- Fork event: 3
Last Year
- Watch event: 9
- Push event: 7
- Fork event: 3
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Changho Hwang | c****g@m****m | 160 |
| Lifan Wu | v****u@m****m | 22 |
| Binyang Li | b****i@m****m | 7 |
| Microsoft Open Source | m****e | 5 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 11
- Total pull requests: 89
- Average time to close issues: 24 days
- Average time to close pull requests: 16 days
- Total issue authors: 5
- Total pull request authors: 4
- Average comments per issue: 0.82
- Average comments per pull request: 0.76
- Merged pull requests: 67
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 34
- Average time to close issues: N/A
- Average time to close pull requests: 5 days
- Issue authors: 0
- Pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.53
- Merged pull requests: 25
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- chhwang (5)
- elliottall (2)
- liuw666-bruce (1)
- shenyanmei2020 (1)
Pull Request Authors
- chhwang (91)
- naturalcandy (19)
- Binyang2014 (11)
- wusar (2)
- trapp3rhat (1)
- testerofpen (1)
- deas23 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- github/codeql-action/analyze v2 composite
- github/codeql-action/autobuild v2 composite
- github/codeql-action/init v2 composite
- styfle/cancel-workflow-action 0.8.0 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- myst-parser *
- rinohtype *
- sphinx *
- sphinx-book-theme *
- sphinx-prompt *
- numpy *
- pickle *
- pyproject_metadata *
- scikit-build-core *
- actions/checkout v4 composite
- actions/checkout v4 composite
- fairscale *
- sentencepiece *
- torch *