fast-llm

Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research

https://github.com/servicenow/fast-llm

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research

Basic Info

Host: GitHub
Owner: ServiceNow
License: other
Language: Python
Default Branch: main
Homepage: https://servicenow.github.io/Fast-LLM/
Size: 12.4 MB

Statistics

Stars: 224
Watchers: 19
Forks: 35
Open Issues: 82
Releases: 0

Created over 1 year ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Citation Security

[![Docker][ci-badge]][ci-workflow] [![Documentation][docs-badge]][docs-workflow] [![License][license-badge]][license] *Accelerating your LLM training to full speed* Made with ❤️ by [ServiceNow Research][servicenow-research]

Overview

Fast-LLM is a cutting-edge open-source library for training large language models with exceptional speed, scalability, and flexibility. Built on PyTorch and Triton, Fast-LLM empowers AI teams to push the limits of generative AI, from research to production.

Optimized for training models of all sizes—from small 1B-parameter models to massive clusters with 70B+ parameters—Fast-LLM delivers faster training, lower costs, and seamless scalability. Its fine-tuned kernels, advanced parallelism techniques, and efficient memory management make it the go-to choice for diverse training needs.

As a truly open-source project, Fast-LLM allows full customization and extension without proprietary restrictions. Developed transparently by a community of professionals on GitHub, the library benefits from collaborative innovation, with every change discussed and reviewed in the open to ensure trust and quality. Fast-LLM combines professional-grade tools with unified support for GPT-like architectures, offering the cost efficiency and flexibility that serious AI practitioners demand.

[!NOTE] Fast-LLM is not affiliated with Fast.AI, FastHTML, FastAPI, FastText, or other similarly named projects. Our library's name refers to its speed and efficiency in language model training.

Why Fast-LLM?

🚀 Fast-LLM is Blazingly Fast:
- ⚡️ Optimized kernel efficiency and reduced overheads.
- 🔋 Optimized memory usage for best performance.
- ⏳ Minimizes training time and cost.
📈 Fast-LLM is Highly Scalable:
- 📡 Distributed training across multiple GPUs and nodes using 3D parallelism (Data, Tensor, and Pipeline).
- 🔗 Supports sequence length parallelism to handle longer sequences effectively.
- 🧠 ZeRO-1, ZeRO-2, and ZeRO-3 implementations for improved memory efficiency.
- 🎛️ Mixed precision training support for better performance.
- 🏋️‍♂️ Large batch training and gradient accumulation support.
- 🔄 Reproducible training with deterministic behavior.
🎨 Fast-LLM is Incredibly Flexible:
- 🤖 Compatible with all common language model architectures in a unified class.
- ⚡ Efficient dropless Mixture-of-Experts (MoE) implementation with SoTA performance.
- 🧩 Customizable language model architectures, data loaders, loss functions, and optimizers (in progress).
- 🤗 Seamless integration with Hugging Face Transformers.
🎯 Fast-LLM is Super Easy to Use:
- 📦 Pre-built Docker images for quick deployment.
- 📝 Simple YAML configuration for hassle-free setup.
- 💻 Command-line interface for easy launches.
- 📊 Detailed logging and real-time monitoring features.
- 📚 Extensive documentation and practical tutorials (in progress).
🌐 Fast-LLM is Truly Open Source:
- ⚖️ Licensed under Apache 2.0 for maximum freedom to use Fast-LLM at work, in your projects, or for research.
- 💻 Transparently developed on GitHub with public roadmap and issue tracking.
- 🤝 Contributions and collaboration are always welcome!

Usage

We'll walk you through how to use Fast-LLM to train a large language model on a cluster with multiple nodes and GPUs. We'll show an example setup using a Slurm cluster and a Kubernetes cluster.

For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file examples/mistral-4-node-benchmark.yaml is pre-configured for a multi-node setup with 4 DGX nodes, each with 8 A100-80GB or H100-80GB GPUs.

[!NOTE] Fast-LLM scales from a single GPU to large clusters. You can start small and expand based on your resources.

Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of 9,800 tokens/s/H100 (batch size 32, sequence length 8k) on a 4-node cluster with 32 H100s.

Running Fast-LLM on a Slurm Cluster

Prerequisites

A Slurm cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
CUDA 12.1 or higher.
Dependencies: PyTorch, Triton, and Apex installed on all nodes.

Steps

Deploy the nvcr.io/nvidia/pytorch:24.07-py3 Docker image to all nodes (recommended), because it contains all the necessary dependencies.
Install Fast-LLM on all nodes:

```bash sbatch <<EOF

!/bin/bash

SBATCH --nodes=$(scontrol show node | grep -c NodeName)

SBATCH --ntasks-per-node=1

SBATCH --ntasks=$(scontrol show node | grep -c NodeName)

SBATCH --exclusive

srun bash -c 'pip install --no-cache-dir -e "git+https://github.com/ServiceNow/Fast-LLM.git#egg=llm[CORE,OPTIONAL,DEV]"' EOF ```
Use the example Slurm job script examples/fast-llm.sbat to submit the job to the cluster:

bash sbatch examples/fast-llm.sbat
Monitor the job's progress:

- Logs: Follow `job_output.log` and `job_error.log` in your working directory for logs.
- Status: Use `squeue -u $USER` to see the job status.

Now, you can sit back and relax while Fast-LLM trains your model at full speed! ☕

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

A Kubernetes cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
KubeFlow installed.
Locked memory limit set to unlimited at the host level on all nodes. Ask your cluster admin to do this if needed.

Steps

Create a Kubernetes PersistentVolumeClaim (PVC) named fast-llm-home that will be mounted to /home/fast-llm in the container using examples/fast-llm-pvc.yaml:

bash kubectl apply -f examples/fast-llm-pvc.yaml
Create a PyTorchJob resource using the example configuration file examples/fast-llm.pytorchjob.yaml:

bash kubectl apply -f examples/fast-llm.pytorchjob.yaml
Monitor the job status:

- Use `kubectl get pytorchjobs` to see the job status.
- Use `kubectl logs -f fast-llm-master-0 -c pytorch` to follow the logs.

That's it! You're now up and running with Fast-LLM on Kubernetes. 🚀

Next Steps

📖 Want to learn more? Check out our documentation for more information on how to use Fast-LLM.

🔨 We welcome contributions to Fast-LLM! Have a look at our contribution guidelines.

🐞 Something doesn't work? Open an issue!

License

Fast-LLM is licensed by ServiceNow, Inc. under the Apache 2.0 License. See LICENSE for more information.

Vulnerability Reporting

For security issues, email disclosure@servicenow.com. See our security policy.

Owner

Name: ServiceNow
Login: ServiceNow
Kind: organization

Website: https://www.servicenow.com
Repositories: 147
Profile: https://github.com/ServiceNow

Works for you™

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use Fast-LLM in your research, please cite it as follows:"
title: "Fast-LLM"
repository-code: "https://github.com/ServiceNow/Fast-LLM"
url: "https://github.com/ServiceNow/Fast-LLM"
license: "Apache-2.0"
keywords:
  - large language models
  - machine learning
  - deep learning
  - distributed training
  - open source
authors:
  - family-names: "Lamy Poirier"
    given-names: "Joel"
  - family-names: "Tian"
    given-names: "Max"
  - family-names: "Li"
    given-names: "Raymond"
  - family-names: "Guille-Escuret"
    given-names: "Charles"
  - family-names: "Kumar"
    given-names: "Luke Nitish"
  - family-names: "Kocetkov"
    given-names: "Denis"
  - family-names: "Scholak"
    given-names: "Torsten"
date-released: "2024-10-19"

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 90
Total pull requests: 136
Average time to close issues: about 1 month
Average time to close pull requests: 12 days
Total issue authors: 7
Total pull request authors: 15
Average comments per issue: 0.77
Average comments per pull request: 0.99
Merged pull requests: 70
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 90
Pull requests: 136
Average time to close issues: about 1 month
Average time to close pull requests: 12 days
Issue authors: 7
Pull request authors: 15
Average comments per issue: 0.77
Average comments per pull request: 0.99
Merged pull requests: 70
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jlamypoirier (41)
tscholak (36)
bigximik (13)
sohamparikh (8)
RaymondLi0 (6)
oleksost (2)
chrish42 (1)
shruthan (1)

Pull Request Authors

jlamypoirier (87)
tscholak (27)
RaymondLi0 (16)
nitsanluke (14)
sohamparikh (14)
oleksost (10)
bigximik (10)
akshaykalkunte (2)
nandahkrishna (2)
gopeshh (2)
shruthan (1)
harshitpawar64 (1)
tobyzl2 (1)
chrish42 (1)
nimasheikholeslami (1)

Top Labels

Issue Labels

enhancement (73) bug (31) Priority (9) need update (6) Critical (5) documentation (2) help wanted (1) good first issue (1)

Pull Request Labels

enhancement (3) documentation (2)

Dependencies

Dockerfile docker

nvcr.io/nvidia/pytorch 24.07-py3 build

pyproject.toml pypi

setup.py pypi

.github/workflows/ci.yaml actions

actions/checkout v4 composite
actions/setup-python v5 composite
docker/build-push-action v6 composite
docker/login-action v3 composite
docker/metadata-action v5 composite
docker/setup-buildx-action v3 composite

.github/workflows/docs.yaml actions

actions/cache v4 composite
actions/checkout v4 composite
actions/setup-python v5 composite

fast-llm

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Overview

Why Fast-LLM?

Usage

Running Fast-LLM on a Slurm Cluster

Prerequisites

Steps

!/bin/bash

SBATCH --nodes=$(scontrol show node | grep -c NodeName)

SBATCH --ntasks-per-node=1

SBATCH --ntasks=$(scontrol show node | grep -c NodeName)

SBATCH --exclusive

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

Steps

Next Steps

License

Vulnerability Reporting

Owner

Citation (CITATION.cff)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies