fast-llm
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
Basic Info
- Host: GitHub
- Owner: ServiceNow
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://servicenow.github.io/Fast-LLM/
- Size: 12.4 MB
Statistics
- Stars: 224
- Watchers: 19
- Forks: 35
- Open Issues: 82
- Releases: 0
Metadata Files
README.md
Overview
Fast-LLM is a cutting-edge open-source library for training large language models with exceptional speed, scalability, and flexibility. Built on PyTorch and Triton, Fast-LLM empowers AI teams to push the limits of generative AI, from research to production.
Optimized for training models of all sizes—from small 1B-parameter models to massive clusters with 70B+ parameters—Fast-LLM delivers faster training, lower costs, and seamless scalability. Its fine-tuned kernels, advanced parallelism techniques, and efficient memory management make it the go-to choice for diverse training needs.
As a truly open-source project, Fast-LLM allows full customization and extension without proprietary restrictions. Developed transparently by a community of professionals on GitHub, the library benefits from collaborative innovation, with every change discussed and reviewed in the open to ensure trust and quality. Fast-LLM combines professional-grade tools with unified support for GPT-like architectures, offering the cost efficiency and flexibility that serious AI practitioners demand.
[!NOTE] Fast-LLM is not affiliated with Fast.AI, FastHTML, FastAPI, FastText, or other similarly named projects. Our library's name refers to its speed and efficiency in language model training.
Why Fast-LLM?
🚀 Fast-LLM is Blazingly Fast:
- ⚡️ Optimized kernel efficiency and reduced overheads.
- 🔋 Optimized memory usage for best performance.
- ⏳ Minimizes training time and cost.
📈 Fast-LLM is Highly Scalable:
- 📡 Distributed training across multiple GPUs and nodes using 3D parallelism (Data, Tensor, and Pipeline).
- 🔗 Supports sequence length parallelism to handle longer sequences effectively.
- 🧠 ZeRO-1, ZeRO-2, and ZeRO-3 implementations for improved memory efficiency.
- 🎛️ Mixed precision training support for better performance.
- 🏋️♂️ Large batch training and gradient accumulation support.
- 🔄 Reproducible training with deterministic behavior.
🎨 Fast-LLM is Incredibly Flexible:
- 🤖 Compatible with all common language model architectures in a unified class.
- ⚡ Efficient dropless Mixture-of-Experts (MoE) implementation with SoTA performance.
- 🧩 Customizable language model architectures, data loaders, loss functions, and optimizers (in progress).
- 🤗 Seamless integration with Hugging Face Transformers.
🎯 Fast-LLM is Super Easy to Use:
- 📦 Pre-built Docker images for quick deployment.
- 📝 Simple YAML configuration for hassle-free setup.
- 💻 Command-line interface for easy launches.
- 📊 Detailed logging and real-time monitoring features.
- 📚 Extensive documentation and practical tutorials (in progress).
🌐 Fast-LLM is Truly Open Source:
- ⚖️ Licensed under Apache 2.0 for maximum freedom to use Fast-LLM at work, in your projects, or for research.
- 💻 Transparently developed on GitHub with public roadmap and issue tracking.
- 🤝 Contributions and collaboration are always welcome!
Usage
We'll walk you through how to use Fast-LLM to train a large language model on a cluster with multiple nodes and GPUs. We'll show an example setup using a Slurm cluster and a Kubernetes cluster.
For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file examples/mistral-4-node-benchmark.yaml is pre-configured for a multi-node setup with 4 DGX nodes, each with 8 A100-80GB or H100-80GB GPUs.
[!NOTE] Fast-LLM scales from a single GPU to large clusters. You can start small and expand based on your resources.
Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of 9,800 tokens/s/H100 (batch size 32, sequence length 8k) on a 4-node cluster with 32 H100s.
Running Fast-LLM on a Slurm Cluster
Prerequisites
- A Slurm cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
- CUDA 12.1 or higher.
- Dependencies: PyTorch, Triton, and Apex installed on all nodes.
Steps
- Deploy the nvcr.io/nvidia/pytorch:24.07-py3 Docker image to all nodes (recommended), because it contains all the necessary dependencies.
Install Fast-LLM on all nodes:
```bash sbatch <<EOF
!/bin/bash
SBATCH --nodes=$(scontrol show node | grep -c NodeName)
SBATCH --ntasks-per-node=1
SBATCH --ntasks=$(scontrol show node | grep -c NodeName)
SBATCH --exclusive
srun bash -c 'pip install --no-cache-dir -e "git+https://github.com/ServiceNow/Fast-LLM.git#egg=llm[CORE,OPTIONAL,DEV]"' EOF ```
Use the example Slurm job script examples/fast-llm.sbat to submit the job to the cluster:
bash sbatch examples/fast-llm.sbatMonitor the job's progress:
- Logs: Follow `job_output.log` and `job_error.log` in your working directory for logs.
- Status: Use `squeue -u $USER` to see the job status.
Now, you can sit back and relax while Fast-LLM trains your model at full speed! ☕
Running Fast-LLM on a Kubernetes Cluster
Prerequisites
- A Kubernetes cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
- KubeFlow installed.
- Locked memory limit set to unlimited at the host level on all nodes. Ask your cluster admin to do this if needed.
Steps
Create a Kubernetes PersistentVolumeClaim (PVC) named
fast-llm-homethat will be mounted to/home/fast-llmin the container using examples/fast-llm-pvc.yaml:bash kubectl apply -f examples/fast-llm-pvc.yamlCreate a PyTorchJob resource using the example configuration file examples/fast-llm.pytorchjob.yaml:
bash kubectl apply -f examples/fast-llm.pytorchjob.yamlMonitor the job status:
- Use `kubectl get pytorchjobs` to see the job status.
- Use `kubectl logs -f fast-llm-master-0 -c pytorch` to follow the logs.
That's it! You're now up and running with Fast-LLM on Kubernetes. 🚀
Next Steps
📖 Want to learn more? Check out our documentation for more information on how to use Fast-LLM.
🔨 We welcome contributions to Fast-LLM! Have a look at our contribution guidelines.
🐞 Something doesn't work? Open an issue!
License
Fast-LLM is licensed by ServiceNow, Inc. under the Apache 2.0 License. See LICENSE for more information.
Vulnerability Reporting
For security issues, email disclosure@servicenow.com. See our security policy.
Owner
- Name: ServiceNow
- Login: ServiceNow
- Kind: organization
- Website: https://www.servicenow.com
- Repositories: 147
- Profile: https://github.com/ServiceNow
Works for you™
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use Fast-LLM in your research, please cite it as follows:"
title: "Fast-LLM"
repository-code: "https://github.com/ServiceNow/Fast-LLM"
url: "https://github.com/ServiceNow/Fast-LLM"
license: "Apache-2.0"
keywords:
- large language models
- machine learning
- deep learning
- distributed training
- open source
authors:
- family-names: "Lamy Poirier"
given-names: "Joel"
- family-names: "Tian"
given-names: "Max"
- family-names: "Li"
given-names: "Raymond"
- family-names: "Guille-Escuret"
given-names: "Charles"
- family-names: "Kumar"
given-names: "Luke Nitish"
- family-names: "Kocetkov"
given-names: "Denis"
- family-names: "Scholak"
given-names: "Torsten"
date-released: "2024-10-19"
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 90
- Total pull requests: 136
- Average time to close issues: about 1 month
- Average time to close pull requests: 12 days
- Total issue authors: 7
- Total pull request authors: 15
- Average comments per issue: 0.77
- Average comments per pull request: 0.99
- Merged pull requests: 70
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 90
- Pull requests: 136
- Average time to close issues: about 1 month
- Average time to close pull requests: 12 days
- Issue authors: 7
- Pull request authors: 15
- Average comments per issue: 0.77
- Average comments per pull request: 0.99
- Merged pull requests: 70
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jlamypoirier (41)
- tscholak (36)
- bigximik (13)
- sohamparikh (8)
- RaymondLi0 (6)
- oleksost (2)
- chrish42 (1)
- shruthan (1)
Pull Request Authors
- jlamypoirier (87)
- tscholak (27)
- RaymondLi0 (16)
- nitsanluke (14)
- sohamparikh (14)
- oleksost (10)
- bigximik (10)
- akshaykalkunte (2)
- nandahkrishna (2)
- gopeshh (2)
- shruthan (1)
- harshitpawar64 (1)
- tobyzl2 (1)
- chrish42 (1)
- nimasheikholeslami (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- nvcr.io/nvidia/pytorch 24.07-py3 build
- actions/checkout v4 composite
- actions/setup-python v5 composite
- docker/build-push-action v6 composite
- docker/login-action v3 composite
- docker/metadata-action v5 composite
- docker/setup-buildx-action v3 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite