https://github.com/awslabs/hlat
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: awslabs
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 117 KB
Statistics
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
HLAT
Open source code for paper (HLAT: High-quality Large Language Model Pre-trained on AWS Trainium: https://arxiv.org/abs/2404.10630) and blog (https://aws.amazon.com/blogs/machine-learning/end-to-end-llm-training-on-instance-clusters-with-over-100-nodes-using-aws-trainium).
Get Started
Steps to reproduce the results:
- Prepare a slurm cluster that has trn1.32xlarge instances.
- Get access and download Llama tokenizer from https://huggingface.co/meta-llama/Llama-2-7b.
- Download the repo to shared disk.
- Prepare dataset, any dataset in Apache Arrow format and each row contains the field "text".
- Change tokenizer path and dataset path:
- 7b:
7b/tp_zero1_llama2_7b_hf_pretrain.shline 67 & 194 & 196. - 70b:
70b/tp_zero1_llama2_70b_hf_pretrain.shline 103 & 187.
- 7b:
- Also change the tensorboard path at the end of training script.
- Compile and run the job:
- compile:
sbatch run_llama.slurmin folder7bor70b - run: change "compile" to "run" in
run_llama.slurm, thensbatch run_llama.slurm
- compile:
Tips:
- Sometimes installation script fails, just retry.
- Make sure output path have enough space to store checkpoints.
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Reference
If you found HLAT useful in your research or applications, please cite using the following BibTeX:
@software{HLAT,
title = {HLAT: High-quality Large Language Model Pre-trained on AWS Trainium},
author = {Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan},
url = {https://github.com/awslabs/HLAT/},
year={2024}
}
@article{HLAT,
title={HLAT: High-quality Large Language Model Pre-trained on AWS Trainium},
author={Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan},
journal={arXiv preprint arXiv:2404.10630},
year={2024}
}
Owner
- Name: Amazon Web Services - Labs
- Login: awslabs
- Kind: organization
- Location: Seattle, WA
- Website: http://amazon.com/aws/
- Repositories: 914
- Profile: https://github.com/awslabs
AWS Labs
GitHub Events
Total
- Watch event: 2
- Push event: 1
- Fork event: 1
Last Year
- Watch event: 2
- Push event: 1
- Fork event: 1