https://github.com/awslabs/hlat

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: Python
Default Branch: main
Size: 117 KB

Statistics

Stars: 6
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Code of conduct

HLAT

Open source code for paper (HLAT: High-quality Large Language Model Pre-trained on AWS Trainium: https://arxiv.org/abs/2404.10630) and blog (https://aws.amazon.com/blogs/machine-learning/end-to-end-llm-training-on-instance-clusters-with-over-100-nodes-using-aws-trainium).

Get Started

Steps to reproduce the results:

Prepare a slurm cluster that has trn1.32xlarge instances.
Get access and download Llama tokenizer from https://huggingface.co/meta-llama/Llama-2-7b.
Download the repo to shared disk.
Prepare dataset, any dataset in Apache Arrow format and each row contains the field "text".
Change tokenizer path and dataset path:
- 7b: 7b/tp_zero1_llama2_7b_hf_pretrain.sh line 67 & 194 & 196.
- 70b: 70b/tp_zero1_llama2_70b_hf_pretrain.sh line 103 & 187.
Also change the tensorboard path at the end of training script.
Compile and run the job:
- compile: sbatch run_llama.slurm in folder 7b or 70b
- run: change "compile" to "run" in run_llama.slurm, then sbatch run_llama.slurm

Tips:

Sometimes installation script fails, just retry.
Make sure output path have enough space to store checkpoints.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Reference

If you found HLAT useful in your research or applications, please cite using the following BibTeX: @software{HLAT, title = {HLAT: High-quality Large Language Model Pre-trained on AWS Trainium}, author = {Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan}, url = {https://github.com/awslabs/HLAT/}, year={2024} } @article{HLAT, title={HLAT: High-quality Large Language Model Pre-trained on AWS Trainium}, author={Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan}, journal={arXiv preprint arXiv:2404.10630}, year={2024} }

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Watch event: 2
Push event: 1
Fork event: 1

Last Year

Watch event: 2
Push event: 1
Fork event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science