https://github.com/awslabs/hlat

https://github.com/awslabs/hlat

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: awslabs
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 117 KB
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Code of conduct

README.md

HLAT

Open source code for paper (HLAT: High-quality Large Language Model Pre-trained on AWS Trainium: https://arxiv.org/abs/2404.10630) and blog (https://aws.amazon.com/blogs/machine-learning/end-to-end-llm-training-on-instance-clusters-with-over-100-nodes-using-aws-trainium).

Get Started

Steps to reproduce the results:

  1. Prepare a slurm cluster that has trn1.32xlarge instances.
  2. Get access and download Llama tokenizer from https://huggingface.co/meta-llama/Llama-2-7b.
  3. Download the repo to shared disk.
  4. Prepare dataset, any dataset in Apache Arrow format and each row contains the field "text".
  5. Change tokenizer path and dataset path:
    • 7b: 7b/tp_zero1_llama2_7b_hf_pretrain.sh line 67 & 194 & 196.
    • 70b: 70b/tp_zero1_llama2_70b_hf_pretrain.sh line 103 & 187.
  6. Also change the tensorboard path at the end of training script.
  7. Compile and run the job:
    • compile: sbatch run_llama.slurm in folder 7b or 70b
    • run: change "compile" to "run" in run_llama.slurm, then sbatch run_llama.slurm

Tips:

  • Sometimes installation script fails, just retry.
  • Make sure output path have enough space to store checkpoints.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Reference

If you found HLAT useful in your research or applications, please cite using the following BibTeX: @software{HLAT, title = {HLAT: High-quality Large Language Model Pre-trained on AWS Trainium}, author = {Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan}, url = {https://github.com/awslabs/HLAT/}, year={2024} } @article{HLAT, title={HLAT: High-quality Large Language Model Pre-trained on AWS Trainium}, author={Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan}, journal={arXiv preprint arXiv:2404.10630}, year={2024} }

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
  • Watch event: 2
  • Push event: 1
  • Fork event: 1
Last Year
  • Watch event: 2
  • Push event: 1
  • Fork event: 1