Updated 9 months ago

gpt-neox • Rank 13.8 • Science 64%

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries