Shen, S., Walsh, P., Keutzer, K., Dodge, J., Peters, M., & Beltagy, I. (2022). Staged Training for Transformer Language Models. https://doi.org/10.48550/arXiv.2203.06211