He, L. (2022). The full derivation of Transformer gradient (Version 0.0.0) [Computer software]. https://github.com/Say-Hello2y/Transformer-attention.git