Miller, J., Chughtai, B., & Saunders, W. (2024). Transformer Circuit Evaluation Metrics Are Not Robust [Conference paper]. First Conference on Language Modeling. https://openreview.net/forum?id=zSf8PJyQb2