Muller, S., Loison, A., Omrani, B., & Viaud, G. (2024). GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering. https://doi.org/arXiv.2409.06595