Updated 6 months ago
https://github.com/amazon-science/beyondcorrelation
Implementation of the paper: Beyond Correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge