https://github.com/amazon-science/madisse
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 3.91 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation
Authors: Mahnaz Koupaee, Jake W. Vincent, Saab Mansour, Igor Shalyminov, Han He, Hwanjun Song, Raphael Shu, Jianfeng He, Yi Nian, Amy Wing-mei Wong, Kyu J. Han, Hang Su
Please check out our paper here.
Madisse
Our debate approach for summary faithfulness evaluation consisting of a group of agnets with initial imposed beleifs of faithfulness which would engage in discussions to resolve any inconsistencies is shown below. Each debate session consists of three stages: 1) stance initialization, in which agents are assigned a belief of the summary faithfulness (faithful or unfaithful), 2) debate, where evaluator agents engage in multiple rounds of debate to persuade each other of whether the summary is faithful or not, and 3) adjudication, where based on the arguments from the debate, the final label is assigned to the summary. Madisse can have simultaneous debate sessions

Ambiguity annotation on MeetingBank
MeetingBank_ambiguity_annotated.json in the data folder contains the ambiguity annotations for MeetingBank summaries. The followings are descriptions of column names.
| Column Name | Description |
| -------- | ----- |
| doc | source document |
| summary | a generated summary sentence for the given document |
| ambiguity | 0 if the given summary is not ambiguous or 1 if the summary is ambiguous |
| category | if the summary is deemed ambiguous, then the selected high-level ambiguity category|
| sub-category | if the summary is deemed ambiguous, the selected fine-grained ambiguity sub-category form the taxonomy|
| explanation | a short description of why there exists an ambiguity in the given summary |
Madisse with ambiguity detection module
An ideal faithfulness evaluation system should handle ambiguities first. This can be done by identifying the ambiguous summaries and filtering them out and then evaluating the non-ambiguous summaries. The overall view of a faithfulness evaluator with the ambiguity detection module is shown below:

Citation
@misc{koupaee2025faithfulunfaithfulambiguousmultiagent,
title={Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation},
author={Mahnaz Koupaee and Jake W. Vincent and Saab Mansour and Igor Shalyminov and Han He and Hwanjun Song and Raphael Shu and Jianfeng He and Yi Nian and Amy Wing-mei Wong and Kyu J. Han and Hang Su},
year={2025},
eprint={2502.08514},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.08514},
}
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Member event: 1
- Public event: 1
- Push event: 5
Last Year
- Member event: 1
- Public event: 1
- Push event: 5
Dependencies
- boto3 *
- datasets *
- openai *