https://github.com/amazon-science/madisse

https://github.com/amazon-science/madisse

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: amazon-science
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 3.91 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation

Authors: Mahnaz Koupaee, Jake W. Vincent, Saab Mansour, Igor Shalyminov, Han He, Hwanjun Song, Raphael Shu, Jianfeng He, Yi Nian, Amy Wing-mei Wong, Kyu J. Han, Hang Su

Please check out our paper here.

Madisse

Our debate approach for summary faithfulness evaluation consisting of a group of agnets with initial imposed beleifs of faithfulness which would engage in discussions to resolve any inconsistencies is shown below. Each debate session consists of three stages: 1) stance initialization, in which agents are assigned a belief of the summary faithfulness (faithful or unfaithful), 2) debate, where evaluator agents engage in multiple rounds of debate to persuade each other of whether the summary is faithful or not, and 3) adjudication, where based on the arguments from the debate, the final label is assigned to the summary. Madisse can have simultaneous debate sessions

alt text

Ambiguity annotation on MeetingBank

MeetingBank_ambiguity_annotated.json in the data folder contains the ambiguity annotations for MeetingBank summaries. The followings are descriptions of column names.

| Column Name | Description | | -------- | ----- | | doc | source document | | summary | a generated summary sentence for the given document | | ambiguity | 0 if the given summary is not ambiguous or 1 if the summary is ambiguous | | category | if the summary is deemed ambiguous, then the selected high-level ambiguity category| | sub-category | if the summary is deemed ambiguous, the selected fine-grained ambiguity sub-category form the taxonomy| | explanation | a short description of why there exists an ambiguity in the given summary |

Madisse with ambiguity detection module

An ideal faithfulness evaluation system should handle ambiguities first. This can be done by identifying the ambiguous summaries and filtering them out and then evaluating the non-ambiguous summaries. The overall view of a faithfulness evaluator with the ambiguity detection module is shown below:

alt text

Citation

@misc{koupaee2025faithfulunfaithfulambiguousmultiagent, title={Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation}, author={Mahnaz Koupaee and Jake W. Vincent and Saab Mansour and Igor Shalyminov and Han He and Hwanjun Song and Raphael Shu and Jianfeng He and Yi Nian and Amy Wing-mei Wong and Kyu J. Han and Hang Su}, year={2025}, eprint={2502.08514}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.08514}, }

Owner

  • Name: Amazon Science
  • Login: amazon-science
  • Kind: organization

GitHub Events

Total
  • Member event: 1
  • Public event: 1
  • Push event: 5
Last Year
  • Member event: 1
  • Public event: 1
  • Push event: 5

Dependencies

requirements.txt pypi
  • boto3 *
  • datasets *
  • openai *