https://github.com/amazon-science/fact-graph

Implementation of the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)"

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Keywords

abstractive-summarization factuality

Last synced: 10 months ago · JSON representation

Repository

Implementation of the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)"

Basic Info

Host: GitHub
Owner: amazon-science
License: other
Language: Python
Default Branch: main
Homepage:
Size: 2.24 MB

Statistics

Stars: 47
Watchers: 6
Forks: 5
Open Issues: 9
Releases: 0

Topics

abstractive-summarization factuality

Created about 4 years ago · Last pushed almost 3 years ago

Metadata Files

Readme Contributing License Code of conduct

FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)

This repository contains the code for the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations".

FactGraph is an adapter-based method for assessing factuality that decomposes the document and the summary into structured meaning representations (MR):

In FactGraph, summary and document graphs are encoded by a graph encoder with structure-aware adapters, along with text representations using an adapter-based text encoder. Text and graph encoders use the same pretrained model and only the adapters are trained:

Environment

The easiest way to proceed is to create a conda environment: conda create -n factgraph python=3.7 conda activate factgraph

Further, install PyTorch and PyTorch Geometric:

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html pip install torch-scatter==2.0.9 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html pip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html pip install torch-geometric==2.0.3

Install the packages required:

pip install -r requirements.txt

Finally, create the environment for AMR preprocessing:

cd data/preprocess ./create_envs_preprocess.sh cd ../../

FactCollect Dataset

FactCollect is created consolidating the following datasets:

| Dataset | Datapoints | | | ------------- |:-------------:|:-------------:| | Wang et al. (2020) | 953 | Link | Kryscinski et al. (2020) | 1434 | Link | Maynez et al. (2020) | 2500 | Link | Pagnoni et al. (2021) | 4942 | Link

FactCollect uses two datasets released under licenses.
- FactCC is under BSD-3. Copyright (c) 2019, Salesforce.com, Inc. All rights reserved.
- XSum Hallucinations is under CC BY 4.0.

For generating FactCollect dataset, execute:

conda activate factgraph cd data ./create_dataset.sh cd ..

Running trained FactGraph Models

First, download FactGraph trained checkpoints: cd src ./download_trained_models.sh

To run FactGraph: ./evaluate.sh factgraph <file> <gpu_id>

To run FactGraph edge-level: ./evaluate.sh factgraph-edge <file> <gpu_id>

<file> is a JSON line file with the following format: {'summary': summary1, 'article': article1} {'summary': summary2, 'article': article2} ... where 'summary' is a single sentence summary.

Training FactGraph

Preprocess

Convert the dataset into the format required for the model:

cd data/preprocess ./process_dataset_for_model.sh <gpu_id> cd ../../

This step generated AMR graphs using the SPRING model. Check their repository for more details.

Download the pretrained parameters of the adapters: cd src ./download_pretrained_adapters.sh

Training

For training FactGraph using the FactCollect dataset, execute: conda activate factgraph ./train.sh <gpu_id>

Predicting

For predicting, run: ./predict.sh <checkpoint_folder> <gpu_id>

Training FactGraph - Edge-level

Preprocess

Download the files train.tsv and test.tsv from this link provided by Goyal and Durrett (2021). Copy those files to data\edge_level_data

Convert the dataset into the format required for the model:

cd data/preprocess ./process_dataset_for_edge_model.sh <gpu_id> cd ../../

Training

For training FactGraph using the FactCollect dataset, execute: conda activate factgraph ./train_edgelevel.sh <gpu_id>

Predicting

For predicting, run: ./predict_edgelevel.sh <checkpoint_folder> <gpu_id>

Trained Models

A FactGraph checkpoint trained on FactCollect dataset can be found here. Test set results: {'accuracy': 0.89, 'bacc': 0.8904, 'f1': 0.89, 'size': 600, 'cnndm': {'bacc': 0.7717, 'f1': 0.8649, 'size': 370}, 'xsum': {'bacc': 0.6833, 'f1': 0.9304, 'size': 230}}

A FactGraph-edge checkpoint trained on the Maynez dataset can be found here. This checkpoint was selected using the test set. Test set results: {'accuracy': 0.8371, 'bacc': 0.8447, 'f1': 0.8371, 'f1_macro': 0.7362, 'accuracy_edge': 0.6948, 'bacc_edge': 0.6592, 'f1_edge': 0.6948}

Security

See CONTRIBUTING for more information.

License Summary

The documentation is made available under under the CC-BY-NC-4.0 License. See the LICENSE file.

Citation

@inproceedings{ribeiro-etal-2022-factgraph, title = "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations", author = "Ribeiro, Leonardo F. R. and Liu, Mengwen and Gurevych, Iryna and Dreyer Markus and Bansal, Mohit", booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", year={2022} }

Owner

Name: Amazon Science
Login: amazon-science
Kind: organization

Website: https://amazon.science
Twitter: AmazonScience
Repositories: 80
Profile: https://github.com/amazon-science

GitHub Events

Total

Watch event: 2
Fork event: 1

Last Year

Watch event: 2
Fork event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 10
Total pull requests: 21
Average time to close issues: N/A
Average time to close pull requests: 3 days
Total issue authors: 4
Total pull request authors: 2
Average comments per issue: 0.2
Average comments per pull request: 0.14
Merged pull requests: 13
Bot issues: 0
Bot pull requests: 11

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

BunsenFeng (2)
Srinivas-R (1)
ShuyangCao (1)
osmalpkoras (1)

Pull Request Authors

leoribeiro (7)
dependabot[bot] (3)

Top Labels

Issue Labels

Pull Request Labels

dependencies (3)

Dependencies

data/preprocess/requirements-preprocess.txt pypi

Penman ==1.2.0
amrlib ==0.7.1
datasets ==1.8.0
dill ==0.3.4
numpy ==1.22.0
packaging ==20.9
pandas ==1.2.4
plac ==1.1.3
pluggy ==0.13.1
preshed ==3.0.5
protobuf ==3.17.3
py ==1.10.0
pyarrow ==3.0.0
pyparsing ==2.4.7
pytest ==6.2.4
python-dateutil ==2.8.1
pytz ==2021.1
requests ==2.25.1
s3transfer ==0.4.2
scipy ==1.5.2
sentence-transformers ==1.2.0
six ==1.16.0
smatch ==1.0.4
spacy ==3.0.6
stanza ==1.2
torch ==1.9.0
unidecode ==1.2.0

requirements.txt pypi

datasets ==1.7.0
rdflib ==6.1.1
sacremoses ==0.0.47
tokenizers ==0.10.3
unidecode ==1.3.3

https://github.com/amazon-science/fact-graph

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)

Environment

FactCollect Dataset

Running trained FactGraph Models

Training FactGraph

Preprocess

Training

Predicting

Training FactGraph - Edge-level

Preprocess

Training

Predicting

Trained Models

Security

License Summary

Citation

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies