https://github.com/amazon-science/fact-graph
Implementation of the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)"
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Keywords
Repository
Implementation of the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)"
Basic Info
Statistics
- Stars: 47
- Watchers: 6
- Forks: 5
- Open Issues: 9
- Releases: 0
Topics
Metadata Files
README.md
FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)
This repository contains the code for the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations".
FactGraph is an adapter-based method for assessing factuality that decomposes the document and the summary into structured meaning representations (MR):
In FactGraph, summary and document graphs are encoded by a graph encoder with structure-aware adapters, along with text representations using an adapter-based text encoder. Text and graph encoders use the same pretrained model and only the adapters are trained:
Environment
The easiest way to proceed is to create a conda environment:
conda create -n factgraph python=3.7
conda activate factgraph
Further, install PyTorch and PyTorch Geometric:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter==2.0.9 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install torch-geometric==2.0.3
Install the packages required:
pip install -r requirements.txt
Finally, create the environment for AMR preprocessing:
cd data/preprocess
./create_envs_preprocess.sh
cd ../../
FactCollect Dataset
FactCollect is created consolidating the following datasets:
| Dataset | Datapoints | | | ------------- |:-------------:|:-------------:| | Wang et al. (2020) | 953 | Link | Kryscinski et al. (2020) | 1434 | Link | Maynez et al. (2020) | 2500 | Link | Pagnoni et al. (2021) | 4942 | Link
- FactCollect uses two datasets released under licenses.
For generating FactCollect dataset, execute:
conda activate factgraph
cd data
./create_dataset.sh
cd ..
Running trained FactGraph Models
First, download FactGraph trained checkpoints:
cd src
./download_trained_models.sh
To run FactGraph:
./evaluate.sh factgraph <file> <gpu_id>
To run FactGraph edge-level:
./evaluate.sh factgraph-edge <file> <gpu_id>
<file> is a JSON line file with the following format:
{'summary': summary1, 'article': article1}
{'summary': summary2, 'article': article2}
...
where 'summary' is a single sentence summary.
Training FactGraph
Preprocess
Convert the dataset into the format required for the model:
cd data/preprocess
./process_dataset_for_model.sh <gpu_id>
cd ../../
This step generated AMR graphs using the SPRING model. Check their repository for more details.
Download the pretrained parameters of the adapters:
cd src
./download_pretrained_adapters.sh
Training
For training FactGraph using the FactCollect dataset, execute:
conda activate factgraph
./train.sh <gpu_id>
Predicting
For predicting, run:
./predict.sh <checkpoint_folder> <gpu_id>
Training FactGraph - Edge-level
Preprocess
Download the files train.tsv and test.tsv from this link provided by Goyal and Durrett (2021). Copy those files to data\edge_level_data
Convert the dataset into the format required for the model:
cd data/preprocess
./process_dataset_for_edge_model.sh <gpu_id>
cd ../../
Training
For training FactGraph using the FactCollect dataset, execute:
conda activate factgraph
./train_edgelevel.sh <gpu_id>
Predicting
For predicting, run:
./predict_edgelevel.sh <checkpoint_folder> <gpu_id>
Trained Models
A FactGraph checkpoint trained on FactCollect dataset can be found here. Test set results:
{'accuracy': 0.89, 'bacc': 0.8904, 'f1': 0.89, 'size': 600, 'cnndm': {'bacc': 0.7717, 'f1': 0.8649, 'size': 370}, 'xsum': {'bacc': 0.6833, 'f1': 0.9304, 'size': 230}}
A FactGraph-edge checkpoint trained on the Maynez dataset can be found here. This checkpoint was selected using the test set. Test set results:
{'accuracy': 0.8371, 'bacc': 0.8447, 'f1': 0.8371, 'f1_macro': 0.7362, 'accuracy_edge': 0.6948, 'bacc_edge': 0.6592, 'f1_edge': 0.6948}
Security
See CONTRIBUTING for more information.
License Summary
The documentation is made available under under the CC-BY-NC-4.0 License. See the LICENSE file.
Citation
@inproceedings{ribeiro-etal-2022-factgraph,
title = "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations",
author = "Ribeiro, Leonardo F. R. and
Liu, Mengwen and
Gurevych, Iryna and
Dreyer Markus and
Bansal, Mohit",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
year={2022}
}
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Watch event: 2
- Fork event: 1
Last Year
- Watch event: 2
- Fork event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 10
- Total pull requests: 21
- Average time to close issues: N/A
- Average time to close pull requests: 3 days
- Total issue authors: 4
- Total pull request authors: 2
- Average comments per issue: 0.2
- Average comments per pull request: 0.14
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 11
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- BunsenFeng (2)
- Srinivas-R (1)
- ShuyangCao (1)
- osmalpkoras (1)
Pull Request Authors
- leoribeiro (7)
- dependabot[bot] (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Penman ==1.2.0
- amrlib ==0.7.1
- datasets ==1.8.0
- dill ==0.3.4
- numpy ==1.22.0
- packaging ==20.9
- pandas ==1.2.4
- plac ==1.1.3
- pluggy ==0.13.1
- preshed ==3.0.5
- protobuf ==3.17.3
- py ==1.10.0
- pyarrow ==3.0.0
- pyparsing ==2.4.7
- pytest ==6.2.4
- python-dateutil ==2.8.1
- pytz ==2021.1
- requests ==2.25.1
- s3transfer ==0.4.2
- scipy ==1.5.2
- sentence-transformers ==1.2.0
- six ==1.16.0
- smatch ==1.0.4
- spacy ==3.0.6
- stanza ==1.2
- torch ==1.9.0
- unidecode ==1.2.0
- datasets ==1.7.0
- rdflib ==6.1.1
- sacremoses ==0.0.47
- tokenizers ==0.10.3
- unidecode ==1.3.3