https://github.com/amazon-science/adasum

https://github.com/amazon-science/adasum

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: amazon-science
  • License: other
  • Language: Python
  • Default Branch: main
  • Size: 439 KB
Statistics
  • Stars: 5
  • Watchers: 8
  • Forks: 1
  • Open Issues: 3
  • Releases: 0
Created about 4 years ago · Last pushed about 3 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Few-shot Fine-tuning for Opinion Summarization

This repository contains the main codebase for the corresponding NAACL findings paper. In this work, we explored in-domain information storage to adapters by pre-training them on customer reviews via the leave-one-out objective. Further, we fine-tune the pre-trained adapters on a handful of summaries. This method yields state-of-the-art results in terms of ROUGE scores and reduces semantic mistakes in generated summaries.

1. Conda environment

In this project, we used conda for environments. To re-create the environment, use the command below.

conda env create --file environment.yml

Then, activate it:

conda activate adasum

2. FAIRSEQ

The codebase relies on FAIRSEQ, which can be downloaded and installed in a parent folder as follows.

``` git clone https://github.com/pytorch/fairseq.git mv fairseq fairseqlib cd fairseqlib

git reset --hard 81046fc pip install --editable ./ ```

Please make sure you use the correct commit to avoid incompatibility issues. Also, set the global variable.

export MKL_THREADING_LAYER=GNU

3. Folder structure

The main codebase is stored at adasum.

  • artifacts: checkpoints and model generated summaries (checkpoints need to be download separately);
  • data: contains pre-training and fine-tuning datasets (see pre-processing folder for instructions on how to obtain data);
  • adasum: fairseq files for adasum and adaqsum models;
  • preprocessing: scripts for data pre-processing;
  • shared: files shared between adasum and preprocessing scripts.

4. Citation

@inproceedings{brazinskas-etal-2022-efficient, title = "Efficient Few-Shot Fine-Tuning for Opinion Summarization", author = "Brazinskas, Arthur and Nallapati, Ramesh and Bansal, Mohit and Dreyer, Markus", booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-naacl.113", pages = "1509--1523" }

5. Security

See CONTRIBUTING for more information.

6. License

This project is licensed under the CC-BY-NC-4.0 License.

Owner

  • Name: Amazon Science
  • Login: amazon-science
  • Kind: organization

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 10
  • Total pull requests: 4
  • Average time to close issues: 5 days
  • Average time to close pull requests: about 18 hours
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 0.6
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • avshalomman (3)
  • Zypressen021 (2)
  • pulkitbv (1)
Pull Request Authors
  • abrazinskas (3)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml pypi
  • antlr4-python3-runtime ==4.8
  • cffi ==1.14.5
  • click ==8.0.1
  • cython ==0.29.23
  • filelock ==3.0.12
  • hydra-core ==1.0.6
  • importlib-metadata ==4.5.0
  • importlib-resources ==5.1.4
  • joblib ==1.0.1
  • nltk ==3.6.2
  • omegaconf ==2.0.6
  • pandas ==1.2.5
  • portalocker ==2.0.0
  • psutil ==5.8.0
  • pycparser ==2.20
  • pyrouge ==0.1.3
  • python-dateutil ==2.8.1
  • pytz ==2021.1
  • pyyaml ==5.4.1
  • regex ==2021.4.4
  • sacrebleu ==1.5.1
  • scipy ==1.7.1
  • tqdm ==4.61.1
  • zipp ==3.4.1