scisum
Resources for the paper Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? (ACL 2024).
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Keywords
Repository
Resources for the paper Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? (ACL 2024).
Basic Info
- Host: GitHub
- Owner: fonsc
- Language: Python
- Default Branch: main
- Homepage: https://aclanthology.org/2024.findings-acl.508/
- Size: 15.1 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Scientific Summarization with LLMs
Resources for the paper Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? (ACL 2024).
Requirements
It is recommended to setup a Python 3.12 environment. Using Miniconda, the environment is created as follows:
conda create -n scisum python=3.12
Then, activate the environment, clone this repository, and install the dependencies:
conda activate scisum
git clone https://github.com/thefonseca/scisum.git
cd scisum
pip install -r requirements.txt
Abstract generation experiments (arXiv and PubMed)
The following commands evaluate Llama-2 and OpenAI models on abstract generation for arXiv papers (Section 4.2 of the paper). The evaluation includes four types of prompt: - Baseline (without guidance) - Target conciseness (fixed sentence budget) - Target conciseness + first person narrative - Target conciseness + first person narrative + target keyword coverage
To evaluate abstract generation with Llama-2 use the run.py script:
bash
python run.py --dataset arxiv --model llama-2-7b-chat --model_checkpoint_path /path/to/llama2/checkpoint --keyword_model factorsum
By default, logs, metrics, and predictions are written to the ./output folder. These outputs can be disabled by setting --output_dir None. To perform the same experiments on the PubMed dataset (Appendix D), use --dataset pubmed. For the Llama-2 with classifier-free guidance (CFG), use the --guidance_scale and --negative_prompt parameters:
bash
python run.py --dataset arxiv --model llama-2-7b-chat --model_checkpoint_path /path/to/llama2/checkpoint --keyword_model factorsum --guidance_scale 1.5 --negative_prompt "Write a summary of the article above."
The OpenAI model used in the paper, gpt-3.5-turbo-0301, is now deprecated. As an alternative, this command performs evaluation with gpt-4o-mini:
bash
export OPENAI_API_KEY=your-openai-api-key
python run.py --dataset arxiv --model gpt-4o-mini --keyword_model factorsum
To evaluate on the 500 held-out arXiv samples (Table 5 in the paper):
bash
python run.py --dataset_name data/arxiv-cs_CL-202401.json --model gpt-4o-mini --max_samples 500 --keyword_model factorsum
Lay summarization experiments (eLife)
The experiments with the eLife dataset work in a similar way as the commands for abstract generation descrive above.
```bash
Llama-2
python run.py --dataset elife --model llama-2-7b-chat --modelcheckpointpath /path/to/llama2/checkpoint --keyword_model bart
OpenAI
python run.py --dataset elife --model gpt-4o-mini --keyword_model bart
Llama-2 with classifier-free guidance (CFG)
python run.py --dataset elife --model llama-2-7b-chat --modelcheckpointpath /path/to/llama2/checkpoint --keywordmodel bart --guidancescale 1.5 --negative_prompt "Write a summary of the article above." ```
For the experiments with varying sentence budgets (Figure 2): ```bash
Llama-2
./budgetsweepllama.sh
OpenAI
./budgetsweepopenai.sh ```
Review summarization experiments (MuP)
The dataset described in Section 4.1 is already provided in data/mup_human_random_validation.csv. To generate a new dataset based on MuP, use:
python mup.py --overwrite
Note that this new dataset might generate slighly different results compared to Table 1 in the paper. The evaluation commands are as follows:
```bash
Human-written summaries
python run.py --model human --dataset mup
Llama-2
python run.py --dataset mup --model llama-2-7b-chat --modelcheckpointpath /path/to/llama2/checkpoint
OpenAI
python run.py --dataset mup --model gpt-4o-mini ```
Citation
@inproceedings{fonseca-cohen-2024-large-language,
title = "Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?",
author = "Fonseca, Marcio and
Cohen, Shay",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.508",
pages = "8599--8618",
}
Owner
- Name: Marcio Fonseca
- Login: fonsc
- Kind: user
- Location: Edinburgh, UK
- Company: University of Edinburgh
- Website: marciofonseca.me
- Repositories: 5
- Profile: https://github.com/fonsc
Citation (CITATION.bib)
@inproceedings{fonseca-cohen-2024-large-language,
title = "Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?",
author = "Fonseca, Marcio and
Cohen, Shay",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.508",
pages = "8599--8618",
}
GitHub Events
Total
Last Year
Dependencies
- fire ==0.6.0