scisynthesis

for prompts, dataset, and code addressing the task of scientific synthesis

https://github.com/jd-coderepos/scisynthesis

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary

Keywords

corpus dataset evaluation-datasets large-language-models natural-language-generation natural-language-understanding scientific-summarization scientific-synthesis

Last synced: 11 months ago · JSON representation ·

Repository

for prompts, dataset, and code addressing the task of scientific synthesis

Basic Info

Host: GitHub
Owner: jd-coderepos
License: mit
Language: Python
Default Branch: main
Homepage: https://github.com/HamedBabaei/LLMs4Synthesis
Size: 8.99 MB

Statistics

Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

corpus dataset evaluation-datasets large-language-models natural-language-generation natural-language-understanding scientific-summarization scientific-synthesis

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

ORKG Synthesis Dataset

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

This work is accepted for publication at JCDL-2024 conference.

What is the ORKG Synthesis Dataset?

We develop a methodology to collect and process scientific papers into a format ready for synthesis using the Open Research Knowledge Graph, a multidisciplinary platform that facilitates the comparison of scientific contributions. Where later, we introduce new synthesis types —- paper-wise, methodological, and thematic —- that focus on different aspects of the extracted insights. Utilizing Mistral-7B and GPT4 , we generate a large-scale dataset of these syntheses. The established nine quality criteria for evaluating these syntheses, assessed by both an automated LLM evaluator (GPT-4) and a human-crowdsourced survey.

Directories

corpus: Contains ORKG Synthesis dataset for bot GPT-4 and Mistral-7B for three synthesis objectives (paper-wise, methodological, and thematic). Also Prolific Human Survey Results.
gpt-4 synthesis-evaluator: Contains Evaluation System Prompt and evaluator script.
orkg-comparison-data-gen-scripts: Synthesis generation scripts.
synthesis-generation-prompts: Synthesis generation prompts for paper-wise, methodological, and thematic objectives.

Prolific Survey

The Prolific Survey Participant Demographics available at Table 1 in the corpus/prolific directory.

Also the average human and automatic (LLM) evaluation available at Table 2 in the corpus/prolific directory, representing average human and LLM evaluation scores by characteristic comparisons. For each domain/characteristic, the human scores are an average of 18 judgements (6 syntheses (2 samples x 3 synthesis types) x 3 participants) while the auto scores are an average of 6 judgements (6 syntheses (2 samples x 3 synthesis types) x 1 LLM evaluation).

LLMs4Synthesis

The LLMs4Synthesis framework on top of this dataset is available at https://github.com/HamedBabaei/LLMs4Synthesis.

Citation

If you find this work useful, please consider citing our research papers listed below.

``` @inproceedings{evans-etal-2024-large, title = "Large Language Models as Evaluators for Scientific Synthesis", author = {Evans, Julia and D{'}Souza, Jennifer and Auer, S{\"o}ren}, editor = "Luz de Araujo, Pedro Henrique and Baumann, Andreas and Gromann, Dagmar and Krenn, Brigitte and Roth, Benjamin and Wiegand, Michael", booktitle = "Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)", month = sep, year = "2024", address = "Vienna, Austria", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.konvens-main.1/", pages = "1--22" }

@inbook{babaei-giglou-etal-2025-synthesis, author = {Babaei Giglou, Hamed and D'Souza, Jennifer and Auer, S\"{o}ren}, title = {LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis}, year = {2025}, isbn = {9798400710933}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3677389.3702565}, articleno = {31}, numpages = {12} }

```

Owner

Login: jd-coderepos
Kind: user

Repositories: 1
Profile: https://github.com/jd-coderepos

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository, please cite the following work."
title: "Large Language Models as Evaluators for Scientific Synthesis"
type: data
authors:
  - family-names: "Evans"
    given-names: "Julia"
  - family-names: "D'Souza"
    given-names: "Jennifer"
  - family-names: "Auer"
    given-names: "Sören"
editors:
  - family-names: "Luz de Araujo"
    given-names: "Pedro Henrique"
  - family-names: "Baumann"
    given-names: "Andreas"
  - family-names: "Gromann"
    given-names: "Dagmar"
  - family-names: "Krenn"
    given-names: "Brigitte"
  - family-names: "Roth"
    given-names: "Benjamin"
  - family-names: "Wiegand"
    given-names: "Michael"
year: "2024"
month: 9
booktitle: "Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)"
address: "Vienna, Austria"
publisher: "Association for Computational Linguistics"
url: "https://aclanthology.org/2024.konvens-main.1/"
pages: "1--22"

GitHub Events

Total

Watch event: 3
Push event: 2

Last Year

Watch event: 3
Push event: 2

Dependencies

requirements.txt pypi

beautifulsoup4 *
openai *
orkg *
pandas *
tqdm *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science