grapheval

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

https://github.com/xz-liu/grapheval

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

Basic Info

Host: GitHub
Owner: xz-liu
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 4.5 MB

Statistics

Stars: 29
Watchers: 1
Forks: 2
Open Issues: 2
Releases: 0

Created about 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

GraphEval: Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

We propose GraphEval to evaluate an LLM's performance using a substantially large test dataset. Specifically, the test dataset is retrieved from a large knowledge graph with more than 10 million facts without expensive human efforts. Unlike conventional methods that evaluate LLMs based on generated responses, GraphEval streamlines the evaluation process by creating a judge model to estimate the correctness of the answers given by the LLM. Our experiments demonstrate that the judge model's factuality assessment aligns closely with the correctness of the LLM's generated outputs, while also substantially reducing evaluation costs. Besides, our findings offer valuable insights into LLM performance across different metrics and highlight the potential for future improvements in ensuring the factual integrity of LLM outputs.

🔬 Dependencies

bash pip install -r requirements.txt

Details

Python (>= 3.7)
PyTorch (>= 1.13.1)
numpy (>= 1.19.2)
Transformers (== 4.38.2)

📚 Data Preparation

Please download mappingbased-objects_lang=en.ttl.bzip2 from the DBpedia dataset and unzip it. A program argument is provided to specify the path to the file.

DBpedia dataset can be downloaded from here.

🚀 Running the code

The 3 steps in the papers are implemented in the following files:

collect.py
train.py
eval.py

The code provides arguments to specify settings, paths, and hyperparameters. To see the arguments, run the following command:

bash python collect.py --help Here, you can use any of the collect.py, train.py, and eval.py files to run the help command.

🤝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

bigquery @journal{liu2024evaluating, title={Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs}, author={Xiaoze Liu and Feijie Wu and Tianyang Xu and Zhuo Chen and Yichi Zhang and Xiaoqian Wang and Jing Gao}, year={2024}, journal={arXiv preprint arXiv:2404.00942} }

Owner

Name: Xiaoze Liu
Login: xz-liu
Kind: user

Website: https://xz-liu.github.io/
Repositories: 1
Profile: https://github.com/xz-liu

ねだるな、勝ち取れ、さすれば与えられん

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software in your work, please cite it using the following metadata"
preferred-citation:
  type: article
  title: "Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs"
  authors:
    - family-names: Liu
      given-names: Xiaoze
    - family-names: Wu
      given-names: Feijie
    - family-names: Xu
      given-names: Tianyang
    - family-names: Chen
      given-names: Zhuo
    - family-names: Zhang
      given-names: Yichi
    - family-names: Wang
      given-names: Xiaoqian
    - family-names: Gao
      given-names: Jing
  year: 2024
  journal: "arXiv preprint arXiv:2404.00942"

GitHub Events

Total

Issues event: 1
Watch event: 10
Issue comment event: 1

Last Year

Issues event: 1
Watch event: 10
Issue comment event: 1

Dependencies

requirements.txt pypi

datasets ==2.18.0
flash_attn ==2.5.3
huggingface_hub ==0.20.2
llama.egg ==info
matplotlib ==3.7.2
networkx ==3.2.1
networkx_metis ==1.0
numpy ==1.24.3
openai ==0.28.0
pandas ==2.2.1
peft ==0.9.0
pytorch_lightning ==2.2.0.post0
pytorch_partial_crf ==0.2.1
scikit_learn ==1.3.1
scikit_learn ==1.3.0
scipy ==1.12.0
sentence_transformers ==2.2.2
sqlitedict ==2.1.0
torch ==1.13.1
tqdm ==4.65.0
transformers ==4.38.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science