specificityplus

👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"

https://github.com/apartresearch/specificityplus

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
2 of 11 committers (18.2%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary

Keywords

benchmarking llm

Last synced: 6 months ago · JSON representation

Repository

👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"

Basic Info

Host: GitHub
Owner: apartresearch
License: other
Language: Python
Default Branch: main
Homepage: https://specificityplus.apartresearch.com
Size: 71.6 MB

Statistics

Stars: 20
Watchers: 2
Forks: 4
Open Issues: 2
Releases: 0

Topics

benchmarking llm

Created about 3 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (website)

This repository contains the code for the paper Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (ACL Findings 2023).

It extends previous work on model editing by Meng et al. [1] by introducing a new benchmark, called CounterFact+, for measuring the specificity of model edits.

Attribution

The repository is a fork of MEMIT, which implements the model editing algorithms MEMIT (Mass Editing Memory in a Transformer) and ROME (Rank-One Model Editing). Our fork extends this code by additional evaluation scripts implementing the CounterFact+ benchmark. For installation instructions see the original repository.

Installation

We recommend conda for managing Python, CUDA, and PyTorch; pip is for everything else. To get started, simply install conda and run:

bash CONDA_HOME=$CONDA_HOME ./scripts/setup_conda.sh

$CONDA_HOME should be the path to your conda installation, e.g., ~/miniconda3.

Running Experiments

See INSTRUCTIONS.md for instructions on how to run the experiments and evaluations.

How to Cite

If you find our paper useful, please consider citing as:

```bibtex

@inproceedings{jason2023detecting, title = {Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark}, author = {Hoelscher-Obermaier, Jason and Persson, Julia and Kran, Esben and Konstas, Ionnis and Barez, Fazl}, booktitle = {Findings of ACL}, year = {2023}, organization = {Association for Computational Linguistics} }

Owner

Name: apartresearch
Login: apartresearch
Kind: organization
Email: operations@apartresearch.com

Website: https://apartresearch.com
Twitter: apartresearch
Repositories: 5
Profile: https://github.com/apartresearch

GitHub Events

Total

Last Year

Committers

Last synced: about 2 years ago

All Time

Total Commits: 454
Total Committers: 11
Avg Commits per committer: 41.273
Development Distribution Score (DDS): 0.467

Past Year

Commits: 197
Committers: 7
Avg Commits per committer: 28.143
Development Distribution Score (DDS): 0.635

Top Committers

Name	Email	Commits
JuliaHPersson	j**n@g**m	242
Jason Hoelscher-Obermaier	j**r@g**m	104
-	-	55
Esben Kran	e**n@k**i	33
Kevin Meng	k**1@g**m	6
Fazl Barez	s**9@l**k	3
Jason Hoelscher-Obermaier	j****o	3
fbarez	3****z	3
David Bau	d**u@g**m	2
Julia Persson	5****n	2
Fazl Barez	s**9@u**k	1

Committer Domains (Top 20 + Academic)

uhtred.inf.ed.ac.uk: 1 landonia01.inf.ed.ac.uk: 1 kran.ai: 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 24
Total pull requests: 39
Average time to close issues: 20 days
Average time to close pull requests: 2 days
Total issue authors: 3
Total pull request authors: 4
Average comments per issue: 0.92
Average comments per pull request: 0.33
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 11
Pull requests: 27
Average time to close issues: about 1 month
Average time to close pull requests: about 5 hours
Issue authors: 3
Pull request authors: 4
Average comments per issue: 1.0
Average comments per pull request: 0.37
Merged pull requests: 26
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jas-ho (20)
fbarez (3)
esbenkc (1)

Pull Request Authors

jas-ho (25)
JuliaHPersson (7)
fbarez (5)
esbenkc (3)

Top Labels

Issue Labels

enhancement (1)

Pull Request Labels

enhancement (1)

Dependencies

baselines/mend/requirements.txt pypi

allennlp *
click ==7.1.2
datasets *
hydra-core *
jsonlines *
numpy *
spacy *
torch *
wandb *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

specificityplus

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (website)

Attribution

Installation

Running Experiments

How to Cite

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies