https://github.com/assert-kth/repairllama

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair http://arxiv.org/pdf/2312.15698

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
2 of 7 committers (28.6%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary

Keywords

apr codellama llama llms lora repair

Last synced: 5 months ago · JSON representation

Repository

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair http://arxiv.org/pdf/2312.15698

Basic Info

Host: GitHub
Owner: ASSERT-KTH
Language: Jupyter Notebook
Default Branch: main
Homepage: http://arxiv.org/abs/2312.15698
Size: 144 MB

Statistics

Stars: 35
Watchers: 4
Forks: 8
Open Issues: 1
Releases: 1

Topics

apr codellama llama llms lora repair

Created over 2 years ago · Last pushed 6 months ago

Metadata Files

Readme

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

If you use RepairLLaMA in academic research, please cite "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair", IEEE Transactions on Software Engineering, 2025.

bibtex @article{repairllama2023, title={RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair}, author={Silva, Andr{\'e} and Fang, Sen and Monperrus, Martin}, journal = {IEEE Transactions on Software Engineering}, doi = {10.1109/TSE.2025.3581062}, url = {http://arxiv.org/abs/2312.15698} }

This repository contains the code, model, and results to replicate the paper "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair"

It is structured as follows: - repairllama-lora contains the RepairLLaMA low-rank adaptation of CodeLLaMA-7B, called "repair adapter" - results contains all generated patches for Defects4J and HumanEval-Java by all models (incl. full fine-tuning, lora, and code representations) - src contains the training and inference scripts, and scripts to generate datasets for different input-output representations (IRxOR) - example contains an example notebook explaining how to load and prompt the RepairLLaMA model - benchmarks contains the datasets for different input-output representations (IRxOR)

Models

All fine-tuned models are available on HuggingFace, here are specific links:

IR1xOR1: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR1-OR1
IR1xOR3: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR1-OR3
IR1xOR3: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR1-OR4
IR2xOR2: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR2-OR2
IR3xOR2: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR3-OR2

Datasets

The processed fine-tuning datasets are made available on HuggingFace at https://huggingface.co/datasets/ASSERT-KTH/repairllama-datasets. It contains the datasets used for training the RepairLLaMA models, one subset per input/output representation pair. To get the 30k..50k datasets we did further filtering based on the token length of input + output pairs being less than 1024 tokens.

If it interests you, you can also find these on our HuggingFace org: - Megadiff (original dataset, in HF format): https://huggingface.co/datasets/ASSERT-KTH/megadiff - Megadiff Single-Function (single-function diffs only, with buggy and fixed functions extracted from it): https://huggingface.co/datasets/ASSERT-KTH/megadiff-single-function

Benchmarks

The evaluation benchmarks are Defects4J v2, HumanEval-Java, and GitBug-Java.

We focus on single-function bugs (i.e. bugs whose developer patch exclusively changes one function): - Defects4J contains 488 single-function bugs: defects4j_sf.txt - HumanEval-Java contains 162 single-function bugs: humanevaljava_sf.txt - GitBug-Java contains 90 single-functions bugs: gitbugjava_sf.txt

Note that the original HumanEval-Java contains a duplicate bug.

Owner

Name: ASSERT
Login: ASSERT-KTH
Kind: organization
Location: Sweden

Website: https://github.com/ASSERT-KTH/
Repositories: 87
Profile: https://github.com/ASSERT-KTH

assertEquals("Research group at KTH Royal Institute of Technology, Stockholm, Sweden", description);

GitHub Events

Total

Issues event: 3
Watch event: 7
Issue comment event: 5
Push event: 40
Pull request event: 2
Fork event: 3
Create event: 1

Last Year

Issues event: 3
Watch event: 7
Issue comment event: 5
Push event: 40
Pull request event: 2
Fork event: 3
Create event: 1

Committers

Last synced: 5 months ago

All Time

Total Commits: 139
Total Committers: 7
Avg Commits per committer: 19.857
Development Distribution Score (DDS): 0.288

Past Year

Commits: 52
Committers: 5
Avg Commits per committer: 10.4
Development Distribution Score (DDS): 0.365

Top Committers

Name	Email	Commits
André Silva	a**e@h**m	99
TomasAndersonFang	f**6@g**m	14
Kiki1643	s**6@g**m	9
Sen Fang	3**g@u**m	7
Martin Monperrus	m**s@g**g	6
TomasAndersonFang	s**f@k**e	2
sfang9	s**9@n**u	2

Committer Domains (Top 20 + Academic)

ncsu.edu: 1 kth.se: 1 gnieh.org: 1

Dependencies

example/requirements.txt pypi

accelerate ==0.22.0
aiohttp ==3.8.6
aiosignal ==1.3.1
anyio ==4.0.0
appdirs ==1.4.4
appnope ==0.1.3
argon2-cffi ==23.1.0
argon2-cffi-bindings ==21.2.0
arrow ==1.3.0
asttokens ==2.4.1
async-lru ==2.0.4
async-timeout ==4.0.3
attrs ==23.1.0
babel ==2.13.1
backcall ==0.2.0
backoff ==2.2.1
bcrypt ==4.0.1
beautifulsoup4 ==4.12.2
bitsandbytes ==0.41.1
black ==23.10.1
bleach ==6.1.0
certifi ==2023.7.22
cffi ==1.16.0
charset-normalizer ==3.3.1
click ==8.1.7
colorama ==0.4.6
comm ==0.1.4
cryptography ==41.0.5
datasets ==2.14.6
debugpy ==1.8.0
decorator ==5.1.1
defusedxml ==0.7.1
dill ==0.3.7
docker-pycreds ==0.4.0
docstring-parser ==0.15
evaluate ==0.4.1
exceptiongroup ==1.1.3
executing ==2.0.0
fastjsonschema ==2.18.1
filelock ==3.12.4
fire ==0.5.0
fqdn ==1.5.1
frozenlist ==1.4.0
fsspec ==2023.10.0
gitdb ==4.0.11
gitpython ==3.1.40
greenlet ==3.0.1
huggingface-hub ==0.17.3
idna ==3.4
importlib-metadata ==6.8.0
ipykernel ==6.26.0
ipython ==8.16.1
ipython-genutils ==0.2.0
ipywidgets ==8.1.1
isoduration ==20.11.0
javalang ==0.13.0
jedi ==0.19.1
jinja2 ==3.1.2
json5 ==0.9.14
jsonpointer ==2.4
jsonschema ==4.19.1
jsonschema-specifications ==2023.7.1
jupyter ==1.0.0
jupyter-client ==8.5.0
jupyter-console ==6.6.3
jupyter-core ==5.4.0
jupyter-events ==0.8.0
jupyter-lsp ==2.2.0
jupyter-server ==2.9.1
jupyter-server-terminals ==0.4.4
jupyterlab ==4.0.7
jupyterlab-pygments ==0.2.2
jupyterlab-server ==2.25.0
jupyterlab-widgets ==3.0.9
markdown-it-py ==3.0.0
markupsafe ==2.1.3
matplotlib-inline ==0.1.6
mdurl ==0.1.2
mistune ==3.0.2
mpmath ==1.3.0
multidict ==6.0.4
multiprocess ==0.70.15
mypy-extensions ==1.0.0
nbclient ==0.8.0
nbconvert ==7.9.2
nbformat ==5.9.2
nest-asyncio ==1.5.8
networkx ==3.2
ninja ==1.11.1.1
notebook ==7.0.6
notebook-shim ==0.2.3
numpy ==1.26.1
nvidia-cublas-cu12 ==12.1.3.1
nvidia-cuda-cupti-cu12 ==12.1.105
nvidia-cuda-nvrtc-cu12 ==12.1.105
nvidia-cuda-runtime-cu12 ==12.1.105
nvidia-cudnn-cu12 ==8.9.2.26
nvidia-cufft-cu12 ==11.0.2.54
nvidia-curand-cu12 ==10.3.2.106
nvidia-cusolver-cu12 ==11.4.5.107
nvidia-cusparse-cu12 ==12.1.0.106
nvidia-nccl-cu12 ==2.19.3
nvidia-nvjitlink-cu12 ==12.3.101
nvidia-nvtx-cu12 ==12.1.105
openai ==0.27.10
overrides ==7.4.0
packaging ==23.2
pandas ==2.1.1
pandocfilters ==1.5.0
paramiko ==3.3.1
parso ==0.8.3
pathspec ==0.11.2
pathtools ==0.1.2
peft ==0.5.0
pexpect ==4.8.0
pickleshare ==0.7.5
platformdirs ==3.11.0
prometheus-client ==0.17.1
prompt-toolkit ==3.0.39
protobuf ==4.24.4
psutil ==5.9.6
psycopg ==3.1.12
psycopg-binary ==3.1.12
psycopg-pool ==3.1.8
ptyprocess ==0.7.0
pure-eval ==0.2.2
pyarrow ==13.0.0
pycparser ==2.21
pygments ==2.16.1
pynacl ==1.5.0
python-dateutil ==2.8.2
python-dotenv ==0.21.1
python-json-logger ==2.0.7
pytz ==2023.3.post1
pywin32 ==306
pywinpty ==2.0.12
pyyaml ==6.0.1
pyzmq ==25.1.1
qtconsole ==5.4.4
qtpy ==2.4.1
referencing ==0.30.2
regex ==2023.10.3
requests ==2.31.0
responses ==0.18.0
rfc3339-validator ==0.1.4
rfc3986-validator ==0.1.1
rich ==13.6.0
rpds-py ==0.10.6
safetensors ==0.3.3
scipy ==1.11.3
send2trash ==1.8.2
sentencepiece ==0.1.99
sentry-sdk ==1.32.0
setproctitle ==1.3.3
setuptools ==68.2.2
shtab ==1.6.4
six ==1.16.0
smmap ==5.0.1
sniffio ==1.3.0
soupsieve ==2.5
sqlalchemy ==2.0.22
stack-data ==0.6.3
sympy ==1.12
termcolor ==2.3.0
terminado ==0.17.1
tinycss2 ==1.2.1
tokenizers ==0.14.1
tomli ==2.0.1
torch ==2.2.1
tornado ==6.3.3
tqdm ==4.66.1
traitlets ==5.12.0
transformers ==4.34.1
triton ==2.2.0
trl ==0.7.2
types-python-dateutil ==2.8.19.14
typing-extensions ==4.8.0
tyro ==0.5.10
tzdata ==2023.3
unidiff ==0.7.5
uri-template ==1.3.0
urllib3 ==2.0.7
wandb ==0.15.12
wcwidth ==0.2.8
webcolors ==1.13
webencodings ==0.5.1
websocket-client ==1.6.4
whatthepatch ==1.0.5
widgetsnbextension ==4.0.9
xxhash ==3.4.1
yarl ==1.9.2
zipp ==3.17.0

src/patch_analysis/requirements.txt pypi

fire ==0.6.0
pygments ==2.17.2
tqdm ==4.66.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/assert-kth/repairllama

Science Score: 46.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

Models

Datasets

Benchmarks

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Dependencies