https://github.com/assert-kth/repairllama
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair http://arxiv.org/pdf/2312.15698
Science Score: 46.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
2 of 7 committers (28.6%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Keywords
Repository
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair http://arxiv.org/pdf/2312.15698
Basic Info
- Host: GitHub
- Owner: ASSERT-KTH
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: http://arxiv.org/abs/2312.15698
- Size: 144 MB
Statistics
- Stars: 35
- Watchers: 4
- Forks: 8
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair
If you use RepairLLaMA in academic research, please cite "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair", IEEE Transactions on Software Engineering, 2025.
bibtex
@article{repairllama2023,
title={RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair},
author={Silva, Andr{\'e} and Fang, Sen and Monperrus, Martin},
journal = {IEEE Transactions on Software Engineering},
doi = {10.1109/TSE.2025.3581062},
url = {http://arxiv.org/abs/2312.15698}
}
This repository contains the code, model, and results to replicate the paper "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair"
It is structured as follows: - repairllama-lora contains the RepairLLaMA low-rank adaptation of CodeLLaMA-7B, called "repair adapter" - results contains all generated patches for Defects4J and HumanEval-Java by all models (incl. full fine-tuning, lora, and code representations) - src contains the training and inference scripts, and scripts to generate datasets for different input-output representations (IRxOR) - example contains an example notebook explaining how to load and prompt the RepairLLaMA model - benchmarks contains the datasets for different input-output representations (IRxOR)
Models
All fine-tuned models are available on HuggingFace, here are specific links:
- IR1xOR1: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR1-OR1
- IR1xOR3: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR1-OR3
- IR1xOR3: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR1-OR4
- IR2xOR2: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR2-OR2
- IR3xOR2: https://huggingface.co/ASSERT-KTH/RepairLLaMA-IR3-OR2
Datasets
The processed fine-tuning datasets are made available on HuggingFace at https://huggingface.co/datasets/ASSERT-KTH/repairllama-datasets. It contains the datasets used for training the RepairLLaMA models, one subset per input/output representation pair. To get the 30k..50k datasets we did further filtering based on the token length of input + output pairs being less than 1024 tokens.
If it interests you, you can also find these on our HuggingFace org: - Megadiff (original dataset, in HF format): https://huggingface.co/datasets/ASSERT-KTH/megadiff - Megadiff Single-Function (single-function diffs only, with buggy and fixed functions extracted from it): https://huggingface.co/datasets/ASSERT-KTH/megadiff-single-function
Benchmarks
The evaluation benchmarks are Defects4J v2, HumanEval-Java, and GitBug-Java.
We focus on single-function bugs (i.e. bugs whose developer patch exclusively changes one function): - Defects4J contains 488 single-function bugs: defects4j_sf.txt - HumanEval-Java contains 162 single-function bugs: humanevaljava_sf.txt - GitBug-Java contains 90 single-functions bugs: gitbugjava_sf.txt
Note that the original HumanEval-Java contains a duplicate bug.
Owner
- Name: ASSERT
- Login: ASSERT-KTH
- Kind: organization
- Location: Sweden
- Website: https://github.com/ASSERT-KTH/
- Repositories: 87
- Profile: https://github.com/ASSERT-KTH
assertEquals("Research group at KTH Royal Institute of Technology, Stockholm, Sweden", description);
GitHub Events
Total
- Issues event: 3
- Watch event: 7
- Issue comment event: 5
- Push event: 40
- Pull request event: 2
- Fork event: 3
- Create event: 1
Last Year
- Issues event: 3
- Watch event: 7
- Issue comment event: 5
- Push event: 40
- Pull request event: 2
- Fork event: 3
- Create event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| André Silva | a****e@h****m | 99 |
| TomasAndersonFang | f****6@g****m | 14 |
| Kiki1643 | s****6@g****m | 9 |
| Sen Fang | 3****g@u****m | 7 |
| Martin Monperrus | m****s@g****g | 6 |
| TomasAndersonFang | s****f@k****e | 2 |
| sfang9 | s****9@n****u | 2 |
Dependencies
- accelerate ==0.22.0
- aiohttp ==3.8.6
- aiosignal ==1.3.1
- anyio ==4.0.0
- appdirs ==1.4.4
- appnope ==0.1.3
- argon2-cffi ==23.1.0
- argon2-cffi-bindings ==21.2.0
- arrow ==1.3.0
- asttokens ==2.4.1
- async-lru ==2.0.4
- async-timeout ==4.0.3
- attrs ==23.1.0
- babel ==2.13.1
- backcall ==0.2.0
- backoff ==2.2.1
- bcrypt ==4.0.1
- beautifulsoup4 ==4.12.2
- bitsandbytes ==0.41.1
- black ==23.10.1
- bleach ==6.1.0
- certifi ==2023.7.22
- cffi ==1.16.0
- charset-normalizer ==3.3.1
- click ==8.1.7
- colorama ==0.4.6
- comm ==0.1.4
- cryptography ==41.0.5
- datasets ==2.14.6
- debugpy ==1.8.0
- decorator ==5.1.1
- defusedxml ==0.7.1
- dill ==0.3.7
- docker-pycreds ==0.4.0
- docstring-parser ==0.15
- evaluate ==0.4.1
- exceptiongroup ==1.1.3
- executing ==2.0.0
- fastjsonschema ==2.18.1
- filelock ==3.12.4
- fire ==0.5.0
- fqdn ==1.5.1
- frozenlist ==1.4.0
- fsspec ==2023.10.0
- gitdb ==4.0.11
- gitpython ==3.1.40
- greenlet ==3.0.1
- huggingface-hub ==0.17.3
- idna ==3.4
- importlib-metadata ==6.8.0
- ipykernel ==6.26.0
- ipython ==8.16.1
- ipython-genutils ==0.2.0
- ipywidgets ==8.1.1
- isoduration ==20.11.0
- javalang ==0.13.0
- jedi ==0.19.1
- jinja2 ==3.1.2
- json5 ==0.9.14
- jsonpointer ==2.4
- jsonschema ==4.19.1
- jsonschema-specifications ==2023.7.1
- jupyter ==1.0.0
- jupyter-client ==8.5.0
- jupyter-console ==6.6.3
- jupyter-core ==5.4.0
- jupyter-events ==0.8.0
- jupyter-lsp ==2.2.0
- jupyter-server ==2.9.1
- jupyter-server-terminals ==0.4.4
- jupyterlab ==4.0.7
- jupyterlab-pygments ==0.2.2
- jupyterlab-server ==2.25.0
- jupyterlab-widgets ==3.0.9
- markdown-it-py ==3.0.0
- markupsafe ==2.1.3
- matplotlib-inline ==0.1.6
- mdurl ==0.1.2
- mistune ==3.0.2
- mpmath ==1.3.0
- multidict ==6.0.4
- multiprocess ==0.70.15
- mypy-extensions ==1.0.0
- nbclient ==0.8.0
- nbconvert ==7.9.2
- nbformat ==5.9.2
- nest-asyncio ==1.5.8
- networkx ==3.2
- ninja ==1.11.1.1
- notebook ==7.0.6
- notebook-shim ==0.2.3
- numpy ==1.26.1
- nvidia-cublas-cu12 ==12.1.3.1
- nvidia-cuda-cupti-cu12 ==12.1.105
- nvidia-cuda-nvrtc-cu12 ==12.1.105
- nvidia-cuda-runtime-cu12 ==12.1.105
- nvidia-cudnn-cu12 ==8.9.2.26
- nvidia-cufft-cu12 ==11.0.2.54
- nvidia-curand-cu12 ==10.3.2.106
- nvidia-cusolver-cu12 ==11.4.5.107
- nvidia-cusparse-cu12 ==12.1.0.106
- nvidia-nccl-cu12 ==2.19.3
- nvidia-nvjitlink-cu12 ==12.3.101
- nvidia-nvtx-cu12 ==12.1.105
- openai ==0.27.10
- overrides ==7.4.0
- packaging ==23.2
- pandas ==2.1.1
- pandocfilters ==1.5.0
- paramiko ==3.3.1
- parso ==0.8.3
- pathspec ==0.11.2
- pathtools ==0.1.2
- peft ==0.5.0
- pexpect ==4.8.0
- pickleshare ==0.7.5
- platformdirs ==3.11.0
- prometheus-client ==0.17.1
- prompt-toolkit ==3.0.39
- protobuf ==4.24.4
- psutil ==5.9.6
- psycopg ==3.1.12
- psycopg-binary ==3.1.12
- psycopg-pool ==3.1.8
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- pyarrow ==13.0.0
- pycparser ==2.21
- pygments ==2.16.1
- pynacl ==1.5.0
- python-dateutil ==2.8.2
- python-dotenv ==0.21.1
- python-json-logger ==2.0.7
- pytz ==2023.3.post1
- pywin32 ==306
- pywinpty ==2.0.12
- pyyaml ==6.0.1
- pyzmq ==25.1.1
- qtconsole ==5.4.4
- qtpy ==2.4.1
- referencing ==0.30.2
- regex ==2023.10.3
- requests ==2.31.0
- responses ==0.18.0
- rfc3339-validator ==0.1.4
- rfc3986-validator ==0.1.1
- rich ==13.6.0
- rpds-py ==0.10.6
- safetensors ==0.3.3
- scipy ==1.11.3
- send2trash ==1.8.2
- sentencepiece ==0.1.99
- sentry-sdk ==1.32.0
- setproctitle ==1.3.3
- setuptools ==68.2.2
- shtab ==1.6.4
- six ==1.16.0
- smmap ==5.0.1
- sniffio ==1.3.0
- soupsieve ==2.5
- sqlalchemy ==2.0.22
- stack-data ==0.6.3
- sympy ==1.12
- termcolor ==2.3.0
- terminado ==0.17.1
- tinycss2 ==1.2.1
- tokenizers ==0.14.1
- tomli ==2.0.1
- torch ==2.2.1
- tornado ==6.3.3
- tqdm ==4.66.1
- traitlets ==5.12.0
- transformers ==4.34.1
- triton ==2.2.0
- trl ==0.7.2
- types-python-dateutil ==2.8.19.14
- typing-extensions ==4.8.0
- tyro ==0.5.10
- tzdata ==2023.3
- unidiff ==0.7.5
- uri-template ==1.3.0
- urllib3 ==2.0.7
- wandb ==0.15.12
- wcwidth ==0.2.8
- webcolors ==1.13
- webencodings ==0.5.1
- websocket-client ==1.6.4
- whatthepatch ==1.0.5
- widgetsnbextension ==4.0.9
- xxhash ==3.4.1
- yarl ==1.9.2
- zipp ==3.17.0
- fire ==0.6.0
- pygments ==2.17.2
- tqdm ==4.66.2