https://github.com/assert-kth/repairllama

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair http://arxiv.org/pdf/2312.15698

https://github.com/assert-kth/repairllama

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    2 of 7 committers (28.6%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary

Keywords

apr codellama llama llms lora repair
Last synced: 5 months ago · JSON representation

Repository

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair http://arxiv.org/pdf/2312.15698

Basic Info
Statistics
  • Stars: 35
  • Watchers: 4
  • Forks: 8
  • Open Issues: 1
  • Releases: 1
Topics
apr codellama llama llms lora repair
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme

README.md

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

If you use RepairLLaMA in academic research, please cite "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair", IEEE Transactions on Software Engineering, 2025.

bibtex @article{repairllama2023, title={RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair}, author={Silva, Andr{\'e} and Fang, Sen and Monperrus, Martin}, journal = {IEEE Transactions on Software Engineering}, doi = {10.1109/TSE.2025.3581062}, url = {http://arxiv.org/abs/2312.15698} }

This repository contains the code, model, and results to replicate the paper "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair"

It is structured as follows: - repairllama-lora contains the RepairLLaMA low-rank adaptation of CodeLLaMA-7B, called "repair adapter" - results contains all generated patches for Defects4J and HumanEval-Java by all models (incl. full fine-tuning, lora, and code representations) - src contains the training and inference scripts, and scripts to generate datasets for different input-output representations (IRxOR) - example contains an example notebook explaining how to load and prompt the RepairLLaMA model - benchmarks contains the datasets for different input-output representations (IRxOR)

Models

All fine-tuned models are available on HuggingFace, here are specific links:

Datasets

The processed fine-tuning datasets are made available on HuggingFace at https://huggingface.co/datasets/ASSERT-KTH/repairllama-datasets. It contains the datasets used for training the RepairLLaMA models, one subset per input/output representation pair. To get the 30k..50k datasets we did further filtering based on the token length of input + output pairs being less than 1024 tokens.

If it interests you, you can also find these on our HuggingFace org: - Megadiff (original dataset, in HF format): https://huggingface.co/datasets/ASSERT-KTH/megadiff - Megadiff Single-Function (single-function diffs only, with buggy and fixed functions extracted from it): https://huggingface.co/datasets/ASSERT-KTH/megadiff-single-function

Benchmarks

The evaluation benchmarks are Defects4J v2, HumanEval-Java, and GitBug-Java.

We focus on single-function bugs (i.e. bugs whose developer patch exclusively changes one function): - Defects4J contains 488 single-function bugs: defects4j_sf.txt - HumanEval-Java contains 162 single-function bugs: humanevaljava_sf.txt - GitBug-Java contains 90 single-functions bugs: gitbugjava_sf.txt

Note that the original HumanEval-Java contains a duplicate bug.

Owner

  • Name: ASSERT
  • Login: ASSERT-KTH
  • Kind: organization
  • Location: Sweden

assertEquals("Research group at KTH Royal Institute of Technology, Stockholm, Sweden", description);

GitHub Events

Total
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 5
  • Push event: 40
  • Pull request event: 2
  • Fork event: 3
  • Create event: 1
Last Year
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 5
  • Push event: 40
  • Pull request event: 2
  • Fork event: 3
  • Create event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 139
  • Total Committers: 7
  • Avg Commits per committer: 19.857
  • Development Distribution Score (DDS): 0.288
Past Year
  • Commits: 52
  • Committers: 5
  • Avg Commits per committer: 10.4
  • Development Distribution Score (DDS): 0.365
Top Committers
Name Email Commits
André Silva a****e@h****m 99
TomasAndersonFang f****6@g****m 14
Kiki1643 s****6@g****m 9
Sen Fang 3****g@u****m 7
Martin Monperrus m****s@g****g 6
TomasAndersonFang s****f@k****e 2
sfang9 s****9@n****u 2
Committer Domains (Top 20 + Academic)

Dependencies

example/requirements.txt pypi
  • accelerate ==0.22.0
  • aiohttp ==3.8.6
  • aiosignal ==1.3.1
  • anyio ==4.0.0
  • appdirs ==1.4.4
  • appnope ==0.1.3
  • argon2-cffi ==23.1.0
  • argon2-cffi-bindings ==21.2.0
  • arrow ==1.3.0
  • asttokens ==2.4.1
  • async-lru ==2.0.4
  • async-timeout ==4.0.3
  • attrs ==23.1.0
  • babel ==2.13.1
  • backcall ==0.2.0
  • backoff ==2.2.1
  • bcrypt ==4.0.1
  • beautifulsoup4 ==4.12.2
  • bitsandbytes ==0.41.1
  • black ==23.10.1
  • bleach ==6.1.0
  • certifi ==2023.7.22
  • cffi ==1.16.0
  • charset-normalizer ==3.3.1
  • click ==8.1.7
  • colorama ==0.4.6
  • comm ==0.1.4
  • cryptography ==41.0.5
  • datasets ==2.14.6
  • debugpy ==1.8.0
  • decorator ==5.1.1
  • defusedxml ==0.7.1
  • dill ==0.3.7
  • docker-pycreds ==0.4.0
  • docstring-parser ==0.15
  • evaluate ==0.4.1
  • exceptiongroup ==1.1.3
  • executing ==2.0.0
  • fastjsonschema ==2.18.1
  • filelock ==3.12.4
  • fire ==0.5.0
  • fqdn ==1.5.1
  • frozenlist ==1.4.0
  • fsspec ==2023.10.0
  • gitdb ==4.0.11
  • gitpython ==3.1.40
  • greenlet ==3.0.1
  • huggingface-hub ==0.17.3
  • idna ==3.4
  • importlib-metadata ==6.8.0
  • ipykernel ==6.26.0
  • ipython ==8.16.1
  • ipython-genutils ==0.2.0
  • ipywidgets ==8.1.1
  • isoduration ==20.11.0
  • javalang ==0.13.0
  • jedi ==0.19.1
  • jinja2 ==3.1.2
  • json5 ==0.9.14
  • jsonpointer ==2.4
  • jsonschema ==4.19.1
  • jsonschema-specifications ==2023.7.1
  • jupyter ==1.0.0
  • jupyter-client ==8.5.0
  • jupyter-console ==6.6.3
  • jupyter-core ==5.4.0
  • jupyter-events ==0.8.0
  • jupyter-lsp ==2.2.0
  • jupyter-server ==2.9.1
  • jupyter-server-terminals ==0.4.4
  • jupyterlab ==4.0.7
  • jupyterlab-pygments ==0.2.2
  • jupyterlab-server ==2.25.0
  • jupyterlab-widgets ==3.0.9
  • markdown-it-py ==3.0.0
  • markupsafe ==2.1.3
  • matplotlib-inline ==0.1.6
  • mdurl ==0.1.2
  • mistune ==3.0.2
  • mpmath ==1.3.0
  • multidict ==6.0.4
  • multiprocess ==0.70.15
  • mypy-extensions ==1.0.0
  • nbclient ==0.8.0
  • nbconvert ==7.9.2
  • nbformat ==5.9.2
  • nest-asyncio ==1.5.8
  • networkx ==3.2
  • ninja ==1.11.1.1
  • notebook ==7.0.6
  • notebook-shim ==0.2.3
  • numpy ==1.26.1
  • nvidia-cublas-cu12 ==12.1.3.1
  • nvidia-cuda-cupti-cu12 ==12.1.105
  • nvidia-cuda-nvrtc-cu12 ==12.1.105
  • nvidia-cuda-runtime-cu12 ==12.1.105
  • nvidia-cudnn-cu12 ==8.9.2.26
  • nvidia-cufft-cu12 ==11.0.2.54
  • nvidia-curand-cu12 ==10.3.2.106
  • nvidia-cusolver-cu12 ==11.4.5.107
  • nvidia-cusparse-cu12 ==12.1.0.106
  • nvidia-nccl-cu12 ==2.19.3
  • nvidia-nvjitlink-cu12 ==12.3.101
  • nvidia-nvtx-cu12 ==12.1.105
  • openai ==0.27.10
  • overrides ==7.4.0
  • packaging ==23.2
  • pandas ==2.1.1
  • pandocfilters ==1.5.0
  • paramiko ==3.3.1
  • parso ==0.8.3
  • pathspec ==0.11.2
  • pathtools ==0.1.2
  • peft ==0.5.0
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • platformdirs ==3.11.0
  • prometheus-client ==0.17.1
  • prompt-toolkit ==3.0.39
  • protobuf ==4.24.4
  • psutil ==5.9.6
  • psycopg ==3.1.12
  • psycopg-binary ==3.1.12
  • psycopg-pool ==3.1.8
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pyarrow ==13.0.0
  • pycparser ==2.21
  • pygments ==2.16.1
  • pynacl ==1.5.0
  • python-dateutil ==2.8.2
  • python-dotenv ==0.21.1
  • python-json-logger ==2.0.7
  • pytz ==2023.3.post1
  • pywin32 ==306
  • pywinpty ==2.0.12
  • pyyaml ==6.0.1
  • pyzmq ==25.1.1
  • qtconsole ==5.4.4
  • qtpy ==2.4.1
  • referencing ==0.30.2
  • regex ==2023.10.3
  • requests ==2.31.0
  • responses ==0.18.0
  • rfc3339-validator ==0.1.4
  • rfc3986-validator ==0.1.1
  • rich ==13.6.0
  • rpds-py ==0.10.6
  • safetensors ==0.3.3
  • scipy ==1.11.3
  • send2trash ==1.8.2
  • sentencepiece ==0.1.99
  • sentry-sdk ==1.32.0
  • setproctitle ==1.3.3
  • setuptools ==68.2.2
  • shtab ==1.6.4
  • six ==1.16.0
  • smmap ==5.0.1
  • sniffio ==1.3.0
  • soupsieve ==2.5
  • sqlalchemy ==2.0.22
  • stack-data ==0.6.3
  • sympy ==1.12
  • termcolor ==2.3.0
  • terminado ==0.17.1
  • tinycss2 ==1.2.1
  • tokenizers ==0.14.1
  • tomli ==2.0.1
  • torch ==2.2.1
  • tornado ==6.3.3
  • tqdm ==4.66.1
  • traitlets ==5.12.0
  • transformers ==4.34.1
  • triton ==2.2.0
  • trl ==0.7.2
  • types-python-dateutil ==2.8.19.14
  • typing-extensions ==4.8.0
  • tyro ==0.5.10
  • tzdata ==2023.3
  • unidiff ==0.7.5
  • uri-template ==1.3.0
  • urllib3 ==2.0.7
  • wandb ==0.15.12
  • wcwidth ==0.2.8
  • webcolors ==1.13
  • webencodings ==0.5.1
  • websocket-client ==1.6.4
  • whatthepatch ==1.0.5
  • widgetsnbextension ==4.0.9
  • xxhash ==3.4.1
  • yarl ==1.9.2
  • zipp ==3.17.0
src/patch_analysis/requirements.txt pypi
  • fire ==0.6.0
  • pygments ==2.17.2
  • tqdm ==4.66.2