repilot
Repilot, a patch generation tool introduced in the ESEC/FSE'23 paper "Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair"
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Keywords
Repository
Repilot, a patch generation tool introduced in the ESEC/FSE'23 paper "Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair"
Basic Info
Statistics
- Stars: 132
- Watchers: 4
- Forks: 12
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README-Artifact.md
Artifact Documentation for ⚙️$\mathbb{R}\mathrm{e}\mathbf{pilot}$🛠️
Welcome to the artifact repository for Repilot, a patch generation tool introduced in the ESEC/FSE'23 paper "Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair"!
[!IMPORTANT]
Environment requirements
- OS: A Linux system with Docker support.
- Optional: NVIDIA Docker support.
- Hardware: X86/X64 CPU; 32GB RAM; 1TB Storage; Good Network to Docker Hub.
- Optional (a): NVIDIA GPU(s) with >6G memory (for CodeT5 patch generation)
- Optional (b): NVIDIA GPU(s) with >30G memory (for Incoder-6.7B patch generation)
Although it is recommended to run the artifact with NVIDIA GPUs for faster patch generation, it is not a requirement. When there is no GPU available, the CPU will be responsible for the patch generation. In this artifact documentation, we only explain the CPU-only Docker-based pipeline for conciseness. We encourage advanced readers who want to run the artifact with GPU support to check the documentation of NVIDIA Docker.
Before we start
Before we start, let's first make sure Docker is installed: Installation Guide.
To check the installation:
```bash docker --version # Test docker availability
Docker version 20.10.21, build 20.10.21-0ubuntu1~20.04.2
```
Now we'll fetch the Docker image of Repilot that includes the implementation of the Algorithm, the Completion Engine, and all the dependencies needed:
```bash
Recommended: pull the image from Docker Hub
docker pull universefly/repilot:fse23
Alternatively, download the image file repilot-docker-image-fse23.tar.gz from https://doi.org/10.5281/zenodo.8280747
Then load this image
docker load --input repilot-docker-image-fse23.tar.gz
Run the docker image
docker run -it --name repilot universefly/repilot:fse23
Now you will get into a "virtual environment" provided by Docker
Enter the repilot directory
cd /root/Repilot echo "Hello Repilot!" ```
Congratulations! We are now ready for the artifact evaluation.
Whet your appetite
Let's run some example scripts to see how Repilot works.
```bash
The full repilot approach with CodeT5 as the base model
Generate 5 patches for Chart-9 and save to chart-9-repilot
ACTIVE=1 python -m repilot.cli.main repair -b "Chart-9" --method pruned-mem -d chart-9-repilot -n 5
You will see logs about the patch generation and which tokens are accepted/rejected.
Validate the patch generation
python -m repilot.cli.main validate -d chart-9-repilot
Print a table of the evaluation results
python -m repilot.cli.main evaluate -d chart-9-repilot ```
If everything works correctly, you will see a similar output table as follows:
root@1d7fea7789ed:/repilot# python -m repilot.cli.main evaluate -d chart-9-repilot
[chart-9-repilot] Loading raw generation data...
Done
[chart-9-repilot] Loading transformed raw generation data...
Done
[chart-9-repilot] Loading validation raw data...
Done
Repilot Evaluation Results
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Tag ┃ Average Gen Time ┃ %Compilable Patches ┃ %Plausible Patches ┃ #Plausible Fixes ┃ #Correct Fixes ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ chart-9-repilot │ 1.33s │ 100.0% │ 0.000% │ 0 │ - │
└─────────────────┴──────────────────┴─────────────────────┴────────────────────┴──────────────────┴────────────────┘
Reproduce RQ Evaluation
We will now show how each RQ can be reproduced through the artifact by applying Repilot evaluation script on pre-generated patches.
[!WARNING] We also provide documentation to reproduce the entire patch generation in different RQs, but it is not recommended for the readers to go through the entire process as it may take days or weeks to finish.
RQ1: Comparison with existing tools
We will now reproduce Table 1, Figure 6, and the number of bugs fixed by removing the bugs that overlap with the CodeT5 training data, which is shown in Section 8 THREATS TO VALIDITY.
bash
python -m repilot.cli.rq1
You will see two tables printed in the console, where the first table corresponds to Table 1 and the second table corresponds to following the sentence in Section 8:
For comparison fairness, if we were to exclude these 7 and 6 bugs and compare them with the previous baseline tools on the remaining bugs, we are still able to achieve the highest bug fixes at 59 and 44 (best baseline at 45 and 29)
The detailed correct patches can be found through the following links: - Defects4j 1.2 correct patches - Defects4j 2.0 correct patches
Also the two venn diagrams shown in Figure 6 are saved in the plots directory. To check the plots, you may need to temporarily exit the Docker container and save the plots to your local machine:
```bash
Exit the docker container with e.g., Ctrl-D
Save the plots to your local machine
sudo docker cp repilot:/root/Repilot/plots /path/to/your/local/directory
Now you can open the plots with your favorite image viewer
Return to the docker container
docker start -ai repilot
Return to the repilot directory
cd /root/Repilot ```
RQ2: Compilation rate analysis
We will now reproduce Table 2. This script may take longer to run as it needs to iterate through 5000 generated patches per bug. We also compressed the patches beforehand due to the large size. Therefore, let's first decompress the patches:
bash
tar -xvf data/large.tar.xz
Then we can run the command for RQ2:
bash
python -m repilot.cli.rq2
This command will print a table in the console, which corresponds to Table 2.
RQ3: Component contribution
We now reproduce Table 3.
bash
python -m repilot.cli.rq3
The detailed correct patches can be found through the following links: - [Vanilla] correct patches - [NoMem] correct patches - [Mem] correct patches - [Repilot] correct patches
RQ4: Generalizability
This script will reproduce Table 4.
bash
python -m repilot.cli.rq4
The detailed correct patches can be found through the following links: - CodeT5/D4J1.2 vanilla - CodeT5/D4J1.2 repilot - CodeT5/D4J2.0 vanilla - CodeT5/D4J2.0 repilot - Incoder/D4J1.2 vanilla - Incoder/D4J1.2 repilot - Incoder/D4J2.0 vanilla - Incoder/D4J2.0 repilot
🎉🎉🎉 Congratulations! You have successfully reproduced all the results in the paper! 🎉🎉🎉
Reproduce patch generation
[!WARNING] ⚠️⚠️⚠️ This section is mainly for advanced readers who have time to reproduce the entire patch generation process. These commands may take days or weeks to finish. Also, the generation time may vary significantly depending on the hardware used for patch generation. ⚠️⚠️⚠️
RQ1
We generate Defects4j 1.2 single-hunk bugs and 2.0 single-line bugs with the help of repair templates. This is achieved through the following command:
bash
D4J1_SINGLE_HUNK=1 ACTIVE=1 TEMPLATE=1 python -m repilot.cli.main repair -b ".*" --method pruned-mem -n 5000 -d rq1-d4j1
D4J2_SINGLE_LINE=1 ACTIVE=1 TEMPLATE=1 python -m repilot.cli.main repair -b ".*" --method pruned-mem -n 5000 -d rq1-d4j2
RQ2
RQ2 is based on RQ1's generated patches, so we don't need to run any additional commands.
RQ3
In RQ3, we generate 500 patches for each bug with 4 different configurations, using the following commands:
```bash
Vanilla
D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method plain -n 500 -d rq3-vanilla D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method pruned-nomem -n 500 -d rq3-nomem D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method pruned-mem -n 500 -d rq3-mem ACTIVE=1 D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method pruned-mem -n 500 -d rq3-repilot ```
RQ4
We further include Incoder-6.7B as the base model to generate patches for RQ4.
```bash
The first two configurations are the same as RQ3
D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b ".*" --method plain -n 500 -d rq3-vanilla
ACTIVE=1 D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b ".*" --method plain -n 500 -d rq3-repilot
D4J2SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method plain -n 500 -d rq4-codet5-d4j2-vanilla ACTIVE=1 D4J2SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method pruned-mem -n 500 -d rq4-codet5-d4j2-repilot
INCODER=1 D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method plain -n 500 -d rq4-incoder-d4j1-vanilla INCODER=1 ACTIVE=1 D4J1SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method pruned-mem -n 500 -d rq4-incoder-d4j1-repilot
INCODER=1 D4J2SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method plain -n 500 -d rq4-incoder-d4j2-vanilla INCODER=1 ACTIVE=1 D4J2SINGLEHUNK=1 python -m repilot.cli.main repair -b "." --method pruned-mem -n 500 -d rq4-incoder-d4j2-repilot ```
Owner
- Name: iSE-UIUC
- Login: ise-uiuc
- Kind: organization
- Repositories: 14
- Profile: https://github.com/ise-uiuc
Citation (CITATION.bib)
@misc{wei2023copiloting,
title={Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair},
author={Yuxiang Wei and Chunqiu Steven Xia and Lingming Zhang},
year={2023},
eprint={2309.00608},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
GitHub Events
Total
- Watch event: 6
- Fork event: 3
Last Year
- Watch event: 6
- Fork event: 3
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- psf/black stable composite