https://github.com/assert-kth/coderepairrl

Reinforcement Fine-Tuning

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Keywords

code-repair coding grpo llm rl

Last synced: 5 months ago · JSON representation

Repository

Reinforcement Fine-Tuning

Basic Info

Host: GitHub
Owner: ASSERT-KTH
Language: Python
Default Branch: master
Homepage:
Size: 1.43 MB

Statistics

Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Topics

code-repair coding grpo llm rl

Created about 1 year ago · Last pushed 6 months ago

Metadata Files

Readme

CodeRepairRL - Reinforcement Learning for Program Repair

Overview

CodeRepairRL leverages recent advancements in applying Reinforcement Learning (RL) to Large Language Models (LLMs) to fine-tune them for domain-specific tasks. Our ultimate goal is to develop models similar to RepairLLama and Llama-3-SWE-RL, which "punch above their weight-class" in terms of parameter count, demonstrating exceptional performance in software engineering benchmarks.

The project uses a two-stage training approach: 1. Supervised Fine-Tuning (SFT): Initial fine-tuning on high-quality code repair demonstrations 2. Group Relative Policy Optimization (GRPO): Reinforcement learning to further improve performance on specific tasks

For more details on the project's objectives, conceptual background, and implementation specifics, see docs/PROJECT.md.

Academic Paper

The methodology and findings of this project are documented in an academic paper. The LaTeX repository for the paper is available at CodeRepairRL-Paper.

Getting Started

Building the Container

To build the Apptainer container:

```bash

Build the training container

apptainer build crrl.sif scripts/train_container.def ```

(the build process may take several minutes)

Reproducing on different compute setups

Using our Apptainer/SLURM setup

Before launching jobs, you should set CRRL_WORKDIR in your environment. Otherwise large files like model weights are downloaded to your $HOME/.cache:

```bash

Choose your working directory (pick a location with plenty of fast storage)

export CRRLWORKDIR="/path/to/your/crrlworkspace" ```

Then follow the container build and SLURM job submission steps above. This ensures that large model files and datasets are stored in a location with sufficient space rather than your home directory.

Alternative: Local reproduction with uv

If you do not have Apptainer/SLURM or want to reproduce runs locally, you can use uv. Below are self-contained bash snippets.

1) Install uv

bash curl -LsSf https://astral.sh/uv/install.sh | sh

2) Create the environment and install dependencies

```bash

Install project dependencies (creates/uses a virtualenv automatically)

uv sync --extra vllm --extra flash ```

3.) Exact 14B GRPO reproduction (3x ≥80GB GPUs) — run in two terminals

Requires 3 GPUs with at least 80 GB VRAM each (e.g., A100 80GB/H100 80GB)
Terminal 1 runs the vLLM server on GPU 0; Terminal 2 runs training on GPUs 1–2

Terminal 1 (vLLM server on GPU 0):

```bash CUDAVISIBLEDEVICES=0 uv run trl vllm-serve-async \ --model "Qwen/Qwen3-14B" \ --max-model-len 14336 \ --gpu-memory-utilization 0.94 \ --async-scheduling \ --enable-prefix-caching \ --max-num-seqs 16 \ --max-num-batched-tokens 8192 \ --long-prefill-token-threshold 2048 \ --disablelogstats \ --enableautotoolchoice \ --reasoningparser qwen3 \ --toolcallparser hermes

Leave this terminal running

```

Terminal 2 (trainer on GPUs 1–2):

bash CUDA_VISIBLE_DEVICES=1,2 uv run accelerate launch \ --config_file scripts/deepspeed/zero2.yaml \ --num_processes 2 \ --module src.train_grpo -- \ run=repo_repair \ model=medium_qwen \ agent.time_limit=60 \ grpo=multi_turn_gspo \ grpo.max_prompt_length=1024 \ grpo.max_completion_length=12288 \ grpo.num_train_epochs=10 \ grpo.num_generations=8 \ grpo.generation_batch_size=8 \ grpo.per_device_train_batch_size=4 \ grpo.gradient_accumulation_steps=4 \ grpo.optim=adamw_torch \ grpo.run_name="your-run-name"

Notes: - If you plan to push to the HuggingFace Hub, run huggingface-cli login first and drop run.push_to_hub=false. - You can override any config at the CLI via Hydra (e.g., change model, learning rate, batch sizes, etc.).

Running Supervised Fine-Tuning (SFT)

Before GRPO training, you can optionally run SFT to create a better starting point:

```bash

Run SFT training job (small model)

sbatch scripts/smallsftloratrainjob.sh

Run SFT training job (large model)

sbatch scripts/largesftloratrainjob.sh

Or run locally for testing

uv run -m src.train_sft ```

The SFT stage uses curated datasets of high-quality code repair examples to provide the model with a strong foundation before RL training.

Running GRPO Training Jobs

We provide specialized SLURM scripts for different model sizes, each pre-configured with appropriate compute resource allocations:

```bash

For small models (8B), defaults to Qwen/Qwen3-8B

sbatch scripts/grpo/smallgrpoloratrainjob.sh grpo.run_name="custom-experiment-name" # LoRA training (3 GPUs)

For medium models (32B), defaults to Qwen/Qwen3-14B

sbatch scripts/grpo/mediumgrpoloratrainjob.sh grpo.run_name="custom-experiment-name" # LoRA training (3 GPUs) ```

Each script includes pre-tuned GRPO parameters optimized for the corresponding model size category. The scripts support three task types: - detection: Binary vulnerability detection - repair: Single-file code repair with search-replace diffs - repo_repair: Repository-level code repair using agentic approaches

You can customize training with Hydra overrides:

```bash

Change task type

sbatch scripts/grpo/mediumgrpoloratrainjob.sh run=detection

Use a different model

sbatch scripts/grpo/mediumgrpotrainjob.sh model=mediumllama ```

Local Development

For "local" development and testing without Apptainer containers, you can use uv directly.

Installing uv

Install the uv package manager with:

MacOS / Linux bash curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (project not tested on Windows) bash powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Testing

```bash

run all tests

uv run pytest

run specific testing file

uv run pytest tests/testsearchreplace_diff.py

run specific test

uv run pytest tests/testsearchreplacediff.py::testspecific_function ```

Documentation Structure

This repository uses several Markdown files to organize information:

README.md: (This file) Provides a high-level overview, setup instructions, and basic usage examples.
docs/PROJECT.md: Contains detailed information about the project's goals, implementation notes, theoretical background, and conceptual insights.
docs/DIARY.md: A development diary tracking progress, challenges, and decisions.
docs/AGENTRLINTEGRATION.md: Describes our approach to integrating agent frameworks into RL training loops using OpenAI-compatible API servers.
docs/DATASETS.md: Describes the datasets used in the project.
docs/RESOURCES.md: Lists relevant research papers, literature and broader resources reviewed for the project.
docs/VOCABULARY.md: Defines key terms and concepts used throughout the project.
docs/PAPER.md: Outlines the structure and key points for the academic paper.

Owner

Name: ASSERT
Login: ASSERT-KTH
Kind: organization
Location: Sweden

Website: https://github.com/ASSERT-KTH/
Repositories: 87
Profile: https://github.com/ASSERT-KTH

assertEquals("Research group at KTH Royal Institute of Technology, Stockholm, Sweden", description);

GitHub Events

Total

Watch event: 1
Delete event: 2
Public event: 1
Push event: 126
Fork event: 1
Create event: 4

Last Year

Watch event: 1
Delete event: 2
Public event: 1
Push event: 126
Fork event: 1
Create event: 4

https://github.com/assert-kth/coderepairrl

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

CodeRepairRL - Reinforcement Learning for Program Repair

Overview

Academic Paper

Getting Started

Building the Container

Build the training container

Reproducing on different compute setups

Using our Apptainer/SLURM setup

Choose your working directory (pick a location with plenty of fast storage)

Alternative: Local reproduction with uv

1) Install uv

2) Create the environment and install dependencies

Install project dependencies (creates/uses a virtualenv automatically)

3.) Exact 14B GRPO reproduction (3x ≥80GB GPUs) — run in two terminals

Leave this terminal running

Running Supervised Fine-Tuning (SFT)

Run SFT training job (small model)

Run SFT training job (large model)

Or run locally for testing

Running GRPO Training Jobs

For small models (8B), defaults to Qwen/Qwen3-8B

For medium models (32B), defaults to Qwen/Qwen3-14B

Change task type

Use a different model

Local Development

Installing uv

Testing

run all tests

run specific testing file

run specific test

Documentation Structure

Owner

GitHub Events

Total

Last Year