https://github.com/amazon-science/llm-code-preference
Training and Benchmarking LLMs for Code Preference.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Keywords
Repository
Training and Benchmarking LLMs for Code Preference.
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://llm-code-preference.github.io/
- Size: 755 KB
Statistics
- Stars: 33
- Watchers: 3
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Learning Code Preference via Synthetic Evolution
📰 TL;DR • 🔎 Evaluation • 🧪 Training • 🔮 Synthetic Data Generation • 📜 Citation • 🙏 Acknowledgement
📰 TL;DR
How to effectively and efficiently obtain code preferences and judgements is an important yet under-studied topic!
To this end, our work provides:
- CodeFavor: an open recipe to train code preference models with from-scratch data!
- Commit-Instruct: code commits -> code preference
- Critic-Evol: code critique & revising -> code preference
- CodePrefBench: 1364 code preference tasks covering both verifiable and human objectives:
- Code Correctness
- Code Efficiency
- Code Security
- Human Preference
- Study: our paper provides comprehensive studies!
- Human studies: quantifying the cost and performance of human preference based 18 developers
- Case studies: our Appendix case-studies model preferences over code correctness, efficiency, and security
- Controlled experiments: impact of data, comment, criteria, modeling, etc. on training preference models

🔎 Evaluation
Environment
- Python requirements: 3.10 or higher.
bash
conda create -n codefavor python=3.10 -y
conda activate codefavor
pip install -r requirements.txt
CodePrefBench
```bash
OpenAI server
python codefavor/evaluate.py --model-id "gpt-4o-2024-05-13" --model-type openai --concurrency 80
Other OpenAI-compatible servers (vLLM, DeepSeek APIs, etc.)
python codefavor/evaluate.py --model-id "google/gemma-2-27b-it" --model-type openai --concurrency 80 --model-url http://localhost:8000/v1
Claude models via Bedrock
python codefavor/evaluate.py --model-id "anthropic.claude-3-sonnet-20240229-v1:0" --model-type bedrock --concurrency 10
Pairwise RM
python codefavor/evaluate.py --model-id ./models/mix-cls-mistral-7b-itbs32ep1_lr5e-6-l3-70b/checkpoint-688 --model-type pair-rm ```
- Supported
--model-type:huggingface,openai,bedrock,pair-rm, andgoogle
🧪 Training
Environment
```bash git clone https://github.com/axolotl-ai-cloud/axolotl.git axolotl-dep cd axolotl-dep
pip install torch==2.3.0 pip install packaging ninja wandb pip install -e '.[flash-attn,deepspeed]' ```
Use existing dataset
bash
python scripts/axolotl/prepare_data.py \
--decomposed-dataset datasets/train/editpackft-Llama-3-70B-Instruct.commit_instruct.decompose.jsonl \
--judge-type classification --both-order
python scripts/axolotl/prepare_data.py \
--decomposed-dataset datasets/train/Llama-3-8B-Instruct-SOSS.teacher.Llama-3-70B-Instruct.critic_evol.decompose.jsonl \
--judge-type classification --both-order
Train models using Axolotl
```bash accelerate launch -m axolotl.cli.train \ scripts/axolotl/recipe/gemma/cls-commit-instruct-from-llama3-70b.yaml \ --deepspeed scripts/axolotl/zero3.json
or use torchrun if your accelerate is complaining
torchrun --nprocpernode 8 -m axolotl.cli.train \ scripts/axolotl/recipe/gemma/cls-commit-instruct-from-llama3-70b.yaml \ --deepspeed scripts/axolotl/zero3.json ```
🔮 Synthetic Data Generation
Commit-Instruct from Scratch
```bash
Support OpenAI and Bedrock interface
OAI interface
python codefavor/prompt/commit_instruct.py --model-id "deepseek-chat" --model-type "openai" --concurrency 256 --dataset editpackft --model-url "https://api.deepseek.com/v1"
Bedrock interface
python codefavor/prompt/commit_instruct.py --model-id "meta.llama3-1-405b-instruct-v1:0" --model-type "bedrock" --concurrency 10 --dataset editpackft ```
Critic-Evol from Scratch
bash
python codefavor/prompt/critic_evol.py --weak-dataset ./datasets/train/Llama-3-8B-Instruct-SOSS.jsonl \
--model-id "deepseek-coder" --model-url "https://api.deepseek.com/v1"
python codefavor/prompt/critic_evol.py --weak-dataset ./datasets/train/Llama-3-8B-Instruct-SOSS.jsonl \
--model-id "meta.llama3-1-405b-instruct-v1:0" --concurrency 10
- Pairwise training code is partially adopted from https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/main/pair-pm
📜 Citation
bibtex
@article{codefavor,
title = {Learning Code Preference via Synthetic Evolution},
author = {Liu, Jiawei and Nguyen, Thanh and Shang, Mingyue and Ding, Hantian and Li, Xiaopeng and Yu, Yu and Kumar, Varun and Wang, Zijian},
journal = {arXiv preprint arXiv:2410.03837},
year = {2024},
}
🙏 Acknowledgement
- Our training code is partially adapted from RLHFlow.
- Our evaluation code is partially adapted from RepoQA.
- The seed corpus used in this paper comes from EditPackFT and Self-OSS-Instruct.
🎓 Research Use Only
This source code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper, but interested parties are encouraged to open an issue requesting open source community development.
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Watch event: 32
- Member event: 1
- Push event: 8
- Public event: 1
- Fork event: 1
Last Year
- Watch event: 32
- Member event: 1
- Push event: 8
- Public event: 1
- Fork event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Thanh Nguyen | m****h@a****m | 6 |
| Zijian Wang | 2****g | 1 |
| Thanh Nguyen | f****l@i****m | 1 |
| Jiawei Liu | j****u@g****m | 1 |
| Amazon GitHub Automation | 5****o | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0