Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.2%) to scientific vocabulary
Repository
This is the a repo of DOCE
Basic Info
- Host: GitHub
- Owner: deep-spin
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 5.14 MB
Statistics
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
DOCE
This repo is for code in our arxived paper:
DOCE: Finding the Sweet Spot for Execution-Based Code Generation
Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins
Contact person: Haau-Sing Li
Usage
Installing packages from
requirements*.txt.Inference on HumanEval/MBPP task
bash
python3 codegen/generate.py \
--model ${model} \
--bs ${batch_size} \
--temperature ${temperature} \
--n_samples ${num_of_samples_for_reranking} \
--dataset ${humaneval/mbpp} \
--resume \
--root ${path_to_store_output}
- Evaluation
bash
evalplus.evaluate \
--dataset {humaneval/mbpp} \
--samples ${path to generated samples} \
--parallel 30 \
--test-details
- Get execution outputs of generated samples (for MBR-Exec)
bash
python3 evalplus/gen_outputs.py \
--gen_dir {model_name_plus_temperature} \
--dataset {humaneval/mbpp} \
--gen_fast
- Self-Debugging
You should get execution feedback first:
bash python3 evalplus/error_feedback.py \ --gen_dir {model_name_plus_temperature} \ --dataset {humaneval/mbpp}
Then we can do self-debugging:
bash
python3 codegen/ape_sd_ut.py \
--model ${model} \
--bs ${batch_size} \
--temperature ${temperature} \
--n_samples ${num_of_samples_for_reranking} \
--dataset ${humaneval/mbpp} \
--resume \
--root ${path_to_store_output}
--debugging_turn ${ith_debugging_turn}
- For MBR and N-Best-Reranking, please refer to our notebooks for now.
We will release our generated candidates soon if you want to save compute.
Our code is built upon EvalPlus.
Owner
- Name: DeepSPIN
- Login: deep-spin
- Kind: organization
- Location: Lisbon, PT
- Website: https://deep-spin.github.io/
- Repositories: 58
- Profile: https://github.com/deep-spin
Deep Structured Prediction in NLP
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this work and love it, consider citing it as below \U0001F917"
title: EvalPlus
authors:
- family-names: EvalPlus Team
url: https://github.com/evalplus/evalplus
doi: https://doi.org/10.48550/arXiv.2305.01210
date-released: 2023-05-01
license: Apache-2.0
preferred-citation:
type: article
title: "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation"
authors:
- family-names: Liu
given-names: Jiawei
- family-names: Xia
given-names: Chunqiu Steven
- family-names: Wang
given-names: Yuyao
- family-names: Zhang
given-names: Lingming
year: 2023
journal: "arXiv preprint arXiv:2305.01210"
doi: https://doi.org/10.48550/arXiv.2305.01210
url: https://arxiv.org/abs/2305.01210
GitHub Events
Total
- Push event: 10
- Fork event: 1
Last Year
- Push event: 10
- Fork event: 1
Dependencies
- python 3.8-slim-buster build
- accelerate *
- openai *
- rich *
- vllm *
- matplotlib *
- numpy *
- rich *
- tempdir *
- termcolor *
- tqdm *
- coverage *
- mutmut ==2.1.0
- rich *
- appdirs *
- multipledispatch *
- numpy *
- tempdir *
- termcolor *
- tqdm *
- wget *
- pytest * test