doce

This is the a repo of DOCE

https://github.com/deep-spin/doce

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.2%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

This is the a repo of DOCE

Basic Info

Host: GitHub
Owner: deep-spin
License: apache-2.0
Language: Python
Default Branch: main
Size: 5.14 MB

Statistics

Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

DOCE

This repo is for code in our arxived paper:

DOCE: Finding the Sweet Spot for Execution-Based Code Generation Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins

Contact person: Haau-Sing Li

Usage

Installing packages from requirements*.txt.
Inference on HumanEval/MBPP task

bash python3 codegen/generate.py \ --model ${model} \ --bs ${batch_size} \ --temperature ${temperature} \ --n_samples ${num_of_samples_for_reranking} \ --dataset ${humaneval/mbpp} \ --resume \ --root ${path_to_store_output}

Evaluation

bash evalplus.evaluate \ --dataset {humaneval/mbpp} \ --samples ${path to generated samples} \ --parallel 30 \ --test-details

Get execution outputs of generated samples (for MBR-Exec)

bash python3 evalplus/gen_outputs.py \ --gen_dir {model_name_plus_temperature} \ --dataset {humaneval/mbpp} \ --gen_fast

Self-Debugging You should get execution feedback first: bash python3 evalplus/error_feedback.py \ --gen_dir {model_name_plus_temperature} \ --dataset {humaneval/mbpp}

Then we can do self-debugging: bash python3 codegen/ape_sd_ut.py \ --model ${model} \ --bs ${batch_size} \ --temperature ${temperature} \ --n_samples ${num_of_samples_for_reranking} \ --dataset ${humaneval/mbpp} \ --resume \ --root ${path_to_store_output} --debugging_turn ${ith_debugging_turn}

For MBR and N-Best-Reranking, please refer to our notebooks for now.

We will release our generated candidates soon if you want to save compute.

Our code is built upon EvalPlus.

Owner

Name: DeepSPIN
Login: deep-spin
Kind: organization
Location: Lisbon, PT

Website: https://deep-spin.github.io/
Repositories: 58
Profile: https://github.com/deep-spin

Deep Structured Prediction in NLP

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work and love it, consider citing it as below \U0001F917"
title: EvalPlus
authors:
  - family-names: EvalPlus Team
url: https://github.com/evalplus/evalplus
doi: https://doi.org/10.48550/arXiv.2305.01210
date-released: 2023-05-01
license: Apache-2.0
preferred-citation:
  type: article
  title: "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation"
  authors:
    - family-names: Liu
      given-names: Jiawei
    - family-names: Xia
      given-names: Chunqiu Steven
    - family-names: Wang
      given-names: Yuyao
    - family-names: Zhang
      given-names: Lingming
  year: 2023
  journal: "arXiv preprint arXiv:2305.01210"
  doi: https://doi.org/10.48550/arXiv.2305.01210
  url: https://arxiv.org/abs/2305.01210

GitHub Events

Total

Push event: 10
Fork event: 1

Last Year

Push event: 10
Fork event: 1

Dependencies

Dockerfile docker

python 3.8-slim-buster build

pyproject.toml pypi

requirements-llm.txt pypi

accelerate *
openai *
rich *
vllm *

requirements-tools.txt pypi

matplotlib *
numpy *
rich *
tempdir *
termcolor *
tqdm *

requirements-tsr.txt pypi

coverage *
mutmut ==2.1.0
rich *

requirements.txt pypi

appdirs *
multipledispatch *
numpy *
tempdir *
termcolor *
tqdm *
wget *

tests/requirements.txt pypi

pytest * test

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science