doce

This is the a repo of DOCE

https://github.com/deep-spin/doce

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This is the a repo of DOCE

Basic Info
  • Host: GitHub
  • Owner: deep-spin
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 5.14 MB
Statistics
  • Stars: 2
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

DOCE

This repo is for code in our arxived paper:

DOCE: Finding the Sweet Spot for Execution-Based Code Generation Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins

Contact person: Haau-Sing Li

Usage

  1. Installing packages from requirements*.txt.

  2. Inference on HumanEval/MBPP task

bash python3 codegen/generate.py \ --model ${model} \ --bs ${batch_size} \ --temperature ${temperature} \ --n_samples ${num_of_samples_for_reranking} \ --dataset ${humaneval/mbpp} \ --resume \ --root ${path_to_store_output}

  1. Evaluation

bash evalplus.evaluate \ --dataset {humaneval/mbpp} \ --samples ${path to generated samples} \ --parallel 30 \ --test-details

  1. Get execution outputs of generated samples (for MBR-Exec)

bash python3 evalplus/gen_outputs.py \ --gen_dir {model_name_plus_temperature} \ --dataset {humaneval/mbpp} \ --gen_fast

  1. Self-Debugging You should get execution feedback first: bash python3 evalplus/error_feedback.py \ --gen_dir {model_name_plus_temperature} \ --dataset {humaneval/mbpp}

Then we can do self-debugging: bash python3 codegen/ape_sd_ut.py \ --model ${model} \ --bs ${batch_size} \ --temperature ${temperature} \ --n_samples ${num_of_samples_for_reranking} \ --dataset ${humaneval/mbpp} \ --resume \ --root ${path_to_store_output} --debugging_turn ${ith_debugging_turn}

  1. For MBR and N-Best-Reranking, please refer to our notebooks for now.

We will release our generated candidates soon if you want to save compute.

Our code is built upon EvalPlus.

Owner

  • Name: DeepSPIN
  • Login: deep-spin
  • Kind: organization
  • Location: Lisbon, PT

Deep Structured Prediction in NLP

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work and love it, consider citing it as below \U0001F917"
title: EvalPlus
authors:
  - family-names: EvalPlus Team
url: https://github.com/evalplus/evalplus
doi: https://doi.org/10.48550/arXiv.2305.01210
date-released: 2023-05-01
license: Apache-2.0
preferred-citation:
  type: article
  title: "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation"
  authors:
    - family-names: Liu
      given-names: Jiawei
    - family-names: Xia
      given-names: Chunqiu Steven
    - family-names: Wang
      given-names: Yuyao
    - family-names: Zhang
      given-names: Lingming
  year: 2023
  journal: "arXiv preprint arXiv:2305.01210"
  doi: https://doi.org/10.48550/arXiv.2305.01210
  url: https://arxiv.org/abs/2305.01210

GitHub Events

Total
  • Push event: 10
  • Fork event: 1
Last Year
  • Push event: 10
  • Fork event: 1

Dependencies

Dockerfile docker
  • python 3.8-slim-buster build
pyproject.toml pypi
requirements-llm.txt pypi
  • accelerate *
  • openai *
  • rich *
  • vllm *
requirements-tools.txt pypi
  • matplotlib *
  • numpy *
  • rich *
  • tempdir *
  • termcolor *
  • tqdm *
requirements-tsr.txt pypi
  • coverage *
  • mutmut ==2.1.0
  • rich *
requirements.txt pypi
  • appdirs *
  • multipledispatch *
  • numpy *
  • tempdir *
  • termcolor *
  • tqdm *
  • wget *
tests/requirements.txt pypi
  • pytest * test