canitedit

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

https://github.com/nuprl/canitedit

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization nuprl has institutional domain (www.ccs.neu.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Basic Info
  • Host: GitHub
  • Owner: nuprl
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 235 KB
Statistics
  • Stars: 47
  • Watchers: 5
  • Forks: 7
  • Open Issues: 3
  • Releases: 0
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

CanItEdit is a benchmark for evaluating LLMs on instructional code editing, the task of updating a program given a natural language instruction. The benchmark contains 105 hand-crafted Python programs with before and after code blocks, two types of natural language instructions (descriptive and lazy), and a hidden test suite.

See our paper for more.

This repository provides code for evaluating models on the benchmark, and the code to reproduce EditPackFT and EditCoder, a dataset and a LLM built for instructional code editing.

The CanItEdit benchmark dataset, EditCoder model, and EditPackFT dataset can be found on HuggingFace:

  • CanItEdit: https://huggingface.co/datasets/nuprl/CanItEdit
  • EditCoder: https://huggingface.co/nuprl/EditCoder-6.7b-v1
  • EditPackFT: https://huggingface.co/datasets/nuprl/EditPackFT

Cloning the repository

It is very important to clone this repository and initialize all submodule recursively. This can be done with the following command:

bash git clone --recurse-submodules https://github.com/nuprl/CanItEdit

Structure

  • ./benchmark contains the CanItEdit benchmark dataset and code for generating and evaluating completions
  • ./editcoder contains code to train an EditCoder model
  • ./editpackft contains code to reproduce the EditPackFT dataset
  • ./requirements.txt contains the requirements for running the code in this repository

Citation

If you use this code or the CanItEdit benchmark, please cite our paper:

@inproceedings{cassano:canitedit, title={Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions}, author={Federico Cassano and Luisa Li and Akul Sethi and Noah Shinn and Abby Brennan-Jones and Anton Lozhkov and Carolyn Jane Anderson and Arjun Guha}, booktitle={Conference on Language Modeling (COLM)}, year={2024}, }

Owner

  • Name: Northeastern University Programming Research Lab
  • Login: nuprl
  • Kind: organization
  • Location: Boston

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Cassano"
    given-names: "Federico"
  - family-names: "Li"
    given-names: "Luisa"
  - family-names: "Sethi"
    given-names: "Akul"
  - family-names: "Shinn"
    given-names: "Noah"
  - family-names: "Brennan-Jones"
    given-names: "Abby"
  - family-names: "Lozkhov"
    given-names: "Anton"
  - family-names: "Anderson"
    given-names: "Carolyn Jane"
  - family-names: "Guha"
    given-names: "Arjun"
title: "Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions"
version: 1.0.0
date-released: 2024
url: "https://github.com/example/canitredit"
preferred-citation:
  type: conference-paper
  authors:
    - family-names: "Cassano"
      given-names: "Federico"
    - family-names: "Li"
      given-names: "Luisa"
    - family-names: "Sethi"
      given-names: "Akul"
    - family-names: "Shinn"
      given-names: "Noah"
    - family-names: "Brennan-Jones"
      given-names: "Abby"
    - family-names: "Lozkhov"
      given-names: "Anton"
    - family-names: "Anderson"
      given-names: "Carolyn Jane"
    - family-names: "Guha"
      given-names: "Arjun"
  title: "Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions"
  year: 2024
  conference:
    name: "Conference on Language Modelling (COLM)"

GitHub Events

Total
  • Commit comment event: 1
  • Issues event: 4
  • Watch event: 7
  • Issue comment event: 1
  • Push event: 5
  • Fork event: 4
Last Year
  • Commit comment event: 1
  • Issues event: 4
  • Watch event: 7
  • Issue comment event: 1
  • Push event: 5
  • Fork event: 4

Dependencies

benchmark/Dockerfile docker
  • ubuntu 22.04 build
benchmark/requirements.txt pypi
  • coverage ==7.3.2
  • pandas ==2.0.2
  • torch ==2.1.0
  • z3-solver ==4.12.2.0
requirements.txt pypi
  • accelerate ==0.24.1
  • bitsandbytes ==0.41.0
  • datasets ==2.15.0
  • deepspeed ==0.12.3
  • editdistance ==0.6.2
  • huggingface-hub ==0.19.4
  • openai ==1.2.0
  • peft ==0.4.0
  • ray ==2.8.0
  • rouge-rs ==0.1.0
  • scikit-learn ==1.3.0
  • tokenizers ==0.15.0
  • torch ==2.1.0
  • tqdm ==4.65.0
  • transformers ==4.35.2
  • vllm ==0.2.2
  • wandb ==0.15.4
  • wordcloud ==1.9.2