codeprompteval

Dataset to evaluate the impact of five prompt programming techniques on the code generated by LLMs. This repository also includes the replication package for the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al.

https://github.com/icetlab/codeprompteval

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Dataset to evaluate the impact of five prompt programming techniques on the code generated by LLMs. This repository also includes the replication package for the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al.

Basic Info
  • Host: GitHub
  • Owner: icetlab
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 5.53 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Citation

README.md

CodePromptEval: Evaluating the impact of prompt programming on code generation

GitHub arXiv

This repository contains a dataset, CodePromptEval, based on the CoderEval Python dataset's functions (Yu et al. (2024)). CodePromptEval consists of 7,072 prompts based on 221 prompts for code-generation tasks, and each prompt implements 32 unique combinations of prompt techniques. The prompt techniques we cover are Few-shot learning, Persona, Chain-of-Thought, Function Signature (context), and List of Packages (context).

In addition, we provide the replication package of the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al. (2024). The replication package contains the original CoderEval, the additional tests and few-shot examples that we added to CoderEval, the scripts that we used to construct and evaluate CodePromptEval on five LLMs (GPT-3.5, GPT-4o, Llama3-70B, Llama2-7B, and Mistral), as well as the LLMs output with the generated functions and the evaluation results.

This replication package also includes the raw results of a manual inspection of 40 functions that failed or passed due to prompting the models using one or more prompt techniques.

To cite this work: bibtex @article{khojah2024impact, title={{The Impact of Prompt Programming on Function-Level Code Generation}}, author={Khojah, Ranim and Neto, Francisco Gomes de Oliveira and Mohamad, Mazen and Leitner, Philipp}, journal={arXiv preprint arXiv:2412.20545}, year={2024} }

Install dependencies

```shell

(optional) create a virtual environment

pip install virtualenv python -m venv . source ./bin/activate

install packages

pip install -r requirements.txt ```

Contact

Please contact khojah{at}chalmers.se if you have any questions.

Owner

  • Name: Internet Computing and Emerging Technologies lab (ICET-lab)
  • Login: icetlab
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset or the replication package, please cite it as below."
authors:
- family-names: "Khojah"
  given-names: "Ranim"
  orcid: "https://orcid.org/0000-0002-1090-3153"
- family-names: "de Oliveira Neto"
  given-names: "Francisco Gomes"
  orcid: "https://orcid.org/0000-0001-9226-5417"
- family-names: "Mohamad"
  given-names: "Mazen"
  orcid: "https://orcid.org/0000-0002-3446-1265"
- family-names: "Leitner"
  given-names: "Philipp"
  orcid: "https://orcid.org/0000-0003-2777-528X"
  
title: "CodePromptEval"
version: 1.0.0
date-released: 2024-12-11
url: "https://github.com/icetlab/CodePromptEval"

GitHub Events

Total
  • Watch event: 2
  • Member event: 1
  • Public event: 1
  • Push event: 5
  • Create event: 1
Last Year
  • Watch event: 2
  • Member event: 1
  • Public event: 1
  • Push event: 5
  • Create event: 1

Dependencies

requirements.txt pypi
  • Jinja2 ==3.1.3
  • MarkupSafe ==2.1.5
  • MutPy-Pynguin ==0.7.1
  • PyYAML ==6.0.1
  • Pygments ==2.17.2
  • aiohttp ==3.9.5
  • aiosignal ==1.3.1
  • annotated-types ==0.6.0
  • anyio ==4.3.0
  • astmonkey ==0.3.6
  • astor ==0.8.1
  • async-timeout ==4.0.3
  • attrs ==23.2.0
  • bytecode ==0.15.1
  • certifi ==2024.2.2
  • charset-normalizer ==3.3.2
  • colorama ==0.4.6
  • commonmark ==0.9.1
  • contourpy ==1.2.1
  • cycler ==0.12.1
  • distro ==1.9.0
  • exceptiongroup ==1.2.1
  • fonttools ==4.51.0
  • frozenlist ==1.4.1
  • h11 ==0.14.0
  • httpcore ==1.0.5
  • httpx ==0.27.0
  • idna ==3.7
  • importlib_resources ==6.4.0
  • iniconfig ==2.0.0
  • jellyfish ==0.11.2
  • kiwisolver ==1.4.5
  • matplotlib ==3.8.4
  • multidict ==6.0.5
  • mypy-extensions ==1.0.0
  • networkx ==2.8.8
  • numpy ==1.26.4
  • openai ==0.28.0
  • ordered-set ==4.1.0
  • packaging ==24.0
  • pillow ==10.3.0
  • pluggy ==1.4.0
  • py ==1.11.0
  • pydantic ==2.7.1
  • pydantic_core ==2.18.2
  • pydot ==1.4.2
  • pynguin ==0.17.0
  • pyparsing ==3.1.2
  • pytest ==6.2.5
  • python-dateutil ==2.9.0.post0
  • python-dotenv ==1.0.1
  • requests ==2.31.0
  • rich ==11.2.0
  • simple-parsing ==0.0.17
  • six ==1.16.0
  • sniffio ==1.3.1
  • termcolor ==2.4.0
  • toml ==0.10.2
  • tqdm ==4.66.2
  • typing-inspect ==0.9.0
  • typing_extensions ==4.10.0
  • urllib3 ==2.2.1
  • yarl ==1.9.4
  • zipp ==3.18.1