codeprompteval

Dataset to evaluate the impact of five prompt programming techniques on the code generated by LLMs. This repository also includes the replication package for the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al.

https://github.com/icetlab/codeprompteval

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: icetlab
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 5.53 MB

Statistics

Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

README.md

CodePromptEval: Evaluating the impact of prompt programming on code generation

This repository contains a dataset, CodePromptEval, based on the CoderEval Python dataset's functions (Yu et al. (2024)). CodePromptEval consists of 7,072 prompts based on 221 prompts for code-generation tasks, and each prompt implements 32 unique combinations of prompt techniques. The prompt techniques we cover are Few-shot learning, Persona, Chain-of-Thought, Function Signature (context), and List of Packages (context).

In addition, we provide the replication package of the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al. (2024). The replication package contains the original CoderEval, the additional tests and few-shot examples that we added to CoderEval, the scripts that we used to construct and evaluate CodePromptEval on five LLMs (GPT-3.5, GPT-4o, Llama3-70B, Llama2-7B, and Mistral), as well as the LLMs output with the generated functions and the evaluation results.

This replication package also includes the raw results of a manual inspection of 40 functions that failed or passed due to prompting the models using one or more prompt techniques.

To cite this work: bibtex @article{khojah2024impact, title={{The Impact of Prompt Programming on Function-Level Code Generation}}, author={Khojah, Ranim and Neto, Francisco Gomes de Oliveira and Mohamad, Mazen and Leitner, Philipp}, journal={arXiv preprint arXiv:2412.20545}, year={2024} }

Install dependencies

```shell

(optional) create a virtual environment

pip install virtualenv python -m venv . source ./bin/activate

install packages

pip install -r requirements.txt ```

Contact

Please contact khojah{at}chalmers.se if you have any questions.

Owner

Name: Internet Computing and Emerging Technologies lab (ICET-lab)
Login: icetlab
Kind: organization

Website: https://icet-lab.eu/
Repositories: 1
Profile: https://github.com/icetlab

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset or the replication package, please cite it as below."
authors:
- family-names: "Khojah"
  given-names: "Ranim"
  orcid: "https://orcid.org/0000-0002-1090-3153"
- family-names: "de Oliveira Neto"
  given-names: "Francisco Gomes"
  orcid: "https://orcid.org/0000-0001-9226-5417"
- family-names: "Mohamad"
  given-names: "Mazen"
  orcid: "https://orcid.org/0000-0002-3446-1265"
- family-names: "Leitner"
  given-names: "Philipp"
  orcid: "https://orcid.org/0000-0003-2777-528X"
  
title: "CodePromptEval"
version: 1.0.0
date-released: 2024-12-11
url: "https://github.com/icetlab/CodePromptEval"

GitHub Events

Total

Watch event: 2
Member event: 1
Public event: 1
Push event: 5
Create event: 1

Last Year

Watch event: 2
Member event: 1
Public event: 1
Push event: 5
Create event: 1

Dependencies

requirements.txt pypi

Jinja2 ==3.1.3
MarkupSafe ==2.1.5
MutPy-Pynguin ==0.7.1
PyYAML ==6.0.1
Pygments ==2.17.2
aiohttp ==3.9.5
aiosignal ==1.3.1
annotated-types ==0.6.0
anyio ==4.3.0
astmonkey ==0.3.6
astor ==0.8.1
async-timeout ==4.0.3
attrs ==23.2.0
bytecode ==0.15.1
certifi ==2024.2.2
charset-normalizer ==3.3.2
colorama ==0.4.6
commonmark ==0.9.1
contourpy ==1.2.1
cycler ==0.12.1
distro ==1.9.0
exceptiongroup ==1.2.1
fonttools ==4.51.0
frozenlist ==1.4.1
h11 ==0.14.0
httpcore ==1.0.5
httpx ==0.27.0
idna ==3.7
importlib_resources ==6.4.0
iniconfig ==2.0.0
jellyfish ==0.11.2
kiwisolver ==1.4.5
matplotlib ==3.8.4
multidict ==6.0.5
mypy-extensions ==1.0.0
networkx ==2.8.8
numpy ==1.26.4
openai ==0.28.0
ordered-set ==4.1.0
packaging ==24.0
pillow ==10.3.0
pluggy ==1.4.0
py ==1.11.0
pydantic ==2.7.1
pydantic_core ==2.18.2
pydot ==1.4.2
pynguin ==0.17.0
pyparsing ==3.1.2
pytest ==6.2.5
python-dateutil ==2.9.0.post0
python-dotenv ==1.0.1
requests ==2.31.0
rich ==11.2.0
simple-parsing ==0.0.17
six ==1.16.0
sniffio ==1.3.1
termcolor ==2.4.0
toml ==0.10.2
tqdm ==4.66.2
typing-inspect ==0.9.0
typing_extensions ==4.10.0
urllib3 ==2.2.1
yarl ==1.9.4
zipp ==3.18.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science