codeprompteval
Dataset to evaluate the impact of five prompt programming techniques on the code generated by LLMs. This repository also includes the replication package for the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Repository
Dataset to evaluate the impact of five prompt programming techniques on the code generated by LLMs. This repository also includes the replication package for the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al.
Basic Info
- Host: GitHub
- Owner: icetlab
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 5.53 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CodePromptEval: Evaluating the impact of prompt programming on code generation
This repository contains a dataset, CodePromptEval, based on the CoderEval Python dataset's functions (Yu et al. (2024)). CodePromptEval consists of 7,072 prompts based on 221 prompts for code-generation tasks, and each prompt implements 32 unique combinations of prompt techniques. The prompt techniques we cover are Few-shot learning, Persona, Chain-of-Thought, Function Signature (context), and List of Packages (context).
In addition, we provide the replication package of the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al. (2024). The replication package contains the original CoderEval, the additional tests and few-shot examples that we added to CoderEval, the scripts that we used to construct and evaluate CodePromptEval on five LLMs (GPT-3.5, GPT-4o, Llama3-70B, Llama2-7B, and Mistral), as well as the LLMs output with the generated functions and the evaluation results.
This replication package also includes the raw results of a manual inspection of 40 functions that failed or passed due to prompting the models using one or more prompt techniques.
To cite this work:
bibtex
@article{khojah2024impact,
title={{The Impact of Prompt Programming on Function-Level Code Generation}},
author={Khojah, Ranim and Neto, Francisco Gomes de Oliveira and Mohamad, Mazen and Leitner, Philipp},
journal={arXiv preprint arXiv:2412.20545},
year={2024}
}
Install dependencies
```shell
(optional) create a virtual environment
pip install virtualenv
python -m venv .
install packages
pip install -r requirements.txt ```
Contact
Please contact khojah{at}chalmers.se if you have any questions.
Owner
- Name: Internet Computing and Emerging Technologies lab (ICET-lab)
- Login: icetlab
- Kind: organization
- Website: https://icet-lab.eu/
- Repositories: 1
- Profile: https://github.com/icetlab
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this dataset or the replication package, please cite it as below." authors: - family-names: "Khojah" given-names: "Ranim" orcid: "https://orcid.org/0000-0002-1090-3153" - family-names: "de Oliveira Neto" given-names: "Francisco Gomes" orcid: "https://orcid.org/0000-0001-9226-5417" - family-names: "Mohamad" given-names: "Mazen" orcid: "https://orcid.org/0000-0002-3446-1265" - family-names: "Leitner" given-names: "Philipp" orcid: "https://orcid.org/0000-0003-2777-528X" title: "CodePromptEval" version: 1.0.0 date-released: 2024-12-11 url: "https://github.com/icetlab/CodePromptEval"
GitHub Events
Total
- Watch event: 2
- Member event: 1
- Public event: 1
- Push event: 5
- Create event: 1
Last Year
- Watch event: 2
- Member event: 1
- Public event: 1
- Push event: 5
- Create event: 1
Dependencies
- Jinja2 ==3.1.3
- MarkupSafe ==2.1.5
- MutPy-Pynguin ==0.7.1
- PyYAML ==6.0.1
- Pygments ==2.17.2
- aiohttp ==3.9.5
- aiosignal ==1.3.1
- annotated-types ==0.6.0
- anyio ==4.3.0
- astmonkey ==0.3.6
- astor ==0.8.1
- async-timeout ==4.0.3
- attrs ==23.2.0
- bytecode ==0.15.1
- certifi ==2024.2.2
- charset-normalizer ==3.3.2
- colorama ==0.4.6
- commonmark ==0.9.1
- contourpy ==1.2.1
- cycler ==0.12.1
- distro ==1.9.0
- exceptiongroup ==1.2.1
- fonttools ==4.51.0
- frozenlist ==1.4.1
- h11 ==0.14.0
- httpcore ==1.0.5
- httpx ==0.27.0
- idna ==3.7
- importlib_resources ==6.4.0
- iniconfig ==2.0.0
- jellyfish ==0.11.2
- kiwisolver ==1.4.5
- matplotlib ==3.8.4
- multidict ==6.0.5
- mypy-extensions ==1.0.0
- networkx ==2.8.8
- numpy ==1.26.4
- openai ==0.28.0
- ordered-set ==4.1.0
- packaging ==24.0
- pillow ==10.3.0
- pluggy ==1.4.0
- py ==1.11.0
- pydantic ==2.7.1
- pydantic_core ==2.18.2
- pydot ==1.4.2
- pynguin ==0.17.0
- pyparsing ==3.1.2
- pytest ==6.2.5
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- requests ==2.31.0
- rich ==11.2.0
- simple-parsing ==0.0.17
- six ==1.16.0
- sniffio ==1.3.1
- termcolor ==2.4.0
- toml ==0.10.2
- tqdm ==4.66.2
- typing-inspect ==0.9.0
- typing_extensions ==4.10.0
- urllib3 ==2.2.1
- yarl ==1.9.4
- zipp ==3.18.1