https://github.com/aise-tudelft/codered

Replication package for FSE'25 paper 'Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks"

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Replication package for FSE'25 paper 'Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks"

Basic Info

Host: GitHub
Owner: AISE-TUDelft
License: unlicense
Language: Jupyter Notebook
Default Branch: main
Size: 599 MB

Statistics

Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme

Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Replication Package for the FSE'25 Paper Titled: Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Additional visualisations of our data can be found in the AISE-TUDelft/Code-Red-Benchmark

Instructions

To run the project, follow these steps:

Create a virtual environment using conda by running: conda create --name myenv python=3.10.8. (Note that Python versions newer than 3.10 will not be supported by Autosklearn)
Activate the virtual environment by running: conda activate myenv.
Install the required Python packages by running: pip install -r requirements.txt.
Create 2 files with the respective API keys: openai.key and openrouter.key to run a single generation with the models we selected, you will need approximately 1 in OpenAI and 30 in Openrouter credits.

The code is tested on an Ubuntu 20.04 LTS machine with 32GB of RAM and an Intel Core i9-12900HK processor.

Results

The results of the experiments can be found in the ./results folder, each file contains the results of a single model for a single generation for the entire dataset. The trained classifier can be found in the ./classification_models folder, we only provide the best performing model to save space.

Replication steps

Classifier Training: The classifier.ipynb notebook will run the experiments with the labelled data to create the classifiers. The classifiers are saved in the /classification_models folder.
Sample Generation: generation.ipynb notebook will run generation with all the models. The results are saved in the ./results
Sample Tagging: tagging.ipynb will use the classifier from step 1 ot label the samples generated in the previous steps, the results will be saved in the ./results/tagged folder.
Plotting: plots.ipynb will take all the results and compile them into several figures used in the paper, each figure is saved in the ./plots folder.

Citation

Please cite our paper if you find our work useful:

misc{alkaswan2025codered, title={Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks}, author={Ali Al-Kaswan and Sebastian Deatc and Begm Ko and Arie van Deursen and Maliheh Izadi}, year={2025}, eprint={2504.01850}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2504.01850}, }

Owner

Name: AISE-TUDelft
Login: AISE-TUDelft
Kind: organization

Repositories: 1
Profile: https://github.com/AISE-TUDelft

GitHub Events

Total

Member event: 1
Push event: 1
Create event: 2

Last Year

Member event: 1
Push event: 1
Create event: 2

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aise-tudelft/codered

Science Score: 23.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Instructions

Results

Replication steps

Citation

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels