https://github.com/aise-tudelft/codered
Replication package for FSE'25 paper 'Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks"
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Repository
Replication package for FSE'25 paper 'Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks"
Basic Info
- Host: GitHub
- Owner: AISE-TUDelft
- License: unlicense
- Language: Jupyter Notebook
- Default Branch: main
- Size: 599 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks
Replication Package for the FSE'25 Paper Titled: Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks
Additional visualisations of our data can be found in the AISE-TUDelft/Code-Red-Benchmark
Instructions
To run the project, follow these steps:
- Create a virtual environment using conda by running:
conda create --name myenv python=3.10.8. (Note that Python versions newer than 3.10 will not be supported by Autosklearn) - Activate the virtual environment by running:
conda activate myenv. - Install the required Python packages by running:
pip install -r requirements.txt. - Create 2 files with the respective API keys:
openai.keyandopenrouter.keyto run a single generation with the models we selected, you will need approximately 1 in OpenAI and 30 in Openrouter credits.
The code is tested on an Ubuntu 20.04 LTS machine with 32GB of RAM and an Intel Core i9-12900HK processor.
Results
The results of the experiments can be found in the ./results folder, each file contains the results of a single model for a single generation for the entire dataset. The trained classifier can be found in the ./classification_models folder, we only provide the best performing model to save space.
Replication steps
- Classifier Training: The
classifier.ipynbnotebook will run the experiments with the labelled data to create the classifiers. The classifiers are saved in the/classification_modelsfolder. - Sample Generation:
generation.ipynbnotebook will run generation with all the models. The results are saved in the./results - Sample Tagging:
tagging.ipynbwill use the classifier from step 1 ot label the samples generated in the previous steps, the results will be saved in the./results/taggedfolder. - Plotting:
plots.ipynbwill take all the results and compile them into several figures used in the paper, each figure is saved in the./plotsfolder.
Citation
Please cite our paper if you find our work useful:
misc{alkaswan2025codered,
title={Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks},
author={Ali Al-Kaswan and Sebastian Deatc and Begm Ko and Arie van Deursen and Maliheh Izadi},
year={2025},
eprint={2504.01850},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2504.01850},
}
Owner
- Name: AISE-TUDelft
- Login: AISE-TUDelft
- Kind: organization
- Repositories: 1
- Profile: https://github.com/AISE-TUDelft
GitHub Events
Total
- Member event: 1
- Push event: 1
- Create event: 2
Last Year
- Member event: 1
- Push event: 1
- Create event: 2
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0