https://github.com/aise-tudelft/codered

Replication package for FSE'25 paper 'Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks"

https://github.com/aise-tudelft/codered

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Replication package for FSE'25 paper 'Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks"

Basic Info
  • Host: GitHub
  • Owner: AISE-TUDelft
  • License: unlicense
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 599 MB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Replication Package for the FSE'25 Paper Titled: Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks

Additional visualisations of our data can be found in the AISE-TUDelft/Code-Red-Benchmark

Instructions

To run the project, follow these steps:

  1. Create a virtual environment using conda by running: conda create --name myenv python=3.10.8. (Note that Python versions newer than 3.10 will not be supported by Autosklearn)
  2. Activate the virtual environment by running: conda activate myenv.
  3. Install the required Python packages by running: pip install -r requirements.txt.
  4. Create 2 files with the respective API keys: openai.key and openrouter.key to run a single generation with the models we selected, you will need approximately 1 in OpenAI and 30 in Openrouter credits.

The code is tested on an Ubuntu 20.04 LTS machine with 32GB of RAM and an Intel Core i9-12900HK processor.

Results

The results of the experiments can be found in the ./results folder, each file contains the results of a single model for a single generation for the entire dataset. The trained classifier can be found in the ./classification_models folder, we only provide the best performing model to save space.

Replication steps

  1. Classifier Training: The classifier.ipynb notebook will run the experiments with the labelled data to create the classifiers. The classifiers are saved in the /classification_models folder.
  2. Sample Generation: generation.ipynb notebook will run generation with all the models. The results are saved in the ./results
  3. Sample Tagging: tagging.ipynb will use the classifier from step 1 ot label the samples generated in the previous steps, the results will be saved in the ./results/tagged folder.
  4. Plotting: plots.ipynb will take all the results and compile them into several figures used in the paper, each figure is saved in the ./plots folder.

Citation

Please cite our paper if you find our work useful:

misc{alkaswan2025codered, title={Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks}, author={Ali Al-Kaswan and Sebastian Deatc and Begm Ko and Arie van Deursen and Maliheh Izadi}, year={2025}, eprint={2504.01850}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2504.01850}, }

Owner

  • Name: AISE-TUDelft
  • Login: AISE-TUDelft
  • Kind: organization

GitHub Events

Total
  • Member event: 1
  • Push event: 1
  • Create event: 2
Last Year
  • Member event: 1
  • Push event: 1
  • Create event: 2

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels