ai_requirements_generation_rr
This repository accompanies the paper “A Case Study on Cyber‑Security Requirement Elicitation: Leveraging Large‑Language‑Model Capabilities.” It contains every script, dataset, prompt template and result needed to fully reproduce our empirical study.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Repository
This repository accompanies the paper “A Case Study on Cyber‑Security Requirement Elicitation: Leveraging Large‑Language‑Model Capabilities.” It contains every script, dataset, prompt template and result needed to fully reproduce our empirical study.
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
AI-augmented Cybersecurity Requirements Generation using LLMs | Reproducible Research Package
This repository accompanies the paper “Experimental Evaluation of AI-Augmented Cybersecurity Requirements Generation Leveraging LLMs’ Capabilities.” It contains every script, dataset, prompt template and result needed to fully reproduce our empirical study.
Research Description
This project investigates the practical use of state‑of‑the‑art Large Language Models (LLMs) to transform high‑level, standard‑driven cyber‑security controls into concrete, system‑specific requirements. Using a synthetic yet industrially plausible case study—AI4I4, an IoT‑enabled automotive logistics platform—we benchmark thirteen frontier models (GPT‑4, LLaMa 3, Mistral, QWen, etc.), representing tge state of the art as of September 2024, across four prompting pipelines and three temperature regimes.
Key contributions include:
- Annotated benchmark of 54 ISO‑27002 clauses with placeholder semantics suitable for automatic instantiation.
- LangChain pipelines that decompose the task into applicability filtering, domain‑element search, requirement generation, and JSON formatting.
- Comprehensive evaluation of accuracy (precision, recall, F2), creativity (F2‑synthetic), and consistency (Jaccard overlap across runs).
- Prompt library enumerating >180 templates, showing how subtle changes in instruction design affect hallucination rate and coverage.
The artefacts and scripts below allow full replication—from raw prompts to final figures—on any infrastructure with access to the referenced models.
Repository Structure
text
.
├── data/ # Experimental inputs
│ ├── ai4i4.md # Functional specification of the AI4I4 case study
│ ├── annotated_standard_subset.json # Annotated subset of ISO‑27002 clauses
│ └── prompt/ # Prompt templates organised by task and model
├── src/ # LangChain pipelines and helper scripts
│ ├── generate_requirements/ # End‑to‑end automation
│ └── graph/ # Scripts to render result figures
├── results/ # Raw outputs and aggregated metrics
│ ├── requirements/ # Requirement lists (human + models)
│ ├── analysis/ # Coverage, F‑scores, Jaccard, etc.
│ └── graph/ # Re‑generated figures from the manuscript
├── doc/ # Execution logs for every configuration
├── LICENSE, LICENSE_DATA.txt
└── README.md # This document
Getting Started
Given that python3 and pip are installed and correctly configured in the system, and assuming that you have (depending on the model(s) you intend to use):
- A valid Huggingface PRO token.
- Granted acces the intended models on AWS Bedrock.
- A valid OpenAI API key.
- A valid Mistral API key.
You may follow the steps below to set up the environment and run the scripts.
Prerequisites
- Clone this repository locally.
bash
git clone git@github.com:STRAST-UPM/ai_requirements_generation_rr.git
- Change to the
generate_requirementsdirectory.
bash
cd src/generate-requirements
- Create a python virtual environment and activate it (recommended)
bash
python -m venv .venv
source .venv/bin/activate
- Install all required dependencies.
bash
pip install -r requirements.txt
- Create a
.envfile with the following content (depending on the models you want to use):
bash
HUGGINGFACE_API_TOKEN=<your_token>
MISTRAL_API_TOKEN=<your_token>
OPENAI_API_TOKEN=<your_token>
[!TIP] You may find an example of the
.envfile at .env.example.
- If you want to use models provided by AWS, configure AWS CLI with the credentials provided by the AWS administration console.
bash
aws configure
Execution
Generation of Cybersecurity Requirements
To generate cybersecurity requirements for a given system description, you may use the /src/generate -requirements/main.py script. You may specify the following parameters:
-s STANDARDS, to set the path of the file containing the adapted cybersecurity standards, as a .json file.
-d DOMAIN, to set the path of the file containing the system description, as a .md file.
-o OUTPUT, to set the path of the folder containing the generated cybersecurity requirements, as a .json file and the execution details.
-c CHAIN, to set the name of the Langchain's chain topology declaration to use (located at /src/generate
-requirements/templates/chain).
--help, to show the help message for the script.
Example:
bash
python main.py \
--standards ../../data/annotated_standard_subset.json \
--domain ../../data/ai4i4.md \
--output ../../results/requirements \
--chain cot_llama
[!IMPORTANT] In its default configuration, the requirements generation script makes use of the meta.llama3-1-405b-instruct-v1:0 model provided by AWS for serverless inference.
Key Artifacts
| Path | Brief description |
| ---------------------------------- | ----------------------------------------------------- |
| data/ai4i4.md | System specification of the pilot use‑case. |
| annotated_standard_subset.json | Parameterised ISO‑27002 controls. |
| data/prompt/** | 180+ prompt templates, categorised by task and model. |
| results/analysis/summary.csv | Precision, recall, F2 and relative F2 for every run. |
| results/analysis/consistency.csv | Jaccard indices across successive runs. |
| doc/*_execution_details.md | Detailed execution logs per configuration. |
[!IMPORTANT] Complete dataset datasheets are provided in the data/README.md and results/README.md files.
Reproducibility Notes
- Determinism Because of the inherent stochasticity of LLMs, results may vary across runs. Please refer to the consistency metrics in
results/analysis/consistency.csvto assess stability considerations. - Data licensing ISO‑27002 excerpts are replaced by identifiers to comply with copyright; users must possess the full standard.
- Model access Some models (e.g., GPT‑4, Mistral) require API keys or specific access permissions. Ensure you have the necessary credentials before running the scripts.
- Environment The scripts are tested on Python 3.10+ with the dependencies listed in
requirements.txt. Ensure your environment matches these specifications to avoid compatibility issues.
[!IMPORTANT] Model selection references and rationale are documented in doc/selectionofmodels.md.
Ethics and Intended Use
This research is conducted under the principles of responsible AI. The generated requirements are intended for educational and research purposes only. Users must ensure compliance with local laws and ethical guidelines when applying these results in real-world scenarios.
Any use involving production compliance auditing, legal certification, or critical system design should involve human oversight and validation by qualified cybersecurity professionals.
Version History
| Version | Date | Highlights | | ------- | ---------- | ----------------------------------------------------------- | | 1.0 | 2025‑07-31 | Initial public release. |
License and Citation
This repository uses two licenses:
- Software: Proprietary license — personal, non-commercial research use only; no modification, redistribution, or commercial use permitted (see LICENSE).
- Data: Creative Commons Attribution 4.0 International (CC BY 4.0) (see LICENSE).
If you use this repository in your research, please cite it as follows:
bibtex
@misc{llmsec2025iso,
author={Yelmo, Juan Carlos and Martín, Yod-Samuel and Perez-Acuna, Santiago},
title={A Case Study on AI-augmented Cybersecurity Requirements Generation leveraging LLMs Capabilities | Reproducible Research Package},
year={2025},
url={https://github.com/STRAST-UPM/ai_requirements_generation_rr},
doi={10.5281/zenodo.15641295},
version={1.0},
}
Contact
Juan Carlos Yelmo García - juancarlos.yelmo@upm.es
Yod Samuel Martín García - ys.martin@upm.es
Santiago Pérez Acuña - santiago.perez.acuna@upm.es
Last updated : 2025‑07-31
Owner
- Name: STRAST-UPM
- Login: STRAST-UPM
- Kind: organization
- Repositories: 1
- Profile: https://github.com/STRAST-UPM
Citation (CITATION.cff)
cff-version: 1.2.0
title: "A Case Study on AI-augmented Cybersecurity Requirements Generation leveraging LLMs Capabilities | Reproducible Research Package"
message: "Please cite this repository using the metadata from `preferred-citation` in CITATION.cff"
type: data
authors:
- family-names: Yelmo
given-names: Juan Carlos
orcid: "0000-0001-7491-0961"
affiliation: "Universidad Politécnica de Madrid"
- family-names: Martín
given-names: Yod-Samuel
orcid: "0000-0002-0065-5117"
affiliation: "Universidad Politécnica de Madrid"
- family-names: Perez-Acuna
given-names: Santiago
orcid: "0009-0006-8305-2325"
affiliation: "Universidad Politécnica de Madrid"
identifiers:
- type: doi
value: 10.5281/zenodo.15641295
description: "Zenodo"
license:
- spdx: LicenseRef-Proprietary
- spdx: CC-BY-4.0
version: 1.0.0
date-released: 2025-06-30
url: "https://github.com/STRAST-UPM/ai_requirements_generation_rr"
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2
Dependencies
- Jinja2 ==3.1.6
- MarkupSafe ==3.0.2
- PyYAML ==6.0.2
- SQLAlchemy ==2.0.41
- aiohappyeyeballs ==2.6.1
- aiohttp ==3.12.8
- aiosignal ==1.3.2
- annotated-types ==0.7.0
- anyio ==4.9.0
- attrs ==25.3.0
- boto3 ==1.38.29
- botocore ==1.38.29
- certifi ==2025.4.26
- charset-normalizer ==3.4.2
- dataclasses-json ==0.6.7
- filelock ==3.18.0
- frozenlist ==1.6.2
- fsspec ==2025.5.1
- greenlet ==3.2.2
- h11 ==0.16.0
- hf-xet ==1.1.3
- httpcore ==1.0.9
- httpx ==0.28.1
- httpx-sse ==0.4.0
- huggingface-hub ==0.32.4
- idna ==3.10
- jmespath ==1.0.1
- joblib ==1.5.1
- jsonpatch ==1.33
- jsonpointer ==3.0.0
- langchain ==0.3.25
- langchain-aws ==0.2.24
- langchain-community ==0.3.24
- langchain-core ==0.3.63
- langchain-huggingface ==0.2.0
- langchain-mistralai ==0.2.10
- langchain-text-splitters ==0.3.8
- langsmith ==0.3.44
- marshmallow ==3.26.1
- mpmath ==1.3.0
- multidict ==6.4.4
- mypy_extensions ==1.1.0
- networkx ==3.5
- numpy ==1.26.4
- nvidia-cublas-cu12 ==12.6.4.1
- nvidia-cuda-cupti-cu12 ==12.6.80
- nvidia-cuda-nvrtc-cu12 ==12.6.77
- nvidia-cuda-runtime-cu12 ==12.6.77
- nvidia-cudnn-cu12 ==9.5.1.17
- nvidia-cufft-cu12 ==11.3.0.4
- nvidia-cufile-cu12 ==1.11.1.6
- nvidia-curand-cu12 ==10.3.7.77
- nvidia-cusolver-cu12 ==11.7.1.2
- nvidia-cusparse-cu12 ==12.5.4.2
- nvidia-cusparselt-cu12 ==0.6.3
- nvidia-nccl-cu12 ==2.26.2
- nvidia-nvjitlink-cu12 ==12.6.85
- nvidia-nvtx-cu12 ==12.6.77
- orjson ==3.10.18
- packaging ==24.2
- pillow ==11.2.1
- propcache ==0.3.1
- pydantic ==2.11.5
- pydantic-settings ==2.9.1
- pydantic_core ==2.33.2
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.1.0
- regex ==2024.11.6
- requests ==2.32.3
- requests-toolbelt ==1.0.0
- s3transfer ==0.13.0
- safetensors ==0.5.3
- scikit-learn ==1.6.1
- scipy ==1.15.3
- sentence-transformers ==4.1.0
- six ==1.17.0
- sniffio ==1.3.1
- sympy ==1.14.0
- tenacity ==9.1.2
- threadpoolctl ==3.6.0
- tokenizers ==0.21.1
- torch ==2.7.0
- tqdm ==4.67.1
- transformers ==4.52.4
- triton ==3.3.0
- typing-inspect ==0.9.0
- typing-inspection ==0.4.1
- typing_extensions ==4.14.0
- urllib3 ==2.4.0
- yarl ==1.20.0
- zstandard ==0.23.0
- contourpy ==1.3.2
- cycler ==0.12.1
- fonttools ==4.58.2
- kiwisolver ==1.4.8
- matplotlib ==3.10.3
- numpy ==2.3.0
- packaging ==25.0
- pillow ==11.2.1
- pyparsing ==3.2.3
- python-dateutil ==2.9.0.post0
- six ==1.17.0