ai_requirements_generation_rr

This repository accompanies the paper “A Case Study on Cyber‑Security Requirement Elicitation: Leveraging Large‑Language‑Model Capabilities.” It contains every script, dataset, prompt template and result needed to fully reproduce our empirical study.

https://github.com/strast-upm/ai_requirements_generation_rr

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

This repository accompanies the paper “A Case Study on Cyber‑Security Requirement Elicitation: Leveraging Large‑Language‑Model Capabilities.” It contains every script, dataset, prompt template and result needed to fully reproduce our empirical study.

Basic Info
  • Host: GitHub
  • Owner: STRAST-UPM
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 2.7 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

AI-augmented Cybersecurity Requirements Generation using LLMs | Reproducible Research Package

DOI

This repository accompanies the paper “Experimental Evaluation of AI-Augmented Cybersecurity Requirements Generation Leveraging LLMs’ Capabilities.” It contains every script, dataset, prompt template and result needed to fully reproduce our empirical study.

Research Description

This project investigates the practical use of state‑of‑the‑art Large Language Models (LLMs) to transform high‑level, standard‑driven cyber‑security controls into concrete, system‑specific requirements. Using a synthetic yet industrially plausible case study—AI4I4, an IoT‑enabled automotive logistics platform—we benchmark thirteen frontier models (GPT‑4, LLaMa 3, Mistral, QWen, etc.), representing tge state of the art as of September 2024, across four prompting pipelines and three temperature regimes.

Key contributions include:

  1. Annotated benchmark of 54 ISO‑27002 clauses with placeholder semantics suitable for automatic instantiation.
  2. LangChain pipelines that decompose the task into applicability filtering, domain‑element search, requirement generation, and JSON formatting.
  3. Comprehensive evaluation of accuracy (precision, recall, F2), creativity (F2‑synthetic), and consistency (Jaccard overlap across runs).
  4. Prompt library enumerating >180 templates, showing how subtle changes in instruction design affect hallucination rate and coverage.

The artefacts and scripts below allow full replication—from raw prompts to final figures—on any infrastructure with access to the referenced models.

Repository Structure

text . ├── data/ # Experimental inputs │ ├── ai4i4.md # Functional specification of the AI4I4 case study │ ├── annotated_standard_subset.json # Annotated subset of ISO‑27002 clauses │ └── prompt/ # Prompt templates organised by task and model ├── src/ # LangChain pipelines and helper scripts │ ├── generate_requirements/ # End‑to‑end automation │ └── graph/ # Scripts to render result figures ├── results/ # Raw outputs and aggregated metrics │ ├── requirements/ # Requirement lists (human + models) │ ├── analysis/ # Coverage, F‑scores, Jaccard, etc. │ └── graph/ # Re‑generated figures from the manuscript ├── doc/ # Execution logs for every configuration ├── LICENSE, LICENSE_DATA.txt └── README.md # This document

Getting Started

Given that python3 and pip are installed and correctly configured in the system, and assuming that you have (depending on the model(s) you intend to use):

You may follow the steps below to set up the environment and run the scripts.

Prerequisites

  1. Clone this repository locally.

bash git clone git@github.com:STRAST-UPM/ai_requirements_generation_rr.git

  1. Change to the generate_requirements directory.

bash cd src/generate-requirements

  1. Create a python virtual environment and activate it (recommended)

bash python -m venv .venv source .venv/bin/activate

  1. Install all required dependencies.

bash pip install -r requirements.txt

  1. Create a .env file with the following content (depending on the models you want to use):

bash HUGGINGFACE_API_TOKEN=<your_token> MISTRAL_API_TOKEN=<your_token> OPENAI_API_TOKEN=<your_token>

[!TIP] You may find an example of the .env file at .env.example.

  1. If you want to use models provided by AWS, configure AWS CLI with the credentials provided by the AWS administration console.

bash aws configure

Execution

Generation of Cybersecurity Requirements

To generate cybersecurity requirements for a given system description, you may use the /src/generate -requirements/main.py script. You may specify the following parameters:

-s STANDARDS, to set the path of the file containing the adapted cybersecurity standards, as a .json file.

-d DOMAIN, to set the path of the file containing the system description, as a .md file.

-o OUTPUT, to set the path of the folder containing the generated cybersecurity requirements, as a .json file and the execution details.

-c CHAIN, to set the name of the Langchain's chain topology declaration to use (located at /src/generate -requirements/templates/chain).

--help, to show the help message for the script.

Example: bash python main.py \ --standards ../../data/annotated_standard_subset.json \ --domain ../../data/ai4i4.md \ --output ../../results/requirements \ --chain cot_llama

[!IMPORTANT] In its default configuration, the requirements generation script makes use of the meta.llama3-1-405b-instruct-v1:0 model provided by AWS for serverless inference.

Key Artifacts

| Path | Brief description | | ---------------------------------- | ----------------------------------------------------- | | data/ai4i4.md | System specification of the pilot use‑case. | | annotated_standard_subset.json | Parameterised ISO‑27002 controls. | | data/prompt/** | 180+ prompt templates, categorised by task and model. | | results/analysis/summary.csv | Precision, recall, F2 and relative F2 for every run. | | results/analysis/consistency.csv | Jaccard indices across successive runs. | | doc/*_execution_details.md | Detailed execution logs per configuration. |

[!IMPORTANT] Complete dataset datasheets are provided in the data/README.md and results/README.md files.

Reproducibility Notes

  • Determinism  Because of the inherent stochasticity of LLMs, results may vary across runs. Please refer to the consistency metrics in results/analysis/consistency.csv to assess stability considerations.
  • Data licensing  ISO‑27002 excerpts are replaced by identifiers to comply with copyright; users must possess the full standard.
  • Model access  Some models (e.g., GPT‑4, Mistral) require API keys or specific access permissions. Ensure you have the necessary credentials before running the scripts.
  • Environment  The scripts are tested on Python 3.10+ with the dependencies listed in requirements.txt. Ensure your environment matches these specifications to avoid compatibility issues.

[!IMPORTANT] Model selection references and rationale are documented in doc/selectionofmodels.md.

Ethics and Intended Use

This research is conducted under the principles of responsible AI. The generated requirements are intended for educational and research purposes only. Users must ensure compliance with local laws and ethical guidelines when applying these results in real-world scenarios.

Any use involving production compliance auditing, legal certification, or critical system design should involve human oversight and validation by qualified cybersecurity professionals.

Version History

| Version | Date | Highlights | | ------- | ---------- | ----------------------------------------------------------- | | 1.0 | 2025‑07-31 | Initial public release. |

License and Citation

This repository uses two licenses:

  • Software: Proprietary license — personal, non-commercial research use only; no modification, redistribution, or commercial use permitted (see LICENSE).
  • Data: Creative Commons Attribution 4.0 International (CC BY 4.0) (see LICENSE).

If you use this repository in your research, please cite it as follows:

bibtex @misc{llmsec2025iso, author={Yelmo, Juan Carlos and Martín, Yod-Samuel and Perez-Acuna, Santiago}, title={A Case Study on AI-augmented Cybersecurity Requirements Generation leveraging LLMs Capabilities | Reproducible Research Package}, year={2025}, url={https://github.com/STRAST-UPM/ai_requirements_generation_rr}, doi={10.5281/zenodo.15641295}, version={1.0}, }

Contact

Juan Carlos Yelmo García - juancarlos.yelmo@upm.es

Yod Samuel Martín García - ys.martin@upm.es

Santiago Pérez Acuña - santiago.perez.acuna@upm.es


Last updated : 2025‑07-31

Owner

  • Name: STRAST-UPM
  • Login: STRAST-UPM
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: "A Case Study on AI-augmented Cybersecurity Requirements Generation leveraging LLMs Capabilities | Reproducible Research Package"
message: "Please cite this repository using the metadata from `preferred-citation` in CITATION.cff"
type: data
authors:
  - family-names: Yelmo
    given-names: Juan Carlos
    orcid: "0000-0001-7491-0961"
    affiliation: "Universidad Politécnica de Madrid"
  - family-names: Martín
    given-names: Yod-Samuel
    orcid: "0000-0002-0065-5117"
    affiliation: "Universidad Politécnica de Madrid"
  - family-names: Perez-Acuna
    given-names: Santiago
    orcid: "0009-0006-8305-2325"
    affiliation: "Universidad Politécnica de Madrid"
identifiers:
  - type: doi
    value: 10.5281/zenodo.15641295
    description: "Zenodo"
license:
  - spdx: LicenseRef-Proprietary
  - spdx: CC-BY-4.0
version: 1.0.0
date-released: 2025-06-30
url: "https://github.com/STRAST-UPM/ai_requirements_generation_rr"

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2

Dependencies

src/generate_requirements/requirements.txt pypi
  • Jinja2 ==3.1.6
  • MarkupSafe ==3.0.2
  • PyYAML ==6.0.2
  • SQLAlchemy ==2.0.41
  • aiohappyeyeballs ==2.6.1
  • aiohttp ==3.12.8
  • aiosignal ==1.3.2
  • annotated-types ==0.7.0
  • anyio ==4.9.0
  • attrs ==25.3.0
  • boto3 ==1.38.29
  • botocore ==1.38.29
  • certifi ==2025.4.26
  • charset-normalizer ==3.4.2
  • dataclasses-json ==0.6.7
  • filelock ==3.18.0
  • frozenlist ==1.6.2
  • fsspec ==2025.5.1
  • greenlet ==3.2.2
  • h11 ==0.16.0
  • hf-xet ==1.1.3
  • httpcore ==1.0.9
  • httpx ==0.28.1
  • httpx-sse ==0.4.0
  • huggingface-hub ==0.32.4
  • idna ==3.10
  • jmespath ==1.0.1
  • joblib ==1.5.1
  • jsonpatch ==1.33
  • jsonpointer ==3.0.0
  • langchain ==0.3.25
  • langchain-aws ==0.2.24
  • langchain-community ==0.3.24
  • langchain-core ==0.3.63
  • langchain-huggingface ==0.2.0
  • langchain-mistralai ==0.2.10
  • langchain-text-splitters ==0.3.8
  • langsmith ==0.3.44
  • marshmallow ==3.26.1
  • mpmath ==1.3.0
  • multidict ==6.4.4
  • mypy_extensions ==1.1.0
  • networkx ==3.5
  • numpy ==1.26.4
  • nvidia-cublas-cu12 ==12.6.4.1
  • nvidia-cuda-cupti-cu12 ==12.6.80
  • nvidia-cuda-nvrtc-cu12 ==12.6.77
  • nvidia-cuda-runtime-cu12 ==12.6.77
  • nvidia-cudnn-cu12 ==9.5.1.17
  • nvidia-cufft-cu12 ==11.3.0.4
  • nvidia-cufile-cu12 ==1.11.1.6
  • nvidia-curand-cu12 ==10.3.7.77
  • nvidia-cusolver-cu12 ==11.7.1.2
  • nvidia-cusparse-cu12 ==12.5.4.2
  • nvidia-cusparselt-cu12 ==0.6.3
  • nvidia-nccl-cu12 ==2.26.2
  • nvidia-nvjitlink-cu12 ==12.6.85
  • nvidia-nvtx-cu12 ==12.6.77
  • orjson ==3.10.18
  • packaging ==24.2
  • pillow ==11.2.1
  • propcache ==0.3.1
  • pydantic ==2.11.5
  • pydantic-settings ==2.9.1
  • pydantic_core ==2.33.2
  • python-dateutil ==2.9.0.post0
  • python-dotenv ==1.1.0
  • regex ==2024.11.6
  • requests ==2.32.3
  • requests-toolbelt ==1.0.0
  • s3transfer ==0.13.0
  • safetensors ==0.5.3
  • scikit-learn ==1.6.1
  • scipy ==1.15.3
  • sentence-transformers ==4.1.0
  • six ==1.17.0
  • sniffio ==1.3.1
  • sympy ==1.14.0
  • tenacity ==9.1.2
  • threadpoolctl ==3.6.0
  • tokenizers ==0.21.1
  • torch ==2.7.0
  • tqdm ==4.67.1
  • transformers ==4.52.4
  • triton ==3.3.0
  • typing-inspect ==0.9.0
  • typing-inspection ==0.4.1
  • typing_extensions ==4.14.0
  • urllib3 ==2.4.0
  • yarl ==1.20.0
  • zstandard ==0.23.0
src/graph/requirements.txt pypi
  • contourpy ==1.3.2
  • cycler ==0.12.1
  • fonttools ==4.58.2
  • kiwisolver ==1.4.8
  • matplotlib ==3.10.3
  • numpy ==2.3.0
  • packaging ==25.0
  • pillow ==11.2.1
  • pyparsing ==3.2.3
  • python-dateutil ==2.9.0.post0
  • six ==1.17.0