vulntrain

A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.

https://github.com/vulnerability-lookup/vulntrain

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Keywords

dataset llm nlp text-generation vulnerability vulnerability-lookup

Last synced: 6 months ago · JSON representation ·

Repository

A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.

Basic Info

Host: GitHub
Owner: vulnerability-lookup
License: gpl-3.0
Language: Python
Default Branch: main
Homepage: https://pypi.org/project/VulnTrain
Size: 948 KB

Statistics

Stars: 13
Watchers: 4
Forks: 2
Open Issues: 3
Releases: 13

Topics

dataset llm nlp text-generation vulnerability vulnerability-lookup

Created about 1 year ago · Last pushed 6 months ago

Metadata Files

Readme Changelog License Citation Authors

VulnTrain

VulnTrain offers a suite of commands to generate diverse AI datasets and train models using comprehensive vulnerability data from Vulnerability-Lookup. It harnesses over one million JSON records from all supported advisory sources to build high-quality, domain-specific models.

Additionally, data from the vulnerability-lookup:meta container, including enrichment sources such as vulnrichment and Fraunhofer FKIE, is incorporated to enhance model quality.

Check out the datasets and models on Hugging Face:

For more information about the use of AI in Vulnerability-Lookup, please refer to the user manual.

Usage

Install VulnTrain:

bash $ pipx install VulnTrain

Three types of commands are available:

Dataset generation: Create and prepare datasets.
Model training: Train models using the prepared datasets.
- Train a model to classify vulnerabilities by severity.
- Train a model for text generation to assist in writing vulnerability descriptions
Model validation: Assess the performance of trained models (validations, benchmarks, etc.).

Check out the documentation for more information.

How to cite

Bonhomme, C., & Dulaunoy, A. (2025). VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Version 1.4.0) [Computer software]. https://doi.org/10.48550/arXiv.2507.03607

bibtex @misc{bonhomme2025vlai, title={VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification}, author={Cédric Bonhomme and Alexandre Dulaunoy}, year={2025}, eprint={2507.03607}, archivePrefix={arXiv}, primaryClass={cs.CR} }

License

VulnTrain is licensed under GNU General Public License version 3

~~~ Copyright (c) 2025 Computer Incident Response Center Luxembourg (CIRCL) Copyright (C) 2025 Cédric Bonhomme - https://github.com/cedricbonhomme Copyright (C) 2025 Léa Ulusan - https://github.com/3LS3-1F ~~~

Owner

Name: Vulnerability-Lookup
Login: vulnerability-lookup
Kind: organization
Email: info@circl.lu

Website: https://www.vulnerability-lookup.org
Repositories: 1
Profile: https://github.com/vulnerability-lookup

Vulnerability-Lookup facilitates quick correlation of vulnerabilities from various sources, independent of vulnerability IDs.

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use VulnTrain or one of our models, please cite the following work."
title: "VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification"
version: 1.4.0
doi: 10.48550/arXiv.2507.03607
url: https://www.vulnerability-lookup.org
repository-code: https://github.com/vulnerability-lookup/VulnTrain
date-released: 2025-07-04
abstract: >
  This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels 
  directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities 
  and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage 
  ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
authors:
  - family-names: Bonhomme
    given-names: Cédric
    orcid: https://orcid.org/0009-0003-7679-0109
  - family-names: Dulaunoy
    given-names: Alexandre
    orcid: https://orcid.org/0000-0002-5437-4652

GitHub Events

Total

Create event: 20
Issues event: 4
Release event: 13
Watch event: 12
Delete event: 2
Member event: 1
Issue comment event: 4
Push event: 125
Pull request event: 4
Fork event: 2

Last Year

Create event: 20
Issues event: 4
Release event: 13
Watch event: 12
Delete event: 2
Member event: 1
Issue comment event: 4
Push event: 125
Pull request event: 4
Fork event: 2

Committers

Last synced: 9 months ago

All Time

Total Commits: 72
Total Committers: 2
Avg Commits per committer: 36.0
Development Distribution Score (DDS): 0.014

Past Year

Commits: 72
Committers: 2
Avg Commits per committer: 36.0
Development Distribution Score (DDS): 0.014

Top Committers

Name	Email	Commits
Cédric Bonhomme	c**c@c**g	71
Else-If-05	l**n@e**r	1

Committer Domains (Top 20 + Academic)

edu.ece.fr: 1 cedricbonhomme.org: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 5
Total pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: about 3 hours
Total issue authors: 2
Total pull request authors: 3
Average comments per issue: 0.8
Average comments per pull request: 0.43
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: about 3 hours
Issue authors: 2
Pull request authors: 3
Average comments per issue: 0.8
Average comments per pull request: 0.43
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

cedricbonhomme (4)
3LS3-1F (1)

Pull Request Authors

3LS3-1F (3)
Else-If-05 (2)
cedricbonhomme (2)

Top Labels

Issue Labels

enhancement (3) dataset (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 60 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 14
Total maintainers: 1

pypi.org: vulntrain

Generate datasets amd models based on vulnerabilities data from Vulnerability-Lookup.

Homepage: https://github.com/vulnerability-lookup/VulnTrain
Documentation: https://vulntrain.readthedocs.io/
License: GPL-3.0-or-later
Latest release: 2.0.0
published 6 months ago

Versions: 14
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 60 Last month

Rankings

Dependent packages count: 9.6%

Average: 31.8%

Dependent repos count: 54.1%

Maintainers (1)

cedricbonhomme

Last synced: 6 months ago

Dependencies

.github/workflows/release.yml actions

actions/checkout v4 composite
pypa/gh-action-pypi-publish release/v1 composite

poetry.lock pypi

aiohappyeyeballs 2.4.6
aiohttp 3.11.12
aiosignal 1.3.2
async-timeout 5.0.1
attrs 25.1.0
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
colorama 0.4.6
datasets 3.3.1
dill 0.3.8
filelock 3.17.0
frozenlist 1.5.0
fsspec 2024.12.0
huggingface-hub 0.29.0
idna 3.10
joblib 1.4.2
multidict 6.1.0
multiprocess 0.70.16
nltk 3.9.1
numpy 2.2.3
packaging 24.2
pandas 2.2.3
propcache 0.2.1
pyarrow 19.0.1
python-dateutil 2.9.0.post0
pytz 2025.1
pyyaml 6.0.2
regex 2024.11.6
requests 2.32.3
six 1.17.0
tqdm 4.67.1
typing-extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
valkey 6.1.0
xxhash 3.5.0
yarl 1.18.3

pyproject.toml pypi

datasets ^3.3.1
nltk ^3.9.1
pandas ^2.2.3
valkey ^6.1.0

vulntrain

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

VulnTrain

Usage

How to cite

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: vulntrain

Rankings

Maintainers (1)

Dependencies