vulntrain
A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Keywords
Repository
A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.
Basic Info
- Host: GitHub
- Owner: vulnerability-lookup
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://pypi.org/project/VulnTrain
- Size: 948 KB
Statistics
- Stars: 13
- Watchers: 4
- Forks: 2
- Open Issues: 3
- Releases: 13
Topics
Metadata Files
README.md
VulnTrain
VulnTrain offers a suite of commands to generate diverse AI datasets and train models using comprehensive vulnerability data from Vulnerability-Lookup. It harnesses over one million JSON records from all supported advisory sources to build high-quality, domain-specific models.
Additionally, data from the vulnerability-lookup:meta container, including enrichment sources such as vulnrichment and Fraunhofer FKIE,
is incorporated to enhance model quality.
Check out the datasets and models on Hugging Face:
For more information about the use of AI in Vulnerability-Lookup, please refer to the user manual.
Usage
Install VulnTrain:
bash
$ pipx install VulnTrain
Three types of commands are available:
- Dataset generation: Create and prepare datasets.
- Model training: Train models using the prepared datasets.
- Model validation: Assess the performance of trained models (validations, benchmarks, etc.).
Check out the documentation for more information.
How to cite
Bonhomme, C., & Dulaunoy, A. (2025). VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Version 1.4.0) [Computer software]. https://doi.org/10.48550/arXiv.2507.03607
bibtex
@misc{bonhomme2025vlai,
title={VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification},
author={Cédric Bonhomme and Alexandre Dulaunoy},
year={2025},
eprint={2507.03607},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
License
VulnTrain is licensed under GNU General Public License version 3
~~~ Copyright (c) 2025 Computer Incident Response Center Luxembourg (CIRCL) Copyright (C) 2025 Cédric Bonhomme - https://github.com/cedricbonhomme Copyright (C) 2025 Léa Ulusan - https://github.com/3LS3-1F ~~~
Owner
- Name: Vulnerability-Lookup
- Login: vulnerability-lookup
- Kind: organization
- Email: info@circl.lu
- Website: https://www.vulnerability-lookup.org
- Repositories: 1
- Profile: https://github.com/vulnerability-lookup
Vulnerability-Lookup facilitates quick correlation of vulnerabilities from various sources, independent of vulnerability IDs.
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use VulnTrain or one of our models, please cite the following work."
title: "VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification"
version: 1.4.0
doi: 10.48550/arXiv.2507.03607
url: https://www.vulnerability-lookup.org
repository-code: https://github.com/vulnerability-lookup/VulnTrain
date-released: 2025-07-04
abstract: >
This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels
directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities
and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage
ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
authors:
- family-names: Bonhomme
given-names: Cédric
orcid: https://orcid.org/0009-0003-7679-0109
- family-names: Dulaunoy
given-names: Alexandre
orcid: https://orcid.org/0000-0002-5437-4652
GitHub Events
Total
- Create event: 20
- Issues event: 4
- Release event: 13
- Watch event: 12
- Delete event: 2
- Member event: 1
- Issue comment event: 4
- Push event: 125
- Pull request event: 4
- Fork event: 2
Last Year
- Create event: 20
- Issues event: 4
- Release event: 13
- Watch event: 12
- Delete event: 2
- Member event: 1
- Issue comment event: 4
- Push event: 125
- Pull request event: 4
- Fork event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Cédric Bonhomme | c****c@c****g | 71 |
| Else-If-05 | l****n@e****r | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 5
- Total pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 0.8
- Average comments per pull request: 0.43
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Issue authors: 2
- Pull request authors: 3
- Average comments per issue: 0.8
- Average comments per pull request: 0.43
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- cedricbonhomme (4)
- 3LS3-1F (1)
Pull Request Authors
- 3LS3-1F (3)
- Else-If-05 (2)
- cedricbonhomme (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 60 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 14
- Total maintainers: 1
pypi.org: vulntrain
Generate datasets amd models based on vulnerabilities data from Vulnerability-Lookup.
- Homepage: https://github.com/vulnerability-lookup/VulnTrain
- Documentation: https://vulntrain.readthedocs.io/
- License: GPL-3.0-or-later
-
Latest release: 2.0.0
published 6 months ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- aiohappyeyeballs 2.4.6
- aiohttp 3.11.12
- aiosignal 1.3.2
- async-timeout 5.0.1
- attrs 25.1.0
- certifi 2025.1.31
- charset-normalizer 3.4.1
- click 8.1.8
- colorama 0.4.6
- datasets 3.3.1
- dill 0.3.8
- filelock 3.17.0
- frozenlist 1.5.0
- fsspec 2024.12.0
- huggingface-hub 0.29.0
- idna 3.10
- joblib 1.4.2
- multidict 6.1.0
- multiprocess 0.70.16
- nltk 3.9.1
- numpy 2.2.3
- packaging 24.2
- pandas 2.2.3
- propcache 0.2.1
- pyarrow 19.0.1
- python-dateutil 2.9.0.post0
- pytz 2025.1
- pyyaml 6.0.2
- regex 2024.11.6
- requests 2.32.3
- six 1.17.0
- tqdm 4.67.1
- typing-extensions 4.12.2
- tzdata 2025.1
- urllib3 2.3.0
- valkey 6.1.0
- xxhash 3.5.0
- yarl 1.18.3
- datasets ^3.3.1
- nltk ^3.9.1
- pandas ^2.2.3
- valkey ^6.1.0