Recent Releases of vulntrain
vulntrain - Release 1.5.0
News
- Dataset generation: Associating Git Fixes with Common Weakness Enumerations (CWEs) found in security advisories. (#4)
- A documentation is now available. (8a345ca)
Changes
- Model generation: Added a boolean parameter in mapcvssto_severity in order to switch between using the first non-null CVSS score or the mean of all available CVSS scores. (ff6616e)
- Dataset generation: Removed useless keys in extract_cnvd (b7d694)
- Python
Published by cedricbonhomme 10 months ago
vulntrain - Release 1.4.0
This version adds support for creating new AI-ready datasets based on the China National Vulnerability Database (CNVD). It also introduces a new training module designed to classify vulnerabilities using text classification models tailored for CNVD data. By default hfl/chinese-macbert-base is used but it is possible to use hfl/chinese-bert-wwm-ext or google-bert/bert-base-chinese.
By @3LS3-1F
- Python
Published by cedricbonhomme 11 months ago
vulntrain - Release 1.3.1
Updated dependencies and fixed issues due to changes in transformers.
- Python
Published by cedricbonhomme about 1 year ago
vulntrain - Release 1.3.0
Changes
- Updated dependencies.
- Python
Published by cedricbonhomme about 1 year ago
vulntrain - Release 1.2.0
Changes
- Dataset generation: CVSS are now extracted from GitHub and PySec security advisories.
- Dataset generation: CVSS, CPE, title and description (summary) are now extracted from CSAF document.
- Python
Published by cedricbonhomme about 1 year ago
vulntrain - Release 1.1.0
News
- Trainers: Support of roberta-base for the text classifier with improved settings for TrainingArguments.
- Validators: Validator for severity classification.
- Python
Published by cedricbonhomme over 1 year ago
vulntrain - Release 1.0.0
News
- Introduced a new trainer to automatically classify vulnerabilities based on their descriptions,
even when CVSS scores are unavailable. - Added CVSS parsing to the dataset generation script.
Changes
- Refactored the project structure for better organization.
- Improved CPE parsing.
- Enhanced the dataset generation script.
- Optimized the trainer for text generation on vulnerability descriptions.
- Improved command-line argument parsing.
- Improved the process of pushing the tokenizer and trainer to Hugging Face.
- Python
Published by cedricbonhomme over 1 year ago
vulntrain - Release 0.5.1
Fixed configuration module name.
- Python
Published by cedricbonhomme over 1 year ago
vulntrain - Release 0.5.0
Added support of configuration file.
- Python
Published by cedricbonhomme over 1 year ago
vulntrain - Release 0.4.0
The dataset generation step now uses data from GitHub Advisories, and the VulnExtractor cleans the summary and details fields.
- Python
Published by cedricbonhomme over 1 year ago
vulntrain - Release 0.3.0
News
Dataset generation: allow specifying a commit message when uploading to Hugging Face.
Validation: Added a simple validation script for a model optimized for text generation. The script is able to pull a model and send tasks via a Pipeline
Changes
Training step: added the choices of model: gpt2, distilgpt2, meta-llama/Llama-3.3-70B-Instruct, and distilbert-base-uncased
Various improvements to the command line parsing.
- Python
Published by cedricbonhomme over 1 year ago
vulntrain - Release 0.2.0
News
- Added a trainer.
- Experimenting distilbert-base-uncased (AutoModelForMaskedLM) and gpt2 (AutoModelForCausalLM). The goal is to generate text.
Changes
- Various improvements to the dataset generator. And added a command line parser.
- Python
Published by cedricbonhomme over 1 year ago
vulntrain - Release 0.1.0
First release with upload of datasets to HuggingFace.
Datasets are build based on NIST data with enrichment from FKIE and vulnrichment.
- Python
Published by cedricbonhomme over 1 year ago