Recent Releases of vulntrain

vulntrain - Release 1.5.0

News

  • Dataset generation: Associating Git Fixes with Common Weakness Enumerations (CWEs) found in security advisories. (#4)
  • A documentation is now available. (8a345ca)

Changes

  • Model generation: Added a boolean parameter in mapcvssto_severity in order to switch between using the first non-null CVSS score or the mean of all available CVSS scores. (ff6616e)
  • Dataset generation: Removed useless keys in extract_cnvd (b7d694)

- Python
Published by cedricbonhomme 10 months ago

vulntrain - Release 1.4.0

This version adds support for creating new AI-ready datasets based on the China National Vulnerability Database (CNVD). It also introduces a new training module designed to classify vulnerabilities using text classification models tailored for CNVD data. By default hfl/chinese-macbert-base is used but it is possible to use hfl/chinese-bert-wwm-ext or google-bert/bert-base-chinese. By @3LS3-1F

- Python
Published by cedricbonhomme 11 months ago

vulntrain - Release 1.3.1

Updated dependencies and fixed issues due to changes in transformers.

- Python
Published by cedricbonhomme about 1 year ago

vulntrain - Release 1.3.0

Changes

  • Updated dependencies.

- Python
Published by cedricbonhomme about 1 year ago

vulntrain - Release 1.2.0

Changes

  • Dataset generation: CVSS are now extracted from GitHub and PySec security advisories.
  • Dataset generation: CVSS, CPE, title and description (summary) are now extracted from CSAF document.

- Python
Published by cedricbonhomme about 1 year ago

vulntrain - Release 1.1.0

News

  • Trainers: Support of roberta-base for the text classifier with improved settings for TrainingArguments.
  • Validators: Validator for severity classification.

- Python
Published by cedricbonhomme over 1 year ago

vulntrain - Release 1.0.0

News

  • Introduced a new trainer to automatically classify vulnerabilities based on their descriptions,
    even when CVSS scores are unavailable.
  • Added CVSS parsing to the dataset generation script.

Changes

  • Refactored the project structure for better organization.
  • Improved CPE parsing.
  • Enhanced the dataset generation script.
  • Optimized the trainer for text generation on vulnerability descriptions.
  • Improved command-line argument parsing.
  • Improved the process of pushing the tokenizer and trainer to Hugging Face.

- Python
Published by cedricbonhomme over 1 year ago

vulntrain - Release 0.5.1

Fixed configuration module name.

- Python
Published by cedricbonhomme over 1 year ago

vulntrain - Release 0.5.0

Added support of configuration file.

- Python
Published by cedricbonhomme over 1 year ago

vulntrain - Release 0.4.0

The dataset generation step now uses data from GitHub Advisories, and the VulnExtractor cleans the summary and details fields.

- Python
Published by cedricbonhomme over 1 year ago

vulntrain - Release 0.3.0

News

Dataset generation: allow specifying a commit message when uploading to Hugging Face.

Validation: Added a simple validation script for a model optimized for text generation. The script is able to pull a model and send tasks via a Pipeline

Changes

Training step: added the choices of model: gpt2, distilgpt2, meta-llama/Llama-3.3-70B-Instruct, and distilbert-base-uncased

Various improvements to the command line parsing.

- Python
Published by cedricbonhomme over 1 year ago

vulntrain - Release 0.2.0

News

  • Added a trainer.
  • Experimenting distilbert-base-uncased (AutoModelForMaskedLM) and gpt2 (AutoModelForCausalLM). The goal is to generate text.

Changes

  • Various improvements to the dataset generator. And added a command line parser.

- Python
Published by cedricbonhomme over 1 year ago

vulntrain - Release 0.1.0

First release with upload of datasets to HuggingFace.

Datasets are build based on NIST data with enrichment from FKIE and vulnrichment.

- Python
Published by cedricbonhomme over 1 year ago