trovon

Learning from what we know: How to perform vulnerability prediction using noisy historical data, Empirical Software Engineering (EMSE)

https://github.com/garghub/trovon

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com, ieee.org, acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.1%) to scientific vocabulary

Keywords

encoder-decoder vulnerability-detection
Last synced: 6 months ago · JSON representation

Repository

Learning from what we know: How to perform vulnerability prediction using noisy historical data, Empirical Software Engineering (EMSE)

Basic Info
  • Host: GitHub
  • Owner: garghub
  • License: apache-2.0
  • Default Branch: main
  • Homepage:
  • Size: 12.7 GB
Statistics
  • Stars: 14
  • Watchers: 1
  • Forks: 8
  • Open Issues: 0
  • Releases: 0
Topics
encoder-decoder vulnerability-detection
Created about 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

Learning from what we know: How to perform vulnerability prediction using noisy historical data

This repository contains the source code and dataset for the paper Learning from what we know: How to perform vulnerability prediction using noisy historical data, published in Empirical Software Engineering (EMSE).

The paper is available here: Paper

The bib entry for citing the paper is available here: Cite

In addition to the source code of our proposed approach TROVON, we also implement existing approaches due to unavailable authors' implementation. Our implementations of the existing approaches which we compare TROVON with, are also available in this repository. Please refer to the details below.


Dataset

The dataset is composed of the following:

1) We gathered vulnerabilities, (i.e., the vulnerable and the corresponding fixed components) of the 36 releases of Linux Kernel, 10 releases of Openssl, and 10 releases of Wireshark. For this task, we use VulData7 which is a vulnerability patch gathering tool that used commit IDs provided by National Vulnerability Database (NVD) to gather the aforementioned. These are available in the vulnerabilities directory.

2) We also gathered codebase for the aforementioned releases. For this task, we use FrameVPM which is a framework built to evaluate and compare vulnerability prediction models. The framework is available here.


Source code

The source code of the vulnerability prediction approaches - TROVON and the existing (that we compared TROVON with) are available as below mentioned:

1) Source code of our proposed approach TROVON is available in the code directory.

2) Source code to replicate the following approaches - Software Metrics, Text Mining, Imports, and Function Calls, is available in the FrameVPM repository.

3) Source code of our implementation of the approach Devign is available in the devign directory.

4) Source code of our implementation of the approaches LSTM and LSTM-RF is available in the lstm-rf directory.


Tools required/dependencies to be taken care of

1) Apache Maven 2) srcML 3) seq2seq 4) Tkinter 5) TensorFlow 6) PyYAML 7) Perl


Model training

Please refer to the script train.sh

./train.sh [dirpath] [training-samples-num * epoch-num] [dirpath]/model [config] 1 [training-samples-num] [training-samples-num] 0

For model configuration, please refer length_50-l-1-2.yml. It is configured to train on sequences of length 50, which can be changed based on your requirement.


Model testing

Please refer to the script test.sh

./test.sh [dirpath]/test [dirpath]/model [desired-generated-sequences-file-name]

Owner

  • Name: Aayush Garg, PhD
  • Login: garghub
  • Kind: user
  • Location: Luxembourg
  • Company: Luxembourg Institute of Science and Technology (LIST)

GitHub Events

Total
  • Watch event: 3
Last Year
  • Watch event: 3

Dependencies

code/pom.xml maven
  • com.google.code.gson:gson 2.8.5
  • org.apache.commons:commons-text 1.8
  • org.apache.maven.shared:maven-shared-utils 3.2.1