https://github.com/bstee615/devign

https://github.com/bstee615/devign

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bstee615
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 44.9 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of saikat107/Devign
Created almost 5 years ago · Last pushed over 4 years ago

https://github.com/bstee615/Devign/blob/master/

# Devign - Implementation

In this repository, we provide lightweight implementation of [Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks](https://arxiv.org/pdf/1909.03496.pdf). 

### Requirements
1. Python=3.6 
2. Pytorch==1.4.0
3. [Deep Graph Library](https://www.dgl.ai/)

### Usage
```shell
python main.py \
      --dataset  \
      --input_dir ;
```

### Datset
The `input_dir` should contain three json files namely
1. `train_GGNNinput.json`
2. `valid_GGNNinput.json`
3. `test_GGNNinput.json`

Each json file should contain a list of json object of the following structure 
```shell
{
  'node_features': ,
  'graph': 
  'target': <0 or 1 representing the vulnerability>
}
```

* Let's assume `n` nodes in the graph are indexed as `0` to `n-1`. The length of `node_features` list should be `n`. Each feature vector should be 100 elements long. Thus the `node_features` list should be a 2D list of shape `(n, 100)`.
  
* The length of `graph` list should be the number of the edges. Each edge should be represented as a three element tuple `[source, edge_type, destination]`. Where the `source` and `destinations` are indices of corresponding node in `node_features` list. Edge types should be from `0` to `max_edge_types`. 

## Note 
1. In this implementation, we followed Devign's paper. We could **NOT** recreate the result in the original paper though.

## Reference
[1] Zhou, Yaqin, et al. "Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks." arXiv preprint arXiv:1909.03496 (2019).

Owner

3rd year PhD student @ ISU. Interests and research: deep learning, program analysis

GitHub Events

Total
Last Year