sivand
ESEC/FSE'21: Prediction-Preserving Program Simplification
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
ESEC/FSE'21: Prediction-Preserving Program Simplification
Basic Info
Statistics
- Stars: 10
- Watchers: 3
- Forks: 3
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
SIVAND: Prediction-Preserving Program Simplification
This repository contains the code of prediction-preserving simplification and the simplified data using DD module for our paper 'Understanding Neural Code Intelligence Through Program Simplification' accepted at ESEC/FSE'21.
Artifact for Article (SIVAND):
- ACM DL: https://dl.acm.org/do/10.1145/3462296
- Zenodo: https://doi.org/10.5281/zenodo.5154090
Reproducible Capsule of FeatureExtractor: - CodeOcean: https://codeocean.com/capsule/7985340/tree/v1
Structure
./ # code for model-agnostic DD framework
data/
selected_input # randomly selected test inputs from different datasets
simplified_input # traces of simplified inputs for different models
summary_result # summary results of all experiments as csv
models/
dd-code2seq # DD module with code2seq model
dd-code2vec # DD module with code2vec model
dd-great # DD module with RNN/Transformer model
others/ # related helper functions
save/ # images of SIVAND
Workflow
|
|
:-------------------------:
Delta Debugging (DD) was implemented with Python 2. We have modified the core modules (DD.py, MyDD.py) to run in Python 3 (i.e., Python 3.7.3), and then adopted the DD modules for prediction-preserving program simplification using different models. The approach, SIVAND, is model-agnostic and can be applied to any model by loading a model and making a prediction with the model for a task.
How to Start:
To apply SIVAND (for MethodName task as an example), first update <g_test_file> (path to a file that contains all selected inputs) and <g_deltas_type> (select token or char type delta for DD) in helper.py.
Then, modify load_model_M() to load a target model (i.e., code2vec/code2seq) from <model_path>, and prediction_with_M() to get the predicted name, score, and loss value with <model> for an input <file_path>.
Also, check whether <code> is parsable into is_parsable(), and load method by language (i.e. Java) from load_method().
Finally, run MyDD.py that will simplify programs one by one and save all simplified traces in the dd_data/ folder.
More Details:
Check models/dd-code2vec/ and models/dd-code2seq/ folders to see how SIVAND works with code2vec and code2seq models for MethodName task on Java program.
Similarly, for VarMisuse task (RNN & Transformer models, Python program), check the models/dd-great/ folder for our modified code.
Motivating Example
|
|
|
:-------------------------:|:-------------------------:
|Example of an original and minimized method in which the target is to predict onCreate.| Reduction of a program while preserving the predicted method name OnCreate by the code2vec model.|
The minimized example clearly shows that the model has learned to take shortcuts, in this case looking for the name in the function's body.
Experimental Settings
Tasks:
Models:
- [MN] code2vec & code2seq
- [VM] RNN & Transformer
Datasets:
- [MN] Java-Large
- [VM] Py150
Sample Inputs:
- [MN] Correctly predicted samples, Wrongly predicted samples
- [VM] Buggy (correct location and target; wrong location), Non-buggy (bug-free)
Delta Types:
- [MN] Token & Char
- [VM] Token
Results
The data/summary_result/ folder contains summary results of all experiments as csv, each file has the following fields:
filename: ID for the input file ofdata/simplified_inputfoldermodel: {code2vec, code2seq, RNN, or Transformer}task: METHODNAME or VARIABLEMISUSEfilter_type:- {tokencorrect, charcorrect or tokenwrong} for task == METHODNAME
- {buggycorrect, nonbuggycorrect, or buggywronglocation} for task == VARIABLEMISUSE
initial_score: score of actual programfinal_score: score of minimal programinitial_loss: loss of actual programfinal_loss: loss of minimal programdd_pass: total/valid/correct DD stepss for reductiondd_time: total time spent for reductioninitial_program: actual raw programfinal_program: minimal simplified programinitial_tokens: tokens in actual programfinal_tokens: tokens in minimal programlen_{initial/final/minimal}_{tokens/chars}: number of corresponding {tokens/chars}per_removed_{chars/tokens}: percentage of removed {chars/tokens}attn_nodes: top-k AST nodes based on attention score {k ~= lenfinalnodes}final_nodes: all AST nodes in minimal programcommon_nodes: common nodes between attention & reductionlen_{attn/final/common}_nodes: number of corresponding AST nodesper_common_tokens: percentage of common nodes between attention & reductionground_truth: True (for correct prediction) or False (for wrong prediction)
Note that the <null>, -1, or <empty> value represents that the value is not available for that particular input/experiment.
|
|
:-------------------------:
|Summary of reduction results in correctly predicted samples.|
Citation:
Understanding Neural Code Intelligence through Program Simplification
@inproceedings{rabin2021sivand,
author = {Rabin, Md Rafiqul Islam and Hellendoorn, Vincent J. and Alipour, Mohammad Amin},
title = {Understanding Neural Code Intelligence through Program Simplification},
year = {2021},
isbn = {9781450385626},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3468264.3468539},
doi = {10.1145/3468264.3468539},
booktitle = {Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
pages = {441452},
numpages = {12},
location = {Athens, Greece},
series = {ESEC/FSE 2021}
}
Other Works:
Owner
- Name: M.R.I. Rabin
- Login: mdrafiqulrabin
- Kind: user
- Location: Houston, TX
- Company: University of Houston
- Website: https://sites.google.com/view/mdrafiqulrabin
- Twitter: mdrafiqulrabin
- Repositories: 34
- Profile: https://github.com/mdrafiqulrabin
Ph.D. Candidate in Computer Science at the University of Houston.