sivand

ESEC/FSE'21: Prediction-Preserving Program Simplification

https://github.com/mdrafiqulrabin/sivand

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 9 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, acm.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

ESEC/FSE'21: Prediction-Preserving Program Simplification

Basic Info

Host: GitHub
Owner: mdrafiqulrabin
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 100 MB

Statistics

Stars: 10
Watchers: 3
Forks: 3
Open Issues: 0
Releases: 2

Created over 5 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Citation

SIVAND: Prediction-Preserving Program Simplification

This repository contains the code of prediction-preserving simplification and the simplified data using DD module for our paper 'Understanding Neural Code Intelligence Through Program Simplification' accepted at ESEC/FSE'21.

Artifact for Article (SIVAND):
- ACM DL: https://dl.acm.org/do/10.1145/3462296 - Zenodo: https://doi.org/10.5281/zenodo.5154090

Reproducible Capsule of FeatureExtractor: - CodeOcean: https://codeocean.com/capsule/7985340/tree/v1

Structure

./ # code for model-agnostic DD framework data/ selected_input # randomly selected test inputs from different datasets simplified_input # traces of simplified inputs for different models summary_result # summary results of all experiments as csv models/ dd-code2seq # DD module with code2seq model dd-code2vec # DD module with code2vec model dd-great # DD module with RNN/Transformer model others/ # related helper functions save/ # images of SIVAND

Workflow

| Workflow in SIVAND | :-------------------------:

Delta Debugging (DD) was implemented with Python 2. We have modified the core modules (DD.py, MyDD.py) to run in Python 3 (i.e., Python 3.7.3), and then adopted the DD modules for prediction-preserving program simplification using different models. The approach, SIVAND, is model-agnostic and can be applied to any model by loading a model and making a prediction with the model for a task.

How to Start: To apply SIVAND (for MethodName task as an example), first update <g_test_file> (path to a file that contains all selected inputs) and <g_deltas_type> (select token or char type delta for DD) in helper.py. Then, modify load_model_M() to load a target model (i.e., code2vec/code2seq) from <model_path>, and prediction_with_M() to get the predicted name, score, and loss value with <model> for an input <file_path>. Also, check whether <code> is parsable into is_parsable(), and load method by language (i.e. Java) from load_method(). Finally, run MyDD.py that will simplify programs one by one and save all simplified traces in the dd_data/ folder.

More Details: Check models/dd-code2vec/ and models/dd-code2seq/ folders to see how SIVAND works with code2vec and code2seq models for MethodName task on Java program. Similarly, for VarMisuse task (RNN & Transformer models, Python program), check the models/dd-great/ folder for our modified code.

Motivating Example

| Motivating Example | Traces of Reduction | :-------------------------:|:-------------------------: |Example of an original and minimized method in which the target is to predict onCreate.| Reduction of a program while preserving the predicted method name OnCreate by the code2vec model.|

The minimized example clearly shows that the model has learned to take shortcuts, in this case looking for the name in the function's body.

Experimental Settings

Tasks:
- MethodName (MN)
- VarMisuse (VM)
Models:
- [MN] code2vec & code2seq
- [VM] RNN & Transformer
Datasets:
- [MN] Java-Large
- [VM] Py150
Sample Inputs:
- [MN] Correctly predicted samples, Wrongly predicted samples
- [VM] Buggy (correct location and target; wrong location), Non-buggy (bug-free)
Delta Types:
- [MN] Token & Char
- [VM] Token

Results

The data/summary_result/ folder contains summary results of all experiments as csv, each file has the following fields:

filename: ID for the input file of data/simplified_input folder
model: {code2vec, code2seq, RNN, or Transformer}
task: METHODNAME or VARIABLEMISUSE
filter_type:
- {tokencorrect, charcorrect or tokenwrong} for task == METHODNAME
- {buggycorrect, nonbuggycorrect, or buggywronglocation} for task == VARIABLEMISUSE
initial_score: score of actual program
final_score: score of minimal program
initial_loss: loss of actual program
final_loss: loss of minimal program
dd_pass: total/valid/correct DD stepss for reduction
dd_time: total time spent for reduction
initial_program: actual raw program
final_program: minimal simplified program
initial_tokens: tokens in actual program
final_tokens: tokens in minimal program
len_{initial/final/minimal}_{tokens/chars}: number of corresponding {tokens/chars}
per_removed_{chars/tokens}: percentage of removed {chars/tokens}
attn_nodes: top-k AST nodes based on attention score {k ~= lenfinalnodes}
final_nodes: all AST nodes in minimal program
common_nodes: common nodes between attention & reduction
len_{attn/final/common}_nodes: number of corresponding AST nodes
per_common_tokens: percentage of common nodes between attention & reduction
ground_truth: True (for correct prediction) or False (for wrong prediction)

Note that the <null>, -1, or <empty> value represents that the value is not available for that particular input/experiment.

| Summary of Results | :-------------------------: |Summary of reduction results in correctly predicted samples.|

Citation:

Understanding Neural Code Intelligence through Program Simplification

@inproceedings{rabin2021sivand, author = {Rabin, Md Rafiqul Islam and Hellendoorn, Vincent J. and Alipour, Mohammad Amin}, title = {Understanding Neural Code Intelligence through Program Simplification}, year = {2021}, isbn = {9781450385626}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3468264.3468539}, doi = {10.1145/3468264.3468539}, booktitle = {Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, pages = {441452}, numpages = {12}, location = {Athens, Greece}, series = {ESEC/FSE 2021} }

Other Works:

Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models [arXiv, GitHub]

Owner

Name: M.R.I. Rabin
Login: mdrafiqulrabin
Kind: user
Location: Houston, TX
Company: University of Houston

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science