cerebro

Cerebro: Static Subsuming Mutant Selection, IEEE Transactions on Software Engineering (TSE)

https://github.com/garghub/cerebro

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: ieee.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary

Keywords

deep-learning encoder-decoder mutation-testing prediction

Last synced: 6 months ago · JSON representation

Repository

Cerebro: Static Subsuming Mutant Selection, IEEE Transactions on Software Engineering (TSE)

Basic Info

Host: GitHub
Owner: garghub
License: apache-2.0
Default Branch: main
Homepage:
Size: 721 MB

Statistics

Stars: 6
Watchers: 1
Forks: 2
Open Issues: 1
Releases: 0

Topics

deep-learning encoder-decoder mutation-testing prediction

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Citation

Cerebro: Static Subsuming Mutant Selection

This repo contains the code, data set and trained models for the paper Cerebro: Static Subsuming Mutant Selection, published in IEEE Transactions on Software Engineering (TSE).

The paper is available here:

The bib entry for citing the paper is available here:

The dataset is composed of the following:

1) Codebase gathered for the 48 GNU Coreutils [1] programs in C language and 10 projects in Java from Apache Commons Proper [2], Joda-Time [3], and Jsoup [4];

2) Mutant infomation in json file format for every program/project with Mutant ID, Source Code File Name, Mutation Type, and Line #;

3) Subsuming Mutant Label information in json file format with mapping to every mutant on ID basis for every program/project;

4) Abstracted Code for every original source code file and mutant for every program/project; and

5) Mutant Annotation Sequences in pairs of lhs (input) and rhs (expected output) for all mutants in every project/program, with mappings between Sequence File Indexes and Mutant IDs, and Sequences and Original Code File Indexes.

Tools/dependencies that we require before executing the code:

Apache Maven ( available here: https://maven.apache.org/download.cgi )
srcML ( available here: https://www.srcml.org/ )

NOTE: please do not forget to modify below variables in data.java file to specify your desired repository locations and/or dependencies

static String dirDataset = "D:/ag/github/Cerebro/dataset";

Commands to execute:

mvn clean package

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar [arguments]

options based on tasks:

to prepare dataset for model training:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep [language] [sequence-length] [abstraction-level]

where,

available options for [language] are c or java

[sequence-length] is the desired number of tokens in a sequence (numeric value) e.g. 25 / 50 / 100

available options for [abstraction-level] are full and partial

so, to create dataset for projects in java, of sequence length 100 with abstraction, below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep java 100 full

to create dataset for projects in c, of sequence length 50 with no abstraction (only code comments removed), below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep c 50 partial

to test the performance of model by evaluating the model generated sequences:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar test [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.

to generate XMLs for input in simulation:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar combinetosimulate [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.

Where to find trained models in the repo?

the trained models are available as below:

dataset/subsuming-mutant-prediction-[language]/smp/smp-[language]-[sequence-length]-[fold#]/model

e.g. model trained for java projects with abstracted sequences of length 100 is available below:

dataset/subsuming-mutant-prediction-java/smp/smp-java-100-01/model

Tools/dependencies that we require to train/test the models:

seq2seq ( available here: https://google.github.io/seq2seq/getting_started/#download-setup )
Tkinter (available here: https://docs.python.org/3.8/library/tkinter.html )
TensorFlow ( available here: https://www.tensorflow.org/install/pip )
PyYAML ( available here: https://pyyaml.org/wiki/LibYAML )
Perl (available here: https://www.cpan.org/modules/INSTALL.html )

for model training:

please refer to the script train.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/train.sh

./train.sh [dirpath] [training-samples-num * epoch-num] [dirpath]/model [config] 1 [training-samples-num] [training-samples-num] 0

below is a sample usage for training a model till 10 epochs for projects in java with sequence length 50 having 135,903 training samples:

./train.sh ../smp-java-50-01 1359030 ../smp-java-50-01/model length_51-g-1-2 1 135903 135903 0

please refer to configurations available in directory Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/configs.

for sequence length 25, 50, and 100, please use length26-g-1-2, length51-g-1-2, and length_101-g-1-2

for model testing:

please refer to the script test.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/test.sh

./test.sh [dirpath]/test [dirpath]/model [desired-generated-sequences-file-name]

below is a sample usage for using the trained model available at location - (../smp-java-50-01/model) and test set available at location - (../smp-java-50-01/test) to generate sequences in file genrhs-smp-java-50-01.txt:

./test.sh ../smp-java-50-01/test ../smp-java-50-01/model genrhs-smp-java-50-01.txt

note:

please note that few models were larger than 100MB in size, hence they were split in 2 files to be able to check-in. below are those models:

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-01/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-02/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-03/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-04/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-05/model/model.ckpt.data-00000-of-00001

in aforementioned cases, model.ckpt.data-00000-of-00001 was divided in model.ckpt.data-00000-of-00001.001 and model.ckpt.data-00000-of-00001.002

References

[1] GNU Coreutils. https://www.gnu.org/software/coreutils/, (last accessed April 24, 2021).

[2] Apache Commons Proper. https://commons.apache.org, (last accessed April 24, 2021).

[3] Joda-Time. https://github.com/JodaOrg/joda-time/, (last accessed April 24, 2021).

[4] Jsoup. https://github.com/jhy/jsoup, (last accessed April 24, 2021).

Owner

Name: Aayush Garg, PhD
Login: garghub
Kind: user
Location: Luxembourg
Company: Luxembourg Institute of Science and Technology (LIST)

Website: https://draayushgarg.github.io/
Twitter: AayushGarg4real
Repositories: 40
Profile: https://github.com/garghub

GitHub Events

Total

Watch event: 9

Last Year

Watch event: 9

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science