cerebro
Cerebro: Static Subsuming Mutant Selection, IEEE Transactions on Software Engineering (TSE)
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: ieee.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary
Keywords
Repository
Cerebro: Static Subsuming Mutant Selection, IEEE Transactions on Software Engineering (TSE)
Basic Info
Statistics
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
Cerebro: Static Subsuming Mutant Selection
This repo contains the code, data set and trained models for the paper Cerebro: Static Subsuming Mutant Selection, published in IEEE Transactions on Software Engineering (TSE).
The bib entry for citing the paper is available here:
The dataset is composed of the following:
1) Codebase gathered for the 48 GNU Coreutils [1] programs in C language and 10 projects in Java from Apache Commons Proper [2], Joda-Time [3], and Jsoup [4];
2) Mutant infomation in json file format for every program/project with Mutant ID, Source Code File Name, Mutation Type, and Line #;
3) Subsuming Mutant Label information in json file format with mapping to every mutant on ID basis for every program/project;
4) Abstracted Code for every original source code file and mutant for every program/project; and
5) Mutant Annotation Sequences in pairs of lhs (input) and rhs (expected output) for all mutants in every project/program, with mappings between Sequence File Indexes and Mutant IDs, and Sequences and Original Code File Indexes.
Tools/dependencies that we require before executing the code:
- Apache Maven ( available here: https://maven.apache.org/download.cgi )
- srcML ( available here: https://www.srcml.org/ )
NOTE: please do not forget to modify below variables in data.java file to specify your desired repository locations and/or dependencies
static String dirDataset = "D:/ag/github/Cerebro/dataset";
Commands to execute:
mvn clean package
java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar [arguments]
options based on tasks:
to prepare dataset for model training:
java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep [language] [sequence-length] [abstraction-level]
where,
available options for [language] are c or java
[sequence-length] is the desired number of tokens in a sequence (numeric value) e.g. 25 / 50 / 100
available options for [abstraction-level] are full and partial
so, to create dataset for projects in java, of sequence length 100 with abstraction, below command should be executed:
java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep java 100 full
to create dataset for projects in c, of sequence length 50 with no abstraction (only code comments removed), below command should be executed:
java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep c 50 partial
to test the performance of model by evaluating the model generated sequences:
java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar test [language] [sequence-length] [abstraction-level]
values for [language], [sequence-length], and [abstraction-level] follow the same as described above.
to generate XMLs for input in simulation:
java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar combinetosimulate [language] [sequence-length] [abstraction-level]
values for [language], [sequence-length], and [abstraction-level] follow the same as described above.
Where to find trained models in the repo?
the trained models are available as below:
dataset/subsuming-mutant-prediction-[language]/smp/smp-[language]-[sequence-length]-[fold#]/model
e.g. model trained for java projects with abstracted sequences of length 100 is available below:
dataset/subsuming-mutant-prediction-java/smp/smp-java-100-01/model
Tools/dependencies that we require to train/test the models:
- seq2seq ( available here: https://google.github.io/seq2seq/getting_started/#download-setup )
- Tkinter (available here: https://docs.python.org/3.8/library/tkinter.html )
- TensorFlow ( available here: https://www.tensorflow.org/install/pip )
- PyYAML ( available here: https://pyyaml.org/wiki/LibYAML )
- Perl (available here: https://www.cpan.org/modules/INSTALL.html )
for model training:
please refer to the script train.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/train.sh
./train.sh [dirpath] [training-samples-num * epoch-num] [dirpath]/model [config] 1 [training-samples-num] [training-samples-num] 0
below is a sample usage for training a model till 10 epochs for projects in java with sequence length 50 having 135,903 training samples:
./train.sh ../smp-java-50-01 1359030 ../smp-java-50-01/model length_51-g-1-2 1 135903 135903 0
please refer to configurations available in directory Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/configs.
for sequence length 25, 50, and 100, please use length26-g-1-2, length51-g-1-2, and length_101-g-1-2
for model testing:
please refer to the script test.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/test.sh
./test.sh [dirpath]/test [dirpath]/model [desired-generated-sequences-file-name]
below is a sample usage for using the trained model available at location - (../smp-java-50-01/model) and test set available at location - (../smp-java-50-01/test) to generate sequences in file genrhs-smp-java-50-01.txt:
./test.sh ../smp-java-50-01/test ../smp-java-50-01/model genrhs-smp-java-50-01.txt
note:
please note that few models were larger than 100MB in size, hence they were split in 2 files to be able to check-in. below are those models:
dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-01/model/model.ckpt.data-00000-of-00001
dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-02/model/model.ckpt.data-00000-of-00001
dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-03/model/model.ckpt.data-00000-of-00001
dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-04/model/model.ckpt.data-00000-of-00001
dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-05/model/model.ckpt.data-00000-of-00001
in aforementioned cases, model.ckpt.data-00000-of-00001 was divided in model.ckpt.data-00000-of-00001.001 and model.ckpt.data-00000-of-00001.002
References
[1] GNU Coreutils. https://www.gnu.org/software/coreutils/, (last accessed April 24, 2021).
[2] Apache Commons Proper. https://commons.apache.org, (last accessed April 24, 2021).
[3] Joda-Time. https://github.com/JodaOrg/joda-time/, (last accessed April 24, 2021).
[4] Jsoup. https://github.com/jhy/jsoup, (last accessed April 24, 2021).
Owner
- Name: Aayush Garg, PhD
- Login: garghub
- Kind: user
- Location: Luxembourg
- Company: Luxembourg Institute of Science and Technology (LIST)
- Website: https://draayushgarg.github.io/
- Twitter: AayushGarg4real
- Repositories: 40
- Profile: https://github.com/garghub
GitHub Events
Total
- Watch event: 9
Last Year
- Watch event: 9