https://github.com/andstor/peft-unit-test-generation-replication-package

Replication Package for "Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study"

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Replication Package for "Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study"

Basic Info

Host: GitHub
Owner: andstor
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 158 MB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 0

Created about 2 years ago · Last pushed 11 months ago

Metadata Files

Readme License

PEFT unit test generation replication package

Repository structure

This repository is organized as follows: - /training: Contains all scripts for training. - /generation: Contains the scripts for generation. - /data: Contains the experiments' generated data. - /evaluation: Contains the scripts for generating coverage data. - /analysis: Contains all scripts used for data analysis. - /figures: Contains all figures created during data analysis. - /tables: Contains all tables created during data analysis.

Training

To train the models, we use the run_train.py CLI script. The script supports various arguments. See the training/README.md file for more information, along with the hyperparameters used in the paper.

training/ |-- run_train.py Python CLI training script. |-- utils/ Directory containing utility scripts. |-- arguments/ Directory containing the supported arguments for the training script. |-- deepspeed_configs/ Directory containing various DeepSpeed configurations.`

Generation

To generate the unit tests, we use the run_gen.py CLI script. The script supports various arguments. See the generation/README.md file for more information. generation/ |-- run_gen.py Python CLI generation script. |-- stopping_criterias/ Directory containing utility scripts. |-- arguments/ Directory containing the supported arguments for the generation script. |-- zero_inference_config.json DeepSpeed-Inference configuration file.`

Data

data/ |-- <dataset>/ | |-- generated/ The generated unit tests from each experiment. | | |-- <tuning_method>/ The tuning method used. Full pre-trained, fine-tuning, LoRA, IA^3, and prompt tuning. | | |-- <namespace>/ The organization that created the base model. | | |-- <model_name>/ The name of the model. | | |-- 0000[i]-of-0000[n].test.jsonl JSONL file with generated unit tests. | |-- fixed/ Same as "generated" but with fixed data. | |-- coverage/ The coverage data of the generated unit tests. | | |-- <tuning_method>/ The tuning method used. Full pre-trained, fine-tuning, LoRA, IA^3, and prompt tuning. | | |-- <namespace>/ The organization that created the base model. | | |-- <model_name>/ The name of the model. | | |-- jacoco.jsonl JSONL file with jacoco report data. | |-- coverage_branch.csv CSV file containing the branch coverage of the generated unit tests. | |-- coverage_instruction.csv CSV file containing the instruction coverage of the generated unit tests. | |-- passing_rate.csv CSV file containing the percentage of the generated unit tests that are runnable. | |-- scores.csv CSV file containing the CodeBLEU scores of the experiments. | |-- valid_syntax.csv CSV file containing the valid syntax fraction generated code. |-- params_data.csv CSV file with count of trainable parameters for each model.

Analysis

From the generated data, we fix it using the fix_data.ipynb notebook. After fixing the data, we calculate the CodeBLEU scores using the calc_similarity.ipynb notebook. After code coverage is calculated (see the evaluation section), we calculate the statistics of the passing rate and coverage results using the calc_execution_metrics.ipynb notebook. Finally, we analyze the data and generate the plots using the plots.ipynb notebook.

analysis/ |-- java-universal-parser/ Directory containing the Java parser used to validate generated code. |-- fix_data.ipynb Jupyter Notebook file used to fix the generated data. Also calculates the syntactic validity of the generated code. |-- calc_execution_metrics.ipynb Jupyter Notebook file used to calculate the statistics of the passing rate and coverage results. |-- calc_similarity.ipynb Jupyter Notebook file used to calculate the CodeBLEU scores of the fixed data. |-- plots.ipynb Jupyter Notebook file containing the Python code used to analyze the extracted data and generate the resulting plots. |-- tables.ipynb Jupyter Notebook file containing the Python code used to analyze the extracted data and generate the resulting tables.

Evaluation

Code coverage is calculated using the evaluate_humaneval-x.py script. Due to potential security issues with executing arbitrary generated code, we use Docker. Execute at your own risk. See the evaluation/README.md file for container build instructions.

evaluation/ |-- humaneval-x/ | |-- Dockerfile | |-- evaluate_tests.py | |-- pom.xml |-- methods2test_runnable/ | |-- Dockerfile | |-- evaluate_tests.py | |-- validate_buildable.py | |-- validate_runnable.py | |-- find_golden_commits.py | |-- package_runnable.py | |-- src/ | | |-- jacoco_report.py | | |-- java_descriptor_converter.py | | |-- java_utils.py | | |-- surefire_report.py | | |-- test_executer.py | |-- output/ | | |-- commits_[split].jsonl | | |-- buildable_[split].jsonl | | |-- runnable_[split].jsonl Directory containing the scripts for evaluating the Humaneval-X codes. Dockerfile for building the evaluation environment. Python script for evaluating the generated codes. Maven project file for building the evaluation environment. Directory containing the scripts for evaluating the runnable methods2test codes. Dockerfile for building the evaluation environment. Python script for evaluating the generated codes. Python script for validating the buildable methods2test codes. Python script for validating the runnable methods2test codes. Python script for finding the golden commits in the methods2test repository. Python script for packaging the runnable methods2test codes. Directory containing the source code for the runnable methods2test codes. Python script for generating the Jacoco report. Python script for converting Java descriptors. Python script for Java utility functions. Python script for generating the Surefire report. Python script for executing the tests. Directory for storing the intermediate results. JSONL file with commits for buildable methods2test test repositories. JSONL file with the buildable methods2test codes. JSONL file with the runnable methods2test codes.

Replication

Follow the setup instructions within each directory. To replicate the experiments, each follow the steps below:

Train the models using the run_train.py script.
Construct the methods2test_runnable evaluation dataset by following the instructions in the evaluation/methods2test_runnable/README.md file.
Generate the unit tests for the methods2test_runnable dataset and the humaneval-x dataset using the run_gen.py script.
Fix the generated data using the fix_data.ipynb notebook.
Calculate the CodeBLEU scores using the calc_similarity.ipynb notebook.
Execute generated tests and collect coverage data by following instructions for running evaluation of the methods2test_runnable dataset and the humaneval-x dataset. See the respective README.md files for details.
Calculate the statistics of the passing rate and coverage results using the calc_execution_metrics.ipynb notebook.
Analyze the data and generate the figures using the plots.ipynb notebook.
Analyze the data and generate tables by running the tables.ipynb notebook.

Due to the variability of deep learning, we provide both the trained models and the generated results. The results are available in the data/ directory. Metadata and links to the trained models can be found at here. Datasets are available at: methods2test_small, methods2test_meta, methods2test_runnable.

Owner

Name: André Storhaug
Login: andstor
Kind: user
Location: Trondheim 🇳🇴
Company: NTNU

Website: https://andre.storhaug.no
Repositories: 87
Profile: https://github.com/andstor

🎓 CS PhD student @ Norwegian University of Science and Technology (NTNU)

GitHub Events

Total

Watch event: 1
Push event: 35
Public event: 1
Pull request event: 2
Create event: 2

Last Year

Watch event: 1
Push event: 35
Public event: 1
Pull request event: 2
Create event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (2)

Top Labels

Issue Labels

Pull Request Labels

dependencies (2) python (1)

Dependencies

analysis/requirements.txt pypi

codebleu *
ipykernel *
ipywidgets *
matplotlib *
pandarallel *
tqdm *
transformers *
tree-sitter-java *

generation/requirements.txt pypi

accelerate >=0.12.0
datasets >=2.14.0
deepspeed *
evaluate *
peft *
protobuf *
scikit-learn *
sentencepiece *
torch >=1.3
transformers ==4.41.0

training/requirements.txt pypi

accelerate >=0.12.0
datasets >=2.14.0
evaluate *
peft *
protobuf *
scikit-learn *
sentencepiece *
torch >=1.3
transformers ==4.41.0

.github/workflows/build_docker.yml actions

actions/attest-build-provenance v2 composite
actions/checkout v4 composite
docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4 composite
docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1 composite
docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite

evaluation/humaneval-x/Dockerfile docker

python 3.12 build

evaluation/methods2test_runnable/Dockerfile docker

python 3.12 build

evaluation/humaneval-x/pom.xml maven

junit:junit 4.13.2 test

evaluation/humaneval-x/requirements.txt pypi

datasets *
pandas *
tqdm *

evaluation/methods2test_runnable/requirements.txt pypi

datasets * test
gitpython * test
ipywidgets * test
numpy * test
pandas * test
tqdm * test

utilities/java-universal-parser/requirements.txt pypi

antlr4-python3-runtime *
antlr4-tools *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/andstor/peft-unit-test-generation-replication-package

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

PEFT unit test generation replication package

Repository structure

Training

Generation

Data

Analysis

Evaluation

Replication

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies