https://github.com/andstor/peft-unit-test-generation-replication-package

Replication Package for "Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study"

https://github.com/andstor/peft-unit-test-generation-replication-package

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Replication Package for "Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study"

Basic Info
  • Host: GitHub
  • Owner: andstor
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 158 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 2
  • Releases: 0
Created about 2 years ago · Last pushed 11 months ago
Metadata Files
Readme License

README.md

PEFT unit test generation replication package

Repository structure

This repository is organized as follows: - /training: Contains all scripts for training. - /generation: Contains the scripts for generation. - /data: Contains the experiments' generated data. - /evaluation: Contains the scripts for generating coverage data. - /analysis: Contains all scripts used for data analysis. - /figures: Contains all figures created during data analysis. - /tables: Contains all tables created during data analysis.

Training

To train the models, we use the run_train.py CLI script. The script supports various arguments. See the training/README.md file for more information, along with the hyperparameters used in the paper.

training/ |-- run_train.py Python CLI training script. |-- utils/ Directory containing utility scripts. |-- arguments/ Directory containing the supported arguments for the training script. |-- deepspeed_configs/ Directory containing various DeepSpeed configurations. `

Generation

To generate the unit tests, we use the run_gen.py CLI script. The script supports various arguments. See the generation/README.md file for more information. generation/ |-- run_gen.py Python CLI generation script. |-- stopping_criterias/ Directory containing utility scripts. |-- arguments/ Directory containing the supported arguments for the generation script. |-- zero_inference_config.json DeepSpeed-Inference configuration file. `

Data

data/ |-- <dataset>/ | |-- generated/ The generated unit tests from each experiment. | | |-- <tuning_method>/ The tuning method used. Full pre-trained, fine-tuning, LoRA, IA^3, and prompt tuning. | | |-- <namespace>/ The organization that created the base model. | | |-- <model_name>/ The name of the model. | | |-- 0000[i]-of-0000[n].test.jsonl JSONL file with generated unit tests. | |-- fixed/ Same as "generated" but with fixed data. | |-- coverage/ The coverage data of the generated unit tests. | | |-- <tuning_method>/ The tuning method used. Full pre-trained, fine-tuning, LoRA, IA^3, and prompt tuning. | | |-- <namespace>/ The organization that created the base model. | | |-- <model_name>/ The name of the model. | | |-- jacoco.jsonl JSONL file with jacoco report data. | |-- coverage_branch.csv CSV file containing the branch coverage of the generated unit tests. | |-- coverage_instruction.csv CSV file containing the instruction coverage of the generated unit tests. | |-- passing_rate.csv CSV file containing the percentage of the generated unit tests that are runnable. | |-- scores.csv CSV file containing the CodeBLEU scores of the experiments. | |-- valid_syntax.csv CSV file containing the valid syntax fraction generated code. |-- params_data.csv CSV file with count of trainable parameters for each model.

Analysis

From the generated data, we fix it using the fix_data.ipynb notebook. After fixing the data, we calculate the CodeBLEU scores using the calc_similarity.ipynb notebook. After code coverage is calculated (see the evaluation section), we calculate the statistics of the passing rate and coverage results using the calc_execution_metrics.ipynb notebook. Finally, we analyze the data and generate the plots using the plots.ipynb notebook.

analysis/ |-- java-universal-parser/ Directory containing the Java parser used to validate generated code. |-- fix_data.ipynb Jupyter Notebook file used to fix the generated data. Also calculates the syntactic validity of the generated code. |-- calc_execution_metrics.ipynb Jupyter Notebook file used to calculate the statistics of the passing rate and coverage results. |-- calc_similarity.ipynb Jupyter Notebook file used to calculate the CodeBLEU scores of the fixed data. |-- plots.ipynb Jupyter Notebook file containing the Python code used to analyze the extracted data and generate the resulting plots. |-- tables.ipynb Jupyter Notebook file containing the Python code used to analyze the extracted data and generate the resulting tables.

Evaluation

Code coverage is calculated using the evaluate_humaneval-x.py script. Due to potential security issues with executing arbitrary generated code, we use Docker. Execute at your own risk. See the evaluation/README.md file for container build instructions.

evaluation/ |-- humaneval-x/ Directory containing the scripts for evaluating the Humaneval-X codes. | |-- Dockerfile Dockerfile for building the evaluation environment. | |-- evaluate_tests.py Python script for evaluating the generated codes. | |-- pom.xml Maven project file for building the evaluation environment. |-- methods2test_runnable/ Directory containing the scripts for evaluating the runnable methods2test codes. | |-- Dockerfile Dockerfile for building the evaluation environment. | |-- evaluate_tests.py Python script for evaluating the generated codes. | |-- validate_buildable.py Python script for validating the buildable methods2test codes. | |-- validate_runnable.py Python script for validating the runnable methods2test codes. | |-- find_golden_commits.py Python script for finding the golden commits in the methods2test repository. | |-- package_runnable.py Python script for packaging the runnable methods2test codes. | |-- src/ Directory containing the source code for the runnable methods2test codes. | | |-- jacoco_report.py Python script for generating the Jacoco report. | | |-- java_descriptor_converter.py Python script for converting Java descriptors. | | |-- java_utils.py Python script for Java utility functions. | | |-- surefire_report.py Python script for generating the Surefire report. | | |-- test_executer.py Python script for executing the tests. | |-- output/ Directory for storing the intermediate results. | | |-- commits_[split].jsonl JSONL file with commits for buildable methods2test test repositories. | | |-- buildable_[split].jsonl JSONL file with the buildable methods2test codes. | | |-- runnable_[split].jsonl JSONL file with the runnable methods2test codes.

Replication

Follow the setup instructions within each directory. To replicate the experiments, each follow the steps below:

  1. Train the models using the run_train.py script.
  2. Construct the methods2test_runnable evaluation dataset by following the instructions in the evaluation/methods2test_runnable/README.md file.
  3. Generate the unit tests for the methods2test_runnable dataset and the humaneval-x dataset using the run_gen.py script.
  4. Fix the generated data using the fix_data.ipynb notebook.
  5. Calculate the CodeBLEU scores using the calc_similarity.ipynb notebook.
  6. Execute generated tests and collect coverage data by following instructions for running evaluation of the methods2test_runnable dataset and the humaneval-x dataset. See the respective README.md files for details.
  7. Calculate the statistics of the passing rate and coverage results using the calc_execution_metrics.ipynb notebook.
  8. Analyze the data and generate the figures using the plots.ipynb notebook.
  9. Analyze the data and generate tables by running the tables.ipynb notebook.

Due to the variability of deep learning, we provide both the trained models and the generated results. The results are available in the data/ directory. Metadata and links to the trained models can be found at here. Datasets are available at: methods2test_small, methods2test_meta, methods2test_runnable.

Owner

  • Name: André Storhaug
  • Login: andstor
  • Kind: user
  • Location: Trondheim 🇳🇴
  • Company: NTNU

🎓 CS PhD student @ Norwegian University of Science and Technology (NTNU)

GitHub Events

Total
  • Watch event: 1
  • Push event: 35
  • Public event: 1
  • Pull request event: 2
  • Create event: 2
Last Year
  • Watch event: 1
  • Push event: 35
  • Public event: 1
  • Pull request event: 2
  • Create event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2) python (1)

Dependencies

analysis/requirements.txt pypi
  • codebleu *
  • ipykernel *
  • ipywidgets *
  • matplotlib *
  • pandarallel *
  • tqdm *
  • transformers *
  • tree-sitter-java *
generation/requirements.txt pypi
  • accelerate >=0.12.0
  • datasets >=2.14.0
  • deepspeed *
  • evaluate *
  • peft *
  • protobuf *
  • scikit-learn *
  • sentencepiece *
  • torch >=1.3
  • transformers ==4.41.0
training/requirements.txt pypi
  • accelerate >=0.12.0
  • datasets >=2.14.0
  • evaluate *
  • peft *
  • protobuf *
  • scikit-learn *
  • sentencepiece *
  • torch >=1.3
  • transformers ==4.41.0
.github/workflows/build_docker.yml actions
  • actions/attest-build-provenance v2 composite
  • actions/checkout v4 composite
  • docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4 composite
  • docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1 composite
  • docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite
evaluation/humaneval-x/Dockerfile docker
  • python 3.12 build
evaluation/methods2test_runnable/Dockerfile docker
  • python 3.12 build
evaluation/humaneval-x/pom.xml maven
  • junit:junit 4.13.2 test
evaluation/humaneval-x/requirements.txt pypi
  • datasets *
  • pandas *
  • tqdm *
evaluation/methods2test_runnable/requirements.txt pypi
  • datasets * test
  • gitpython * test
  • ipywidgets * test
  • numpy * test
  • pandas * test
  • tqdm * test
utilities/java-universal-parser/requirements.txt pypi
  • antlr4-python3-runtime *
  • antlr4-tools *