https://github.com/amazon-science/migrationbench
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 2.75 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
MigrationBench
|
|
|
|
|
|
|
|
1. 📖 Overview
MigrationBench is a library to access code migration success, in an automated and robust way.
1.1 MigrationBench: Dataset and Evaluation Framework
The name MigrationBench is used for both the dataset and the evaluation framework for code migration success:
- 🤗 MigrationBench
is a large-scale code migration benchmark dataset at the repository level,
across multiple programming languages.
- Current and initial release includes
java 8repositories with themavenbuild system, as of May 2025.
- Current and initial release includes
- MigrationBench
(current Github package)
is the evaluation framework to assess code migration success,
from
java 8to17or any other long-term support (LTS) versions.
The evaluation is an approximation for functional equivalence by checking the following:
1. The repo is able to build and pass all tests
1. Compiled classes' major versions are consistent with the target java version
- 52 and 61 for java 8 and 17 respectively
1. Test methods are invariant after code migration
1. Number of test cases is non-decreasing after code migration
1. The repos' dependency libraries match their latest major versions
- Optional for minimal migration by definition, while
- Required for maximal migration
1.2 SDFeedback: Migration with LLMs
SDFeedback is a separate Github package to conduct code migration with LLMs as a baseline solution, and it relies on the current package for the final evaluation. - It builds an ECR image and then - It runs both code migration and final evaluation with Elastic Map Reduce (EMR) Serverless in a scalable way.
2. 🤗 MigrationBench Datasets
There are three datasets in 🤗 MigrationBench:
- All repositories included in the datasets are available on GitHub, under the MIT or Apache-2.0 license.
| Index | Dataset | Size | Notes |
|-------|-----------------------------------------------|-------|-----------------------------------------------------------------------------------------------------|
| 1 | 🤗 AmazonScience/migration-bench-java-full | 5,102 | Each repo has a test directory or at least one test case |
| 2 | 🤗 AmazonScience/migration-bench-java-selected | 300 | A subset of 🤗 migration-bench-java-full |
| 3 | 🤗 AmazonScience/migration-bench-java-utg | 4,814 | The unit test generation (utg) dataset, disjoint with 🤗 migration-bench-java-full|
3. Code Migration Evaluation
We support running code migration evaluation for MigrationBench in two modes: 1. Single eval mode: For a single repository and 2. Batch eval mode: For multiple repositories
3.1 Get Started
To get started with code migration evaluation from java 8 to 17,
under either minimal migration or maximal migration
(See the arXiv paper for the definition):
3.1.1 Basic Setup
Verify you have java 17, maven 3.9.6 and conda (optional) locally:
```
java
~ $ java --version openjdk 17.0.15 2025-04-15 LTS OpenJDK Runtime Environment Corretto-17.0.15.6.1 (build 17.0.15+6-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.15.6.1 (build 17.0.15+6-LTS, mixed mode, sharing) ```
```
maven
~ $ mvn --version Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae) Maven home: /usr/local/bin/apache-maven-3.9.6 Java version: 17.0.15, vendor: Amazon.com Inc., runtime: /usr/lib/jvm/java-17-amazon-corretto.x8664 Default locale: enUS, platform encoding: UTF-8 OS name: "linux", version: "5.10.236-208.928.amzn2int.x86_64", arch: "amd64", family: "unix" ```
```
conda (Optional)
$ conda --version conda 25.1.1 ```
3.1.2 Install MigrationBench
``` git clone https://github.com/amazon-science/MigrationBench.git
cd MigrationBench
They're optional if one doesn't need a conda env
export CONDA_ENV=migration-bench
conda create -n $CONDA_ENV python=3.9
conda activate $CONDA_ENV
pip install -r requirements.txt -e . ```
Next,
to run a single job or a batch of jobs,
refer to file level comments in src/migraiton_bench/run_eval.py.
3.2 Single Eval
To run eval for a single repository, provide the Github url, a git diff file and optionally more flags:
3.2.1 Unsuccessful Eval
```
cd .../src/migraiton_bench
GITHUBURL=https://github.com/0xShamil/java-xid GITDIFF_FILE=...
python runeval.py --githuburl $GITHUBURL --gitdifffilename $GITDIFF_FILE ```
One may see the following output, as the git diff file is invalid:
...
[single] Migration success (count) `False`: `('https://github.com/0xShamil/java-xid', '...')`.
...
3.2.2 Successful Eval
python run_eval.py --github_url $GITHUB_URL --require_compiled_java_major_version 52
By redirecting the code migration target to java 8
(through require_compiled_java_major_version = 52),
it should succeed without any code changes:
...
[single] Migration success (count) `True`: `('https://github.com/0xShamil/java-xid', None)`.
...
3.3 Batch Eval
To run eval for in batch mode for multiple repositories,
one can provide a predictions file in the json format.
3.3.1 Sample Predictions File
For each repo, one needs to provide the Github url and the git diff content or file:
$ cat predictions.json
[
{
"github_url": "https://github.com/0xShamil/java-xid",
"git_diff_file": "eval/testdata/java-xid.diff"
},
{
"github_url": "https://github.com/0xShamil/java-xid",
"git_diff": ""
}
]
3.3.2 Run Batch Eval
```
cd .../src/migraiton_bench
PREDICTIONS=predictions.json python runeval.py --predictionsfilename $PREDICTIONS # --requirecompiledjavamajorversion 52 ```
One may see the following output, without valid git diff content or file:
...
[batch] Final eval result: Success = 0 out of 2.
...
4. 📚 Citation
bibtex
@misc{liu2025migrationbenchrepositorylevelcodemigration,
title={MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8},
author={Linbo Liu and Xinle Liu and Qiang Zhou and Lin Chen and Yihan Liu and Hoan Nguyen and Behrooz Omidvar-Tehrani and Xi Shen and Jun Huan and Omer Tripp and Anoop Deoras},
year={2025},
eprint={2505.09569},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2505.09569},
}
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Watch event: 3
- Delete event: 3
- Issue comment event: 1
- Push event: 35
- Public event: 1
- Pull request event: 12
- Create event: 5
Last Year
- Watch event: 3
- Delete event: 3
- Issue comment event: 1
- Push event: 35
- Public event: 1
- Pull request event: 12
- Create event: 5
Dependencies
- com.github.javaparser:javaparser-core 3.25.10
- junit:junit 4.8.2 test
- Microsoft.AspNetCore.ApplicationInsights.HostingStartup 2.2.0
- Microsoft.AspNetCore.AzureAppServices.HostingStartup 8.0.6
- Microsoft.AspNetCore.AzureAppServicesIntegration 8.0.6
- Microsoft.AspNetCore.DataProtection.AzureKeyVault 3.1.24
- Microsoft.AspNetCore.DataProtection.AzureStorage 3.1.24
- Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv 6.0.31
- Microsoft.AspNetCore.SignalR.Redis 1.1.5
- Microsoft.Data.Sqlite 8.0.6
- Microsoft.Data.Sqlite.Core 8.0.6
- Microsoft.EntityFrameworkCore.Sqlite 8.0.6
- Microsoft.EntityFrameworkCore.Sqlite.Core 8.0.6
- Microsoft.EntityFrameworkCore.Tools 8.0.2
- Microsoft.Extensions.Caching.Redis 2.2.0
- Microsoft.Extensions.Configuration.AzureKeyVault 3.1.24
- Microsoft.Extensions.Logging.AzureAppServices 8.0.6
- Microsoft.VisualStudio.Web.BrowserLink 2.2.0
- Microsoft.VisualStudio.Web.CodeGeneration.Design 2.0.3
- Microsoft.AspNetCore.Diagnostics *
- Microsoft.AspNetCore.Owin 6.0.29
- Microsoft.AspNetCore.SystemWebAdapters 1.4.0
- boto3 ==1.34.4
- datasets ==3.6.0
- gitpython ==3.1.43
- javalang ==0.13.0
- nltk *
- numpy ==1.26.4
- packaging *
- parameterized ==0.8.1
- protobuf ==3.20.3
- pydantic ==2.6.4
- pylint ==2.14.5
- pyspark ==3.5.1
- pytz ==2024.1
- rank_bm25 *
- requests ==2.32.3