https://github.com/amazon-science/migrationbench

https://github.com/amazon-science/migrationbench

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: amazon-science
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 2.75 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License Code of conduct

README.md

MigrationBench

MigrationBench (Hugging Face) MigrationBench (GitHub) SDFeedback (GitHub) MigrationBench (arXiv) java-full java-selected java-utg

1. 📖 Overview

MigrationBench is a library to access code migration success, in an automated and robust way.

1.1 MigrationBench: Dataset and Evaluation Framework

The name MigrationBench is used for both the dataset and the evaluation framework for code migration success:

  1. 🤗 MigrationBench is a large-scale code migration benchmark dataset at the repository level, across multiple programming languages.
    • Current and initial release includes java 8 repositories with the maven build system, as of May 2025.
  2. MigrationBench (current Github package) is the evaluation framework to assess code migration success, from java 8 to 17 or any other long-term support (LTS) versions.

The evaluation is an approximation for functional equivalence by checking the following: 1. The repo is able to build and pass all tests 1. Compiled classes' major versions are consistent with the target java version - 52 and 61 for java 8 and 17 respectively 1. Test methods are invariant after code migration 1. Number of test cases is non-decreasing after code migration 1. The repos' dependency libraries match their latest major versions - Optional for minimal migration by definition, while - Required for maximal migration

1.2 SDFeedback: Migration with LLMs

SDFeedback is a separate Github package to conduct code migration with LLMs as a baseline solution, and it relies on the current package for the final evaluation. - It builds an ECR image and then - It runs both code migration and final evaluation with Elastic Map Reduce (EMR) Serverless in a scalable way.

2. 🤗 MigrationBench Datasets

There are three datasets in 🤗 MigrationBench: - All repositories included in the datasets are available on GitHub, under the MIT or Apache-2.0 license.

| Index | Dataset | Size | Notes | |-------|-----------------------------------------------|-------|-----------------------------------------------------------------------------------------------------| | 1 | 🤗 AmazonScience/migration-bench-java-full | 5,102 | Each repo has a test directory or at least one test case | | 2 | 🤗 AmazonScience/migration-bench-java-selected | 300 | A subset of 🤗 migration-bench-java-full | | 3 | 🤗 AmazonScience/migration-bench-java-utg | 4,814 | The unit test generation (utg) dataset, disjoint with 🤗 migration-bench-java-full|

3. Code Migration Evaluation

We support running code migration evaluation for MigrationBench in two modes: 1. Single eval mode: For a single repository and 2. Batch eval mode: For multiple repositories

3.1 Get Started

To get started with code migration evaluation from java 8 to 17, under either minimal migration or maximal migration (See the arXiv paper for the definition):

3.1.1 Basic Setup

Verify you have java 17, maven 3.9.6 and conda (optional) locally:

```

java

~ $ java --version openjdk 17.0.15 2025-04-15 LTS OpenJDK Runtime Environment Corretto-17.0.15.6.1 (build 17.0.15+6-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.15.6.1 (build 17.0.15+6-LTS, mixed mode, sharing) ```

```

maven

~ $ mvn --version Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae) Maven home: /usr/local/bin/apache-maven-3.9.6 Java version: 17.0.15, vendor: Amazon.com Inc., runtime: /usr/lib/jvm/java-17-amazon-corretto.x8664 Default locale: enUS, platform encoding: UTF-8 OS name: "linux", version: "5.10.236-208.928.amzn2int.x86_64", arch: "amd64", family: "unix" ```

```

conda (Optional)

$ conda --version conda 25.1.1 ```

3.1.2 Install MigrationBench

``` git clone https://github.com/amazon-science/MigrationBench.git

cd MigrationBench

They're optional if one doesn't need a conda env

export CONDA_ENV=migration-bench

conda create -n $CONDA_ENV python=3.9

conda activate $CONDA_ENV

pip install -r requirements.txt -e . ```

Next, to run a single job or a batch of jobs, refer to file level comments in src/migraiton_bench/run_eval.py.

3.2 Single Eval

To run eval for a single repository, provide the Github url, a git diff file and optionally more flags:

3.2.1 Unsuccessful Eval

```

cd .../src/migraiton_bench

GITHUBURL=https://github.com/0xShamil/java-xid GITDIFF_FILE=...

python runeval.py --githuburl $GITHUBURL --gitdifffilename $GITDIFF_FILE ```

One may see the following output, as the git diff file is invalid:

... [single] Migration success (count) `False`: `('https://github.com/0xShamil/java-xid', '...')`. ...

3.2.2 Successful Eval

python run_eval.py --github_url $GITHUB_URL --require_compiled_java_major_version 52

By redirecting the code migration target to java 8 (through require_compiled_java_major_version = 52), it should succeed without any code changes:

... [single] Migration success (count) `True`: `('https://github.com/0xShamil/java-xid', None)`. ...

3.3 Batch Eval

To run eval for in batch mode for multiple repositories, one can provide a predictions file in the json format.

3.3.1 Sample Predictions File

For each repo, one needs to provide the Github url and the git diff content or file:

$ cat predictions.json [ { "github_url": "https://github.com/0xShamil/java-xid", "git_diff_file": "eval/testdata/java-xid.diff" }, { "github_url": "https://github.com/0xShamil/java-xid", "git_diff": "" } ]

3.3.2 Run Batch Eval

```

cd .../src/migraiton_bench

PREDICTIONS=predictions.json python runeval.py --predictionsfilename $PREDICTIONS # --requirecompiledjavamajorversion 52 ```

One may see the following output, without valid git diff content or file:

... [batch] Final eval result: Success = 0 out of 2. ...

4. 📚 Citation

bibtex @misc{liu2025migrationbenchrepositorylevelcodemigration, title={MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8}, author={Linbo Liu and Xinle Liu and Qiang Zhou and Lin Chen and Yihan Liu and Hoan Nguyen and Behrooz Omidvar-Tehrani and Xi Shen and Jun Huan and Omer Tripp and Anoop Deoras}, year={2025}, eprint={2505.09569}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2505.09569}, }

Owner

  • Name: Amazon Science
  • Login: amazon-science
  • Kind: organization

GitHub Events

Total
  • Watch event: 3
  • Delete event: 3
  • Issue comment event: 1
  • Push event: 35
  • Public event: 1
  • Pull request event: 12
  • Create event: 5
Last Year
  • Watch event: 3
  • Delete event: 3
  • Issue comment event: 1
  • Push event: 35
  • Public event: 1
  • Pull request event: 12
  • Create event: 5

Dependencies

src/migration_bench/common/testdata/pom.xml maven
src/migration_bench/common/testdata/subdir/pom.xml maven
src/migration_bench/common/testdata/subdir/subsubdir/pom.xml maven
src/migration_bench/lang/java/native/pom.xml maven
  • com.github.javaparser:javaparser-core 3.25.10
  • junit:junit 4.8.2 test
src/migration_bench/common/testdata/AspNetCoreMvcRejitApplication.csproj nuget
src/migration_bench/common/testdata/DotNetCoreAuthExamples.CustomPasswordHasher.csproj nuget
  • Microsoft.AspNetCore.ApplicationInsights.HostingStartup 2.2.0
  • Microsoft.AspNetCore.AzureAppServices.HostingStartup 8.0.6
  • Microsoft.AspNetCore.AzureAppServicesIntegration 8.0.6
  • Microsoft.AspNetCore.DataProtection.AzureKeyVault 3.1.24
  • Microsoft.AspNetCore.DataProtection.AzureStorage 3.1.24
  • Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv 6.0.31
  • Microsoft.AspNetCore.SignalR.Redis 1.1.5
  • Microsoft.Data.Sqlite 8.0.6
  • Microsoft.Data.Sqlite.Core 8.0.6
  • Microsoft.EntityFrameworkCore.Sqlite 8.0.6
  • Microsoft.EntityFrameworkCore.Sqlite.Core 8.0.6
  • Microsoft.EntityFrameworkCore.Tools 8.0.2
  • Microsoft.Extensions.Caching.Redis 2.2.0
  • Microsoft.Extensions.Configuration.AzureKeyVault 3.1.24
  • Microsoft.Extensions.Logging.AzureAppServices 8.0.6
  • Microsoft.VisualStudio.Web.BrowserLink 2.2.0
  • Microsoft.VisualStudio.Web.CodeGeneration.Design 2.0.3
src/migration_bench/common/testdata/HelloWorld.csproj nuget
  • Microsoft.AspNetCore.Diagnostics *
  • Microsoft.AspNetCore.Owin 6.0.29
  • Microsoft.AspNetCore.SystemWebAdapters 1.4.0
src/migration_bench/common/testdata/Naif.Blog.csproj nuget
src/migration_bench/common/testdata/ReflectSoftware.Facebook.Messenger.Common.csproj nuget
requirements.txt pypi
  • boto3 ==1.34.4
  • datasets ==3.6.0
  • gitpython ==3.1.43
  • javalang ==0.13.0
  • nltk *
  • numpy ==1.26.4
  • packaging *
  • parameterized ==0.8.1
  • protobuf ==3.20.3
  • pydantic ==2.6.4
  • pylint ==2.14.5
  • pyspark ==3.5.1
  • pytz ==2024.1
  • rank_bm25 *
  • requests ==2.32.3
setup.py pypi