https://github.com/chains-project/bump
A dataset of reproducible breaking dependency updates, SANER 2024 (https://doi.org/10.1109/SANER60148.2024.00024)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
✓Committers with academic emails
1 of 9 committers (11.1%) from academic institutions -
✓Institutional organization owner
Organization chains-project has institutional domain (chains.proj.kth.se) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A dataset of reproducible breaking dependency updates, SANER 2024 (https://doi.org/10.1109/SANER60148.2024.00024)
Basic Info
Statistics
- Stars: 20
- Watchers: 5
- Forks: 8
- Open Issues: 14
- Releases: 0
Topics
Metadata Files
README.md
BUMP Breaking Updates
Overview
Bump is a benchmark of breaking dependency updates. It can be downloaded from Zenodo.
A breaking updates is defined as:
a pair of commits for a Java project, which we designate as the pre-commit and the breaking-commit, typically performed by bots such as
Dependabot and Renovate.
When we build the project with the pre-commit, compilation and test execution are successful,
while the build of the breaking-commit fails.
Each breaking-commit is a one-line change in the Maven pom file.
If you use Bump, please cite:
bibtex
@inproceedings{bump2024,
title = {BUMP: A Benchmark of Reproducible Breaking Dependency Updates},
booktitle = {Proceedings of SANER},
year = {2024},
doi = {10.1109/SANER60148.2024.00024},
author = {Frank Reyes and Yogya Gamage and Gabriel Skoglund and Benoit Baudry and Martin Monperrus},
url = {http://arxiv.org/pdf/2401.09906},
}
Download BUMP
All breaking updates in Bump are stored within Docker images. They can be downloaded from Zenodo.
To easily download the Zenodo tar file and load the associated Docker images use the following commands:
Warning: You need a minimum of 250 GB of free disk space to load the images.
```bash $ wget https://zenodo.org/records/10041883/files/bump.tar.gz $ docker load -i bump.tar.gz # this loads 1142 images $ docker images | wc -l 1142
running a breaking commit
docker run ghcr.io/chains-project/breaking-updates:{-pre,-breaking}
$ docker run ghcr.io/chains-project/breaking-updates:5769bdad76925da568294cb8a40e7d4469699ac3-breaking ```
Data format
Gathered data can be found as JSON files in the data folder.
There are 3 sub-folders inside the data folder.
* benchmark : contains the successfully reproduced breaking dependency updates.
* in-progress-reproductions : contains the potential breaking updates which have not yet been reproduced.
* sanity-check-failures : contains the data that are removed after the sanity-check procedure.
* unsuccessful-reproductions : contains the data regarding unsuccessful reproduction attempts.
Each file inside these folders is named according to the SHA of the (potential) breaking commit.
The JSON files in our benchmark of breaking dependency updates have the following JSON data format.
json
{
"url": "<github pr url>",
"project": "<github_project>",
"projectOrganisation": "<github_project_organisation>",
"breakingCommit": "<sha>",
"prAuthor": "{human|bot}",
"preCommitAuthor": "{human|bot}",
"breakingCommitAuthor": "{human|bot}",
"updatedDependency": {
"dependencyGroupID": "<group id>",
"dependencyArtifactID": "<artifact id>",
"previousVersion": "<label indicating the previous version of the dependency>",
"newVersion": "<label indicating the new version of the dependency>",
"dependencyScope": "{compile|provided|runtime|system|import}",
"versionUpdateType": "{major|minor|patch|other}",
"githubCompareLink": "<the github comparison link for the previous and breaking tag releases of the updated dependency if it exists>",
"mavenSourceLinkPre": "<maven source jar link for the previous release of the updated dependency if it exists>",
"mavenSourceLinkBreaking": "<maven source jar link for the breaking release of the updated dependency if it exists>",
"updatedFileType": "{pom|jar}",
"dependencySection" : "{dependencies|dependencyManagement|buildPlugins|buildPluginManagement|profileBuildPlugins}"
},
"preCommitReproductionCommand": "<the command to compile and run tests without the breaking update commit>",
"breakingUpdateReproductionCommand": "<the command to compile and run tests with the breaking update commit>",
"javaVersionUsedForReproduction": "<the java version version used for reproduction>",
"failureCategory": "<the category of the root cause of the reproduction failure>"
}
Workflow
The data gathering workflow is as follows:
* Stage 1 : Collect Java projects which meet the following criteria.
* builds with Maven,
* has at least 100 commits on the default branch,
* created in the last 10 years,
* has at least 3 contributors,
* has at least 10 stars.
* Stage 2 : Identify the breaking updates.
* Stage 3 : Reproduce the failure locally under the assumptions documented below.
* Assumptions:
* We run Linux (kernel version and distribution to be documented)
* We use Maven version 3.8.6
* We run OpenJDK
* As a starting point, we use Java 11
* The reproduction can result in different successful outcomes based on the Maven goal where the failure happens. For example,
* The compilation step fails after the dependency is updated, but not before.
This is a successful reproduction corresponding to the label "COMPILATIONFAILURE".
* The test step fails _after the dependency is updated, but not before.
This is a successful reproduction corresponding to the label "TESTFAILURE".
* The project build fails _after the dependency is updated due to unresolved dependencies, but not before.
This is a successful reproduction corresponding to the label "DEPENDENCYRESOLUTIONFAILURE".
* The project build fails after the dependency is updated due to enforcer rules violations, but not before.
This is a successful reproduction corresponding to the label "ENFORCERFAILURE".
* The project build fails _after the dependency is updated when executing the plugin dependency-lock-maven-plugin, but not before.
This is a successful reproduction corresponding to the label "DEPENDENCYLOCKFAILURE".
* The project build fails after the dependency is updated due to the activation of the failOnWarning option in the configuration file.
This is a successful reproduction corresponding to the label "WERROR_FAILURE".
* Stage 4 : Build two Docker images for each successfully reproduced breaking update,
and isolate all environment / network requests by downloading them.
After stage 4, by running the preCommitReproductionCommand, and the breakingUpdateReproductionCommand,
the successful build of the previous commit and the failing build of the breaking commit can be reproduced.
Tools
The BreakingUpdateMiner
In order to gather breaking dependency updates from GitHub, a tool called the
BreakingUpdateMiner is available.
You can build this tool locally using mvn package with Java 17.
You can then run the tool and print usage information with the command:
bash
java -jar target/BreakingUpdateMiner.jar --help
The BreakingUpdateReproducer
In order to perform local reproduction once potential breaking uppdates have been found by the miner,
a tool called the BreakingUpdateReproducer is available.
You can build this tool locally using mvn package with Java 17.
You can then run the tool and print usage information with the command:
bash
java -jar target/BreakingUpdateReproducer.jar --help
Owner
- Name: CHAINS research project at KTH Royal Institute of Technology
- Login: chains-project
- Kind: organization
- Website: https://chains.proj.kth.se
- Repositories: 9
- Profile: https://github.com/chains-project
"Consistent Hardening and Analysis of Software Supply Chains" at KTH, funded by SSF
GitHub Events
Total
- Watch event: 4
- Delete event: 20
- Push event: 85
- Pull request event: 58
- Fork event: 2
- Create event: 34
Last Year
- Watch event: 4
- Delete event: 20
- Push event: 85
- Pull request event: 58
- Fork event: 2
- Create event: 34
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| renovate[bot] | 2****] | 143 |
| Gabriel Skoglund | g****d@g****m | 77 |
| Frank Reyes | f****g@k****e | 67 |
| Yogya Tulip Gamage | 4****e | 57 |
| github-actions[bot] | 4****] | 49 |
| YogyaGamage | 4****e | 9 |
| Martin Monperrus | m****s@g****g | 9 |
| frankreyesSC | f****s@n****m | 3 |
| Lukas | l****s@f****h | 3 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 32
- Total pull requests: 107
- Average time to close issues: about 2 months
- Average time to close pull requests: 23 days
- Total issue authors: 6
- Total pull request authors: 5
- Average comments per issue: 1.22
- Average comments per pull request: 0.24
- Merged pull requests: 91
- Bot issues: 1
- Bot pull requests: 67
Past Year
- Issues: 0
- Pull requests: 39
- Average time to close issues: N/A
- Average time to close pull requests: 2 months
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 28
- Bot issues: 0
- Bot pull requests: 39
Top Authors
Issue Authors
- monperrus (14)
- gabrielskoglund (11)
- yogyagamage (3)
- renovate[bot] (3)
- frankreyesgarcia (3)
- snadi (1)
- MartinWitt (1)
- LukvonStrom (1)
Pull Request Authors
- renovate[bot] (139)
- yogyagamage (21)
- gabrielskoglund (7)
- frankreyesgarcia (5)
- monperrus (4)
- LukvonStrom (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- com.google.code.gson:gson 2.10
- com.squareup.okhttp3:okhttp 4.10.0
- info.picocli:picocli-codegen 4.7.0
- org.kohsuke:github-api 1.313
- junit:junit 4.13.2 test
- org.junit.jupiter:junit-jupiter 5.9.1 test