https://github.com/assert-kth/ghrb
A Repository of Real, Recent Java Bugs
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
A Repository of Real, Recent Java Bugs
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of coinse/GHRB
Created over 2 years ago
· Last pushed over 2 years ago
https://github.com/ASSERT-KTH/GHRB/blob/main/
# GHRB: GitHub Recent Bugs
GitHub Recent Bugs (GHRB) is a collection of real-world bugs merged _after_
the OpenAI LLM training cutoff point (Sep. 2021), along with a command-line
interface to easily execute code. It was constructed to facilitate
evaluation of LLM-based automated debugging techniques, free from concern
of training data contamination.
## Setting up GHRB
First, build the docker image:
```bash
cd Docker
docker build -t ghrb_framework .
```
Next, make a docker container:
```bash
sh run_docker_container.sh
```
Inside the docker, the following commands need to be executed to complete setup:
```bash
cd /root/framework
chmod +x cli.py
cd debug
python collector.py
cd ..
```
Finally, check whether cli.py runs correctly via:
```bash
./cli.py -h
```
## Using GHRB
(Note that all directory paths given to the tool need to be absolute paths
within the container.)
1. `info` - View information about a project or a particular bug.
* Example: `./cli.py info -p gson`
2. `checkout` - checkout the buggy or fixed version of a bug.
* Example: `./cli.py checkout -p gson -v 1b -w /root/framework/testing`
3. `compile` - compile the code in a directory.
* Example: `./cli.py compile -w /root/framework/testing`
4. `test` - run the tests for a project.
* Example: `./cli.py test -w /root/framework/testing`
## Using Fetch/Filter Scripts
With a file that consists of a list of url to the repositories, use
```
python filter_repo.py --repository_list
```
to collect the metadata of repositories prior to gathering pull request information. `` should look like:
```
https://github.com/coinse/GHRB
https://github.com/coinse/libro
...
```
Note that the script will automatically filter out non-English repositories and repositories where Java consists <90% according to GitHub language statistics.
With the pull request information, gather the actual pull request data with:
```bash
python collect_raw_data.py --api_token --repository_file
```
where `` should have a format like:
```jsonc
[
{
"name": // name of the repo,
"owner":
{
"login": // owner of the repo
},
"url": // full url for git clone
},
]
```
should be included in the repository manually. Note that each metadata item should be inside a list. An example of such file can be found under the `example/` directory.
## Publication
The companion preprint for our work is here: [http://arxiv.org/abs/2310.13229](http://arxiv.org/abs/2310.13229).
Owner
- Name: ASSERT
- Login: ASSERT-KTH
- Kind: organization
- Location: Sweden
- Website: https://github.com/ASSERT-KTH/
- Repositories: 87
- Profile: https://github.com/ASSERT-KTH
assertEquals("Research group at KTH Royal Institute of Technology, Stockholm, Sweden", description);