gfi-bot
[Working in Progress] ML-powered 🤖 for finding and labeling good first issues in your GitHub project!
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
â—‹codemeta.json file
-
â—‹.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
â—‹Academic publication links
-
â—‹Academic email domains
-
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Keywords
Repository
[Working in Progress] ML-powered 🤖 for finding and labeling good first issues in your GitHub project!
Basic Info
- Host: GitHub
- Owner: osslab-pku
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://gfibot.io
- Size: 3.83 MB
Statistics
- Stars: 34
- Watchers: 6
- Forks: 7
- Open Issues: 12
- Releases: 0
Topics
Metadata Files
README.md
GFI-Bot
ML-powered 🤖 for finding and labeling good first issues in your GitHub project!
A GFI-Bot introduction paper is available as follows (in ESEC/FSE 2022 Demonstration Track):
- Hao He, Haonan Su, Wenxin Xiao, Runzhi He, and Minghui Zhou. 2022. GFI-Bot: Automated Good First Issue Recommendation on GitHub. In Proceedings of the 2022 ACM 30th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, November 14-16, 2022. ACM. https://hehao98.github.io/files/2022-gfibot.pdf
The underlying ML approach is introduced in the following paper:
- Wenxin Xiao, Hao He, Weiwei Xu, Xin Tan, Jinhao Dong, and Minghui Zhou. 2022. Recommending Good First Issues in GitHub OSS Projects. In Proceedings of the 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 21–29, 2022. ACM. https://hehao98.github.io/files/2022-recgfi.pdf
See CITATIONS.bib for the BibTeX citations. We also provide an offline good first issue recommendation dataset at Zenodo.
Get Started
GFI-Bot is available at https://gfibot.io, where you can browse through existing good first issue recommendations or register your own repository for recommendation. GFI-Bot can be installed in GitHub repositories from the GitHub App page.
NOTE: GFI-Bot is currently in pre-alpha stage. It is undergoing rapid development and still highly unstable. We cannot guanrantee the preseveration of registered users and repositories in the next release and it may have unexpected behaviors on GitHub. We will change this note after GFI-Bot reaches a certain level of maturity
Roadmap
We describe our envisioned use cases for GFI-Bot in this documentation.
Currently, we are focusing on the following tasks: 1. Identifying an optimal training strategy 2. Improving user experience
Development
Project Organization
GFI-Bot is organized into four main modules:
gfibot.data: Modules to periodically and incrementally collect latest issue statistics on registered GitHub projects.gfibot.model: Modules to periodically train GFI recommendation models based on issue statistics collected bygfibot.data.gfibot.backend: Modules to provide RESTful APIs for interaction withfrontendand the GitHub App.frontend: A standalone JavaScript (or TypeScript?) project as our website. This website will be used both as the main portal of GFI-Bot and as a control panel for users to find recommended good first issues or track bot status for their projects.
All modules interact with a MongoDB instance for both reading and writing data (except frontend, which interact with backend using RESTful APIs). The MongoDB instance serves as a "single source of truth" and the main way to decouple different modules. It will be used to store and continiously update issue statistics, training progress and performance, recommendation results, etc.
Environment Setup
GFI-Bot uses poetry for dependency management. Run the following commands with poetry to setup a working environment.
shell script
poetry shell # activate a working virtual environment
poetry install # install all dependencies
pre-commit install # install pre-commit hooks
black . # lint all Python code
pytest # run all tests to confirm this environment is working
Then, configure a MongoDB instance (4.2 or later) and specify its connection URL in pyproject.toml.
Database Schemas
As mentioned before, the MongoDB instance serves as a "single source of truth" and decouples different modules. Therefore, before you start working with any part of GFI-Bot, it is important to know how the data look like in the MongoDB. For this purpose, we adopt mongoengine as an ORM-alike layer to formally describe and enforce schemas for each MongoDB collection and all collections are defined as Python classes here.
Development Guidelines
Contributions should follow existing conventions and styles in the codebase with best effort. Please add type annotations for all class members, function parameters, and return values. When writing commit messages, please follow the Conventional Commits specification.
Deployment
First, determine some GitHub projects of interest and specify them in pyproject.toml. Configure a list of GitHub access tokens (line separated) in tokens.txt. Make sure to use more tokens in order to quickly bootstrap GFI-Bot. Run the following script to check if the tokens are configured correctly.
shell script
python -m gfibot.check_tokens
We provide scripts for building docker images in the production/ folder. You can choose to build docker images to quickly setup MongoDB and backend by following the README there.
Dataset Preparation
Next, run the following script to collect historical data for the interested projects. This can take some time (up to days) to finish for the first run, but can perform quick incremental update on an existing database. This script should be done periodically (e.g., as a scheduled background task) to ensure that the MongoDB database reflect the latest state in the specified repositories.
shell script
python -m gfibot.data.update --nprocess=4 # you can increase parallelism with more GitHub tokens
Then, build a dataset for training and prediction as follows. This script may also take a long time but can be accelerated with more processes.
shell script
python -m gfibot.data.dataset --since=2008.01.01 --nprocess=4
Model Training
Model training can be simply done by running the following script.
shell script
python -m gfibot.model.predictor
Dataset Dump
The Zenodo dataset can be dumped using the following script. See Zenodo for more details about how to use the dumped dataset.
shell script
mongodump --uri=mongodb://localhost:27020 --db=gfibot --collection=dataset --query="{\"resolver_commit_num\":{\"\$ne\":-1}}" --gzip
mongodump --uri=mongodb://localhost:27020 --db=gfibot --collection=resolved_issue --query="{\"resolver_commit_num\":{\"\$ne\":-1}}" --gzip
Owner
- Name: Open Source Software Data Analytics Lab@PKU-SEI
- Login: osslab-pku
- Kind: organization
- Website: https://osslab-pku.github.io/
- Repositories: 6
- Profile: https://github.com/osslab-pku
Citation (CITATIONS.bib)
@inproceedings{GFI-Bot,
author = {Hao He and Haonan Su and Wenxin Xiao and Runzhi He and Minghui Zhou},
title = {{GFI-Bot}: {Automated} Good First Recommendation on {GitHub}},
booktitle = {Proceedings of the 2022 ACM 30th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, November 14-16, 2022},
url = {https://hehao98.github.io/files/2022-gfibot.pdf},
}
@inproceedings{DBLP:conf/icse/XiaoHXTDZ22,
author = {Wenxin Xiao and
Hao He and
Weiwei Xu and
Xin Tan and
Jinhao Dong and
Minghui Zhou},
title = {Recommending Good First Issues in GitHub {OSS} Projects},
booktitle = {44th {IEEE/ACM} 44th International Conference on Software Engineering,
{ICSE} 2022, Pittsburgh, PA, USA, May 25-27, 2022},
pages = {1830--1842},
year = {2022},
url = {https://doi.org/10.1145/3510003.3510196},
doi = {10.1145/3510003.3510196},
timestamp = {Fri, 29 Jul 2022 09:36:18 +0200},
biburl = {https://dblp.org/rec/conf/icse/XiaoHXTDZ22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
GitHub Events
Total
- Watch event: 6
- Issue comment event: 4
- Fork event: 1
Last Year
- Watch event: 6
- Issue comment event: 4
- Fork event: 1