issue-report-classification-using-roberta
This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.
https://github.com/collab-uniba/issue-report-classification-using-roberta
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Repository
This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.
Basic Info
- Host: GitHub
- Owner: collab-uniba
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 3.86 MB
Statistics
- Stars: 8
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Issue-Report-Classification-Using-RoBERTa
This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.
The model is available on HuggingFace
Link to Classifier 1 Colab notebook
Link to Classifier 2 Colab notebook
If you use the software in this repository please consider citing our paper as follows:
@inproceedings{Colavito-2022,
title = {Issue Report Classification Using Pre-trained Language Models},
booktitle = {2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)},
author = {Colavito, Giuseppe and Lanubile, Filippo and Novielli, Nicole},
year = {2022},
month = may,
pages = {29--32},
doi = {10.1145/3528588.3528659},
abstract = {This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for the automatic classification of GitHub issues. We experimented with different pre-trained models, achieving the best performance with fine-tuned RoBERTa (F1 = .8591).},
keywords = {Issue classification, BERT, deep learning, labeling unstructured data,
software maintenance and evolution},
}
Comparison with CodeBERT
As show in the table, the two models are equivalent. For CodeBERT, we removed the preprocessing to keep all the code, since it is trained on code also. Note that also removing the code, the performances are lower, but remain similar to RoBERTa | | CodeBERT | RoBERTa | CodeBERT | RoBERTa | CodeBERT | RoBERTa | | |-----------|----------|---------|----------|---------|----------|---------|---------| | | precision | precision | recall | recall | f1-score | f1-score | support | | bug | 0.8729 | 0.8750 | 0.8997 | 0.8988 | 0.8861 | 0.8867 | 40122 | | enhancement | 0.8729 | 0.8713 | 0.8716 | 0.8743 | 0.8722 | 0.8728 | 33073 | | question | 0.6664 | 0.6760 | 0.5528 | 0.5591 | 0.6043 | 0.6120 | 6943 | | micro avg | 0.8580 | 0.8591 | 0.8580 | 0.8591 | 0.8580 | 0.8591 | 80138 | | macro avg | 0.8041 | 0.8074 | 0.7747 | 0.7774 | 0.7875 | 0.7905 | 80138 |
Combination of the filters
The filters combined are the following: - The project must have an age of more than 4 year - The project must have more than 1500 stars - The issue should have a single label (and should still be available) on GitHub
Results are compared with a model trained on a random sampling of the training set, tested on the filtered test set
| RANDOM SAMPLING | Precision | Recall | F1-score | Support | |-----------------|-----------|--------|----------|---------| | bug | 0.8935 | 0.7937 | 0.8407 | 2094 | | enhancement | 0.8496 | 0.7569 | 0.8005 | 1164 | | question | 0.5140 | 0.8233 | 0.6329 | 600 | | microavg | 0.7872 | 0.7872 | 0.7872 | 3858 | | macroavg | 0.7524 | 0.7913 | 0.7580 | 3858 |
| FILTERED | Precision | Recall | F1-score | Support | |-----------------|-----------|--------|----------|---------| | bug | 0.9032 | 0.7751 | 0.8342 | 2094 | | enhancement | 0.7763 | 0.8290 | 0.8018 | 1164 | | question | 0.5587 | 0.7617 | 0.6446 | 600 | | microavg | 0.7893 | 0.7893 | 0.7893 | 3858 | | macroavg | 0.7461 | 0.7886 | 0.7602 | 3858 |
Owner
- Name: Collaborative Development Group
- Login: collab-uniba
- Kind: organization
- Email: info@peopleware.ai
- Location: University of Bari, Italy
- Website: http://collab.di.uniba.it
- Repositories: 87
- Profile: https://github.com/collab-uniba
As a research group we address challenges that must be overcome in collaborative environments, even if distributed by time or distance
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Giuseppe" given-names: "Colavito" - family-names: "Filippo" given-names: "Lanubile" orcid: "https://orcid.org/0000-0003-3373-7589" - family-names: "Nicole" given-names: "Novielli" orcid: "https://orcid.org/0000-0003-1160-2608" title: "Issue-Report-Classification-Using-RoBERTa" version: 1.0.0 date-released: 2022-03-09 url: "https://github.com/collab-uniba/Issue-Report-Classification-Using-RoBERTa"
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0