issue-report-classification-using-roberta

This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.

https://github.com/collab-uniba/issue-report-classification-using-roberta

Last synced: 6 months ago · JSON representation ·

Repository

This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.

Basic Info

Host: GitHub
Owner: collab-uniba
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 3.86 MB

Statistics

Stars: 8
Watchers: 4
Forks: 1
Open Issues: 0
Releases: 0

Created about 4 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

README.md

Issue-Report-Classification-Using-RoBERTa

This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.

The model is available on HuggingFace

Link to Classifier 1 Colab notebook
Link to Classifier 2 Colab notebook

If you use the software in this repository please consider citing our paper as follows:

@inproceedings{Colavito-2022, title = {Issue Report Classification Using Pre-trained Language Models}, booktitle = {2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)}, author = {Colavito, Giuseppe and Lanubile, Filippo and Novielli, Nicole}, year = {2022}, month = may, pages = {29--32}, doi = {10.1145/3528588.3528659}, abstract = {This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for the automatic classification of GitHub issues. We experimented with different pre-trained models, achieving the best performance with fine-tuned RoBERTa (F1 = .8591).}, keywords = {Issue classification, BERT, deep learning, labeling unstructured data, software maintenance and evolution}, }

Comparison with CodeBERT

As show in the table, the two models are equivalent. For CodeBERT, we removed the preprocessing to keep all the code, since it is trained on code also. Note that also removing the code, the performances are lower, but remain similar to RoBERTa | | CodeBERT | RoBERTa | CodeBERT | RoBERTa | CodeBERT | RoBERTa | | |-----------|----------|---------|----------|---------|----------|---------|---------| | | precision | precision | recall | recall | f1-score | f1-score | support | | bug | 0.8729 | 0.8750 | 0.8997 | 0.8988 | 0.8861 | 0.8867 | 40122 | | enhancement | 0.8729 | 0.8713 | 0.8716 | 0.8743 | 0.8722 | 0.8728 | 33073 | | question | 0.6664 | 0.6760 | 0.5528 | 0.5591 | 0.6043 | 0.6120 | 6943 | | micro avg | 0.8580 | 0.8591 | 0.8580 | 0.8591 | 0.8580 | 0.8591 | 80138 | | macro avg | 0.8041 | 0.8074 | 0.7747 | 0.7774 | 0.7875 | 0.7905 | 80138 |

Combination of the filters

The filters combined are the following: - The project must have an age of more than 4 year - The project must have more than 1500 stars - The issue should have a single label (and should still be available) on GitHub

Results are compared with a model trained on a random sampling of the training set, tested on the filtered test set

| RANDOM SAMPLING | Precision | Recall | F1-score | Support | |-----------------|-----------|--------|----------|---------| | bug | 0.8935 | 0.7937 | 0.8407 | 2094 | | enhancement | 0.8496 | 0.7569 | 0.8005 | 1164 | | question | 0.5140 | 0.8233 | 0.6329 | 600 | | microavg | 0.7872 | 0.7872 | 0.7872 | 3858 | | macroavg | 0.7524 | 0.7913 | 0.7580 | 3858 |

| FILTERED | Precision | Recall | F1-score | Support | |-----------------|-----------|--------|----------|---------| | bug | 0.9032 | 0.7751 | 0.8342 | 2094 | | enhancement | 0.7763 | 0.8290 | 0.8018 | 1164 | | question | 0.5587 | 0.7617 | 0.6446 | 600 | | microavg | 0.7893 | 0.7893 | 0.7893 | 3858 | | macroavg | 0.7461 | 0.7886 | 0.7602 | 3858 |

Owner

Name: Collaborative Development Group
Login: collab-uniba
Kind: organization
Email: info@peopleware.ai
Location: University of Bari, Italy

Website: http://collab.di.uniba.it
Repositories: 87
Profile: https://github.com/collab-uniba

As a research group we address challenges that must be overcome in collaborative environments, even if distributed by time or distance

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Giuseppe"
  given-names: "Colavito"
- family-names: "Filippo"
  given-names: "Lanubile"
  orcid: "https://orcid.org/0000-0003-3373-7589"
- family-names: "Nicole"
  given-names: "Novielli"
  orcid: "https://orcid.org/0000-0003-1160-2608"
title: "Issue-Report-Classification-Using-RoBERTa"
version: 1.0.0
date-released: 2022-03-09
url: "https://github.com/collab-uniba/Issue-Report-Classification-Using-RoBERTa"

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

issue-report-classification-using-roberta

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Issue-Report-Classification-Using-RoBERTa

Comparison with CodeBERT

Combination of the filters

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels