issue-report-classification-using-roberta

This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.

https://github.com/collab-uniba/issue-report-classification-using-roberta

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.

Basic Info
  • Host: GitHub
  • Owner: collab-uniba
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 3.86 MB
Statistics
  • Stars: 8
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Issue-Report-Classification-Using-RoBERTa

This repository contains the notebook for training and testing the classifiers for our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering.

The model is available on HuggingFace

Link to Classifier 1 Colab notebook
Link to Classifier 2 Colab notebook

If you use the software in this repository please consider citing our paper as follows:

@inproceedings{Colavito-2022, title = {Issue Report Classification Using Pre-trained Language Models}, booktitle = {2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)}, author = {Colavito, Giuseppe and Lanubile, Filippo and Novielli, Nicole}, year = {2022}, month = may, pages = {29--32}, doi = {10.1145/3528588.3528659}, abstract = {This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for the automatic classification of GitHub issues. We experimented with different pre-trained models, achieving the best performance with fine-tuned RoBERTa (F1 = .8591).}, keywords = {Issue classification, BERT, deep learning, labeling unstructured data, software maintenance and evolution}, }

Comparison with CodeBERT

As show in the table, the two models are equivalent. For CodeBERT, we removed the preprocessing to keep all the code, since it is trained on code also. Note that also removing the code, the performances are lower, but remain similar to RoBERTa | | CodeBERT | RoBERTa | CodeBERT | RoBERTa | CodeBERT | RoBERTa | | |-----------|----------|---------|----------|---------|----------|---------|---------| | | precision | precision | recall | recall | f1-score | f1-score | support | | bug | 0.8729 | 0.8750 | 0.8997 | 0.8988 | 0.8861 | 0.8867 | 40122 | | enhancement | 0.8729 | 0.8713 | 0.8716 | 0.8743 | 0.8722 | 0.8728 | 33073 | | question | 0.6664 | 0.6760 | 0.5528 | 0.5591 | 0.6043 | 0.6120 | 6943 | | micro avg | 0.8580 | 0.8591 | 0.8580 | 0.8591 | 0.8580 | 0.8591 | 80138 | | macro avg | 0.8041 | 0.8074 | 0.7747 | 0.7774 | 0.7875 | 0.7905 | 80138 |

Combination of the filters

The filters combined are the following: - The project must have an age of more than 4 year - The project must have more than 1500 stars - The issue should have a single label (and should still be available) on GitHub

Results are compared with a model trained on a random sampling of the training set, tested on the filtered test set

| RANDOM SAMPLING | Precision | Recall | F1-score | Support | |-----------------|-----------|--------|----------|---------| | bug | 0.8935 | 0.7937 | 0.8407 | 2094 | | enhancement | 0.8496 | 0.7569 | 0.8005 | 1164 | | question | 0.5140 | 0.8233 | 0.6329 | 600 | | microavg | 0.7872 | 0.7872 | 0.7872 | 3858 | | macroavg | 0.7524 | 0.7913 | 0.7580 | 3858 |

| FILTERED | Precision | Recall | F1-score | Support | |-----------------|-----------|--------|----------|---------| | bug | 0.9032 | 0.7751 | 0.8342 | 2094 | | enhancement | 0.7763 | 0.8290 | 0.8018 | 1164 | | question | 0.5587 | 0.7617 | 0.6446 | 600 | | microavg | 0.7893 | 0.7893 | 0.7893 | 3858 | | macroavg | 0.7461 | 0.7886 | 0.7602 | 3858 |

Owner

  • Name: Collaborative Development Group
  • Login: collab-uniba
  • Kind: organization
  • Email: info@peopleware.ai
  • Location: University of Bari, Italy

As a research group we address challenges that must be overcome in collaborative environments, even if distributed by time or distance

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Giuseppe"
  given-names: "Colavito"
- family-names: "Filippo"
  given-names: "Lanubile"
  orcid: "https://orcid.org/0000-0003-3373-7589"
- family-names: "Nicole"
  given-names: "Novielli"
  orcid: "https://orcid.org/0000-0003-1160-2608"
title: "Issue-Report-Classification-Using-RoBERTa"
version: 1.0.0
date-released: 2022-03-09
url: "https://github.com/collab-uniba/Issue-Report-Classification-Using-RoBERTa"

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels