https://github.com/gssi/sala-satd

https://github.com/gssi/sala-satd

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary
Last synced: 5 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: gssi
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 2.3 MB
Statistics
  • Stars: 0
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License

README.md

SALA: Replication package

Article Binary and multi-class classification of Self-Admitted Technical Debt: How far can we go?

Authors

  • Francesca Arcelli Fontana (Università degli studi di Milano–Bicocca)
  • Juri Di Rocco (Università degli studi dell’Aquila)
  • Davide Di Ruscio (Università degli studi dell’Aquila)
  • Amleto Di Salle (Gran Sasso Science Institute)
  • Phuong T. Nguyen (Università degli studi dell’Aquila)

Abstract

Context. Aiming for a trade-off between short-term efficiency and long-term stability, software teams resort to sub-optimal solutions, neglecting the best software development practices. Such solutions may induce technical debt (TD), triggering maintenance issues. To facilitate future fixing, developers mark code with any issues using textual comments, resulting in Self-Admitted Technical Debt (SATD). Detecting SATD in source code is crucial since it helps programmers locate potentially erroneous snippets, allowing for suitable interventions, and improving code quality. There are two main types of SATD detection, i.e., binary classification and multi-class classification, grouping TD comments into SATD/Non-SATD categories, and multiple categories, respectively.

Objective. We attempt to understand to which extent state-of-the-art research has addressed the issue of detecting SATD, both binary and multi-class classification. Based on this investigation, we also propose a practical approach for the detection of SATD using Large Language Models (LLMs).

Methods. First, we conducted a literature review to understand to which extent the two types of classification have been tackled by existing research. Second, we developed SALA, a dual-purpose tool on top of Natural Language Processing (NLP) techniques and neural networks to deal with both types of classification. An empirical evaluation has been performed to compare SALA with state-of-the-art baselines.

Results. The literature review reveals that while binary classification has been well studied, multiclass classification has not received adequate attention. The empirical evaluation shows that SALA obtains a promising performance, and outperforms the baselines with respect to various quality metrics.

Conclusion. We conclude that more effort needs to be spent to tackle multi-class classification of SATD. To this end, LLMs hold the potential, albeit with more rigorous investigation on possible fine-tuning and prompt engineering strategies.

Description

The replication package repository refers to the qualitative and quantitative analysis.

In particular, the RQ1 folder contains the qualitative analysis. Meanwhile, the RQ2 and RQ3 folders contain the Python scripts and datasets used for binary and multi-class classification.

Owner

  • Name: Gran Sasso Science Institute
  • Login: gssi
  • Kind: organization
  • Location: L'Aquila (Italy)

International PhD school and a Center for advanced studies in physics, mathematics, computer science and social sciences.

GitHub Events

Total
  • Push event: 2
  • Create event: 2
Last Year
  • Push event: 2
  • Create event: 2

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 2
  • Total Committers: 2
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.5
Past Year
  • Commits: 2
  • Committers: 2
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Juri Di Rocco j****o@g****m 1
Amleto Di Salle a****e@g****t 1
Committer Domains (Top 20 + Academic)
gssi.it: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels