autofl

A tool for semantic annotation of file, package, and projects based on their application domain..

https://github.com/sascezar/autofl

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A tool for semantic annotation of file, package, and projects based on their application domain..

Basic Info
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 7
Created almost 3 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

AutoFL

License: GPL v3 DOI Docker

Automatic source code file annotation using weak labeling.

Overview

AutoFL is a tool designed for automatic annotation of source code files through weak labeling techniques. It provides both an API and a web-based UI for easy analysis of projects across different languages.

Setup

To set up the repository along with its UI submodule, clone it using:

bash git clone --recursive git@github.com:SasCezar/AutoFL.git AutoFL

Optional Model Setup

For advanced features like semantic-based labeling, download models as required. For example, to use w2v-so, download the model from here and place it in the data/models/w2v-so folder. Alternatively, you can provide a custom path in the configuration files.

Usage

To run the tool using Docker, navigate to the project directory (where the docker-compose.yaml file is located) and execute:

shell docker compose up

API Endpoint

To analyze the files of a project, make a POST request to the following endpoint:

shell curl -X POST -d '{"name": "<PROJECT_NAME>", "remote": "<PROJECT_REMOTE>", "languages": ["<PROGRAMMING_LANGUAGE>"]}' localhost:8000/label/files -H "content-type: application/json"

For instance, to analyze the project at https://github.com/mickleness/pumpernickel, use:

shell curl -X POST -d '{"name": "pumpernickel", "remote": "https://github.com/mickleness/pumpernickel", "languages": ["java"]}' localhost:8000/label/files -H "content-type: application/json"

Web UI

AutoFL provides a web-based UI accessible locally at http://localhost:8501:

UI

For more details, check the UI repository.

Configuration

AutoFL uses Hydra to manage configurations. The configuration files can be found in the config folder. The main configuration file, main.yaml, allows you to customize various options:

  • local: Choose between local or Docker environments. Docker is the default.
  • taxonomy: Set the taxonomy for labeling. Currently supports gitranking. You can add custom taxonomies.
  • annotator: Specify the annotators to use. The default is simple, offering good results without dependencies on language models.
  • version_strategy: Select the versioning strategy. The default is latest.
  • dataloader: Choose the dataloader. The default is postgres.
  • writer: Set the writer for storing results. The default is postgres.

Additional configurations can be added by creating new files in the corresponding component folders.

Functionalities

  • Annotation (UI/API/Script)
    • File-Level
    • Package-Level
    • Project-Level
  • Batch Analysis (Script Only)
  • Temporal Analysis (TODO)
  • Classification (TODO)

Supported Languages

  • Java
  • Python (untested)
  • C (untested)
  • C++ (untested)
  • C# (untested)

Development

AutoFL is composed of multiple components, as shown in the architecture diagram below:

Architecture

Adding Support for New Languages

To add support for additional languages, a language-specific parser is required. You can use tree-sitter to develop a parser quickly.

Parser Details

The parser needs to be located in the parser/languages folder. It should extend the BaseParser class, which follows this structure:

```python class ParserBase(ABC): """ Abstract class for a programming language parser. """

def __init__(self, library_path: Path | str):
    """
    :param library_path: Path to the tree-sitter languages.so file. The file has to contain the
    language parser. See tree-sitter for more details
    """
    ...

```

To implement the parsing logic, create a class that handles extracting identifiers. For Python, the parser might look like:

```python class PythonParser(ParserBase, lang=Extension.python.name): """ Python-specific parser using a generic grammar for multiple versions. Utilizes tree-sitter for AST extraction. """

def __init__(self, library_path: Path | str):
    ...

```

A custom parser independent of tree-sitter can also be developed. For more details, refer to the implementation of ParserBase.

Known Issues

  • Dependency Installation: The setup process may take significant time (~10 minutes), and dependency installations might fail due to timeouts. This appears to be a network-related issue, and retrying often resolves it. Future updates will aim to simplify dependencies.
  • ~~Indefinite Analysis Loops~~: ~~In some projects, the analysis may loop indefinitely. This issue is currently under investigation.~~ Seems solved in the latest version. Will monitor for further occurrences.

Docker Image Availability

AutoFL is also available as a Docker image. You can pull the image from Docker Hub using:

shell docker pull cezarsas/autofl

Find more details and updates at the Docker Hub page.

Disclaimer

This tool is in active development and may not function as expected in some cases. It has been tested primarily on Docker versions 24.0.7 and 25.0.0 for Ubuntu 22.04. Limited testing has been performed on Windows and MacOS, where functionality may vary.

If you encounter any issues, please open an issue on GitHub, make a pull request, or contact me at c.a.sas@rug.nl.

Citation

If you find this tool useful, please cite our work:

Paper

bibtex @article{sas2024multigranular, title = {Multi-granular Software Annotation using File-level Weak Labelling}, author = {Cezar Sas and Andrea Capiluppi}, journal = {Empirical Software Engineering}, volume = {29}, number = {1}, pages = {12}, year = {2024}, url = {https://doi.org/10.1007/s10664-023-10423-7}, doi = {10.1007/s10664-023-10423-7} }

Note: The code used in this paper is available at CodeGraphClassification. However, AutoFL provides enhanced features, is more user-friendly, and includes a UI.

Tool

bibtex @software{sas2023autofl, author = {Sas, Cezar and Capiluppi, Andrea}, month = oct, title = {{AutoFL}}, url = {https://github.com/SasCezar/AutoFL}, version = {0.5.0}, year = {2024}, url = {https://doi.org/10.5281/zenodo.13895493}, doi = {10.5281/zenodo.13895493} }

Owner

  • Name: Cezar Sas
  • Login: SasCezar
  • Kind: user
  • Location: Groningen
  • Company: University of Groningen

PhD Graduate Student @ Groningen University. Interested in NLP and Machine Learning. Currently working on Software Representation Learning.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Sas"
  given-names: "Cezar"
  orcid: "https://orcid.org/0000-0002-3018-0140"
- family-names: "Capiluppi"
  given-names: "Andrea"
  orcid: "https://orcid.org/0000-0001-9469-6050"
title: "AutoFL"
version: 0.4.1
doi: "10.5281/zenodo.10255368"
date-released: 2023-09-01
url: "https://github.com/SasCezar/AutoFL"

GitHub Events

Total
  • Push event: 4
  • Pull request event: 1
Last Year
  • Push event: 4
  • Pull request event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • SasCezar (5)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

docker/Dockerfile docker
  • python 3.10 build
pyproject.toml pypi
  • fastapi ^0.104.1
  • gitpython ^3.1.40
  • gunicorn ^21.2.0
  • hydra-core ^1.3.2
  • loguru ^0.7.1
  • more-itertools ^10.1.0
  • multiset ^3.0.1
  • pandas ^2.1.2
  • psycopg ^3.1.12
  • pydantic ^2.4.2
  • pydantic-mongo ^2.0.2
  • pymongo ^4.5.0
  • python >=3.10,<3.13
  • python-rake ^1.5.0
  • scikit-learn ^1.3.2
  • setuptools ^68.2.0
  • sqlalchemy ^2.0.21
  • tqdm ^4.66.1
  • tree-sitter ^0.20.2
  • uvicorn ^0.24.0