code-translator

Web Application for code-to-code generation

https://github.com/tob1n8tor/code-translator

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Web Application for code-to-code generation

Basic Info
  • Host: GitHub
  • Owner: Tob1n8tor
  • Language: TeX
  • Default Branch: main
  • Size: 6.38 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

Code Translation

Web Application for translating code from one language (e.g Java) to another language (e.g. Python).

Description

Project Overview

This project aims to fine-tune existing models using the Hugging Face library for code translation tasks. The goal is to evaluate different models and datasets, adjusting hyperparameters to find the best-performing combination. The trained models, data, and results are stored in the model_training directory, which also includes an Excel file for model comparison. Additionally, a detailed research paper explaining the methodology and findings is included.

Project structure

Bash \kabul | ├── /backend # Django backend code | ├── /api # Django REST API and used model for translation | ├── /backend # Django project settings | ├── Dockerfile # Dockerfile for Django backend | ├── manage.py # Django project entry point | └── requirements.txt # Python dependencies | ├── /frontend # React frontend code | ├── /public # Public assets | ├── /src # React components including App.js and App.css for the UI | ├── Dockerfile # Dockerfile for React frontend | ├── firebase.json # Firebase configuration for hosting and rewrites | ├── package-log.json # Auto-generated lock file for dependencies, ensures consistent installs | └── package.json # Project metadata and dependencies for the React frontend | ├── /model_training # Trained models, data, and results | ├── /benchmarking # Model benchmarking results and code used for evaluation | └── /training_codes # Python notebooks for model training | ├── /report # Folder including final project paper | └── Kabul_Code_Translation_Report.pdf | └── Kabul_Code_Translation_Report.tex | ├── docker-compose.yml # Docker Compose configuration for local setup └── README.md # This README file

Getting started

Prerequisites

  • Docker Desktop installed on your machine.
    • Installation guide: https://www.docker.com/products/docker-desktop/
  • Docker Compose installed (comes with Docker Desktop).

Installation Steps

  1. Clone the repository bash git clone https://gitlab.lrz.de/bpc-ws-2425/kabul.git cd kabul

  2. Build and start the containers using Docker Compose bash docker-compose --profile dev up This command will build and start both the React frontend and Django backend containers. It will also configure networking between the two services.

  3. Access the Application

    Frontend (React): Open your browser and go to http://localhost:3000

Online Access

Our application is currently deployed online and can be accessed directly without requiring local setup. Visit the following link to explore the code translation functionality: https://code-translation.com

Model Fine-Tuning & Benchmarking

This project uses Hugging Face's transformers library to fine-tune pre-trained models for code translation tasks. The goal is to benchmark various models with different hyperparameters and datasets to identify the most optimal configuration.

Training

The model training code is included in the modeltraining/trainingcodes directory. The training process involves loading the pre-trained model, preparing and tokenizing the training data, and fine-tuning the model on the target dataset. The training code is written in Python and uses the Hugging Face transformers library.

Fine-Tuned Models

We fine-tuned following base models for code translation tasks: - Salesforce/codeT5-small (https://huggingface.co/Salesforce/codet5-small) - Salesforce/codeT5-base (https://huggingface.co/Salesforce/codet5-base) - openai-community/gpt2 (https://huggingface.co/openai-community/gpt2) - microsoft/codebert-base (https://huggingface.co/microsoft/codebert-base) - facebook/bart-base (https://huggingface.co/facebook/bart-base) - lirezamsh/small100 (https://huggingface.co/alirezamsh/small100)

Datasets

For the model fine-tuning process, we use the following datasets: - https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval - https://huggingface.co/datasets/ziwenyd/transcoder-geeksforgeeks - https://huggingface.co/datasets/CM/codexgluecodetrans - https://leetcode.ca/all/problems.html - Version 1: An unbalanced custom dataset by combining the NTU-NLP-sg/xCodeEval and the ziwenyd/transcoder-geeksforgeeks dataset - Version 2: An unbalanced custom dataset by combining the NTU-NLP-sg/xCodeEval, the ziwenyd/transcoder-geeksforgeeks, the CM/codexgluecodetrans dataset, as well as some instances generated by ChatGPT - Version 3: A balanced custom dataset created by webscraping data from LeetCode answers - Version 4: A balanced custom dataset created by combining webscraped data from LeetCode answers and the ziwenyd/transcoder-geeksforgeeks dataset

The custom datasets are stored in modeltraining/trainingcodes/datasets

Benchmarking

For benchmarking, we evaluate the performance of different fine-tuned models using the ROUGE score, TER score, BERTScore and Frugal score. The benchmarking results are stored in the form of an Excel file inside the modeltraining/benchmarking/excelresult_files directory. We benchmarked the models using different test sets. We had one test set for each translation task (e.g. Java to Python, Python to Java, etc.) as well as a combined test set that included all translation tasks. For easier comparison, the results are also visualized using differnts charts in the Excel file. The benchmarking code is included in the model_training/benchmarking directory.

Research Paper

A detailed report explaining the methodology, model evaluation, and results is included in the /report directory. This report provides insights into the model selection, fine-tuning process and model evaluation.

Troubleshooting:

  • when you run into issues like react-script not found try to run npm install in the /frontend folder from terminal first and the try the docker-compose command again

Owner

  • Name: Tobias Konieczny
  • Login: Tob1n8tor
  • Kind: user

Computer Science Student @TUM | neuromorphic computing @neuroTUM

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Konieczny, Yang, Elfhaiel, Meng
    given-names: Tobias, Anne, Malek, Eric Yiyang
title: "Code Translation Model Fine-Tuning and Web Application"
version: 1.0
date-released: 2025-28-01

GitHub Events

Total
  • Push event: 2
  • Create event: 2
Last Year
  • Push event: 2
  • Create event: 2

Dependencies

backend/Dockerfile docker
  • python 3.10-slim build
docker-compose.yml docker
frontend/Dockerfile docker
  • node 22.11.0 build
frontend/package-lock.json npm
  • 1361 dependencies
frontend/package.json npm
  • @codemirror/lang-html ^6.4.9
  • @codemirror/lang-javascript ^6.2.2
  • @fortawesome/fontawesome-svg-core ^6.7.2
  • @fortawesome/free-solid-svg-icons ^6.7.2
  • @fortawesome/react-fontawesome ^0.2.2
  • @testing-library/jest-dom ^5.17.0
  • @testing-library/react ^13.4.0
  • @testing-library/user-event ^13.5.0
  • @uiw/react-codemirror ^4.23.6
  • axios ^1.7.7
  • codemirror ^6.0.1
  • prismjs ^1.29.0
  • react ^18.3.1
  • react-dom ^18.3.1
  • react-scripts 5.0.1
  • react-select ^5.8.3
  • react-simple-code-editor ^0.14.1
  • react-syntax-highlighter ^15.6.1
  • web-vitals ^2.1.4
backend/requirements.txt pypi
  • Django ==5.1.3
  • Jinja2 ==3.1.3
  • MarkupSafe ==2.1.5
  • PyJWT ==2.9.0
  • PyYAML ==6.0.2
  • annotated-types ==0.7.0
  • anyio ==4.6.2.post1
  • asgiref ==3.8.1
  • certifi ==2024.8.30
  • charset-normalizer ==3.4.0
  • click ==8.1.7
  • colorama ==0.4.6
  • django-cors-headers ==4.6.0
  • djangorestframework ==3.15.2
  • djangorestframework-simplejwt ==5.3.1
  • fastapi ==0.115.4
  • filelock ==3.16.1
  • fsspec ==2024.10.0
  • h11 ==0.14.0
  • huggingface-hub ==0.26.2
  • idna ==3.10
  • mpmath ==1.3.0
  • networkx ==3.2.1
  • numpy ==1.23.5
  • packaging ==24.2
  • peft *
  • pillow ==10.2.0
  • pydantic ==2.9.2
  • pydantic_core ==2.23.4
  • python-dotenv ==1.0.1
  • pytz ==2024.2
  • regex ==2024.11.6
  • requests ==2.32.3
  • safetensors ==0.4.5
  • setuptools ==70.0.0
  • sniffio ==1.3.1
  • sqlparse ==0.5.2
  • starlette ==0.41.2
  • sympy ==1.13.1
  • tokenizers ==0.20.3
  • torch *
  • torchaudio *
  • torchvision *
  • tqdm ==4.67.0
  • transformers ==4.46.2
  • typing_extensions ==4.12.2
  • tzdata ==2024.2
  • urllib3 ==2.2.3
  • uvicorn ==0.32.0