code-translator
Web Application for code-to-code generation
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Repository
Web Application for code-to-code generation
Basic Info
- Host: GitHub
- Owner: Tob1n8tor
- Language: TeX
- Default Branch: main
- Size: 6.38 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Code Translation
Web Application for translating code from one language (e.g Java) to another language (e.g. Python).
Description
Project Overview
This project aims to fine-tune existing models using the Hugging Face library for code translation tasks. The goal is to evaluate different models and datasets, adjusting hyperparameters to find the best-performing combination. The trained models, data, and results are stored in the model_training directory, which also includes an Excel file for model comparison. Additionally, a detailed research paper explaining the methodology and findings is included.
Project structure
Bash
\kabul
|
├── /backend # Django backend code
| ├── /api # Django REST API and used model for translation
| ├── /backend # Django project settings
| ├── Dockerfile # Dockerfile for Django backend
| ├── manage.py # Django project entry point
| └── requirements.txt # Python dependencies
|
├── /frontend # React frontend code
| ├── /public # Public assets
| ├── /src # React components including App.js and App.css for the UI
| ├── Dockerfile # Dockerfile for React frontend
| ├── firebase.json # Firebase configuration for hosting and rewrites
| ├── package-log.json # Auto-generated lock file for dependencies, ensures consistent installs
| └── package.json # Project metadata and dependencies for the React frontend
|
├── /model_training # Trained models, data, and results
| ├── /benchmarking # Model benchmarking results and code used for evaluation
| └── /training_codes # Python notebooks for model training
|
├── /report # Folder including final project paper
| └── Kabul_Code_Translation_Report.pdf
| └── Kabul_Code_Translation_Report.tex
|
├── docker-compose.yml # Docker Compose configuration for local setup
└── README.md # This README file
Getting started
Prerequisites
- Docker Desktop installed on your machine.
- Installation guide: https://www.docker.com/products/docker-desktop/
- Docker Compose installed (comes with Docker Desktop).
Installation Steps
Clone the repository
bash git clone https://gitlab.lrz.de/bpc-ws-2425/kabul.git cd kabulBuild and start the containers using Docker Compose
bash docker-compose --profile dev upThis command will build and start both the React frontend and Django backend containers. It will also configure networking between the two services.Access the Application
Frontend (React): Open your browser and go to http://localhost:3000
Online Access
Our application is currently deployed online and can be accessed directly without requiring local setup. Visit the following link to explore the code translation functionality: https://code-translation.com
Model Fine-Tuning & Benchmarking
This project uses Hugging Face's transformers library to fine-tune pre-trained models for code translation tasks. The goal is to benchmark various models with different hyperparameters and datasets to identify the most optimal configuration.
Training
The model training code is included in the modeltraining/trainingcodes directory. The training process involves loading the pre-trained model, preparing and tokenizing the training data, and fine-tuning the model on the target dataset. The training code is written in Python and uses the Hugging Face transformers library.
Fine-Tuned Models
We fine-tuned following base models for code translation tasks: - Salesforce/codeT5-small (https://huggingface.co/Salesforce/codet5-small) - Salesforce/codeT5-base (https://huggingface.co/Salesforce/codet5-base) - openai-community/gpt2 (https://huggingface.co/openai-community/gpt2) - microsoft/codebert-base (https://huggingface.co/microsoft/codebert-base) - facebook/bart-base (https://huggingface.co/facebook/bart-base) - lirezamsh/small100 (https://huggingface.co/alirezamsh/small100)
Datasets
For the model fine-tuning process, we use the following datasets: - https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval - https://huggingface.co/datasets/ziwenyd/transcoder-geeksforgeeks - https://huggingface.co/datasets/CM/codexgluecodetrans - https://leetcode.ca/all/problems.html - Version 1: An unbalanced custom dataset by combining the NTU-NLP-sg/xCodeEval and the ziwenyd/transcoder-geeksforgeeks dataset - Version 2: An unbalanced custom dataset by combining the NTU-NLP-sg/xCodeEval, the ziwenyd/transcoder-geeksforgeeks, the CM/codexgluecodetrans dataset, as well as some instances generated by ChatGPT - Version 3: A balanced custom dataset created by webscraping data from LeetCode answers - Version 4: A balanced custom dataset created by combining webscraped data from LeetCode answers and the ziwenyd/transcoder-geeksforgeeks dataset
The custom datasets are stored in modeltraining/trainingcodes/datasets
Benchmarking
For benchmarking, we evaluate the performance of different fine-tuned models using the ROUGE score, TER score, BERTScore and Frugal score. The benchmarking results are stored in the form of an Excel file inside the modeltraining/benchmarking/excelresult_files directory. We benchmarked the models using different test sets. We had one test set for each translation task (e.g. Java to Python, Python to Java, etc.) as well as a combined test set that included all translation tasks. For easier comparison, the results are also visualized using differnts charts in the Excel file. The benchmarking code is included in the model_training/benchmarking directory.
Research Paper
A detailed report explaining the methodology, model evaluation, and results is included in the /report directory. This report provides insights into the model selection, fine-tuning process and model evaluation.
Troubleshooting:
- when you run into issues like react-script not found try to run npm install in the /frontend folder from terminal first and the try the docker-compose command again
Owner
- Name: Tobias Konieczny
- Login: Tob1n8tor
- Kind: user
- Repositories: 1
- Profile: https://github.com/Tob1n8tor
Computer Science Student @TUM | neuromorphic computing @neuroTUM
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Konieczny, Yang, Elfhaiel, Meng
given-names: Tobias, Anne, Malek, Eric Yiyang
title: "Code Translation Model Fine-Tuning and Web Application"
version: 1.0
date-released: 2025-28-01
GitHub Events
Total
- Push event: 2
- Create event: 2
Last Year
- Push event: 2
- Create event: 2
Dependencies
- python 3.10-slim build
- node 22.11.0 build
- 1361 dependencies
- @codemirror/lang-html ^6.4.9
- @codemirror/lang-javascript ^6.2.2
- @fortawesome/fontawesome-svg-core ^6.7.2
- @fortawesome/free-solid-svg-icons ^6.7.2
- @fortawesome/react-fontawesome ^0.2.2
- @testing-library/jest-dom ^5.17.0
- @testing-library/react ^13.4.0
- @testing-library/user-event ^13.5.0
- @uiw/react-codemirror ^4.23.6
- axios ^1.7.7
- codemirror ^6.0.1
- prismjs ^1.29.0
- react ^18.3.1
- react-dom ^18.3.1
- react-scripts 5.0.1
- react-select ^5.8.3
- react-simple-code-editor ^0.14.1
- react-syntax-highlighter ^15.6.1
- web-vitals ^2.1.4
- Django ==5.1.3
- Jinja2 ==3.1.3
- MarkupSafe ==2.1.5
- PyJWT ==2.9.0
- PyYAML ==6.0.2
- annotated-types ==0.7.0
- anyio ==4.6.2.post1
- asgiref ==3.8.1
- certifi ==2024.8.30
- charset-normalizer ==3.4.0
- click ==8.1.7
- colorama ==0.4.6
- django-cors-headers ==4.6.0
- djangorestframework ==3.15.2
- djangorestframework-simplejwt ==5.3.1
- fastapi ==0.115.4
- filelock ==3.16.1
- fsspec ==2024.10.0
- h11 ==0.14.0
- huggingface-hub ==0.26.2
- idna ==3.10
- mpmath ==1.3.0
- networkx ==3.2.1
- numpy ==1.23.5
- packaging ==24.2
- peft *
- pillow ==10.2.0
- pydantic ==2.9.2
- pydantic_core ==2.23.4
- python-dotenv ==1.0.1
- pytz ==2024.2
- regex ==2024.11.6
- requests ==2.32.3
- safetensors ==0.4.5
- setuptools ==70.0.0
- sniffio ==1.3.1
- sqlparse ==0.5.2
- starlette ==0.41.2
- sympy ==1.13.1
- tokenizers ==0.20.3
- torch *
- torchaudio *
- torchvision *
- tqdm ==4.67.0
- transformers ==4.46.2
- typing_extensions ==4.12.2
- tzdata ==2024.2
- urllib3 ==2.2.3
- uvicorn ==0.32.0