Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: NazaninTafreshi
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 111 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

SAMM Copilot

Thesis PDF Presentation PPTX Explanation Video

This repository contains all artifacts related to the Master's thesis "SAMM Copilot," completed at Bosch Connected Industry. The project focuses on creating Aspect Models (based on Semantic Aspect Meta Model (SAMM)) using Large Language Models (LLMs). Instead of using natural language text as input, this work leverages structured data in JSON to create the semantic model.

Key Resources:


Dataset for Fine-tuning

The dataset used to fine-tune the primary OpenAI model described in the thesis is located in the dataset/original_cleaned_data/ directory. These files are provided in the required .jsonl format:

This data can be directly uploaded to the OpenAI platform for fine-tuning purposes.


Training a Model

You can replicate the model training process using the provided data or experiment with alternative methods:

1. OpenAI Fine-tuning (Replicating Thesis Model)

  1. Use the .jsonl files provided in the Dataset section.
  2. Upload these files to the OpenAI fine-tuning platform.
  3. The exact seed number and configuration used during the thesis work are detailed in the full thesis text. However, the default fine-tuning settings suggested by OpenAI are generally effective.

2. Alternative Fine-tuning using Unsloth/Qwen

If you wish to experiment with fine-tuning a different model architecture (e.g., Qwen 2.5 Coder) using the Unsloth library, a Jupyter notebook is provided:


Inference and Reproducing Results

You can perform inference using the fine-tuned model in the following ways:

1. Java Inference Code (Recommended for Reproducibility)

The Java code located in the inference/ directory is designed to evaluate the test set and reproduce the results reported in the thesis.

  • Functionality: This code automatically implements the iterative prompting strategy described in the thesis.
  • Dependencies: Project dependencies are managed using Apache Maven. Refer to the pom.xml file within the inference directory.
  • Augmented Data: The inference process utilizes augmented data (SAMM Aspect Model). This data is provided as zip files within the dataset/augmented_data/ directory. You will need to extract these files before running the inference code.
  • Input Format: Requires an example JSON input. Examples can be found within the human evaluation section or detailed descriptions in the thesis text.

2. GPT Interface (e.g., LibreChat, OpenAI Playground)

Alternatively, you can interact with your fine-tuned model using standard GPT interfaces. You will need to provide appropriate prompts, potentially including few-shot examples based on the task format described in the thesis.


Utilities


Additional Artifacts

  • Drawings: Diagrams and illustrations created for the thesis are available in the drawings/ folder. See Citation section for attribution.
  • Experimental Data Generation: Code related to earlier data augmentation and generation experiments (which were found to be less effective) can be found in the data-generator/ folder.

Citation

If you use the work or artifacts from this repository in your research or projects, please cite the Master's thesis:

bibtex TBD

Owner

  • Name: Nazanin Mashhaditafreshi
  • Login: NazaninTafreshi
  • Kind: user
  • Location: Germany

MSc in CS @ TU Kaiserslautern

GitHub Events

Total
  • Public event: 1
  • Push event: 3
Last Year
  • Public event: 1
  • Push event: 3