kanelectra

Transformer model based on Kolmogorov–Arnold Network(KAN), which is an alternative of Multi-Layer Perceptron(MLP)

https://github.com/klassikcat/kanelectra

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Transformer model based on Kolmogorov–Arnold Network(KAN), which is an alternative of Multi-Layer Perceptron(MLP)

Basic Info
  • Host: GitHub
  • Owner: Klassikcat
  • Language: Python
  • Default Branch: main
  • Size: 310 KB
Statistics
  • Stars: 29
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed 8 months ago
Metadata Files
Readme Citation

README.md

ElectraModel based on Kolmogorov–Arnold Network

Introduction

Recently, Kolmogorov–Arnold Networks (KANs) have been introduced as a replacement for Fully Connected Layers. According to the authors of the paper, KANs have a significant advantage in terms of speed and performance compared to traditional FC layers.

They are currently being actively applied in vision-based models and were recently applied to Transformer decoder models as well. It has been reported that this has resulted in a substantial improvement in both performance and throuthput side.

Despite the verification of KANs' performance in various areas, I have not yet found examples of their application to the Transformer encoder part. In this repository, we aim to verify whether KANs can be applied to Transformer encoder models and to evaluate their performance.

Model Download link

Once the model training is complete, a download link will be provided on Hugging Face.

About KAN Electra

KAN Electra Model

KAN Electra replaces the fully connected layers in a typical Transformer Encoder model with Kolmogorov–Arnold Networks (KANs). This modification aims to leverage the speed and performance benefits of KANs to enhance the efficiency and effectiveness of the Transformer Encoder.

In the Encoder model, attention is implemented as self-attention using scaled-dot product attention.

Self-attention allows the model to weigh the importance of different words in a sentence relative to each other, regardless of their position. This is particularly useful for capturing long-range dependencies in the input sequence.

Scaled-dot product attention works as follows: 1. Query, Key, and Value Matrices: The input embeddings are transformed into three matrices: Query (Q), Key (K), and Value (V). 2. Dot Product: The Query matrix is multiplied by the Key matrix to obtain a score matrix. This score indicates the relevance of each word in the sequence to every other word. 3. Scaling: The scores are scaled down by the square root of the dimension of the Key matrix to prevent the gradients from becoming too small during backpropagation. 4. Softmax: The scaled scores are passed through a softmax function to obtain attention weights. These weights determine the importance of each word in the context of the current word. 5. Weighted Sum: The attention weights are used to compute a weighted sum of the Value matrix, producing the final output of the attention mechanism.

This mechanism allows the model to focus on relevant parts of the input sequence, enhancing its ability to understand and generate complex patterns in the data.

Hyperparameters

Vocab

I use Google's google/electra-small-discriminator for English model, monologg/koelectra-base-v3-discriminator for korean model. to train KANElectra model for pretraining.

Data

Requirements

text python 3.8 or higher torch >= 2.2.0 torch-tensorRT hydra lightning transformers

CLI

Train

ElectraKAN uses Hydra from Meta Inc to simplify configuring parameters and Easy-to-plug parameters in CLI. You can train ElectraKAN with your own code by using Hydra's own syntax here ss an example of training Electra Language model using generator and discriminator.

Train using Helm Charts

```shell

Run training with default configuration

helm install electra-training ./helm/electra-training

Run training with specific image tag

helm install electra-training ./helm/electra-training --set image.tag=v1.0.0

Run training with custom configuration

helm install electra-training ./helm/electra-training -f custom-values.yaml

Uninstall training job

helm uninstall electra-training ```

The Helm chart uses the following resources: - CPU: 8 vCPU - GPU: 4 GPUs - Memory: 8Gi (request) / 16Gi (limit) - Node: Scheduled on nodes with label env=batch

Docker

shell git clone https://github.com/Klassikcat/KANElectra docker buildx build -t kanelectra:latest -f nvidia.Dockerfile . docker run <image-name>

On Local Machine

shell python scripts/pretraining/train.py

Test(TODO)

shell python scripts/pretraining/test.py

TensorRT Convert

shell trtexec \ --onnx=${your_onnx_engine_path} \ --saveEngine=${engine_save_path} \ --minShape=1x512,1x512,1x512 \ --optShape=${opt_batch_size}x512,${opt_batch_size}x512,${opt_batch_size}x512 \ --maxShape=${max_batch_size}x512,${max_batch_size}x512,${max_batch_size}x512

Install Package

```shell

Install from source

pip install -e .

Install with development dependencies

pip install -e ".[dev]" ```

After installation, you can use ElectraKAN in your Python code:

python from ElectraKAN import ElectraModel from ElectraKAN.datamodule import ElectraKANDataModule

Owner

  • Name: Shin Jung Tae
  • Login: Klassikcat
  • Kind: user
  • Location: South Korea
  • Company: dealertire

Faster Than Light, Lighter than feather.

Citation (CITATION.cff)

cff-version: 1.1.0
references:
  - type: misc
    authors:
      - family-names: "Park"
        given-names: "Jangwon"
    title: "KoELECTRA: Pretrained ELECTRA Model for Korean"
    year: 2020
    version: 1.0.0
    publisher: "GitHub"
    url: "https://github.com/monologg/KoELECTRA"
  - type: inproceedings
    authors:
      - family-names: "Clark"
        given-names: "Kevin"
      - family-names: "Luong"
        given-names: "Minh-Thang"
      - family-names: "Le"
        given-names: "Quoc V."
      - family-names: "Manning"
        given-names: "Christopher D."
    title: "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators"
    year: 2020
    booktitle: "ICLR"
    url: "https://openreview.net/pdf?id=r1xMH1BtvB"
  - type: article
    authors:
      - family-names: "Liu"
        given-names: "Ziming"
      - family-names: "Wang"
        given-names: "Yixuan"
      - family-names: "Vaidya"
        given-names: "Sachin"
      - family-names: "Ruehle"
        given-names: "Fabian"
      - family-names: "Halverson"
        given-names: "James"
      - family-names: "Soljačić"
        given-names: "Marin"
      - family-names: "Hou"
        given-names: "Thomas Y"
      - family-names: "Tegmark"
        given-names: "Max"
    title: "KAN: Kolmogorov-Arnold Networks"
    year: 2024
    journal: "arXiv preprint arXiv:2404.19756"
    url: "https://arxiv.org/abs/2404.19756"

GitHub Events

Total
  • Watch event: 2
  • Push event: 56
  • Pull request event: 2
  • Create event: 1
Last Year
  • Watch event: 2
  • Push event: 56
  • Pull request event: 2
  • Create event: 1

Dependencies

requirements.txt pypi