rage

RagE (RAG Engine) - A tool supporting the construction and training of components of the Retrieval-Augmented-Generation (RAG) model. It also facilitates the rapid development of Q&A systems and chatbots following the RAG model.

https://github.com/anti-aii/rage

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary

Keywords

chatbot-framework embedding llm nlp qa-system rag ranker retrieval-augmented-generation vietnamese-nlp
Last synced: 6 months ago · JSON representation

Repository

RagE (RAG Engine) - A tool supporting the construction and training of components of the Retrieval-Augmented-Generation (RAG) model. It also facilitates the rapid development of Q&A systems and chatbots following the RAG model.

Basic Info
  • Host: GitHub
  • Owner: anti-aii
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 5.67 MB
Statistics
  • Stars: 7
  • Watchers: 1
  • Forks: 3
  • Open Issues: 2
  • Releases: 0
Topics
chatbot-framework embedding llm nlp qa-system rag ranker retrieval-augmented-generation vietnamese-nlp
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

RAG Engine: Retrieval Augmented Generation Engine

RagE (Rag Engine) is a tool designed to facilitate the construction and training of components within the Retrieval-Augmented-Generation (RAG) model. It also offers algorithms to support retrieval and provides pipelines for evaluating models. Moreover, it fosters rapid development of question answering systems and chatbots based on the RAG model.

Currently, we have completed the basic training pipeline for the model, but there is still much to be done due to limited resources. However, with this library, we are continuously updating and developing. Additionally, we will consistently publish the models that we train.

Installation 🔥

We have detailed instructions for using our models for inference. See notebook

1. Initialize the model

Let's initialize the SentenceEmbedding model

```python

import torch from pyvi import ViTokenizer from rage import SentenceEmbedding device= torch.device('cuda' if torch.cuda.isavailable() else 'cpu') model= SentenceEmbedding(modelname= "vinai/phobert-base-v2", torchdtype= torch.float32, aggregationhiddenstates= False, strategypooling= "densefirst") model.to(device) SentenceEmbeddingConfig(modelbase: {'modeltypebase': 'RobertaModel', 'modelname': 'vinai/phobert-base-v2', 'typebackbone': 'mlm', 'requiredgradbasemodel': True, 'aggregationhiddenstates': False, 'concatembeddings': False, 'dropout': 0.1, 'quantizationconfig': None}, pooling: {'strategypooling': 'densefirst'}) Then, we can show the number of parameters in the model. python model.summaryparams() trainable params: 135588864 || all params: 135588864 || trainable%: 100.0 model.summary() +---------------------------+-------------+------------------+ | Layer (type) | Params | Trainable params | +---------------------------+-------------+------------------+ | model (RobertaModel) | 134,998,272 | 134998272 | | pooling (PoolingStrategy) | 590,592 | 590592 | | drp1 (Dropout) | 0 | 0 | +---------------------------+-------------+------------------+ Now we can use the SentenceEmbedding model to encode the input words. The output of the model will be a matrix in the shape of (batch, dim). Additionally, we can load weights that we have previously trained and saved. python model.load("bestsupgeneralembeddingphobert2.pt", key= False) sentences= ["Tôi đang đi học", "Bạn tên là gì?",] sentences= list(map(lambda x: ViTokenizer.tokenize(x), sentences)) model.encode(sentences, batchsize= 1, normalizeembedding= "l2", return_tensors= "np", verbose= 1) 2/2 [==============================] - 0s 43ms/Sample array([[ 0.00281098, -0.00829096, -0.01582766, ..., 0.00878178, 0.01830498, -0.00459659], [ 0.00249859, -0.03076724, 0.00033016, ..., 0.01299141, -0.00984358, -0.00703243]], dtype=float32) ```

2. Load model from Huggingface Hub

First, download a pretrained model. ```python

model= SentenceEmbedding.frompretrained('anti-ai/VieSemantic-base') Then, we encode the input sentences and compare their similarity. python sentences = ["Nó rất thúvị", "Nó không thúvị ."] output= model.encode(sentences, batchsize= 1, returntensors= 'pt') torch.cosinesimilarity(output[0].view(1, -1), output[1].view(1, -1)).cpu().tolist() 2/2 [==============================] - 0s 40ms/Sample [0.5605039596557617] ```

3. Training

We have some examples of training for SentenceEmbedding, ReRanker, and LLM models. Additionally, you can rely on the optimal parameters we used for specific tasks and datasets. See examples

4. ONNX

It is now possible to export SentenceEmbedding and ReRanker models to the ONNX format. We also provide an API for easy usage.

We can export to .onnx format with the export_onnx function. Note that it is only supported for the SentenceEmbedding and ReRanker classes. ```python

model.exportonnx('model.onnx', opsetversion= 17, testperformance= True) **** DONE **** 2024-10-30 17:20:29.981403140 [W:onnxruntime:, inferencesession.cc:2039 Initialize] Serializing optimized model with Graph Optimization level greater than ORTENABLEEXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in. ******** Test Performance ******** 2774/2756 [==============================] - 143s 52ms/step - time: 0.7818 Average inference time: 1.70 seconds Total inference time: 2 minutes and 22.39 seconds ```

To use SentenceEmbedding or ReRanker models in ONNX format, you can use the load_onnx method to return objects of the corresponding SentenceEmbeddingOnnx or ReRankerOnnx classes.

```python

modelonnx= SentenceEmbedding.loadonnx('model.onnx') 2024-10-30 10:50:22.721487149 [W:onnxruntime:, inferencesession.cc:2039 Initialize] Serializing optimized model with Graph Optimization level greater than ORTENABLEEXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in. modelonnx.encode(['xin chào', 'bạn tên là gì ạ?']) 2/2 [==============================] - 0s 14ms/Sample array([[[ 0.19600058, 0.0093571 , -0.20171645, ..., -0.12414521, 0.1908756 , -0.02904402], [ 0.07333153, 0.07584963, -0.01428957, ..., -0.0851631 , 0.14394096, -0.28628293]]], dtype=float32) ```

5. List of pretrained models

This list will be updated with our prominent models. Our models will primarily aim to support Vietnamese language. Additionally, you can access our datasets and pretrained models by visiting https://huggingface.co/anti-ai.

| Model Name | Model Type | #params | checkpoint| | - | - | - | - | | anti-ai/ViEmbedding-base | SentenceEmbedding | 135.5M |model | | anti-ai/BioViEmbedding-base-unsup | SentenceEmbedding | 135.5M |model | | anti-ai/VieSemantic-base | SentenceEmbedding | 135.5M |model |

Contacts

If you have any questions about this repo, please contact me (nduc0231@gmail.com)

Owner

  • Name: anti-ai
  • Login: anti-aii
  • Kind: organization
  • Email: cocotechteam@gmail.com
  • Location: Vietnam

Citation (CITATION.cff)


      

GitHub Events

Total
  • Watch event: 1
  • Push event: 18
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Push event: 18
  • Pull request event: 1
  • Fork event: 1

Dependencies

setup.py pypi