https://github.com/aieng-lab/llm-voicechat-demo

https://github.com/aieng-lab/llm-voicechat-demo

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: aieng-lab
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 2.39 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

llm-voicechat-demo

Installation

For this project you'll need python 3.9.18.

  1. Clone this repository: git clone https://github.com/aieng-lab/llm-voicechat-demo.git

  2. create a virtual environment:

    • Using conda: conda create --name voicebot python==3.9.18 conda activate voicebot
  • Using Python virtualenv: python3.9.18 -m venv voicebot source voicebot/bin/activate
  • Or you can build a Docker instance: cd llm-voicechat-demo docker build . -t bot_backend -f DockerProject/Dockerfile docker run -p 5000:5000 --runtime=nvidia --gpus all bot_backend If you face problems with your GPU when using docker, refer to this question on Stackoverflow click here.

    When you run tha last command, all backend components will be downloaded.

    Models download progress doesn't show and takes a while, if you're worried that the program is stuck, watch your network traffic to make sure it's downloading.
  1. Install the required libraries (Even if you're using Docker, you still need few libraries to run the GPU): pip install -r requirements.txt

  2. You need two terminal windows at llm-voicechat-demo/Code directory

    • In the first terminal run (if you're using Docker, you don't need to run the backend, because it's already running. Just run the GUI): python FlaskSocketIO_backend.py
    • In the second terminal run: python FlaskSocketIO_GUI.py
  • If you want to pass a predefined setting, you can directly pass the path to .json parameters file. Ex: python FlaskSocketIO_backend.py modes/vi_de_no_clicks.json
    • In the second terminal run: python FlaskSocketIO_GUI.py modes/vi_de_no_clicks.json ## How to use:

After executing the commands in Step 5, wait until all components are loaded. (Even if the GUI appears, make sure that the FlaskAPI is running, because loading the models takes more time than loading the GUI).

//Link to GUI screenshot

In the middle of the screen a text field will inform you about the current state of the program:

  • "Ich schlafe ...": The program is initialized, and ready to be started.
  • "Ich spreche ...": The generated speech is being played and plotted.
  • "Ich höre zu ...": The program is ready to record a voice query.
  • "Ich überlege was ich antworte ...": The query is being processed.

Once everything is loaded, you can start the program by pressing on the "Starte neues Gespräch" button.

The program will not record until "Ich höre zu ..." is shown, that's when you can query.

"Beende Antwort" button will stop the current answer, but keeps the current chat history including the stopped answer. Use the "Starte neues Gespräch" button to start a new conversation with a cleaned chat history.

Everything must be closed from terminals.

Architicture:

The project is a Websocket server with RestAPI using python Flask-SocketIO with front- and backend.

  • Frontend:
    • Holds GUI designed using PyQt5.
    • Holds SocketIO AsyncClient to communicate with the backend.

    • Controls both microphone and speaker to record an play audio.
    • Uses QRunnable to work as threads.
  • Backend:
    • Holds The FlaskSocketIO AsyncServer to communicate with frontend.
    • Holds Speech-to-Text model.
    • Holds Text-to-Text model.
    • Holds Text-to-Speech model.
    • Runs the server on the main thread.
    • When a request is received, a new thread is created to process and send back response.

How it works:

When "Starte neues Gespräch" is clicked, the frontend starts with playing a pre-recorded welcome message and connects to the backend through socketIO client. Then the microphone is initialized and records user's query.

After an audio is recorded, it will be sent to the Flask-SocketIO server (backend) as GET request.

The backend receives the request on a Flask route, creates a new thread to process it as follows:

  1. Transcribes the audio data using a speech-to-text model (WhisperLargeV2)
  2. Streams generated text on the transcription using a large language model (FastChat/Vicuna).
  3. Generates speech for each chunk of generated text using text-to-speech model (thorsten/vits from XTT2/Coqui)
  4. Each generated speech would be sent back to frontend using socketIO as audio bytes.

The frontend reformat and normalize those bytes to be played and plotted synchronously.

At the end of generation the backend emits a signal to inform the frontend.

When frontend receives end-of-generation signal, it reactivates the microphone to record the next request.

Owner

  • Name: aieng-lab
  • Login: aieng-lab
  • Kind: organization

GitHub organization of the Chair for AI Engineering of the University of Passau

GitHub Events

Total
  • Push event: 4
Last Year
  • Push event: 4