speechllm

https://github.com/thomasthebaud/speechllm

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (3.8%) to scientific vocabulary

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 83% confidence

Last synced: 11 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: thomasthebaud
License: apache-2.0
Language: Python
Default Branch: main
Size: 3.4 MB

Statistics

Stars: 2
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 11 months ago

Metadata Files

Readme License

SpeechLLM

Repository adapted from https://github.com/skit-ai/SpeechLLM/tree/main

Currently under work for connectors analysis

SpeechLLM is a multi-modal Language Model (LLM) specifically trained to analyze and predict metadata from a speaker's turn in a conversation. This advanced model integrates a speech encoder to transform speech signals into meaningful speech representations. These embeddings, combined with text instructions, are then processed by the LLM to generate predictions.

The model inputs an speech audio file of 16 KHz and predicts the following: 1. SpeechActivity : if the audio signal contains speech (True/False) 2. Transcript : ASR transcript of the audio 3. Gender of the speaker (Female/Male) 4. Age of the speaker (number) 5. Accent of the speaker (Africa/America/Celtic/Europe/Oceania/South-Asia/South-East-Asia) 6. Emotion of the speaker (Happy/Sad/Anger/Neutral/Frustrated)

Owner

Name: Thomas Thebaud
Login: thomasthebaud
Kind: user

Repositories: 14
Profile: https://github.com/thomasthebaud

GitHub Events

Total

Watch event: 1
Push event: 15
Pull request event: 1
Fork event: 2
Create event: 1

Last Year

Watch event: 1
Push event: 15
Pull request event: 1
Fork event: 2
Create event: 1

Dependencies

requirements.txt pypi

accelerate ==0.30.0
audio_recorder_streamlit ==0.0.8
datasets ==2.2.1
huggingface-hub ==0.23.0
jiwer ==3.0.3
librosa ==0.10.1
peft ==0.9.0
pytorch-lightning ==1.9.4
streamlit ==1.34.0
tokenizers ==0.19.1
torch ==2.0.1
torchaudio ==2.0.2
transformers ==4.41.2
wandb ==0.15.3

environment.yml pypi

cffi ==1.17.1
mutagen ==1.47.0
nvidia-cublas-cu11 ==11.11.3.6
nvidia-cuda-cupti-cu11 ==11.8.87
nvidia-cuda-nvrtc-cu11 ==11.8.89
nvidia-cuda-runtime-cu11 ==11.8.89
nvidia-cudnn-cu11 ==9.1.0.70
nvidia-cufft-cu11 ==10.9.0.58
nvidia-curand-cu11 ==10.3.0.86
nvidia-cusolver-cu11 ==11.4.1.48
nvidia-cusparse-cu11 ==11.7.5.86
nvidia-nccl-cu11 ==2.21.5
nvidia-nvtx-cu11 ==11.8.86
pillow ==11.0.0
pycparser ==2.22
pysoundfile ==0.9.0.post1
torch ==2.7.0
torchaudio ==2.7.0
torchvision ==0.22.0
triton ==3.3.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science