Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.8%) to scientific vocabulary

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 83% confidence
Last synced: 4 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: thomasthebaud
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 3.4 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 5 months ago
Metadata Files
Readme License

README.md

SpeechLLM

Repository adapted from https://github.com/skit-ai/SpeechLLM/tree/main

Currently under work for connectors analysis

SpeechLLM is a multi-modal Language Model (LLM) specifically trained to analyze and predict metadata from a speaker's turn in a conversation. This advanced model integrates a speech encoder to transform speech signals into meaningful speech representations. These embeddings, combined with text instructions, are then processed by the LLM to generate predictions.

The model inputs an speech audio file of 16 KHz and predicts the following: 1. SpeechActivity : if the audio signal contains speech (True/False) 2. Transcript : ASR transcript of the audio 3. Gender of the speaker (Female/Male) 4. Age of the speaker (number) 5. Accent of the speaker (Africa/America/Celtic/Europe/Oceania/South-Asia/South-East-Asia) 6. Emotion of the speaker (Happy/Sad/Anger/Neutral/Frustrated)

Owner

  • Name: Thomas Thebaud
  • Login: thomasthebaud
  • Kind: user

GitHub Events

Total
  • Watch event: 1
  • Push event: 15
  • Pull request event: 1
  • Fork event: 2
  • Create event: 1
Last Year
  • Watch event: 1
  • Push event: 15
  • Pull request event: 1
  • Fork event: 2
  • Create event: 1

Dependencies

requirements.txt pypi
  • accelerate ==0.30.0
  • audio_recorder_streamlit ==0.0.8
  • datasets ==2.2.1
  • huggingface-hub ==0.23.0
  • jiwer ==3.0.3
  • librosa ==0.10.1
  • peft ==0.9.0
  • pytorch-lightning ==1.9.4
  • streamlit ==1.34.0
  • tokenizers ==0.19.1
  • torch ==2.0.1
  • torchaudio ==2.0.2
  • transformers ==4.41.2
  • wandb ==0.15.3
environment.yml pypi
  • cffi ==1.17.1
  • mutagen ==1.47.0
  • nvidia-cublas-cu11 ==11.11.3.6
  • nvidia-cuda-cupti-cu11 ==11.8.87
  • nvidia-cuda-nvrtc-cu11 ==11.8.89
  • nvidia-cuda-runtime-cu11 ==11.8.89
  • nvidia-cudnn-cu11 ==9.1.0.70
  • nvidia-cufft-cu11 ==10.9.0.58
  • nvidia-curand-cu11 ==10.3.0.86
  • nvidia-cusolver-cu11 ==11.4.1.48
  • nvidia-cusparse-cu11 ==11.7.5.86
  • nvidia-nccl-cu11 ==2.21.5
  • nvidia-nvtx-cu11 ==11.8.86
  • pillow ==11.0.0
  • pycparser ==2.22
  • pysoundfile ==0.9.0.post1
  • torch ==2.7.0
  • torchaudio ==2.7.0
  • torchvision ==0.22.0
  • triton ==3.3.0