Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.8%) to scientific vocabulary
Scientific Fields
Repository
Basic Info
- Host: GitHub
- Owner: thomasthebaud
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 3.4 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SpeechLLM
Repository adapted from https://github.com/skit-ai/SpeechLLM/tree/main
Currently under work for connectors analysis
SpeechLLM is a multi-modal Language Model (LLM) specifically trained to analyze and predict metadata from a speaker's turn in a conversation. This advanced model integrates a speech encoder to transform speech signals into meaningful speech representations. These embeddings, combined with text instructions, are then processed by the LLM to generate predictions.
The model inputs an speech audio file of 16 KHz and predicts the following: 1. SpeechActivity : if the audio signal contains speech (True/False) 2. Transcript : ASR transcript of the audio 3. Gender of the speaker (Female/Male) 4. Age of the speaker (number) 5. Accent of the speaker (Africa/America/Celtic/Europe/Oceania/South-Asia/South-East-Asia) 6. Emotion of the speaker (Happy/Sad/Anger/Neutral/Frustrated)
Owner
- Name: Thomas Thebaud
- Login: thomasthebaud
- Kind: user
- Repositories: 14
- Profile: https://github.com/thomasthebaud
GitHub Events
Total
- Watch event: 1
- Push event: 15
- Pull request event: 1
- Fork event: 2
- Create event: 1
Last Year
- Watch event: 1
- Push event: 15
- Pull request event: 1
- Fork event: 2
- Create event: 1
Dependencies
- accelerate ==0.30.0
- audio_recorder_streamlit ==0.0.8
- datasets ==2.2.1
- huggingface-hub ==0.23.0
- jiwer ==3.0.3
- librosa ==0.10.1
- peft ==0.9.0
- pytorch-lightning ==1.9.4
- streamlit ==1.34.0
- tokenizers ==0.19.1
- torch ==2.0.1
- torchaudio ==2.0.2
- transformers ==4.41.2
- wandb ==0.15.3
- cffi ==1.17.1
- mutagen ==1.47.0
- nvidia-cublas-cu11 ==11.11.3.6
- nvidia-cuda-cupti-cu11 ==11.8.87
- nvidia-cuda-nvrtc-cu11 ==11.8.89
- nvidia-cuda-runtime-cu11 ==11.8.89
- nvidia-cudnn-cu11 ==9.1.0.70
- nvidia-cufft-cu11 ==10.9.0.58
- nvidia-curand-cu11 ==10.3.0.86
- nvidia-cusolver-cu11 ==11.4.1.48
- nvidia-cusparse-cu11 ==11.7.5.86
- nvidia-nccl-cu11 ==2.21.5
- nvidia-nvtx-cu11 ==11.8.86
- pillow ==11.0.0
- pycparser ==2.22
- pysoundfile ==0.9.0.post1
- torch ==2.7.0
- torchaudio ==2.7.0
- torchvision ==0.22.0
- triton ==3.3.0