https://github.com/bytedance/salmonn

SALMONN family: A suite of advanced multi-modal LLMs

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
3 of 6 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Keywords

audio audio-processing audio-visual-understanding bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university video video-understanding

Last synced: 5 months ago · JSON representation

Repository

SALMONN family: A suite of advanced multi-modal LLMs

Basic Info

Host: GitHub
Owner: bytedance
License: apache-2.0
Default Branch: main
Homepage: https://bytedance.github.io/SALMONN/
Size: 58.4 MB

Statistics

Stars: 1,304
Watchers: 27
Forks: 101
Open Issues: 20
Releases: 0

Topics

audio audio-processing audio-visual-understanding bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university video video-understanding

Created over 2 years ago · Last pushed 6 months ago

Metadata Files

Readme License Code of conduct

SALMONN family: A suite of advanced multi-modal LLMs

🚀🚀 Welcome to the repo of SALMONN!

The SALMONN model family consists of a series of advanced multi-modal large language models. For more details, please refer to the corresponding branches.

🔥 News

[2025-07-08] We have opensourced video-SALMONN 2! video-SALMONN 2 is a powerful audio-visual LLM that generates high-quality audio-visual video captions and achieves competitive performance on general video QA benchmarks.
[2025-06-01] We have opensourced QualiSpeech dataset - A speech quality assessment dataset with natural language reasoning. You can use QualiSpeech to develop your own audio LLM for speech quality assessment or to evaluate the low-level speech perception capabilities of existing audio LLMs. Feel free to download it here!
[2025-03-03] We have released the data processing scripts and finetuned model checkpoints for SALMONN for speech quality assessment! See here!
[2024-09-04] We have released the model and inference code for video-SALMONN! See here!
[2024-05-28] 🧳 We have released all the annotations (including 600k SQA/AQA data and 50k audio-based storytelling data) for the 3-stage training of SALMONN! Feel free to download them here!
[2024-04-07] 🤖 We have released all the codes you need to train your own SALMONN! Try some cool things!
[2024-01-16] 💖 Our paper was accepted by ICLR 2024!
[2023-11-13] 🎁 We have released a 7B version of SALMONN at tsinghua-ee/SALMONN-7B and built the 7B demo here!
[2023-10-08] ✨ We have released the model checkpoint and the inference code for SALMONN-13B!

📖 Paper List

``` @inproceedings{ sun2025videosalmonno1, title={{video-SALMONN-o1}: Reasoning-enhanced Audio-visual Large Language Model}, author={Guangzhi Sun, Yudong Yang, Jimin Zhuang, Changli Tang, Yixuan Li, Wei Li, Zejun MA, Chao Zhang}, booktitle={ICML}, year={2025} }

@article{tang2025video, title={{video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models}}, author={Changli Tang and Yixuan Li and Yudong Yang and Jimin Zhuang and Guangzhi Sun and Wei Li and Zejun Ma and Chao Zhang}, journal={arXiv preprint arXiv:2506.15220}, year={2025}, }

@inproceedings{wang2024enabling, title={Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation}, author={Wang, Siyin and Yu, Wenyi and Yang, Yudong and Tang, Changli and Li, Yixuan and Zhuang, Jimin and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others}, booktitle={Proc. ICASSP}, address={Hyderabad}, year={2025} }

@inproceedings{wang2024enabling, title={QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions}, author={Wang, Siyin and Yu, Wenyi and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others}, booktitle={Proc. ACL}, address={Vienna}, year={2025} }

@inproceedings{ sun2024videosalmonn, title={video-{SALMONN}: Speech-Enhanced Audio-Visual Large Language Models}, author={Guangzhi Sun and Wenyi Yu and Changli Tang and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Yuxuan Wang and Chao Zhang}, booktitle={Forty-first International Conference on Machine Learning}, year={2024}, url={https://openreview.net/forum?id=nYsh5GFIqX} }

@inproceedings{ tang2024salmonn, title={SALMONN: Towards Generic Hearing Abilities for Large Language Models}, author={Changli Tang and Wenyi Yu and Guangzhi Sun and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Chao Zhang}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=14rn7HpKVk} } ```

Owner

Name: Bytedance Inc.
Login: bytedance
Kind: organization
Location: Singapore

Website: https://opensource.bytedance.com
Twitter: ByteDanceOSS
Repositories: 255
Profile: https://github.com/bytedance

GitHub Events

Total

Create event: 5
Commit comment event: 1
Issues event: 52
Watch event: 264
Delete event: 3
Member event: 1
Issue comment event: 59
Push event: 23
Pull request review event: 1
Pull request event: 17
Fork event: 27

Last Year

Create event: 5
Commit comment event: 1
Issues event: 52
Watch event: 264
Delete event: 3
Member event: 1
Issue comment event: 59
Push event: 23
Pull request review event: 1
Pull request event: 17
Fork event: 27

Committers

Last synced: 9 months ago

All Time

Total Commits: 58
Total Committers: 6
Avg Commits per committer: 9.667
Development Distribution Score (DDS): 0.552

Past Year

Commits: 34
Committers: 4
Avg Commits per committer: 8.5
Development Distribution Score (DDS): 0.5

Top Committers

Name	Email	Commits
Changli Tang	8****6	26
Yu-Doit	5****t	14
Brian Sun	g**4@n**k	8
Brian Sun	g**4@c**k	5
chan-ming	6****g	4
tangchangli	t**i@b**m	1

Committer Domains (Top 20 + Academic)

bytedance.com: 1 cam.ac.uk: 1 nete.eng.cam.ac.uk: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 78
Total pull requests: 40
Average time to close issues: 23 days
Average time to close pull requests: 20 days
Total issue authors: 61
Total pull request authors: 19
Average comments per issue: 1.18
Average comments per pull request: 0.3
Merged pull requests: 19
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 36
Pull requests: 22
Average time to close issues: 8 days
Average time to close pull requests: about 8 hours
Issue authors: 30
Pull request authors: 7
Average comments per issue: 0.86
Average comments per pull request: 0.0
Merged pull requests: 8
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

cathyliucx (4)
yt605155624 (4)
SaraAlthubaiti (3)
David19970306 (3)
zhanghanweii (3)
peggyxpxu (2)
deniro21 (2)
mohitd404 (2)
Dinxin (2)
qixueweigitbub (2)
URRealHero (1)
JustinYuu (1)
tuanad121 (1)
andeyeluguo (1)
ridingmower (1)

Pull Request Authors

BriansIDP (7)
shubham-gupta-30 (4)
hawkoli1987 (4)
TCL606 (4)
apu52 (3)
teinhonglo (2)
mohitd404 (2)
chan-ming (2)
alienishi (2)
ayushrakesh (1)
cotitan (1)
HimanshuMahto (1)
eltociear (1)
Killer2OP (1)
denglelaibh (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bytedance/salmonn

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

SALMONN family: A suite of advanced multi-modal LLMs

🔥 News

📖 Paper List

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels